LabelledGeneric in Rust: What, Why, How?
What is LabelledGeneric? How does one encode type-level Strings in Rust? What is a labelled HList?
Hold on, let’s take a step back.
In a previous post about implementing Generic in Rust, I briefly mentioned the fact that Generic could cause silent failures at runtime if you have 2 structs that are identically shaped type-wise, but have certain fields swapped.
While we can work around this using wrapper types, that solution leaves something to be desired, because, well, more boilerplate adds noise and requires more maintenance.
Ideally, we want to have something like this, where the following works:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | |
but the following fails at compile-time because the fields are mis-matched (first_name and last_name have been swapped):
1 2 3 4 5 6 7 8 9 10 11 | |
The solution to this sort of problem has been in Shapeless for some time; by using HLists where each cell contains not just a value, but instead hold named fields, where each value is labelled at the type level.
Let’s take a look at how Frunk implements Field values and LabelledGeneric in Rust :)
Add Frunk to your project
Frunk is published to Crates.io, so to begin, add the crate to your list of dependencies:
1 2 | |
Outline
Why? (Motivation)
Silent runtime errors with Generic
To illustrate the problem, observe that the following 2 structs have the exact same “shape”
1 2 3 4 5 6 7 8 9 10 11 12 13 | |
That is, the Generic representation of their fields as Generic is simply HList![&'a str, &'a str, usize]. As a result, when we do the following:
1 2 3 4 5 6 7 8 | |
Oh no! s_user has first_name and last_name flipped :(
As explained near the end of the post introducing Generic, you can catch this sort of mistake by introducing wrapper types like FirstName<'a>(&' str) for each field, but that introduces more boilerplate. This sucks, because Generic is supposed to help avoid boilerplate!
Can we have our cake and eat it too ?
LabelledGeneric to the rescue
LabelledGeneric was introduced in v0.1.12 of Frunk to solve this exact problem. This is how you use it.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | |
There isn’t a whole lot different to using LabelledGeneric vs using Generic:
- Instead of deriving
Generic, deriveLabelledGeneric - Instead of calling
convert_from, calllabelled_convert_from
These 2 changes buy you a lot more type-safety at compile time, with zero boilerplate. By the way, if you’d like the compiler to automatically “align”, the generic representations so that you could instantiate a JumbledUser from a NewUser, then stay tuned for a later post ;)
The tl;dr version of how this works is that deriving by LabelledGeneric, we make the struct an instance of the LabelledGeneric typeclass. This typeclass is almost identical to the Generic typeclass, but the derive does something a bit different with the generic representation of the struct: it isn’t just an HList wrapping naked values.
Instead, the generic representation will be an HList where each cell will contain field name information, at the type-level, and conceptually has the following types:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | |
This difference in type-level representation is how the compiler knows that one can’t simply convert a NewUser or SavedUser into a JumbledUser via labelled_convert_from.
Field ??
What is Field ? It’s simply a container struct that is parameterised by 2 types, and has the following signature:
1
| |
The first type parameter is Name and its purpose is to contain a type-level String, and the second type parameter is Type, which reflects the type of value contained inside the struct.
It may help to think of Field as an ad-hoc wrapper type.
How it works
Field<Name, Type>
The full definition of Field is currently as follows:
1 2 3 4 | |
PhantomData is used to allow us to bind a concrete type to the Name type parameter in an instance of Field without actually having it take up any space (for more details on Phantom data, refer to the official docs).
To construct a Field, Frunk exposes a macro called field! so that you don’t need to touch PhantomData yourself.
1 2 3 4 5 | |
For more information about the field! macro, please refer to its Rustdoc page. Astute readers will notice the odd (a,g,e) type used for naming. What is that about ???
Type-level characters and strings
In order represent characters at the type level, Frunk currently uses enums that have zero members. This is because empty enums have distinct types, and yet cannot be instantiated at runtime and thus are guaranteed to incur zero cost.
Conceptually, we declare one enum for every character we want to represent:
1 2 3 4 5 6 7 8 9 10 11 | |
This means that characters outside English alphanumeric range will need to be specially encoded (the LabelledGeneric derivation uses unicode, but more on this later), but for the most part, this should suffice for the use case of encoding field names as types.
As you may have guessed, type-level strings are then simply represented as tuple types, hence (a,g,e). For the sake of reducing noise, in the rest of this post, we will refer to these name-types without commas and parentheses.
Note: This type-level encoding of strings may change in the future.
(Anonymous) Records!
Combining the Field and HList constructs gets us something else: Records. I believe once upon a time, Rust supported anonymous structs; well, you can get most of that functionality back with Frunk!
1 2 3 4 5 6 7 8 9 10 11 | |
This kind of thing is sometimes called an “anonymous Record” in Scala (see scala-records, or Shapeless).
In the future, the anonymous Records API in Frunk might be improved. As it stands, it exists mostly for the purpose of LabelledGeneric and is a bit noisy to use.
Field and LabelledGeneric
So, what is the relationship between Field and the LabelledGeneric typeclass?
Quite simply, the associated Repr type of an instance of LabelledGeneric should have the type of an anonymous record (labelled HList).
So, given the following
1 2 3 4 | |
This is one possible implementation of LabelledGeneric for Person:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | |
But writing that yourself is tedious and error-prone, so Frunk provides a derivation for you.
How the LabelledGeneric derivation is generated
As illustrated earlier, you can do the following to create an instance of LabelledGeneric for your struct:
1 2 3 4 5 | |
It generates something conceptually similar to what we had above, so we won’t repeat that here.
That said, there is something special about the way that characters outside the range of the standard English alphabet and digits are handled. For each of those characters, we get the Unicode hexcode and use those digits, sandwiched by _uc and uc_ delimiters, as the type-level representation.
1 2 3 4 5 6 7 8 | |
This allows us to effectively represent virtually any legal identifier at the type level, even when the ASCII-only restriction for identifiers is lifted from stable Rust. For more details, take a look at how characters are matched to identifiers here.
Conclusion
In closing, I’d like to stress that all the abstractions and techniques described in this post are type-safe (no casting happening) and thus get fully verified by Rust’s compiler and its strong type system.
As far as I am aware, this is the first implementation of labelled HLists (aka anonymous Records) and LabelledGeneric in Rust, and I hope this post did a good job of explaining what problems they solve, what they are, how they work, and why you might want to use them. As usual, please give them a go and chime in with questions, comments, ideas, or PRs!
Also, as alluded to in the section introducing LabelledGeneric, there is a way to automatically match up out-of-order fields. We’ll go through this in another post.