LabelledGeneric in Rust: What, Why, How?
What is LabelledGeneric
? How does one encode type-level Strings in Rust? What is a labelled HList?
Hold on, let’s take a step back.
In a previous post about implementing Generic
in Rust, I briefly mentioned the fact that Generic
could cause silent failures at runtime if you have 2 structs that are identically shaped type-wise, but have certain fields swapped.
While we can work around this using wrapper types, that solution leaves something to be desired, because, well, more boilerplate adds noise and requires more maintenance.
Ideally, we want to have something like this, where the following works:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
|
but the following fails at compile-time because the fields are mis-matched (first_name
and last_name
have been swapped):
1 2 3 4 5 6 7 8 9 10 11 |
|
The solution to this sort of problem has been in Shapeless for some time; by using HList
s where each cell contains not just a value, but instead hold named fields, where each value is labelled at the type level.
Let’s take a look at how Frunk implements Field
values and LabelledGeneric
in Rust :)
Add Frunk to your project
Frunk is published to Crates.io, so to begin, add the crate to your list of dependencies:
1 2 |
|
Outline
Why? (Motivation)
Silent runtime errors with Generic
To illustrate the problem, observe that the following 2 structs have the exact same “shape”
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
That is, the Generic
representation of their fields as Generic
is simply HList![&'a str, &'a str, usize]
. As a result, when we do the following:
1 2 3 4 5 6 7 8 |
|
Oh no! s_user
has first_name
and last_name
flipped :(
As explained near the end of the post introducing Generic, you can catch this sort of mistake by introducing wrapper types like FirstName<'a>(&' str)
for each field, but that introduces more boilerplate. This sucks, because Generic
is supposed to help avoid boilerplate!
Can we have our cake and eat it too ?
LabelledGeneric
to the rescue
LabelledGeneric
was introduced in v0.1.12 of Frunk to solve this exact problem. This is how you use it.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
|
There isn’t a whole lot different to using LabelledGeneric
vs using Generic
:
- Instead of deriving
Generic
, deriveLabelledGeneric
- Instead of calling
convert_from
, calllabelled_convert_from
These 2 changes buy you a lot more type-safety at compile time, with zero boilerplate. By the way, if you’d like the compiler to automatically “align”, the generic representations so that you could instantiate a JumbledUser
from a NewUser
, then stay tuned for a later post ;)
The tl;dr version of how this works is that deriving by LabelledGeneric
, we make the struct an instance of the LabelledGeneric
typeclass. This typeclass is almost identical to the Generic
typeclass, but the derive
does something a bit different with the generic representation of the struct: it isn’t just an HList
wrapping naked values.
Instead, the generic representation will be an HList
where each cell will contain field name information, at the type-level, and conceptually has the following types:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
This difference in type-level representation is how the compiler knows that one can’t simply convert a NewUser
or SavedUser
into a JumbledUser
via labelled_convert_from
.
Field
??
What is Field
? It’s simply a container struct that is parameterised by 2 types, and has the following signature:
1
|
|
The first type parameter is Name
and its purpose is to contain a type-level String, and the second type parameter is Type
, which reflects the type of value contained inside the struct.
It may help to think of Field
as an ad-hoc wrapper type.
How it works
Field<Name, Type>
The full definition of Field
is currently as follows:
1 2 3 4 |
|
PhantomData
is used to allow us to bind a concrete type to the Name
type parameter in an instance of Field
without actually having it take up any space (for more details on Phantom data, refer to the official docs).
To construct a Field
, Frunk exposes a macro called field!
so that you don’t need to touch PhantomData
yourself.
1 2 3 4 5 |
|
For more information about the field!
macro, please refer to its Rustdoc page. Astute readers will notice the odd (a,g,e)
type used for naming. What is that about ???
Type-level characters and strings
In order represent characters at the type level, Frunk currently uses enum
s that have zero members. This is because empty enums have distinct types, and yet cannot be instantiated at runtime and thus are guaranteed to incur zero cost.
Conceptually, we declare one enum for every character we want to represent:
1 2 3 4 5 6 7 8 9 10 11 |
|
This means that characters outside English alphanumeric range will need to be specially encoded (the LabelledGeneric
derivation uses unicode, but more on this later), but for the most part, this should suffice for the use case of encoding field names as types.
As you may have guessed, type-level strings are then simply represented as tuple types, hence (a,g,e)
. For the sake of reducing noise, in the rest of this post, we will refer to these name-types without commas and parentheses.
Note: This type-level encoding of strings may change in the future.
(Anonymous) Records!
Combining the Field
and HList
constructs gets us something else: Records. I believe once upon a time, Rust supported anonymous structs; well, you can get most of that functionality back with Frunk!
1 2 3 4 5 6 7 8 9 10 11 |
|
This kind of thing is sometimes called an “anonymous Record” in Scala (see scala-records, or Shapeless).
In the future, the anonymous Records API in Frunk might be improved. As it stands, it exists mostly for the purpose of LabelledGeneric
and is a bit noisy to use.
Field
and LabelledGeneric
So, what is the relationship between Field
and the LabelledGeneric
typeclass?
Quite simply, the associated Repr
type of an instance of LabelledGeneric
should have the type of an anonymous record (labelled HList
).
So, given the following
1 2 3 4 |
|
This is one possible implementation of LabelledGeneric
for Person
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
But writing that yourself is tedious and error-prone, so Frunk provides a derivation for you.
How the LabelledGeneric
derivation is generated
As illustrated earlier, you can do the following to create an instance of LabelledGeneric
for your struct:
1 2 3 4 5 |
|
It generates something conceptually similar to what we had above, so we won’t repeat that here.
That said, there is something special about the way that characters outside the range of the standard English alphabet and digits are handled. For each of those characters, we get the Unicode hexcode and use those digits, sandwiched by _uc
and uc_
delimiters, as the type-level representation.
1 2 3 4 5 6 7 8 |
|
This allows us to effectively represent virtually any legal identifier at the type level, even when the ASCII-only restriction for identifiers is lifted from stable Rust. For more details, take a look at how characters are matched to identifiers here.
Conclusion
In closing, I’d like to stress that all the abstractions and techniques described in this post are type-safe (no casting happening) and thus get fully verified by Rust’s compiler and its strong type system.
As far as I am aware, this is the first implementation of labelled HLists (aka anonymous Records) and LabelledGeneric
in Rust, and I hope this post did a good job of explaining what problems they solve, what they are, how they work, and why you might want to use them. As usual, please give them a go and chime in with questions, comments, ideas, or PRs!
Also, as alluded to in the section introducing LabelledGeneric
, there is a way to automatically match up out-of-order fields. We’ll go through this in another post.