What is LabelledGeneric? How does one encode type-level Strings in Rust? What is a labelled HList?

Hold on, let’s take a step back.

In a previous post about implementing Generic in Rust, I briefly mentioned the fact that Generic could cause silent failures at runtime if you have 2 structs that are identically shaped type-wise, but have certain fields swapped.

While we can work around this using wrapper types, that solution leaves something to be desired, because, well, more boilerplate adds noise and requires more maintenance.

Ideally, we want to have something like this, where the following works:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
#[derive(LabelledGeneric)]
struct NewUser<'a> {
    first_name: &'a str,
    last_name: &'a str,
    age: usize,
}

#[derive(LabelledGeneric)]
struct SavedUser<'a> {
    first_name: &'a str,
    last_name: &'a str,
    age: usize,
}

let n_user = NewUser {
                    first_name: "Moe",
                    last_name: "Ali",
                    age: 30
                };

// Convert from NewUser to SavedUser
let s_user: SavedUser = labelled_convert_from(n_user);

but the following fails at compile-time because the fields are mis-matched (first_name and last_name have been swapped):

1
2
3
4
5
6
7
8
9
10
11
// Uh-oh! Fields are jumbled :(
#[derive(LabelledGeneric)]
struct JumbledUser<'a> {
    last_name: &'a str,
    first_name: &'a str,
    age: usize
}

// This should fail at compile-time because last_name and first_name are swapped
// even if they have the same type
let d_user = <JumbledUser as LabelledGeneric>::convert_from(s_user);

The solution to this sort of problem has been in Shapeless for some time; by using HLists where each cell contains not just a value, but instead hold named fields, where each value is labelled at the type level.

Let’s take a look at how Frunk implements Field values and LabelledGeneric in Rust :)

Add Frunk to your project

Frunk is published to Crates.io, so to begin, add the crate to your list of dependencies:

Crates.io

1
2
[dependencies]
frunk = "${latest_version}"

Outline

Why? (Motivation)

Silent runtime errors with Generic

To illustrate the problem, observe that the following 2 structs have the exact same “shape”

1
2
3
4
5
6
7
8
9
10
11
12
13
#[derive(Generic)]
struct NewUser<'a> {
    first_name: &'a str,
    last_name: &'a str,
    age: usize,
}

#[derive(Generic)]
struct JumbledUser<'a> {
    last_name: &'a str,
    first_name: &'a str,
    age: usize
}

That is, the Generic representation of their fields as Generic is simply HList![&'a str, &'a str, usize]. As a result, when we do the following:

1
2
3
4
5
6
7
8
let n_user = NewUser {
                    first_name: "Moe",
                    last_name: "Ali",
                    age: 30
                };

// Convert from NewUser to JumbledUser
let s_user: JumbledUser = convert_from(n_user);

Oh no! s_user has first_name and last_name flipped :(

As explained near the end of the post introducing Generic, you can catch this sort of mistake by introducing wrapper types like FirstName<'a>(&' str) for each field, but that introduces more boilerplate. This sucks, because Generic is supposed to help avoid boilerplate!

Can we have our cake and eat it too ?

LabelledGeneric to the rescue

LabelledGeneric was introduced in v0.1.12 of Frunk to solve this exact problem. This is how you use it.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#[derive(LabelledGeneric)]
struct NewUser<'a> {
    first_name: &'a str,
    last_name: &'a str,
    age: usize,
}

#[derive(LabelledGeneric)]
struct SavedUser<'a> {
    first_name: &'a str,
    last_name: &'a str,
    age: usize,
}

let n_user = NewUser {
                    first_name: "Moe",
                    last_name: "Ali",
                    age: 30
                };

// Convert from NewUser to SavedUser
let s_user: SavedUser = labelled_convert_from(n_user);

#[derive(Generic)]
struct JumbledUser<'a> {
    last_name: &'a str,
    first_name: &'a str,
    age: usize
}
// ⬇︎ This will fail at compile time
let j_user: JumbledUser = labelled_convert_from(n_user);

There isn’t a whole lot different to using LabelledGeneric vs using Generic:

  1. Instead of deriving Generic, derive LabelledGeneric
  2. Instead of calling convert_from, call labelled_convert_from

These 2 changes buy you a lot more type-safety at compile time, with zero boilerplate. By the way, if you’d like the compiler to automatically “align”, the generic representations so that you could instantiate a JumbledUser from a NewUser, then stay tuned for a later post ;)

The tl;dr version of how this works is that deriving by LabelledGeneric, we make the struct an instance of the LabelledGeneric typeclass. This typeclass is almost identical to the Generic typeclass, but the derive does something a bit different with the generic representation of the struct: it isn’t just an HList wrapping naked values.

Instead, the generic representation will be an HList where each cell will contain field name information, at the type-level, and conceptually has the following types:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// LabelledGeneric Representation for NewUser
type NewUserRepr = HList![
  Field<first_name, &'a str>,
  Field<last_name, &'a str>,
  Field<age, usize>];

// LabelledGeneric Representation for SavedUser
type SavedUserRepr = HList![
  Field<first_name, &'a str>,
  Field<last_name, &'a str>,
  Field<age, usize>];

// LabelledGeneric Representation for JumbledUser
type JumbledUserRepr = HList![
  Field<last_name, &'a str>,
  Field<first_name, &'a str>,
  Field<age, usize>];

This difference in type-level representation is how the compiler knows that one can’t simply convert a NewUser or SavedUser into a JumbledUser via labelled_convert_from.

Field ??

What is Field ? It’s simply a container struct that is parameterised by 2 types, and has the following signature:

1
pub struct Field<Name, Type> { ... }

The first type parameter is Name and its purpose is to contain a type-level String, and the second type parameter is Type, which reflects the type of value contained inside the struct.

It may help to think of Field as an ad-hoc wrapper type.

How it works

Field<Name, Type>

The full definition of Field is currently as follows:

1
2
3
4
pub struct Field<Name, Type> {
    name: PhantomData<Name>,
    pub value: Type,
}

PhantomData is used to allow us to bind a concrete type to the Name type parameter in an instance of Field without actually having it take up any space (for more details on Phantom data, refer to the official docs).

To construct a Field, Frunk exposes a macro called field! so that you don’t need to touch PhantomData yourself.

1
2
3
4
5
// Usage: we let the compiler figure out the value type for us
let age = field!((a, g, e), 3);

assert_eq!(age.name, "age");
assert_eq!(age.value, 3);

For more information about the field! macro, please refer to its Rustdoc page. Astute readers will notice the odd (a,g,e) type used for naming. What is that about ???

Type-level characters and strings

In order represent characters at the type level, Frunk currently uses enums that have zero members. This is because empty enums have distinct types, and yet cannot be instantiated at runtime and thus are guaranteed to incur zero cost.

Conceptually, we declare one enum for every character we want to represent:

1
2
3
4
5
6
7
8
9
10
11
pub enum a {}
pub enum b {}
pub enum c {}
// ...
pub enum A {}
// ... etc
// Numbers can't be identifiers, so we preface them with an underscore
pub enum _1 {}
pub enum _2 {}

// In reality, the above is generated by a macro.

This means that characters outside English alphanumeric range will need to be specially encoded (the LabelledGeneric derivation uses unicode, but more on this later), but for the most part, this should suffice for the use case of encoding field names as types.

As you may have guessed, type-level strings are then simply represented as tuple types, hence (a,g,e). For the sake of reducing noise, in the rest of this post, we will refer to these name-types without commas and parentheses.

Note: This type-level encoding of strings may change in the future.

(Anonymous) Records!

Combining the Field and HList constructs gets us something else: Records. I believe once upon a time, Rust supported anonymous structs; well, you can get most of that functionality back with Frunk!

1
2
3
4
5
6
7
8
9
10
11
let record = hlist![
    field!(name, "Joe"),
    field!(age, 30)
];

// We'll talk about pluck() in a later post, but just an FYI, it returns the
// target value with the type you specified as well as the remainder
// of the HList in a pair. It is checked at compile time to make sure it never
// fails at runtime.
let (name, _): (Field<name, _>, _) = record.pluck();
assert_eq!(name.value, "Joe")

This kind of thing is sometimes called an “anonymous Record” in Scala (see scala-records, or Shapeless).

In the future, the anonymous Records API in Frunk might be improved. As it stands, it exists mostly for the purpose of LabelledGeneric and is a bit noisy to use.

Field and LabelledGeneric

So, what is the relationship between Field and the LabelledGeneric typeclass?

Quite simply, the associated Repr type of an instance of LabelledGeneric should have the type of an anonymous record (labelled HList).

So, given the following

1
2
3
4
struct Person {
  name: String,
  age: usize
}

This is one possible implementation of LabelledGeneric for Person:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
impl LabelledGeneric for Person {

  type Repr = HList![ Field<name, String>, Field<age, usize> ];

  fn into(self) -> Self::Repr {
    hlist![
      field!(name, self.name),
      field!(age, self.age)
    ]
  }

  fn from(r: Self::Repr) -> Self {
    let hlist_pat![ name, age ] = r;
    Person {
      name: name.value,
      age: age.value
    }
  }

}

But writing that yourself is tedious and error-prone, so Frunk provides a derivation for you.

How the LabelledGeneric derivation is generated

As illustrated earlier, you can do the following to create an instance of LabelledGeneric for your struct:

1
2
3
4
5
#[derive(LabelledGeneric)]
struct Person {
  name: String,
  age: usize
}

It generates something conceptually similar to what we had above, so we won’t repeat that here.

That said, there is something special about the way that characters outside the range of the standard English alphabet and digits are handled. For each of those characters, we get the Unicode hexcode and use those digits, sandwiched by _uc and uc_ delimiters, as the type-level representation.

1
2
3
4
5
6
7
8
// This isn't possible (yet) in Rust, but let's pretend it is
struct Fancy {
  : usize
}

// Since ❤ has a Unicode hexcode of \u{2764}\u{fe0f}, the
// labelled generic representation for the above would be
type Repr = HList![ Field<_ucu2764ufe0fuc_, usize> ]

This allows us to effectively represent virtually any legal identifier at the type level, even when the ASCII-only restriction for identifiers is lifted from stable Rust. For more details, take a look at how characters are matched to identifiers here.

Conclusion

In closing, I’d like to stress that all the abstractions and techniques described in this post are type-safe (no casting happening) and thus get fully verified by Rust’s compiler and its strong type system.

As far as I am aware, this is the first implementation of labelled HLists (aka anonymous Records) and LabelledGeneric in Rust, and I hope this post did a good job of explaining what problems they solve, what they are, how they work, and why you might want to use them. As usual, please give them a go and chime in with questions, comments, ideas, or PRs!

Also, as alluded to in the section introducing LabelledGeneric, there is a way to automatically match up out-of-order fields. We’ll go through this in another post.

  1. Frunk on Github
  2. Frunk on Crates.io

Comments