Have you ever wanted to write a structurally typed function in Rust? Do you spend a lot of time and effort getting your Rust structs just so, and want to DRY-out data access for common field paths without declaring a new trait and implementing it for each struct (let’s say, Cat and Dog both have a name: String field)? If so, read on.

This post talks about how we can leverage LabelledGeneric to build Path traversers (functionally similar to lenses), and use them to write clean and performant structurally typed functions with all the compile-time safety that you’ve come to expect from Rust.

Re: radio silence

It’s been a while (4 years!) since I last updated this blog. Why?

  • I started working on the Cloud SWE team at Elastic (we’re hiring!). I’ve been busy leading project teams, implementing features, and writing (and deleting!) lots of Scala code (no Rust though, sadly 😭)
  • My kid gained sentience: Around the same time, my daughter turned 2, and it’s just been a complete whirlwind of activities, learning, viruses, emotions, etc. It’s been awesome and I wouldn’t trade it for the world, but people are DOWNRIGHT LYING if they claim having kids doesn’t change anything.
  • 2020: Covid was a big one, but the whole year felt like a trainwreck in slow motion … if the train was pulling dumpster fires.

Lastly, I just didn’t have the oomph to write a post that describes transmogrify() to follow up on the post on Struct transforms. Transmogrifier, which allows flexibile recursive transformation between similarly-structured structs, was added over 2.5 years ago, but writing about it was … intimidating.

Still, I recently decided to try to start writing again, so I picked a topic that’s slightly simpler, but related: Path, which introduced zero-overhead structurally-typed functions that you could use with normal structs to stable Rust back in Februrary of 2019 1.

Is the post late? Yes. Better than never? I hope so 🙏

Overview

Structural typing, you say?

“Structural typing” was thrown around up there ↑, but what do we mean? To quote Wiki:

A structural type system (or property-based type system) is a major class of type system in which type compatibility and equivalence are determined by the type’s actual structure or definition and not by other characteristics such as its name or place of declaration. Structural systems are used to determine if types are equivalent and whether a type is a subtype of another. It contrasts with nominative systems, where comparisons are based on the names of the types or explicit declarations, and duck typing, in which only the part of the structure accessed at runtime is checked for compatibility.

Out-of-the-box-Rust has nominally typed functions 2 3. For the purposes of this post (and frunk), we specifically mean structs and their fields when it comes to “structure”4, and not methods that they get from impls of themselves or traits. Why? Well, you can’t spell “structural typing without struct, I’ve been mostly focused on structs, and … simplicity 😂. Also, to my mind, traits already enable a kind of part-way “structural typing” of methods 5.

Show me yours

I Read Somewhere ™ that giving a concrete example upfront helps people decide if they want to keep reading (if it aligns with their interests), plus there are lots of movies where the first scene you see is chronologically from the end of the story, followed by a rewinding sound and jump back to the beginning … and Hollywood knows engagement. Anyway, we’ll end up with something that allows us to do write this sort of thing:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
/// Function that generically takes any struct `A` that is traversable with `.pet.name`, if
/// doing so returns a `String`
///
/// This is done without declaring any traits specific to this traversal
fn print_pet_name<A, Idx>(obj: A) -> ()
                                 // ↓ dot-separated structural path
    where A: PathTraverser<Path!(pet.name), Idx, TargetValue=String>
                                 // ↑ 🎉
{
    println!(
        "Pet name [{}]",
        path!(pet.name).get(obj)
    );
}

// Pass it any object that has `pet.name`
print_pet_name(dog_person);
print_pet_name(cat_person);
print_pet_name(hamster_person);
print_pet_name(snake_person);
print_pet_name(goldfish_person);
print_pet_name(house);

The objects you pass to the print_pet_name function don’t need to know anything specific to it nor structurally typed functions in general: their struct declarations just need to derive(LabelledGeneric) and have a structure that complies with the function’s type signature (i.e. have a pet.name path that returns a String):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
#[derive(LabelledGeneric)]
struct Dog {
    name: String,
    age: u32
}

#[derive(LabelledGeneric)]
struct Cat {
    name: String,
    age: u32
}

// The next two structs can both be traversed with `pet.age`

#[derive(LabelledGeneric)]
struct DogPerson {
  name: String,
  pet: Dog
}

#[derive(LabelledGeneric)]
struct CatPerson {
  name: String,
  pet: Cat
}

// etc etc

let dog = Dog {
    name: "Odie".to_string(),
    age: 32
};

let cat = Cat {
    name: "Garfield".to_string(),
    age: 16
};

let dog_person = DogPerson {
  name: "Jon".to_string(),
  pet: dog
};

let cat_person = CatPerson {
  name: "Jon".to_string(),
  pet: cat
};

That’s it. The API is relatively clean, simple to write, read, and understand (IMO), and there are no unsafe or dyn traits anywhere (even in the implementation). And, you can still declare and treat your structs as you normally would, passing them to nominally typed functions, implementing traits as you normally would etc.

Still, when used with structurally typed functions like print_pet_name, the compiler will as usual ensure that:

  1. The paths accessed on the generic parameter A inside the structurally typed function are constrained by the function’s type signature.
  2. The LabelledGeneric objects passed as arguments to the structurally typed function support the required path in the function’s type signature.

The functions themselves are not constrained to just getting values, they can also set values too (see the other example at the end of the post)

Quick review of LabelledGeneric

By adding a #[derive(LabelledGeneric)] attribute to a struct, like so:

1
2
3
4
5
#[derive(LabelledGeneric)]
struct Dog {
    name: String,
    age: u32
}

we gain the ability to turn a Dog object into a labelled heterogenous list:

1
2
3
4
5
6
7
8
9
10
11
let dog = Dog {
    name: "Odie".to_string(),
    age: 32
};
let as_labelled = <Dog as LabelledGeneric>::into(dog);
let expected_labelled = hlist![
    // in reality the field label is a tuple of type-level chars, but ignore that for now
    field!(name, "Odie".to_string()),
    field!(age, 32)
];
assert_eq!(expected_labelled, as_labelled);

This ability to turn a struct into a heterogenous List of “fields” (type-level labels and values, henceforth “labelled HList”) paves the way for us to go from nominative typing (does this type have the right name?) to structural typing (does this type have a given structure?).

For a more thorough review of HLists and LabelledGeneric, see this post.

Detour: Plucking by labelled field

Given a labelled HList, it would be useful to be able to “pluck” a value out of it by using a type-level field name. That would allow us to have compile-time-checked access of a field in a labelled Hlist by type-level name:

1
2
3
// Following from the above `Dog` example
let (age_field, _): (Field<age, _>, _) = as_labelled.pluck_by_name();
assert_eq!(32, age_field.value);

This is the equivalent of accessing a specific .age field on a Dog struct in the normal Rust Way ™, but we’re doing it our own way on its labelled HList equivalent, using user-declared types and taking advantage of the type system.

The trait would look like this:

1
2
3
4
5
6
7
pub trait ByNameFieldPlucker<TargetKey, Index> {
    type TargetValue;
    type Remainder;

    /// Returns a pair consisting of the value pointed to by the target key and the remainder.
    fn pluck_by_name(self) -> (Field<TargetKey, Self::TargetValue>, Self::Remainder);
}

The implementation of this “by-name-field” Plucker shares much with the normal Plucker mentioned in the previous post, so instead of re-explaining things like the Index type param, I’ll simply add a link to that section and show the implementation for the exit and recursion implementations here:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
/// Implementation when the pluck target key is in the head.
impl<K, V, Tail> ByNameFieldPlucker<K, Here> for HCons<Field<K, V>, Tail> {
    type TargetValue = V;
    type Remainder = Tail;

    #[inline(always)]
    fn pluck_by_name(self) -> (Field<K, Self::TargetValue>, Self::Remainder) {
        let field = field_with_name(self.head.name, self.head.value);
        (field, self.tail)
    }
}

/// Implementation when the pluck target key is in the tail.
impl<Head, Tail, K, TailIndex> ByNameFieldPlucker<K, There<TailIndex>> for HCons<Head, Tail>
where
    Tail: ByNameFieldPlucker<K, TailIndex>,
{
    type TargetValue = <Tail as ByNameFieldPlucker<K, TailIndex>>::TargetValue;
    type Remainder = HCons<Head, <Tail as ByNameFieldPlucker<K, TailIndex>>::Remainder>;

    #[inline(always)]
    fn pluck_by_name(self) -> (Field<K, Self::TargetValue>, Self::Remainder) {
        let (target, tail_remainder) =
            <Tail as ByNameFieldPlucker<K, TailIndex>>::pluck_by_name(self.tail);
        (
            target,
            HCons {
                head: self.head,
                tail: tail_remainder,
            },
        )
    }
}

In truth, it probably makes sense to re-write the ByNameFieldPlucker implementation(s) in terms of Plucker, but this felt somewhat more straightforward when I wrote it at the time for transmogrifying.

PathTraverser

ByNameFieldPlucker provides us with a way of accessing a field on single struct, but we want to be able to traverse multiple levels of structs. For instance, given the aformentioned Dog and DogPerson structs, Rust allows us to get the age of his dog by doing dog_person.pet.age, and we’d like to be able to do that structurally. Enter PathTraverser:

1
2
3
4
5
6
pub trait PathTraverser<Path, Indices> {
    type TargetValue;

    /// Returns a pair consisting of the value pointed to by the target key and the remainder.
    fn get(self) -> Self::TargetValue;
}

Instead of Index, its second type param is Indices to reflect the fact that we’re going to need multiple Indexs to “pluck” by field name from. The “exit” (the last, aka no-more-dots, target field name and value type are on the current struct) and “recurse” (the last target field name and value type are in an “inner” struct) implementations of this trait are as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
// For the case where we have no more field names to traverse
impl<Name, PluckIndex, Traversable> PathTraverser<Path<HCons<Name, HNil>>, PluckIndex>
    for Traversable
where
    Traversable: IntoLabelledGeneric,
    <Traversable as IntoLabelledGeneric>::Repr: ByNameFieldPlucker<Name, PluckIndex>,
{
    type TargetValue = <<Traversable as IntoLabelledGeneric>::Repr as ByNameFieldPlucker<
        Name,
        PluckIndex,
    >>::TargetValue;

    #[inline(always)]
    fn get(self) -> Self::TargetValue {
        self.into().pluck_by_name().0.value
    }
}

// For the case where a path nests another path (e.g. nested traverse)
impl<HeadName, TailNames, HeadPluckIndex, TailPluckIndices, Traversable>
    PathTraverser<Path<HCons<HeadName, Path<TailNames>>>, HCons<HeadPluckIndex, TailPluckIndices>>
    for Traversable
where
    Traversable: IntoLabelledGeneric,
    <Traversable as IntoLabelledGeneric>::Repr: ByNameFieldPlucker<HeadName, HeadPluckIndex>,
    <<Traversable as IntoLabelledGeneric>::Repr as ByNameFieldPlucker<HeadName, HeadPluckIndex>>::TargetValue:
        PathTraverser<Path<TailNames>, TailPluckIndices>,
{
    type TargetValue = <<<Traversable as IntoLabelledGeneric>::Repr as ByNameFieldPlucker<HeadName, HeadPluckIndex>>::TargetValue as
    PathTraverser<Path<TailNames>, TailPluckIndices>>::TargetValue ;

    #[inline(always)]
    fn get(self) -> Self::TargetValue {
        self.into().pluck_by_name().0.value.get()
    }
}

That type signature is a bit hairy.

It’s a bit “Inceptiony” to think about what the Indices type param might look like at a given callsite, and for the most part it doesn’t matter for users (we make it the compiler’s job to fill it in or error out trying), but for the purposes of trying to understand what’s going on, it’s reasonable to imagine this as the Indices for structurally accessing dog_person.pet.age:

1
2
3
4
HList![
  <There<Here>>, // First access is `.pet`, which is the 2nd field on `DogUser`
  <There<Here>>, // First access is `.age`, which is the 2nd field on `Dog`
]

Path, path! and Path!

The last piece we need is something that allows us to describe a path (e.g. pet.age). Since the path is going to be itself a type-level thing (reminder: we pluck values by type-level field name), we can model this as a newtype wrapper around the zero-sized PhantomData<T> type

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
pub struct Path<T>(PhantomData<T>);

impl<T> Path<T> {
    /// Creates a new Path
    pub fn new() -> Path<T> {
        Path(PhantomData)
    }

    /// Gets something using the current path
    pub fn get<V, I, O>(&self, o: O) -> V
    where
        O: PathTraverser<Self, I, TargetValue = V>,
    {
        o.get()
    }
}

Paths basically works like “lens”, only without the target type locked down (maybe that will be a future type in frunk…), enabling this sort of thing:

1
2
3
4
let p = path!(pet.name);

let dog_age: &u32 = p.get(&dog_person);
let cat_age: &u32 = p.get(&cat_person);

That’s all fine and good. From here on though, things get a bit tricky because we need to create friendly ways to declare Paths, and T needs to be a type level path, one that needs to be easy to use and compatible with the way LabelledGeneric encodes field names into type-level strings. Rubber, meet road.

To make declaring value and type level Paths easy to use, we’ll need to make use of procedural macros because they allow us to take user-defined expressions and turn them into type-level paths made of type-level field names, and doing so with declarative macros is extremely difficult (I gave it a stab) if not impossible.

A core function that is reused for generating value-level and type-value Paths is:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
fn build_path_type(path_expr: Expr) -> impl ToTokens {
    let idents = find_idents_in_expr(path_expr);
    idents
        .iter()
        .map(|i| build_label_type(i))
        .fold(quote!(::frunk_core::hlist::HNil), |acc, t| {
            quote! {
            ::frunk_core::path::Path<
                ::frunk_core::hlist::HCons<
                   #t,
                   #acc
                >
              >
            }
        })
}

Where find_idents_in_expr is a function turns a path expression like pet.age into a vector of Ident identifiers.

We then pass those through to the build_label_type function, which translates each Ident into a type-level name. This is also re-used by LabelledGeneric’s derivation macro, which is important because it ensures that the way field names are encoded as types for Paths is compatible with the way field names are encoded as types in LabelledGeneric-produced labelled HLists.

Value-level

The macro for creating a Path value simply instantiates a Path using Path::new(), but with a type ascription based on what gets returned from build_path_type.

1
2
3
4
5
6
7
8
9
10
11
12
pub fn path(input: TokenStream) -> TokenStream {
    let expr = parse_macro_input!(input as Expr);
    let path_type = build_path_type(expr);
    let ast = quote! {
        {
            let p: #path_type = ::frunk_core::path::Path::new();
            p
        }
    };
    //    println!("ast: [{}]", ast);
    TokenStream::from(ast)
}

Type-level

The macro for creating a Path type simply splices the type returned from build_path_type.

1
2
3
4
5
6
7
8
9
pub fn Path(input: TokenStream) -> TokenStream {
    let expr = parse_macro_input!(input as Expr);
    let path_type = build_path_type(expr);
    let ast = quote! {
        #path_type
    };
    //    println!("ast: [{}]", ast);
    TokenStream::from(ast)
}

Another example

Getting and setting ids of from structs, without declaring a GetId or SetId trait and implementing it for each type:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
#[derive(LabelledGeneric)]
struct User {
    id: String,
    is_admin: bool,
}

#[derive(LabelledGeneric)]
struct Book {
    id: String,
    title: String,
}

#[derive(LabelledGeneric)]
struct Store {
    id: String,
    address: String,
}

// Object references passed to this function just need to have an `id: String`
// in their struct defintion.
fn get_id<'a, A, Idx>(obj: &'a A) -> &'a str
where
    &'a A: PathTraverser<Path!(id), Idx, TargetValue = &'a String>,
{
    path!(id).get(obj).as_str()
}

// DRYed-out setter
fn set_id<'a, A, Idx>(obj: &'a mut A, set_to: String) -> ()
where
    &'a mut A: PathTraverser<Path!(id), Idx, TargetValue = &'a mut String>,
{
    *path!(id).get(obj) = set_to;
}


let mut user = User {
    id: "user_id".to_string(),
    is_admin: true,
};
let mut book = Book {
    id: "book_id".to_string(),
    title: "Tale of Three structs".to_string(),
};
let mut store = Store {
    id: "store_id".to_string(),
    address: "Sesame street".to_string(),
};

println!("{}", get_id(&user));
println!("{}", get_id(&book));
println!("{}", get_id(&store));

// ↑ Prints:
//
// user_id
// book_id
// store_id

set_id(&mut user, "new_user_id".to_string());
set_id(&mut book, "new_book_id".to_string());
set_id(&mut store, "new_store_id".to_string());

// Print again
println!("{}", get_id(&user));
println!("{}", get_id(&book));
println!("{}", get_id(&store));

// ↑ Prints:
//
// new_user_id
// new_book_id
// new_store_id

Conclusion

The PathTraverser trait and Path type build on LabelledGeneric and HList as core abstractions, which is nice because we get some more mileage out of them, and it means that there are no additional traits that you need to import nor implement (even as a macro).

As usual, it’s compile-time checked, but it’s also performant. In benchmarks, tests comparing lens_path* (structurally typed traversal) versus normal_path* (Rust lang built-in traversal) traversals show that they perform the same: in other words, using structural typing in this way adds zero overhead.

As usual, please give it a spin and chime in with any questions, corrections, and suggestions !

Footnotes

  1. Technically, everything for writing basic structurally typed functions minus support for jumping through .-separated fields was available in frunk since October of 2018 at the latest because ByNamePlucker was available already by then.

  2. In Rust, macros can and have been used to approximate structural typing (macro arguments aren’t typed, so you can just do something like $x.access.some.path and have the compiler expand and fail it if an object at the callsite doesn’t have that path). This is fine too, but macros can be hard to read and maintain (they have no type signature, so you’ll need to look in the implementation/docs to know what it expects), and they aren’t functions; they’re code that write code. Again, The Macro Way is Fine ™; this post just offers an alternative.

  3. Rust did at one point have built-in support structural records, but it was removed almost 9 years ago before 1.0 was released. I found an answer to a question on the internal Rust lang forum asking why, and the 3 reasons listed for removal at the time made sense; the Path implementation described here (and implemented in frunk) addresses 1, if not 2, of the 3 issues (field order requirement and recursion IIUC), leaving the issue of field visibility, which I believe can probably be addressed as an option to the LabelledGeneric derive.

  4. There are some who would call this “row polymorphism”, which is maybe (more) correct, but it’s also a term that is much more niche (pronounced: “less generally known” or “less searched for”). Indeed, depending on whom you ask, “row polymorphism” is regarded as being under the “structural typing” umbrella (1, 2), but in any case, I personally find the distinction to be of questionable value in the context of Rust 🤷‍♂️. Having said that, feel free to substitute “row polymorphism” in place of “structural typing” when reading this post if it helps you slog through the actual important bits :)

  5. traits can be adhoc and auto-implemented, and directly used as constraints in functions (though still nominally), so being structurally-typed on traits feels a bit less of a problem that needs solving, and I get the feeling that it will be even less so with things like specialization coming down the pipeline, which will allow for more blanket and overlapping impls.

Comments