BeachApe.

Containerising an Octopress 2.0 Blog

2024-01-01T13:41:00+00:00

This blog was started almost 11 years ago, on Octopress 2.0. It was a different world: Ruby 1.x was king of the hill, and Docker wasn’t even a thing yet. It was an amazing thing to use.

Over the years, every time I switched machines, I noticed it was getting harder and harder for me to get the blog and in particular Octopress 2.0 working on it. It was plagued by incompatibility between system versions, tooling, and dependencies on various levels (OS, Gem, Ruby, etc), but I was also getting more and more out of touch with the Ruby world, having long jumped over to other languages.

Still, I loved this Markdown-based blog, and didn’t think it was time to move to a newer version (Octpress 3.0) or another tool (I’d heard good things about Hugo). I simply didn’t have the time to upgrade or port, nor did I feel the need to: it may be using old versions of things, but at the end of the day, it was generating and deploying simple static HTML files that get served. Finally, this year, I decided to take a stab at containerising it so that I could hopefully easily keep using it for years to come (and lose another excuse to not write..).

Overview

References vs what’s new
Target audience
Dockerfile
Rakefile
Docker
- With Rancher Desktop
- Makefile to work with all of the above
Conclusion

References vs what’s new

I didn’t come up with everything from scratch and followed in the footsteps of those who already did most of the heavy lifting.

I just updated some things here and there:

Ubuntu LTS 22.04
Rancher Desktop

Target audience

People who have their own Octopress 2.0 blog already and want to continue to use it. You may have already containerised but it are looking to another take on it.
Geeks: Octopress says it’s for “hackers”

`Dockerfile`

To start with the conclusion, here is the complete Dockerfile in the root dir of my Octopress 2.0 project

FROM ubuntu:22.04

# System dependencies, including those required to install Ruby
RUN apt-get update -y && apt-get -y install software-properties-common
RUN add-apt-repository -y ppa:rael-gc/rvm && apt-get update -y && apt-get -y install \
  sudo \
  make \
  git \
  vim less \
  curl \
  build-essential \
  libreadline-dev \
  libssl1.0-dev \
  zlib1g-dev \
  python2.7

# Install gcc-7 for Ruby 2.3 by adding and removing focal source
RUN cp /etc/apt/sources.list /etc/apt/sources.list.bak
RUN echo "deb [arch=amd64] http://archive.ubuntu.com/ubuntu focal main universe" >> /etc/apt/sources.list
RUN apt-get update -y && apt-get -y install g++-7
RUN mv -f /etc/apt/sources.list.bak /etc/apt/sources.list
RUN apt-get update -y

# Install ruby-build & ruby
WORKDIR ~/
RUN curl -L https://github.com/rbenv/ruby-build/archive/refs/tags/v20231225.tar.gz > ruby-build.tar.gz
RUN tar -xzf ruby-build.tar.gz 
RUN PREFIX=/usr/local ./ruby-build-*/install.sh
RUN CC=/usr/bin/gcc-7 ruby-build 2.3.3 /usr/local

# Add blogger user so we don't use the root user and use it
RUN adduser --disabled-password --gecos "" blogger && \
echo "blogger ALL=(root) NOPASSWD:ALL" > /etc/sudoers
USER blogger

# Initialise ruby encording
ENV RUBYOPT -EUTF-8

# Directory for the blog files
RUN sudo mkdir /octopress
WORKDIR /octopress

# Set permissions so blogger can install gems
RUN sudo chown -Rv blogger:blogger /octopress
RUN sudo chown -Rv blogger:blogger /usr/local/lib/ruby
RUN sudo chown -Rv blogger:blogger /usr/local/bin

# # Expose port 4000 so we can preview the blog
EXPOSE 4000

# Add the Gemfile and install the gems
ADD Gemfile /octopress/Gemfile
ADD Gemfile.lock /octopress/Gemfile.lock
RUN gem update --system 3.2.3
RUN gem install bundler -v 2.3
RUN bundle install

# Git config file because Octopress pushes using Git but we might
# not want it to use the exact same settings as our host (e.g. signing)
ADD .gitconfig /home/blogger/.gitconfig

I’ll go through the things that are different from the awesome article at Octopress in a Docker Container. Whatever I don’t call out below should be taken as unchanged from that article, so use it as a reference.

Ubuntu 22.04

The Octopress in a Docker Container article uses Ubuntu but at version 16.04. I love Ubuntu and think it’s a great choice for an Octropess dev env, but since it’s Jan 2024, I wanted to use the latest Ubuntu LTS release (required for Ruby 2.3) install, 22.04, instead (knowing full well that the next LTS is slated for release in a few months..). Hence

FROM ubuntu:22.04

That brought interesting challenges, mostly stemming from the fact that the default apt-get install for ruby would be too new for (my) Octopress installation’s dependencies.

Installing Ruby 2.3

There are different ways to install Ruby on a system, but I opted for ruby-build, in particular the standalone install option because it was simple.

# System dependencies, including those required to install Ruby
RUN apt-get update -y && apt-get -y install software-properties-common
RUN add-apt-repository -y ppa:rael-gc/rvm && apt-get update -y && apt-get -y install \
  sudo \
  make \
  git \
  vim less \
  curl \
  build-essential \
  libreadline-dev \
  libssl1.0-dev \
  zlib1g-dev \
  python2.7

# Install gcc-7 for Ruby 2.3 by adding and removing focal source
RUN cp /etc/apt/sources.list /etc/apt/sources.list.bak
RUN echo "deb [arch=amd64] http://archive.ubuntu.com/ubuntu focal main universe" >> /etc/apt/sources.list
RUN apt-get update -y && apt-get -y install g++-7
RUN mv -f /etc/apt/sources.list.bak /etc/apt/sources.list
RUN apt-get update -y

# Install ruby-build & ruby
WORKDIR ~/
RUN curl -L https://github.com/rbenv/ruby-build/archive/refs/tags/v20231225.tar.gz > ruby-build.tar.gz
RUN tar -xzf ruby-build.tar.gz 
RUN PREFIX=/usr/local ./ruby-build-*/install.sh
RUN CC=/usr/bin/gcc-7 ruby-build 2.3.3 /usr/local

The main thing here was installing libssl1.0-dev (I used the RVM PPA), and installing GCC-7 (otherwise I got segfaults using Ruby).

Gemfile.lock file for project dependencies

Since my goal was to get this working with an old Octopress blog and I didn’t want to mess around with version conflicts, I ADDed the Gemfile.lock file as well, before RUNning bundle install

# Add the Gemfile and install the gems
ADD Gemfile /octopress/Gemfile
ADD Gemfile.lock /octopress/Gemfile.lock
RUN gem update --system 3.2.3
RUN gem install bundler -v 2.3
RUN bundle install

Compared with the reference article, we update Rubygems and lock down the bundler version.

Rakefile

This is mentioned in the Octopress in a Docker Container article as well, but I’ll mention it here too: in order to preview the blog, you need to change the Rakefile from

rackupPid = Process.spawn("rackup --port #{server_port}")

rackupPid = Process.spawn("rackup -o 0.0.0.0 --port #{server_port}")

Docker

With Rancher Desktop

Since it’s 2024, I also wanted to try using a Docker Desktop alternative, and chose Rancher Desktop. Overall, the entire experience was really smooth and in my Octopress usage so far, I. haven’t noticed much difference between Rancher Desktop and Docker Desktop, but I’ve only been lightly using docker CLI.

I did notice that the auto-regenerate-based-on-changes feature of rake preview worked better (faster, more reliably) with the VZ emulation mode and virtiofs volume mount type.

Makefile to work with all of the above

I added a Makefile to make it simpler for future me to deal with building the image and working with it

build-image:
	docker build . -t blog/octopress

start-env:
	docker run -p 4000:4000 --rm --volume $$(pwd):/octopress --volume ${HOME}/.ssh:/home/blogger/.ssh -ti blog/octopress /bin/bash

This is entirely optional/subjective but I find make start-env more manageable for starting an Octopress env that has everything mounted properly.

Conclusion

So that’s it: yet another containerised-Octopress-2.0 article, with this entry being the first beachape.com one that was written and published entirely using it.

Structural Typing in Rust

2021-05-25T21:42:00+00:00

Have you ever wanted to write a structurally typed function in Rust? Do you spend a lot of time and effort getting your Rust structs just so, and want to DRY-out data access for common field paths without declaring a new trait and implementing it for each struct (let’s say, Cat and Dog both have a name: String field)? If so, read on.

This post talks about how we can leverage LabelledGeneric to build Path traversers (functionally similar to lenses), and use them to write clean and performant structurally typed functions with all the compile-time safety that you’ve come to expect from Rust.

Re: radio silence

It’s been a while (4 years!) since I last updated this blog. Why?

I started working on the Cloud SWE team at Elastic (we’re hiring!). I’ve been busy leading project teams, implementing features, and writing (and deleting!) lots of Scala code (no Rust though, sadly 😭)
My kid gained sentience: Around the same time, my daughter turned 2, and it’s just been a complete whirlwind of activities, learning, viruses, emotions, etc. It’s been awesome and I wouldn’t trade it for the world, but people are DOWNRIGHT LYING if they claim having kids doesn’t change anything.
2020: Covid was a big one, but the whole year felt like a trainwreck in slow motion … if the train was pulling dumpster fires.

Lastly, I just didn’t have the oomph to write a post that describes transmogrify() to follow up on the post on Struct transforms. Transmogrifier, which allows flexibile recursive transformation between similarly-structured structs, was added over 2.5 years ago, but writing about it was … intimidating.

Still, I recently decided to try to start writing again, so I picked a topic that’s slightly simpler, but related: Path, which introduced zero-overhead structurally-typed functions that you could use with normal structs to stable Rust back in Februrary of 2019 ¹.

Is the post late? Yes. Better than never? I hope so 🙏

Overview

Structural typing, you say?
Show me yours
Quick review of LabelledGeneric
Detour: Plucking by labelled field
PathTraverser
Path, path! and Path!
- Value-level
- Type-level
Another example
Conclusion

Structural typing, you say?

“Structural typing” was thrown around up there ↑, but what do we mean? To quote Wiki:

A structural type system (or property-based type system) is a major class of type system in which type compatibility and equivalence are determined by the type’s actual structure or definition and not by other characteristics such as its name or place of declaration. Structural systems are used to determine if types are equivalent and whether a type is a subtype of another. It contrasts with nominative systems, where comparisons are based on the names of the types or explicit declarations, and duck typing, in which only the part of the structure accessed at runtime is checked for compatibility.

Out-of-the-box-Rust has nominally typed functions ² ³. For the purposes of this post (and frunk), we specifically mean structs and their fields when it comes to “structure”⁴, and not methods that they get from impls of themselves or traits. Why? Well, you can’t spell “structural typing without struct, I’ve been mostly focused on structs, and … simplicity 😂. Also, to my mind, traits already enable a kind of part-way “structural typing” of methods ⁵.

Show me yours

I Read Somewhere ™ that giving a concrete example upfront helps people decide if they want to keep reading (if it aligns with their interests), plus there are lots of movies where the first scene you see is chronologically from the end of the story, followed by a rewinding sound and jump back to the beginning … and Hollywood knows engagement. Anyway, we’ll end up with something that allows us to do write this sort of thing:

/// Function that generically takes any struct `A` that is traversable with `.pet.name`, if
/// doing so returns a `String`
///
/// This is done without declaring any traits specific to this traversal
fn print_pet_name<A, Idx>(obj: A) -> ()
                                 // ↓ dot-separated structural path
    where A: PathTraverser<Path!(pet.name), Idx, TargetValue=String>
                                 // ↑ 🎉
{
    println!(
        "Pet name [{}]",
        path!(pet.name).get(obj)
    );
}

// Pass it any object that has `pet.name`
print_pet_name(dog_person);
print_pet_name(cat_person);
print_pet_name(hamster_person);
print_pet_name(snake_person);
print_pet_name(goldfish_person);
print_pet_name(house);

The objects you pass to the print_pet_name function don’t need to know anything specific to it nor structurally typed functions in general: their struct declarations just need to derive(LabelledGeneric) and have a structure that complies with the function’s type signature (i.e. have a pet.name path that returns a String):

#[derive(LabelledGeneric)]
struct Dog {
    name: String,
    age: u32
}

#[derive(LabelledGeneric)]
struct Cat {
    name: String,
    age: u32
}

// The next two structs can both be traversed with `pet.age`

#[derive(LabelledGeneric)]
struct DogPerson {
  name: String,
  pet: Dog
}

#[derive(LabelledGeneric)]
struct CatPerson {
  name: String,
  pet: Cat
}

// etc etc

let dog = Dog {
    name: "Odie".to_string(),
    age: 32
};

let cat = Cat {
    name: "Garfield".to_string(),
    age: 16
};

let dog_person = DogPerson {
  name: "Jon".to_string(),
  pet: dog
};

let cat_person = CatPerson {
  name: "Jon".to_string(),
  pet: cat
};

That’s it. The API is relatively clean, simple to write, read, and understand (IMO), and there are no unsafe or dyn traits anywhere (even in the implementation). And, you can still declare and treat your structs as you normally would, passing them to nominally typed functions, implementing traits as you normally would etc.

Still, when used with structurally typed functions like print_pet_name, the compiler will as usual ensure that:

The paths accessed on the generic parameter A inside the structurally typed function are constrained by the function’s type signature.
The LabelledGeneric objects passed as arguments to the structurally typed function support the required path in the function’s type signature.

The functions themselves are not constrained to just getting values, they can also set values too (see the other example at the end of the post)

Quick review of `LabelledGeneric`

By adding a #[derive(LabelledGeneric)] attribute to a struct, like so:

#[derive(LabelledGeneric)]
struct Dog {
    name: String,
    age: u32
}

we gain the ability to turn a Dog object into a labelled heterogenous list:

let dog = Dog {
    name: "Odie".to_string(),
    age: 32
};
let as_labelled = <Dog as LabelledGeneric>::into(dog);
let expected_labelled = hlist![
    // in reality the field label is a tuple of type-level chars, but ignore that for now
    field!(name, "Odie".to_string()),
    field!(age, 32)
];
assert_eq!(expected_labelled, as_labelled);

This ability to turn a struct into a heterogenous List of “fields” (type-level labels and values, henceforth “labelled HList”) paves the way for us to go from nominative typing (does this type have the right name?) to structural typing (does this type have a given structure?).

For a more thorough review of HLists and LabelledGeneric, see this post.

Detour: Plucking by labelled field

Given a labelled HList, it would be useful to be able to “pluck” a value out of it by using a type-level field name. That would allow us to have compile-time-checked access of a field in a labelled Hlist by type-level name:

// Following from the above `Dog` example
let (age_field, _): (Field<age, _>, _) = as_labelled.pluck_by_name();
assert_eq!(32, age_field.value);

This is the equivalent of accessing a specific .age field on a Dog struct in the normal Rust Way ™, but we’re doing it our own way on its labelled HList equivalent, using user-declared types and taking advantage of the type system.

The trait would look like this:

pub trait ByNameFieldPlucker<TargetKey, Index> {
    type TargetValue;
    type Remainder;

    /// Returns a pair consisting of the value pointed to by the target key and the remainder.
    fn pluck_by_name(self) -> (Field<TargetKey, Self::TargetValue>, Self::Remainder);
}

The implementation of this “by-name-field” Plucker shares much with the normal Plucker mentioned in the previous post, so instead of re-explaining things like the Index type param, I’ll simply add a link to that section and show the implementation for the exit and recursion implementations here:

/// Implementation when the pluck target key is in the head.
impl<K, V, Tail> ByNameFieldPlucker<K, Here> for HCons<Field<K, V>, Tail> {
    type TargetValue = V;
    type Remainder = Tail;

    #[inline(always)]
    fn pluck_by_name(self) -> (Field<K, Self::TargetValue>, Self::Remainder) {
        let field = field_with_name(self.head.name, self.head.value);
        (field, self.tail)
    }
}

/// Implementation when the pluck target key is in the tail.
impl<Head, Tail, K, TailIndex> ByNameFieldPlucker<K, There<TailIndex>> for HCons<Head, Tail>
where
    Tail: ByNameFieldPlucker<K, TailIndex>,
{
    type TargetValue = <Tail as ByNameFieldPlucker<K, TailIndex>>::TargetValue;
    type Remainder = HCons<Head, <Tail as ByNameFieldPlucker<K, TailIndex>>::Remainder>;

    #[inline(always)]
    fn pluck_by_name(self) -> (Field<K, Self::TargetValue>, Self::Remainder) {
        let (target, tail_remainder) =
            <Tail as ByNameFieldPlucker<K, TailIndex>>::pluck_by_name(self.tail);
        (
            target,
            HCons {
                head: self.head,
                tail: tail_remainder,
            },
        )
    }
}

In truth, it probably makes sense to re-write the ByNameFieldPlucker implementation(s) in terms of Plucker, but this felt somewhat more straightforward when I wrote it at the time for transmogrifying.

`PathTraverser`

ByNameFieldPlucker provides us with a way of accessing a field on single struct, but we want to be able to traverse multiple levels of structs. For instance, given the aformentioned Dog and DogPerson structs, Rust allows us to get the age of his dog by doing dog_person.pet.age, and we’d like to be able to do that structurally. Enter PathTraverser:

pub trait PathTraverser<Path, Indices> {
    type TargetValue;

    /// Returns a pair consisting of the value pointed to by the target key and the remainder.
    fn get(self) -> Self::TargetValue;
}

Instead of Index, its second type param is Indices to reflect the fact that we’re going to need multiple Indexs to “pluck” by field name from. The “exit” (the last, aka no-more-dots, target field name and value type are on the current struct) and “recurse” (the last target field name and value type are in an “inner” struct) implementations of this trait are as follows:

// For the case where we have no more field names to traverse
impl<Name, PluckIndex, Traversable> PathTraverser<Path<HCons<Name, HNil>>, PluckIndex>
    for Traversable
where
    Traversable: IntoLabelledGeneric,
    <Traversable as IntoLabelledGeneric>::Repr: ByNameFieldPlucker<Name, PluckIndex>,
{
    type TargetValue = <<Traversable as IntoLabelledGeneric>::Repr as ByNameFieldPlucker<
        Name,
        PluckIndex,
    >>::TargetValue;

    #[inline(always)]
    fn get(self) -> Self::TargetValue {
        self.into().pluck_by_name().0.value
    }
}

// For the case where a path nests another path (e.g. nested traverse)
impl<HeadName, TailNames, HeadPluckIndex, TailPluckIndices, Traversable>
    PathTraverser<Path<HCons<HeadName, Path<TailNames>>>, HCons<HeadPluckIndex, TailPluckIndices>>
    for Traversable
where
    Traversable: IntoLabelledGeneric,
    <Traversable as IntoLabelledGeneric>::Repr: ByNameFieldPlucker<HeadName, HeadPluckIndex>,
    <<Traversable as IntoLabelledGeneric>::Repr as ByNameFieldPlucker<HeadName, HeadPluckIndex>>::TargetValue:
        PathTraverser<Path<TailNames>, TailPluckIndices>,
{
    type TargetValue = <<<Traversable as IntoLabelledGeneric>::Repr as ByNameFieldPlucker<HeadName, HeadPluckIndex>>::TargetValue as
    PathTraverser<Path<TailNames>, TailPluckIndices>>::TargetValue ;

    #[inline(always)]
    fn get(self) -> Self::TargetValue {
        self.into().pluck_by_name().0.value.get()
    }
}

That type signature is a bit hairy.

It’s a bit “Inceptiony” to think about what the Indices type param might look like at a given callsite, and for the most part it doesn’t matter for users (we make it the compiler’s job to fill it in or error out trying), but for the purposes of trying to understand what’s going on, it’s reasonable to imagine this as the Indices for structurally accessing dog_person.pet.age:

HList![
  <There<Here>>, // First access is `.pet`, which is the 2nd field on `DogUser`
  <There<Here>>, // First access is `.age`, which is the 2nd field on `Dog`
]

`Path`, `path!` and `Path!`

The last piece we need is something that allows us to describe a path (e.g. pet.age). Since the path is going to be itself a type-level thing (reminder: we pluck values by type-level field name), we can model this as a newtype wrapper around the zero-sized PhantomData type

pub struct Path<T>(PhantomData<T>);

impl<T> Path<T> {
    /// Creates a new Path
    pub fn new() -> Path<T> {
        Path(PhantomData)
    }

    /// Gets something using the current path
    pub fn get<V, I, O>(&self, o: O) -> V
    where
        O: PathTraverser<Self, I, TargetValue = V>,
    {
        o.get()
    }
}

Paths basically works like “lens”, only without the target type locked down (maybe that will be a future type in frunk…), enabling this sort of thing:

let p = path!(pet.name);

let dog_age: &u32 = p.get(&dog_person);
let cat_age: &u32 = p.get(&cat_person);

That’s all fine and good. From here on though, things get a bit tricky because we need to create friendly ways to declare Paths, and T needs to be a type level path, one that needs to be easy to use and compatible with the way LabelledGeneric encodes field names into type-level strings. Rubber, meet road.

To make declaring value and type level Paths easy to use, we’ll need to make use of procedural macros because they allow us to take user-defined expressions and turn them into type-level paths made of type-level field names, and doing so with declarative macros is extremely difficult (I gave it a stab) if not impossible.

A core function that is reused for generating value-level and type-value Paths is:

fn build_path_type(path_expr: Expr) -> impl ToTokens {
    let idents = find_idents_in_expr(path_expr);
    idents
        .iter()
        .map(|i| build_label_type(i))
        .fold(quote!(::frunk_core::hlist::HNil), |acc, t| {
            quote! {
            ::frunk_core::path::Path<
                ::frunk_core::hlist::HCons<
                   #t,
                   #acc
                >
              >
            }
        })
}

Where find_idents_in_expr is a function turns a path expression like pet.age into a vector of Ident identifiers.

We then pass those through to the build_label_type function, which translates each Ident into a type-level name. This is also re-used by LabelledGeneric’s derivation macro, which is important because it ensures that the way field names are encoded as types for Paths is compatible with the way field names are encoded as types in LabelledGeneric-produced labelled HLists.

Value-level

The macro for creating a Path value simply instantiates a Path using Path::new(), but with a type ascription based on what gets returned from build_path_type.

pub fn path(input: TokenStream) -> TokenStream {
    let expr = parse_macro_input!(input as Expr);
    let path_type = build_path_type(expr);
    let ast = quote! {
        {
            let p: #path_type = ::frunk_core::path::Path::new();
            p
        }
    };
    //    println!("ast: [{}]", ast);
    TokenStream::from(ast)
}

Type-level

The macro for creating a Path type simply splices the type returned from build_path_type.

pub fn Path(input: TokenStream) -> TokenStream {
    let expr = parse_macro_input!(input as Expr);
    let path_type = build_path_type(expr);
    let ast = quote! {
        #path_type
    };
    //    println!("ast: [{}]", ast);
    TokenStream::from(ast)
}

Another example

Getting and setting ids of from structs, without declaring a GetId or SetId trait and implementing it for each type:

#[derive(LabelledGeneric)]
struct User {
    id: String,
    is_admin: bool,
}

#[derive(LabelledGeneric)]
struct Book {
    id: String,
    title: String,
}

#[derive(LabelledGeneric)]
struct Store {
    id: String,
    address: String,
}

// Object references passed to this function just need to have an `id: String`
// in their struct defintion.
fn get_id<'a, A, Idx>(obj: &'a A) -> &'a str
where
    &'a A: PathTraverser<Path!(id), Idx, TargetValue = &'a String>,
{
    path!(id).get(obj).as_str()
}

// DRYed-out setter
fn set_id<'a, A, Idx>(obj: &'a mut A, set_to: String) -> ()
where
    &'a mut A: PathTraverser<Path!(id), Idx, TargetValue = &'a mut String>,
{
    *path!(id).get(obj) = set_to;
}


let mut user = User {
    id: "user_id".to_string(),
    is_admin: true,
};
let mut book = Book {
    id: "book_id".to_string(),
    title: "Tale of Three structs".to_string(),
};
let mut store = Store {
    id: "store_id".to_string(),
    address: "Sesame street".to_string(),
};

println!("{}", get_id(&user));
println!("{}", get_id(&book));
println!("{}", get_id(&store));

// ↑ Prints:
//
// user_id
// book_id
// store_id

set_id(&mut user, "new_user_id".to_string());
set_id(&mut book, "new_book_id".to_string());
set_id(&mut store, "new_store_id".to_string());

// Print again
println!("{}", get_id(&user));
println!("{}", get_id(&book));
println!("{}", get_id(&store));

// ↑ Prints:
//
// new_user_id
// new_book_id
// new_store_id

Conclusion

The PathTraverser trait and Path type build on LabelledGeneric and HList as core abstractions, which is nice because we get some more mileage out of them, and it means that there are no additional traits that you need to import nor implement (even as a macro).

As usual, it’s compile-time checked, but it’s also performant. In benchmarks, tests comparing lens_path* (structurally typed traversal) versus normal_path* (Rust lang built-in traversal) traversals show that they perform the same: in other words, using structural typing in this way adds zero overhead.

As usual, please give it a spin and chime in with any questions, corrections, and suggestions !

Footnotes

Technically, everything for writing basic structurally typed functions minus support for jumping through .-separated fields was available in frunk since October of 2018 at the latest because ByNamePlucker was available already by then.↩
In Rust, macros can and have been used to approximate structural typing (macro arguments aren’t typed, so you can just do something like $x.access.some.path and have the compiler expand and fail it if an object at the callsite doesn’t have that path). This is fine too, but macros can be hard to read and maintain (they have no type signature, so you’ll need to look in the implementation/docs to know what it expects), and they aren’t functions; they’re code that write code. Again, The Macro Way is Fine ™; this post just offers an alternative.↩
Rust did at one point have built-in support structural records, but it was removed almost 9 years ago before 1.0 was released. I found an answer to a question on the internal Rust lang forum asking why, and the 3 reasons listed for removal at the time made sense; the Path implementation described here (and implemented in frunk) addresses 1, if not 2, of the 3 issues (field order requirement and recursion IIUC), leaving the issue of field visibility, which I believe can probably be addressed as an option to the LabelledGeneric derive.↩
There are some who would call this “row polymorphism”, which is maybe (more) correct, but it’s also a term that is much more niche (pronounced: “less generally known” or “less searched for”). Indeed, depending on whom you ask, “row polymorphism” is regarded as being under the “structural typing” umbrella (1, 2), but in any case, I personally find the distinction to be of questionable value in the context of Rust 🤷‍♂️. Having said that, feel free to substitute “row polymorphism” in place of “structural typing” when reading this post if it helps you slog through the actual important bits :)↩
traits can be adhoc and auto-implemented, and directly used as constraints in functions (though still nominally), so being structurally-typed on traits feels a bit less of a problem that needs solving, and I get the feeling that it will be even less so with things like specialization coming down the pipeline, which will allow for more blanket and overlapping impls.↩

Rust: A Scala Engineer's Perspective

2017-05-24T13:45:00+00:00

The 1st year anniversary of my first line of Rust code is coming up, and it’s getting for 5 years since I wrote my first line of Scala code. I thought it would be a good idea to summarise my Scala-tinted perspective of The Rust Experience ^TM, one year on.

Rusty spiral staircase by Jano De Cesare

This is not an objective language vs language comparison. I’ve written this post as part experience dump, part waymark for other Scala devs who are exploring or thinking of exploring Rust.

A bit about me

I’ve written a few Rust libraries/tools as well as Scala ones. For all intents and purposes, I’m a Scala engineer: I get paid to do it and it’s by far my strongest language. I’ve used Rust in a few of my side projects (libraries and smaller utilities).

On the Scala side, I’m the author of enumeratum, which brings flexible enums and value-enums to Scala as a library. I’ve also dabbled in writing macro-based libraries to make things like Free Monads and Tagless Final nicer to use.

On the Rust side, I’ve written frunk, a Rust functional programming toolbelt that is roughly a port of Shapeless with a bit of cats/scalaz mixed in, which does some pretty funky things with the type system that I’ve blogged about (1, 2, 3, 4). I also wrote a Rust port of requestb.in called rusqbin based on Hyper, and a small WIP async client for Microsoft Cognitive services called cogs.

Forewarning

I’m biased towards Scala and I’ve mostly gotten used to Scala’s warts. That said, I make an effort to try to be as neutral as possible.
When I talk about Rust, I mean Rust stable. This is because I only use Scala stable.
Some of the stuff that I write about with regards to Rust might have changed by the time you read this. After all, there is an ongoing ergonomics initiative

Overview

Things I’m happy with
Things I’ve adjusted to
- Semicolons
- Ownership model: Stack vs heap
Things I wish were different
Conclusion

Things I’m happy with

Batteries included

The dev-environment-setup experience with Rust is amazing. The Rust community has striven to make it super easy to get started with Rust and it shows. Literally one shell command will set everything you need up.

rustup for managing your Rust toolbelts (different versions/channels of Rust)
cargo for managing your build and for publishing to crates.io, which includes, among other things:
- A test subcommand for running tests
- A bench subcommand for running benchmarks
rustfmt for formatting your code (runs on cargo projects via cargo fmt)
rustdoc for generating beautiful documentation websites.
- This tool supports doc tests with zero additional configuration/setup (runs as part of cargo test)

Coming from Scala, having all of this set up with no fuss right out of the gate is a breath of fresh air and feels like a big win for productivity. I know there are reasons for Scala’s more modular approach, but I think it would be nice if some of this rubbed off on ~~Scala~~ other languages.

Editor/IDE

When I first started with Rust, I used IntelliJ and its Rust plugin, but later switched to Microsoft Studio Code with the Rust plugin, which interfaces very well with Rust Language Server (installable as a rustup toolchain component). It feels very lightweight, and offers all the assistance I need.

Type System

If you lean more towards the functional programming paradigm side of Scala then you’ll probably love the following about Rust’s type system:

No inheritance for data types (there is a bottom type but it’s used much more sparingly)
No universal equality
No nulls
Traits are basically Haskell typeclasses
Many more primary types (instead of just Int, there are i8, i16, i32, i64, isize, as well as u8, u16 … )

Essentially Rust has a lot of the good things about Scala’s type system. One thing currently missing from Rust is first class support for higher-kinded types (HKT), which, to be honest, I don’t miss too much because:

There are ways to emulate it to an extent
Rust’s ownership/memory model tends to push you towards thinking more granularly about your values/references, something which is perhaps in conflict with the kind of programming typically involving HKT-based abstractions.

If this still sounds unacceptable, just know that you can get quite far in building reuseable abstractions using Rust’s traits + associated types, and BurnSushi’s port of quickcheck is available for writing and enforcing laws.

There are a few interesting things in the pipeline as well:

Adding functionality by using Rust’s traits should be familiar territory if you’ve written typeclass-like stuff in Scala. In fact, Rust’s trait system feels a lot more similar to Haskell’s typeclass system than Scala’s, something which has its pros and cons (no scoping of implementations for a given type, for example). I’ve written an intro/guide to Rust’s trait system in another post.

Type inference

Both Rust and Scala have local type inference, and overall, they work in pretty much the same way. In both of them, you need to write the types for your function parameters. In Scala, you can leave the return type off and have the compiler infer it for you, in Rust you can’t (if you leave it off, it is assumed to be (), unit).

Macros

The Rust macro system, while less powerful than Scala’s, is quite useful for keeping your code DRY and importantly, integrates really well with the rest of the language. It is in fact enabled and available out of the box without any additional dependencies/flags.

Compared with Scala’s macros, Rust’s macros feel like a very natural part of the language, and you’ll run into them quite often when reading/using Rust libraries. In Rust code bases, you’ll often see macros declared and used immediately for the purpose of code generation (e.g. deriving trait implementations for a list of numeric types, or for tuples up to N elements), something that Scala users have generally done “out-of-band” by hooking into SBT and using another templating or AST-based tool.

On the other hand, in Scala, the usual refrain is “don’t write macros if you don’t have to”. When I compare the approaches the two languages have taken, I feel that Scala may have been overambitious in terms of giving developers power, thus leading to deprecations of APIs that can’t be maintained due to complexity. Indeed, Scala’s metaprogramming toolkit is going through another reform with the migration to Scalameta.

Because of its simplicity (the macros work based on a series of patterns), Rust’s macro API may feel limiting at first, but if you stick with it, you’ll likely find that you can accomplish more than what you initially thought. For example, the fact that you can build/restructure macro arguments recursively (!) and call the macro again (or even call another macro) is a fairly powerful tool.

Having said that, in addition to the legacy macro system, Rust will soon be getting procedural macros, which are more similar to what Scala devs are used to seeing. You can get a peek of what procedural macros are like by looking at custom derives, which I’ve used to implement derive for LabelledGeneric in Rust.

Compile-time optimisations

I think it’s not news to anyone that Rust is fast and efficient. The home page of the official site says it runs “blazingly fast” and features “zero-cost abstractions”, and the Rust-faithfuls loudly trumpted Rust’s defeat of GCC-C in in k-nucleotide a few months ago. Even if you don’t completely buy into the “faster than C” part, it’s not a big jump to say that Rust performance is in the same ballpark as C, or at least, there is no reason for it not to be (yes, language and implementation are different, compilers make a difference, etc.).

I’m particularly impressed by the Rust compiler’s (though I’m not sure if it’s LLVM?) ability to compile abstractions away so that the operations they enable have zero overhead. As a personal anecdote, when I wrote LabelledGeneric in frunk, I expected there to be some performance difference between using that abstraction for conversions between structs versus writing the conversions by hand (using From). After all, there are non-negligible differences in the Shapeless version of it in Scala land (benchmark code):

// JMH benchmark results

[info] Benchmark                               Mode  Cnt     Score     Error  Units
[info] Benchmarks.from24FieldsManual           avgt   30    33.626 ±   1.032  ns/op
[info] Benchmarks.from24FieldsShapeless        avgt   30  4443.018 ± 101.612  ns/op
[info] Benchmarks.from25FieldsManual           avgt   30    33.066 ±   0.650  ns/op
[info] Benchmarks.from25FieldsShapeless        avgt   30  4859.432 ± 104.763  ns/op

To my surprise, Rust manages to compile frunk’s LabelledGeneric-based, non-trivial, multi-step, unoptimised (other than using the stack, no effort was spent) transform between structs into a zero-cost abstraction. That is, using LabelledGeneric for conversion adds zero overhead over writing the transform by hand (benchmark code):

// Cargo benchmark results

test from_24fields_manual           ... bench:         109 ns/iter (+/- 49)
test from_24fields_labelledgeneric  ... bench:         104 ns/iter (+/- 24)
test from_25fields_manual           ... bench:         129 ns/iter (+/- 9)
test from_25fields_labelledgeneric  ... bench:         131 ns/iter (+/- 13)

Note: The Rust vs Scala LabelledGeneric benchmarks are not completely apples-to-apples (the Rust version needs to instantiate new source objects every run because of move semantics), but they illustrate the performance difference between LabelledGeneric-based vs handwritten conversion in the two languages.

Syntax

Overall, the Rust’s syntax is very similar to Scala’s. Sure, there are small adjustments here and there (let and let mut vs var and val, you’ll be using angle brackets instead of square ones, etc), but overall the languages feel very similar because they’re both C-like languages that are heavily inspired by ML.

Scala people will probably rejoice at things like the enum being available (coming soon to Scala via Dotty) as well as partial destructuring (e.g. assuming struct Point { x: i32, y: 32}, you can do let Point { x, .. } = p;).

There are a handful of things that you’ll miss just from muscle memory in the beginning, but are either implemented as libraries or are done slightly differently, such as lazy values (rust-lazy or lazy-static) and methods such as Option’s foreach (try if let Some(x) = myOption { /* use x here */ } instead). Others are just plain missing, such as by-name parameters (not too big of a deal for me), for/do comprehensions, and keyword arguments (these last two hurt).

Oh, in Rust, types and traits are named the same way as in Scala, in CamelCase, but identifiers (bindings and methods) use snake_case, which I still find makes code look longer but isn’t a big problem. You’ll find references that can help if you are unsure and you’ll likely pick it up from reading library code anyways.

As with Swift, I haven’t been able to find conclusive evidence nor credit given to suggest that there was any influence from Scala on Rust …

Interoperability with C

Rust makes working with C as smooth as possible while sticking to its mantra of keeping things safe. For reference take a look at the section in the Rust book that deals with FFI.

// Taken from the Rust book
#[link(name = "snappy")]
extern {
    fn snappy_max_compressed_length(source_length: size_t) -> size_t;
}

let x = unsafe { snappy_max_compressed_length(100) };

The syntax might look familiar to those who have played around with Scala.Native.

// Taken from Scala Native homepage
@extern object stdlib {
  def malloc(size: CSize): Ptr[Byte] = extern
}

val ptr = stdlib.malloc(32)

Since calling C-code can be unsafe (wrt memory, thread-safety), Rust requires you to wrap your C-calls in unsafe. If you wish to hide this from your users, you can wrap these calls in another function.

// Taken from the Rust book
pub fn validate_compressed_buffer(src: &[u8]) -> bool {
    unsafe {
        snappy_validate_compressed_buffer(src.as_ptr(), src.len() as size_t) == 0
    }
}

Calling Rust code from C is also very smooth, something that Scala Native has yet to implement.

The current Zeitgeist

The current “feel” of Rust, and its community (or communities, since libraries/frameworks can have their own) is very welcoming and helpful. It’s also very difficult to quantify so I’ll just list some observations:

Rust stable is only 2 years old and yet there is an official ergonomics initiative to reduce friction
I’ve asked a hand full of questions on StackOverflow and have gotten prompt and helpful answers each time.
Rust is the #1 “most loved” language in StackOverflow’s 2017 survey
Rust feels very community driven: its got a very lively RFC repo and since I’ve started tinkering in it I’ve seen at least 3 RFCs make it into the language (type macros, custom derives, and ? syntax for Trys).

Things I’ve adjusted to

Semicolons

In Scala, semicolons are optional and almost everything is an expression and therefore return values.

3 // returns 3

val x = 3 // assignment, returns unit

// certain things don't return anything though, such as import
// statements, and blocks

import com.beachape._ // returns nothing
object Hello {} // returns nothing

In Rust, semicolons are non-optional and are of significance. Statements that end with semicolons return () (unit) and those that do not get turned into expressions and thus return a value.

// taken from the Rust book

let x = 5u32; // this is a statement

let y = {
    let x_squared = x * x;
    let x_cube = x_squared * x;

    // This expression will be assigned to `y`
    x_cube + x_squared + x
};

let z = {
    // The semicolon suppresses this expression and `()` is assigned to `z`
    2 * x;
};

Ownership model: Stack vs heap

Rust’s memory/ownership model is, to me, its main killer feature; it gives you tighter control over the way your program consumes memory while maintaining memory-safety, all without having to ship a garbage collector with the runtime. You get to decide whether to pass things by value or by reference as well as mutability of bindings (including when pattern matching).

There is also the matter of where things get allocated. In Scala (and perhaps with most JVM-based languages), there are a set of rules that decide whether or not something gets put on the stack or on the heap (and thus incur the future cost of garbage collection). In general, the only time something gets allocated on the stack are primitives that do not escape methods as fields of objects, and references to objects which themselves get allocated on the heap. There might be fun runtime tricks the runtime environment does, like escape analysis, but overall, you don’t get to choose.

In Rust, you can choose to allocate things on the heap by instantiating them inside (or transferring ownership of them to) data structures such as Boxes or Vecs, etc. Or you can choose to work with plain values. You get to pick your abstraction based on the cost you want to pay for the features and guarantees they offer, such as safe multi-thread access (this page is a great reference point). Either way, Rust’s ownership system will, at compile time, make sure that you won’t get data races caused by, for instance, modifying naked values in different threads with no access control.

Scala’s doesn’t give its users the same level of control, so naturally there is some adjustment to be made. However, contrary to the experiences of some others, I didn’t find the ownership stuff too hard to understand and get used to. Having experience with Scala’s rich type system meant that the lifetime annotation stuff was quite easy to come to grips with. Maybe doing C and C++ in Comsci courses in university helped too.

Note: If you’re a glass-half-full kind of person, I guess you can say that Rust forces you to take control rather than gives you control. It’s all a matter of perspective …
Note 2: If you find yourself doing lots of .clone()s to get the compiler off your back, maybe you’re doing something not quite right.

Mutability

Mutability deserves to be mentioned separately. If you’re coming from years of Scala (or pretty much any other language that stresses immutability and referential transparency as the road to enlightenment), writing your first let mut or &mut self can feel dirty.

It took me a while to get used to the idea, but hey, when in Rome, right? If it helps, remember that Rust is focused on speed and efficiency through (near, or actually) zero-cost abstractions and that, thanks to its strict ownership model, data races due to mutability are not a problem.

Things I wish were different

Async IO

In Scala, most frameworks that deal with any sort of IO have embraced non-blocking IO by utilising some kind of wrapper data type, such as Future[A], Task[A], or IO[A] (usually a Monad), that separates the description of your program from its execution, and identify, by type, the effect of talking with the scary and dirty outside world. This allows you to not block the executing thread when waiting for stuff to happen (such as data to come back) by choosing a suitable execution strategy.

In Rust land, most of the widely-used libraries that I’ve seen, such as the Redis client, and and Hyper (and all the various things built on it, such as Rusoto, Rocket, etc) are all blocking. While this works okay for stuff like single-user utilities, this is suboptimal for applications that are IO heavy and need to serve a large number of concurrent users because your application’s threads can get tied up just waiting for data, leaving it unable to serve other requests. Or, you end up with potentially huge thread pools (à la old school Java Servlet apps..), which seems to go against Rust’s spirit of efficiency.

Having said that I know that advances are being made in this area:

Tokio, “tokenised IO”, an async IO framework that exposes a Future-based API is making lots of progress. Looks production-ready.
Hyper, the defacto HTTP client server framework, is going to hit 0.11 soon, which will bring with it a Futures-based API based on Tokio. This will likely (I hope) cascade down to any libs based on Hyper.

Also, as of now, it’s painful to transform and return Futures from functions because every transformation causes the concrete type of your object to get chained and tagged with an arbitrary closure type. Since writing the result type is non-optional in Rust, the current solution is to declare your return type as Box>, but it’s less efficient at runtime because boxed trait objects necessitate dynamic dispatch and heap allocation. Hopefully soon “impl Trait” will be released to address this issue (tracking RFC)

Strings

In Rust there are a number of ways to represent Strings. Here are a few:

String runtime string value, with its contents allocated on the heap
&'a str string with a lifetime
- &' static str string with static lifetime (baked into your binary)
Vec

While I’ve mostly gotten used to this by now and understand the purpose of having each one, I hope the ergonomics initiative can make this situation better to understand, since strings are so ubiquitous. How? I have no idea..maybe I’m just ranting.

Cross compiling

Obviously, Scala devs are used to compiling once and running the same binaries everywhere thanks to the JVM (mostly :p). While I don’t expect the same for Rust because it compiles to native machine code, I do wish the cross-compilation tools were better out of the box (for example, like it is in Golang).

At the moment, depending on the target platform, cross-compilation for Rust is a bit involved and there are several options:

Adding a target toolchain via Rustup and possibly installing some more packages specifically for your target platform (as in this guide)
Using a pre-built Docker container that holds all the settings/environment variables/installations needed to compile to your target platform (see rust-on-raspberry-docker)
Using the cross, cargo tool that seems like it automates 2.

My use case is building for my Raspberry Pi and I’ve only tried the first 2, but that last one looks to be the winner here and it would be awesome to see something like that included by default as part of rustup or cargo.

Odd headscratchers

Just a few things I still don’t quite get:

Do we actually need `ref`?

In my opinion, ref is unnecessarily confusing. From what I can tell, it’s mostly used for binding pointers during pattern matching

match some_int {
  // Why not Some(& s) => ... ???
  Some(ref s) => println!("{}",s),
  None => unreachable!()
}

`&mut`

When handing out references of something bound with let mut, why do i need to do &mut instead of just & ?

// This uses mut for no reason other than to prove a point.
fn non_empty(s: &mut String) -> bool { s.len() > 0 }

let mut string = "hello".to_string();
hello(&mut string); // why can't this just be hello(& string) ??

Scoping of lifetimes with braces

I somehow managed to code my way into a deadlock when using RWLock because the lifetime-scoping behaviour of {} braces when used with pattern matching is, in my opinion, non-intuitive. If you’re interested, more about it in this issue.

Gimme

I know these things are in the pipeline but I wish they were in Rust yesterday:

Higher kinded types
“Specialisation”, aka finding the most specific implementation of a traits according to the type of value at the call site. Right now, if you implement a Rust trait for A, then it clashes with every other implementation you write. Specialisation should remedy that (tracking RFC)
A REPL. There’s Rusti but I think Rust is missing a trick by not supplying one out-of-the-box, especially when it’s got such a strong dev-env-setup game.
Some kind of do or for comprehension for working with container types (there are libs out there but built-in would be nice)

Conclusion

That concludes my take on what it’s like to use Rust, from a Scala dev’s perspective, one year on, in 2017. Overall I’m very happy that the me a year ago decided to look into Rust. It’s been a fun and exciting ride: for a while it felt like every few months I was getting new toys that I could immediately use: type macros and custom derives were game changers because they made it ergonomic to write Hlist types by hand, and made Generic/LabelledGeneric practical, respectively.

Overall, I believe there are a lot of things in Rust for Scala engineers to like. The community is friendly and diverse so you can easily find a library that interests you to get involved in (shameless plug: contributions to frunk are always welcome). Or, you can do your own side project and write a small system utility or program a microcontroller; online resources are very easy to find. In any case, with Rust, you really can’t say it’s hard to get started !

Boilerplate-free Struct Transforms in Rust.

2017-04-12T12:45:00+00:00

The last several posts have introduced a number of abstractions, namely HList, Generic, LabelledGeneric, as well as pluck() and sculpt(). Although each of those have impressive party tricks of their own, I’d like to share how you can use them to write a reuseable, generic function that handles converting between structs with mis-matched fields and thus have different LabelledGeneric representations.

Unlike the last post, this one will be relatively light on recursion and mind-bending type-level stuff; it’s time to sit back and enjoy the fruits of our labour.

Adding Frunk to your project

Much of this post will make use of Frunk’s types (e.g. HCons, HNil), methods, macros (esp. for describing macro types via the Hlist! type macro), and terminology.

It might be easier to follow along if you add Frunk to your project and play around with it. Frunk is published to Crates.io, so to add it your list of dependencies, simply put this in your Cargo.toml:

[dependencies]
frunk = "${latest_version}"

Alternatively, take a look at the published Rustdocs.

Boilerplate-free conversions between Structs

Suppose we have a bunch of structs that are similar-ish in terms of their data but ultimately, not necessarily exactly the same. This means we can’t just use the normal LabelledGeneric convert_from method to convert between them.

#[derive(LabelledGeneric)]
struct UserFromDb<'a> {
    id: u64,
    first_name: &'a str,
    last_name: &'a str,
    email: &'a str,
    age: u32,
    pw_hash: &'a str,
    is_admin: bool,
    created_at: i64
}

// Holds User data for rendering in a front-end view
// or for sending over an API. Striped of any sensitive
// information
#[derive(LabelledGeneric)]
struct PresentableUser<'a> {
    last_name: &'a str,
    first_name: &'a str,
    age: u32,
    created_at: i64
}

// Holds data for sending a User over our internal API
#[derive(LabelledGeneric)]
struct InternalApiUser<'a> {
    id: u64,
    first_name: &'a str,
    last_name: &'a str,
    age: u32,
    email: &'a str,
    is_admin: bool,
    created_at: i64
}

In our example, PresentableUser and InternalApiUser structs have fields that are subsets of the fields in UserFromDb, and not in the same order either. The scenario is that UserFromDb is a struct that we get from reading our persistence layer, and the other 2 are types that we use in our application for business logic.

Assuming a flow where we want to be able to go from UserFromDb to either PresentableUser or InternalApiUser, the idea is that we don’t want be holding on to sensitive data like pw_hash when we don’t need to, thus lowering the risk of accidentally leaking said data (e.g. serialising it by accident, or by rendering it in debug messages, etc).

While we could go about writing Froms by hand for each of these, and for every other time a similar situation arises, that’s quite a lot of boilerplate to write and maintain. Thankfully, we can make use of Frunk’s LabelledGeneric and Sculptor to write a single, reuseable generic function.

Note, for a review of:

LabelledGeneric, see this post
Sculptor, see this post

/// Converts from another type A into B assuming that A and B have labelled generic representations
/// that can be sculpted into each other.
///
/// Note that this method tosses away the "remainder" of the sculpted representation.
fn transform_from<A, B, Indices>(a: A) -> B
    where A: LabelledGeneric,
          B: LabelledGeneric,
// The labelled representation of A must be sculpt-able into the labelled representation of Self
          <A as LabelledGeneric>::Repr: Sculptor<<B as LabelledGeneric>::Repr, Indices> {
    // Turn A into its labelled generic representation
    let a_gen = <A as LabelledGeneric>::into(a);
    // Sculpt the generic labelled representation of A into the labelled generic representation
    // of B. We ignore the remainder.
    let (b_gen, _): (<B as LabelledGeneric>::Repr, _) = a_gen.sculpt();
    // Turn the lablled generic representation of B into B
    <B as LabelledGeneric>::from(b_gen)
}

Not bad. The body of the function is literally 3 lines long :) Now we can do this:

let u_db = UserFromDb {
    id: 3,
    first_name: "Joe",
    last_name: "Blow",
    email: "joe@gmail.com",
    age: 30,
    pw_hash: "asd35235adsf",
    is_admin: true,
    created_at: 12345,
};

let p_user: PresentableUser = transform_from(udb);
// or
let i_user: InternalApiUser = transform_from(udb);

In actuality, Frunk already ships with this function so you can use it out of the box.

Conclusion

Often times, you’ll hear that heterogeneous lists enable developers to write reuseable generic functions because they abstract over arity and types, and it might not be obvious exactly what that means on a practical level. The example shown in this post just scratches the surface of what is made possible through HList and LabelledGeneric, and there are definitely more creative usages out there, such as building of boilerplate-free (e.g. JSON) codecs (hint: look to Haskell and Scala libs for more).

As usual, please give it a spin and chime in with any questions, corrections, and suggestions !

Gentle Intro to Type-level Recursion in Rust: From Zero to HList Sculpting.

2017-03-12T12:03:00+00:00

Getting the type signature right was 99% of the work in implementing pluck and sculpt for HLists in Frunk.

Here’s what I’ve learnt along the way: what works, and what doesn’t work (and why).

As you may already know, Rust eschews the now-mainstream object-oriented model of programming (e.g. in Java, where behaviour for a type is added to the type/interface definition directly) in favour of a typeclass-like approach (e.g. in Haskell where you can ad-hoc add behaviour to a type separate from the type definition itself). Both approaches have their merits, and indeed, some languages, such as Scala, allow for a mix of both.

For those coming from the OOP school of programming, Rust’s system of adding behaviour to types might be daunting to come to grips with. At a glance, it might not be obvious how to get things done, especially when what you want to build goes beyond implementing Debug or Eq. If your abstraction has a certain degree of type-level recursiveness, it might be even harder to see the light at the end of the tunnel, and the lack of online material covering that sort of thing doesn’t help.

As a Scala guy with Haskell knowledge, I’m no stranger to typeclasses, but it took me a while and several failed attempts to figure out how to implement the following:

Plucking out a value by type from an HList and getting back the remainder **
Sculpting an HList into another shape, and getting back the remainder (in the case where we only want a smaller subset than the original) **

Of course, the type signature of the finished product can be intimidating !

In this post, I’ll briefly introduce Rust’s trait system and present my mental model for writing trait implementations that deal with type-level recursion. To do so, I will go through how pluck() and sculpt() were written in Frunk, as well as recount some of my failed approaches so you can learn from my mistakes.

Hopefully, by the end of it, you’ll be able to look at signatures like the one above and not go “WTF”, but rather, “FTW”.

“Type-level recursion”?

Ok, I may be butchering/making up a term, but by “type-level recursion”, I’m referring to recursive expansions/evaluations of types at compile-time, particularly for the purpose of proving that a certain typeclass instance exists at a function call site. This is distinct from runtime “value”-level recursion that occurs when you call a function that calls itself.

If you’re having trouble understanding the difference:

Value-level recursion: If it can’t find an exit condition, your program is stuck running forever.
Type-level recursion: If it can’t expand/find the exit-type, your compiler will either give up or never finish compiling; you won’t even have a program to run.

Outline

Basic Gist of Rust typeclasses (traits)
- Dependent trait implementations
- Mental model for type-level recursion
  - Recursion on the value level
  - Recursion on the type level
What the Frunk?
Plucking from HLists
Sculpting HLists
Conclusion
- Credit

Basic Gist of Rust typeclasses (traits)

In Rust, typeclass is spelt trait, and although that word is somewhat ambiguous and overloaded with different meanings depending on context (e.g. in Scala), I’ll try to stick with it throughout this article. Subsequently, a typeclass instance is called an “implementation” (impl in code) in Rust.

Here is a basic example of a simple trait and implementation for a type Circle, taken from the official Rust book.

// A trait that allows you to call "area" on something
trait HasArea {
    fn area(&self) -> f64;
}

// Our type
struct Circle {
    x: f64,
    y: f64,
    radius: f64,
}

// Our implementation of HasArea for Circle
impl HasArea for Circle {
    fn area(&self) -> f64 {
        std::f64::consts::PI * (self.radius * self.radius)
    }
}

For comparison, here is the Haskell equivalent

-- Our typeclass
class HasArea a where
  area :: a -> Float

-- Our type
data Circle = Circle { x :: Float, y :: Float, radius :: Float }

-- Our typeclass instance for Circle
instance HasArea Circle where
  area c = pi * radius c ^ 2

In both of these cases, what we see is

There is a trait, HasArea, which describes behaviour (must implement an area function that takes as its first argument the implementing type) for types that want to belong to, or join it.
Next, we have a type, Circle, which has one purpose: hold data.
Then, we add Circle to the HasArea trait by implementing an instance of the trait, fulfilling the contract by writing the area function.

The key difference between this approach and the OOP approach is that adding behaviour to an existing type does not require us to edit the original type declaration, nor does it require us to create a wrapper type. This allows us to add behaviour to types that do not belong to us (e.g. we don’t have access to its source)! This flexibility is a key advantage of the typeclass/trait approach. For a much more detailed comparison between OOP and typeclasses (traits), checkout this wiki entry on haskell.org.

Dependent trait implementations

Sometimes, you’ll want to write trait implementations for data types that have one or more type parameters. In these cases, your trait implementation will likely require that implementations of the trait exist for each of those type parameters.

For example

// The Add trait, which exists in core::ops, copied verbatim here.
//
// Note that the Add trait has a right hand side (RHS) type parameter
// to represent the type that the implementing trait is being added
// with.
pub trait Add<RHS=Self> {
    /// The resulting type after applying the `+` operator
    #[stable(feature = "rust1", since = "1.0.0")]
    type Output;

    /// The method for the `+` operator
    #[stable(feature = "rust1", since = "1.0.0")]
    fn add(self, rhs: RHS) -> Self::Output;
}

// Our Cup struct. We signal that its contents can be
// anything because it has an unrestricted type parameter
// of A
struct Cup<A> {
    content: A,
}

// In our case, we want to implement Add> because we want to add
// 2 cups with the same content type together, but we don't know in
// advance what kind of content would be in them; hence we keep
// it parameterised with A.
//
// Thus, we write an implementation of Cup for Add, but add a restriction:
// the implementation only exists for Cups where the content is bound to a
// type that is already implements the Add trait (thus "A: Add")
impl<A: Add<A>> Add<Cup<A>> for Cup<A>
{
    // This is what is called an associated type.
    // Here, Output is the type that will be returned
    // from the add operation
    type Output = Cup<< A as Add<A> >::Output>;

    fn add(self, rhs: Cup<A>) -> Self::Output {
        // Here we make use of the Add trait for A to add
        // the contents from both cups together
        let added_content = self.content.add(rhs.content);
        Cup { content: added_content }
    }
}

Making Cup part of the Add typeclass will allow us to call cup_a + cup_b, which is kind of neat. One thing to take note of here is the Output associated type. Pay attention to the fact that in our implementation of Add for Cup, the type of Output is Cup<< A as Add >::Output>, which means that ultimately, the output of Adding of 2 Cups will depend on what the Output of Add is. The < A as Add > part can be read as “summon the Add implementation for the type A” (the compiler will do the actual lookup work here; if one doesn’t exist, your code will fail to compile), and the ::Output following it means “retrieve the associated type, Output, from that implementation”. Let this sink in, because it’s important in order for us to move towards the concept of type-level recursion for traits.

Here is another way to write the same thing: using where clause syntax, so that the restriction goes at the end of the initial type signature in our implementation declaration. This is useful when you have more than 2 or 3 type parameters for your typeclass instance and you have a complex set of restraints. Using where can help cut down on initial noise.

impl<A> Add<Cup<A>> for Cup<A>
    where A: Add<A>
{
    type Output = Cup<<A as Add<A>>::Output>;

    fn add(self, rhs: Cup<A>) -> Self::Output {
        let added_content = self.content.add(rhs.content);
        Cup { content: added_content }
    }
}

Here’s another, more general implementation of Add for Cup. It’s more general because it lets us add Cups of different content types, provided that there exists an Add implementation for whatever concrete type is bound to A in any given Cup.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
// Instead of just A, we introduce another type parameter, B, which // is passed as the type parameter for the Cup that we want to add with impl<A, B> Add<Cup<B>> for Cup<A> // This next line means "A must have an Add implementation" where A: Add<B> { // The Output associated type now depends on the Output of > type Output = Cup<<A as Add<B>>::Output>; fn add(self, rhs: Cup<B>) -> Self::Output { // Notice that we can use the operator "+" let added_content = self.content + rhs.content; Cup { content: added_content } } }

Mental model for type-level recursion

By this point, we have covered most of the basic understanding required to write more complex traits and implementations. To recap, they are:

The differences between a trait, a type, and a trait implementation

How to use bounds (A: Add or where clauses) when writing implementations for generic types

How to summon an implementation for a given type ()

How to write and use associated types (see Output in the above examples)

For a more thorough introduction to Rust’s trait system, by all means refer to the official Rust docs on traits.

Before going any further, I’d like to provide you with my mental model of how to think about recursion on the type level.

Recursion on the value level

You write a function that keeps calling itself until an exit condition is met, then returns a value.

Recursion on the type level

You write implementations of your trait for exit-types and work-to-be-done types. In order to prove an implementation of your trait exists for a concrete type at a function call site, the compiler will try to lookup and expand/expand types recursively until it can figure out a concrete implementation to use, or gives up with an error.

This may not make much sense at the moment, but hopefully it will soon.

What the Frunk?

Much of this post will make use of Frunk’s types (e.g. HCons, HNil), methods, macros (esp. for describing macro types via the Hlist! type macro), and terminology.

It might be easier to follow along if you add Frunk to your project and play around with it. Frunk is published to Crates.io, so to add it your list of dependencies, simply put this in your Cargo.toml:

1 2
[dependencies] frunk = "${latest_version}"

Alternatively, take a look at the published Rustdocs.

Plucking from HLists

Given an HList, how can we write a function that allows us to pluck out a value by type (if the HList does not contain this type, the compiler should let us know), and also return the rest of the HList?

Suppose we call this function pluck(), it should behave like so:

1 2 3 4 5 6 7 8
// h has type Hlist![ {integer}, &str, f32, bool ] let h = hlist![ 1, "Joe", 42f32, true ]; // We tell it the target type, and let the compiler infer the rest let (target, remainder): (f32, _) = h.pluck(); assert_eq!(target, 42f32); assert_eq!(remainder, hlist![1, "Joe", true]);

Implementation intuition

Our basic logic is fairly simple, given an HList and a Target type:

If the head of the Hlist matches the Target type, return the head of the Hlist and the tail of the Hlist as the remainder in a pair (2 element tuple)

Otherwise,

Store the head in current_head

Call pluck() again on the tail of the current Hlist with the same Target type (i.e. recursively call 1. with the tail), and store the result in (tail_target, tail_remainder) pair.

Return the target plucked from the tail, and prepend current_head to the remainder from the tail. Return both in a tuple like so: (tail_target, HCons { head: current_head, tail: tail_remainder} ).

First attempt

First, let’s assume we’ll be working with a trait; call it Plucker. For now, let’s also assume that it will be parameterised with 1 type, the target type, and will also have an associated type, Remainder. There isn’t really a hard and fast rule for when you should use type parameters vs associated types, but if you’re interested, you can take a look at this Stackoverflow question because Matthieu offers some great advice.

Personally, I always try use an associated type when I need to refer to the type from somewhere else (espescially recursively; more on this later). However, going with a type parameter is useful when you need to have different implementations of a trait for the same type in different circumstances. We saw this with Add, where the right hand side was a type parameter, RHS, allowing you to declare different Add implementations for the same left-hand-side type and letting the compiler find the correct implementation to use at + call sites depending on the type of thing being added with.

1 2 3 4 5 6 7 8
// Our trait trait Plucker<Target> { type Remainder; // Pluck should return the target type and the Remainder in a pair fn pluck(self) -> (Target, Self::Remainder); }

The “exit-type” implementation is for when the current head of the HList contains the target type, so let’s jot that down that:

1 2 3 4 5 6 7 8 9
impl <Target, Tail> Plucker<Target> for HCons<Target, Tail> { // Target is the head element, so the Remainder type is the tail! type Remainder = Tail; fn pluck(self) -> (Target, Self::Remainder) { (self.head, self.tail) } }

Now let’s implement the second piece; the non-trivial part where the target type is not in Head, but in the Tail of our HList. I’ll sometimes refer to this as the “work-to-be-done” type.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
impl <Target, Head, Tail> Plucker<Target> for HCons<Head, Tail> where Tail: Plucker<Target> // Target is in the tail, so we add the current head type to the remainder // And use the Tail's Plucker's Remainder type as the tail :) type Remainder = HCons<Head, <Tail as Plucker<Target>>::Remainder>; fn pluck(self) -> (Target, Self::Remainder) { let (tail_target, tail_remainder): (Target, <Tail as Plucker<Target>>::Remainder) = self.tail.pluck(); ( tail_target, HCons { head: self.head, tail: tail_remainder} ) } }

Looks good right? But if you send that to the compiler, you’ll be hit with this message:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
error[E0119]: conflicting implementations of trait `Plucker<_>` for type `frunk_core::hlist::HCons<_, _>`: --> tests/example.rs:306:1 | 296 | impl Plucker for HCons { | _- starting here... 297 | | 298 | | // Target is the head element, so the Remainder type is the tail! 299 | | type Remainder = Tail; 300 | | 301 | | fn pluck(self) -> (Target, Self::Remainder) { 302 | | (self.head, self.tail) 303 | | } 304 | | } | |_- ...ending here: first implementation here 305 | 306 | impl Plucker for HCons | ^ conflicting implementation for `frunk_core::hlist::HCons<_, _>`

What the Rust compiler is helpfully is telling us, is that it can’t distinguish between our two implementations, and if we look closely at the types, that is indeed true:

1 2 3 4 5
// exit (work done) type implementation impl <Target, Tail> Plucker<Target> for HCons<Target, Tail> // work-to-be-done implementation impl <Target, Head, Tail> Plucker<Target> for HCons<Head, Tail>

The Plucker part is exactly the same, and sure, we’ve used Target instead of Head in the for HCons<..> part in the first case, but simply using different type parameters isn’t enough to distinguish between the two.

Furthermore, note that you can’t use the lack of constraints (or where clauses) to distinguish between implementations either. This is because the current lack of an implementation for a given type parameter doesn’t mean that it can’t be added later (see this Stackoverflow questions for more details).

Welp, back to the drawing board.

Second attempt

What we’ve learnt is that we need to have another type parameter in order to distinguish the exit-type and the work-to-be-done-type implementations, so let’s add one to Plucker. Intuitively, we know that we want to have a way to distinguish between “the target is here in the HList” (exit) and “the target is over there in the HList” (recursion), so let’s call our type parameter Index.

1 2 3 4 5 6
// the new and improved Plucker trait trait Plucker<Target, Index> { type Remainder; fn pluck(self) -> (Target, Self::Remainder); }

Then, let’s add a type to identify the index for the exit-type implementation. We’ll use an empty enum because we just want to have a type, and we don’t want it to be available at runtime (ensuring zero runtime cost for our type).

1 2 3 4 5 6 7 8 9 10 11 12
// This will be the type we'll use to denote that the Target is in the Head enum Here {} impl <Target, Tail> Plucker<Target, Here> for HCons<Target, Tail> { // Target type is in the Head, so the Remainder type must be the tail! type Remainder = Tail; fn pluck(self) -> (Target, Self::Remainder) { (self.head, self.tail) } }

What about the work-to-be-done-type? Let’s imagine a scenario where we want to pluck a Target of type MagicType (let’s assume it’s declared as struct MagicType, so a type with a single element in it), and we have the following HLists to pluck() from; what would the Index be?

HNil

Trick question, there is no Index because our target of MagicType isn’t here. The compiler should fail to find an instance/implementation of our trait.

hlist[ MagicType ] (this is syntactic sugar for HCons)

Index would clearly be our Here enum type

hlist![ Foo, MagicType ] (this is syntactic sugar for HCons>)

Index can’t be Here, but we know that in order for the compiler to be satisfied that it can reach our end-type in 1. Here needs to be somewhere inside the type, but we can’t just use it as is, otherwise we’ll run into the same “conflicting implementation” error as before. So, let’s introduce new type There, that has one type parameter. In this case, the Index should resolve to There because the target type is in the head of the tail.

hlist![ Foo, Foo, MagicType ]

Following from 3. Index would have to be There>

hlist![ Foo, Foo, Foo, MagicType ]

What else could Index be but There>>

That Looks alright, so let’s give it a go. Since the new type has a type parameter but no real data to associate it with, we’ll need use the PhantomData trick (discussed in the last post).

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
// Type for representing a not-here Index struct There<T>(PhantomData<T>); impl<Head, Tail, Target, TailIndex> Plucker<Target, There<TailIndex>> for HCons<Head, Tail> // This where clause can be interpreted as "the target must be pluckable from the Tail" where Tail: Plucker<Target, TailIndex> { type Remainder = HCons<Head, <Tail as Plucker<Target, TailIndex>>::Remainder>; fn pluck(self) -> (Target, Self::Remainder) { let (target, tail_remainder): (Target, <Tail as Plucker<Target, TailIndex>>::Remainder) = <Tail as Plucker<Target, TailIndex>>::pluck(self.tail); (target, HCons { head: self.head, tail: tail_remainder, }) } }

And that’s it, we’ve written implementations of Plucker for HList. The implementation for work-to-be-done type is type-recursive in its Index type as well as its Remainder associated type. The cool thing is that the compiler is in charge of figuring out what the concrete types should be at any given pluck() call-site. In fact, you can see from this example in Frunk that the compiler will also happily infer the remainder for us too.

Type-level walkthrough

Let’s take a step back and work through what we’ve done.

We’ve declared an implementation of Plucker for the trivial exit-type (Target is in the head).

We’ve also declared an implementation for the work-to-be-done type (Target is in the tail). This implementation, however, is dependent on its recursive types of Tail and TailIndex (hint: look at the where clause). Intuitively speaking, an implementation of this type only exists if the current HList’s Tail has either:

An implementation for the exit-type; the Target type is in the head

Another work-to-be-done implementation of Plucker. This ultimately means that eventually there has to be a 1. in the tail somewhere.

Let’s try to walk through a mental model of how pluck() gets associated to the proper implementation.

1 2 3 4 5 6 7
// Given an HList (type explicitly declared for clarity) let h: Hlist![ &str, bool, f32, i32 ] = hlist![ "Joe", false, 42f32, 9 ]; // Suppose we want to get the float (f32) value out // We're ignoring the remainder and its type (Rust will figure it out), // because it isn't relevant for now. let (v, _): (f32,_) = h.pluck();

We’re ignoring the remainder and its type (Rust will figure it out if we use the underscore binding _), because it isn’t relevant for what we’re about to do.

In the following steps, we’ll substitute concrete types into our implementations where possible; similar to how functions get bound to values during the substitution model of evaluation (normally used for evaluating runtime values). We’ll do this in steps, so it’s possible that in the earlier stages, we don’t quite know the concrete type yet, but we’ll go down the “stack”, and come back up and fill those types in, too, once we know them.

pluck() on Hlist![ &str, bool, f32, i32 ]

Since our Target type (f32) is not in the head, it doesn’t match with the Here case, so we will try to use the work-to-be-done case (Index is There) and fill in as many types as we can for now. Let’s replace some type parameters with their concrete types where possible.

Concrete types:

Head → &str

Tail → Hlist![bool, f32, i32 ] (remember, this is syntactic sugar for HCons)

Target → f32 (this doesn’t change)

Remainder → Don’t know yet, but we already know that the current Head will be in it, since it isn’t the target type. And we know the tail of Remainder will be the remainder from pluck()ing f32 from the tail, so we can reference it as HCons< &str, < Hlist![bool, f32, i32] as Plucker> >::Remainder > for now.

TailIndex → Don’t know yet, but we’ll find out. Let’s call reference it as TailIndex1 for now.

pluck() on Hlist![bool, f32, i32] (Tail from 1.)

Again, f32 is not in the head of our type, so we know we aren’t going to be working with the exit-type typeclass implementation (e.g., Index is not Here yet.)

Concrete types:

Head → bool

Tail → Hlist![ f32, i32 ]

Target → f32 (again, this doesn’t change)

Remainder → Still don’t know yet, but we do know that bool will be in it since it isn’t our target. Similar to the previous step, we’ll tentatively call it HCons< bool, < Hlist![ f32, i32] as Plucker >::Remainder >

TailIndex → Don’t know yet, but let’s rename it TailIndex2 for now and fill it in later.

pluck() on Hlist![ f32, i32 ] (Tail from 2.)

The head has type f32 and the target type is f32, so we’ve arrived at the exit-type implementation.

Concrete types:

Head → f32

Tail → Hlist![ i32 ]

Target → f32 !

Remainder → Since we’ve found our target, we know that Remainder must be the tail, and thus Hlist![ i32 ], or its equivalent HCons< i32, HNil >

Index → Here !

Now, that we’ve finally resolved a concrete type for Index, we can go backwards up the type-level stack and fill in our unknowns:

Step 2:

TailIndex2 → Here, which means that Index is There>

Remainder → HList![ boo, i32 ]

Step 1:

TailIndex1 → There, which means that Index is There>>

Remainder → HList![ &str, boo, i32 ]

The compiler is thus able to find a trait implementation to pluck() a f32 out of an Hlist![ &str, bool, f32, i32 ] that looks like this (with all the type parameters bound to a concrete type):

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
// Remember Hlist![ ... ] is just a type-macro to make it easier to write nested Hcons impl Plucker< f32, There<There<Here>> > for Hlist![ &str, bool, f32, i32 ] where Hlist![ bool, f32, i32 ]: Plucker< f32, There<Here> > { type Remainder = HList![ &str, boo, i32 ]; fn pluck(self) -> (f32, Self::Remainder) { let (target, tail_remainder): (f32, < Hlist![bool, f32, i32] as Plucker<f32, There<Here>> >::Remainder) = < Hlist![ bool, f32, i32 ] as Plucker<f32, There<Here>> >::pluck(self.tail); (target, HCons { head: self.head, tail: tail_remainder, }) } }

Whew! That took a while, but I hope it helps illustrate how you can use a mental model similar to the substitution model of evaluation, but with types, in order to prove the existence of implementations for a given type.

By the way, by default, the compiler has a limit on how many levels of recursion/expansion this search for a typeclass instance goes. In my testing, I found this to be 64 levels and verified it to be so by looking at Rust’s source code. If you hit the limit, the compiler blow up, but will helpfully offer you a solution:

1 2 3 4 5 6 7 8
error[E0275]: overflow evaluating the requirement `frunk_core::hlist::HNil: frunk_core::hlist::Plucker<bool, _>` --> tests/derivation_tests.rs:296:35 | 296 | let (e, _): (bool, _) = hello.pluck(); | ^^^^^ | = note: consider adding a `#![recursion_limit="128"]` attribute to your crate = note: required because of the requirements on the impl of `frunk_core::hlist::Plucker<bool, frunk_core::hlist::There<_>>` for `frunk_core::hlist::HCons<bool, frunk_core::hlist::HNil>`

So, simply add #![recursion_limit="128"] to your crate. If you hit the limit again, the compiler will tell you to double the limit again. Ad infinitum.

Sculpting HLists

Great ! Now that we’ve finished with Plucker, let’s go one level deeper: making use of Plucker to do something even more interesting; sculpting HLists !

Here is the basic idea of what we want to be able to do:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
// Given an HList of type Hlist![ i32, &str, f32, bool ] let h] = hlist![9000, "joe", 41f32, true]; // We'd like to be able to "sculpt" it into another, differently shaped HList. // // Of course, the types in the new HList must be a subset of the original HList, // and if not, compilation should fail. // // Similar to pluck(), we'd also want the remainder of the original HList _not_ // used in the final result. let (reshaped, remainder): (Hlist![ f32, i32, &str ], _) = h.sculpt(); assert_eq!(reshaped, hlist![41f32, 9000, "joe"]); assert_eq!(remainder, hlist![true]); // the following should fail to compile, because there is no char in the original Hlist let (reshaped, _) = (Hlist![char], _) = h.sculpt();

Implementation intuition

Let’s call our trait Sculptor. We should be able to re-use our Plucker trait, which which means we’ll work with Targets and Indexs, but there’s more than one of each!

Intuitively, this is the kind of logic that we want:

Given TargetHList (target HList) and SourceHList (source HList), and assuming the types in the former is a subset (not necessarily in order though) of the latter:

Pluck value with the head type of TargetHList from SourceHList:

Store the result in a (plucked, remainder) tuple

Call sculpt on remainder, passing the tail type of the current TargetHList as the new TargetHList type.

Store the result in a (sculpted_tail, sculpted_remainder) tuple

Return (HCons { head: plucked, tail: sculpted_tail }, sculpted_remainder)

Note that in 1. we are making use of pluck(), and there is a recursive call to sculpt() in 2. Since there is a recursive call to sculpt(), it means that we need an exit-type as well. Intuitively, we’ll pencil one in:

When the target HList is empty (HNil), return a tuple (HNil, SourceHList)

First attempt

Given our logic, let’s assume we want 4 type parameters in our trait. Our trait is a bit more complicated than our Pluck trait, but not by much. We make use of the same associated-type trick to hold the type of Remainder to be returned as the 2nd element in our type that will be filled-in when we write instances of the trait.

1 2 3 4 5 6
trait Sculptor<Target, TargetTail, HeadIndex, TailIndices> { type Remainder; fn sculpt(self) -> (HCons<Target, TargetTail>, Self::Remainder); }

The instance of Sculptor for the exit-type should be simple, right?:

1 2 3 4 5 6 7 8 9 10 11 12
// Our exit condition is when Target is HNil, so we don't care about the tail of the target // nor do we really care about the type of SourceHList impl <TargetTail, HeadIndex, TailIndices, SourceHList> Sculptor<HNil, TargetTail, HeadIndex, TailIndices> for SourceHList { type Remainder = Source; // ?!?!? HNil as the head type of an HCons doesn't make sense fn sculpt(self) -> (HCons<HNil, TargetTail>, Self::Remainder) { // nevermind } }

Ooops; that didn’t work; our type signature for the trait can’t be fulfilled when implementing our instance! We simply have too many type parameters in our trait, even for the exit-type implementation (try implementing for the recursion case…it’ll become more apparent)

Back to the drawing board.

Second attempt

Let’s collapse our target-related type parameters into a single Target type parameter and our indices-related type parameters into a single Indices type parameter in our Sculptor trait declaration, and rely on the implementations to dictate (specialise) what types they should be (similar to how the Plucker trait had no mention of There or Here).

1 2 3 4 5 6
trait Sculptor<Target, Indices> { type Remainder; fn sculpt(self) -> (Target, Self::Remainder); }

The exit-type implementation will still be when we have HNil as the target. Thinking it through further, in the case that we don’t have a HNil as the target, it’s obvious that Source can then be literally anything, so we’ll rename its type parameter Source. Since our intention for Sculptor is for Indices to be an HList of Here or There (one for each type in our Target HList), the exit Indices must therefore be a valid Hlist. Since we don’t need an index to find an empty target, let’s make Indices HNil for simplicity.

1 2 3 4 5 6 7 8
impl<Source> Sculptor<HNil, HNil> for Source { // Since Our Target is HNil, we just return the Source type Remainder = Source; fn sculpt(self) -> (HNil, Self::Remainder) { (HNil, self) } }

To figure out the type parameters needed for our work-to-be-done type, let’s work through the logic we laid out earlier.

At minimum, we know we’re writing an instance of Sculptor for a Source of type HList, and our Target type is also an HList, so we’ll use SHead and STail to describe the “Source” HList (so HCons), and THead and TTail to denote the “Target” HList (similarly, HCons).

Pluck value with the head type of TargetHList from SourceHList:

Store the result in a (plucked, remainder) tuple

Since we need to pluck() a THead from our Source HList, we’ll need a type parameter for the first index, so let’s name it IndexHead. In addition, in order to pluck(), we need a Plucker too, so this constraint is needed somewhere in our implementation declaration:

1
HCons<SHead, STail>: Plucker<THead, IndexHead>

Call sculpt() on remainder, passing the tail type of the current TargetHList as the new TargetHList type.

Store the result in a (sculpted_tail, sculpted_remainder) tuple

Since we want to sculpt the remainder of calling pluck() in step 1. into type TTail (tail of TargetHList), we’ll need to have an HList of indices for that purpose too, so let’s call it IndexTail. Note that we don’t need a separate type parameter for the remainder from 1 because we can take advantage of the associated type on Plucker.

1 2 3 4 5
// In English, this is read as: // "The remainder of plucking the Target head type (THead) out of the source HList // must have a Sculptor implementation that lets us turn it into the tail type of // the Target HList (TTail) using the tail of the current Indices (IndexTail)" <HCons<SHead, STail> as Plucker<THead, IndexHead>>::Remainder: Sculptor<TTail, IndexTail>

Return (HCons { head: plucked, tail: sculpted_tail }, sculpted_remainder)

What will the Remainder type be? It should be the remainder of sculpting the remainder from plucking the head type (THead) out of the current source HList into TTail (yeah…)

1
type Remainder = <<HCons<SHead, STail> as Plucker<THead, IndexHead>>::Remainder as Sculptor<TTail, IndexTail>>::Remainder;

Putting all these types together with the logic, we have

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
impl <THead, TTail, SHead, STail, IndexHead, IndexTail> Sculptor<HCons<THead, TTail>, HCons<IndexHead, IndexTail>> for HCons<SHead, STail> where HCons<SHead, STail>: Plucker<THead, IndexHead>, <HCons<SHead, STail> as Plucker<THead, IndexHead>>::Remainder: Sculptor<TTail, IndexTail> { type Remainder = <<HCons<SHead, STail> as Plucker<THead, IndexHead>>::Remainder as Sculptor<TTail, IndexTail>>::Remainder; fn sculpt(self) -> (HCons<THead, TTail>, Self::Remainder) { let (p, r): (THead, <HCons<SHead, STail> as Plucker<THead, IndexHead>>::Remainder) = self.pluck(); let (tail, tail_remainder): (TTail, Self::Remainder) = r.sculpt(); ( HCons { head: p, tail: tail }, tail_remainder ) } }

As you can see, our implementations of Sculptor is type-recursive in an interesting way, and there are quite a few dependencies that need to be worked out between all the type parameters and the Plucker trait as well as the Sculptor trait itself (it appears in the where after all). Fortunately, the Rust compiler will do that for us (and if need be, tell you to raise the #![recursion_limit] in your crate).

If you’re not convinced this works, please by all means check out the hlist module in Frunk, in particular the Sculptor trait.

Conclusion

One last thing: the Plucker and Sculptor things aren’t just cute exercises; Plucker has already paid dividends when modeling Sculptor, and Sculptor, well, it’s instrumental in letting us do cool stuff like convert between structs with different LabelledGeneric implementations (to an extent, anyways), and other, even cooler generic functions. We’ll talk more about this in another post.

If you do a search, you’ll find a number of articles on the Interwebs that introduce Rust’s trait system, but not many that go deep into how to use it when you need to do non-trivial type-level recursion in your trait implementations (though how often this need arises is … another topic altogether). I also find that people generally don’t talk about what they did wrong, so I wanted to share my failed approaches as well.

The goal of this post is to hopefully help others who are curious, or have a need to do something similar, as well as to leave notes for myself in case I ever need to revisit this in the future. The mental models for breaking down the problem, defining types, and building up to an implementation might not work for everyone, but they’ve helped me.

Personally, I think it’s awesome that a close-to-the-metal systems programming language like Rust has a powerful enough compiler and type-system to allow for these kinds of techniques. As you can see, we’ve managed to build powerful, reusable abstractions without doing anything unsafe, and we’ve exposed an API that requires just the bare minimum of type annotations; Rust infers the rest :) In any case, I hope this post was useful, and as usual, please chime in with questions and suggestions.

Credit

The Here and There design was largely gleaned from this code. I stand on the shoulders of giants :)

** It goes without saying that these operations need to be type-safe. That is, they are verified by the compiler without using any unsafe tricks that could blow up at runtime.

LabelledGeneric in Rust: What, Why, How?

2017-03-04T12:23:00+00:00

What is LabelledGeneric? How does one encode type-level Strings in Rust? What is a labelled HList?

Hold on, let’s take a step back.

In a previous post about implementing Generic in Rust, I briefly mentioned the fact that Generic could cause silent failures at runtime if you have 2 structs that are identically shaped type-wise, but have certain fields swapped.

While we can work around this using wrapper types, that solution leaves something to be desired, because, well, more boilerplate adds noise and requires more maintenance.

Ideally, we want to have something like this, where the following works:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
#[derive(LabelledGeneric)] struct NewUser<'a> { first_name: &'a str, last_name: &'a str, age: usize, } #[derive(LabelledGeneric)] struct SavedUser<'a> { first_name: &'a str, last_name: &'a str, age: usize, } let n_user = NewUser { first_name: "Moe", last_name: "Ali", age: 30 }; // Convert from NewUser to SavedUser let s_user: SavedUser = labelled_convert_from(n_user);

but the following fails at compile-time because the fields are mis-matched (first_name and last_name have been swapped):

1 2 3 4 5 6 7 8 9 10 11
// Uh-oh! Fields are jumbled :( #[derive(LabelledGeneric)] struct JumbledUser<'a> { last_name: &'a str, first_name: &'a str, age: usize } // This should fail at compile-time because last_name and first_name are swapped // even if they have the same type let d_user = <JumbledUser as LabelledGeneric>::convert_from(s_user);

The solution to this sort of problem has been in Shapeless for some time; by using HLists where each cell contains not just a value, but instead hold named fields, where each value is labelled at the type level.

Let’s take a look at how Frunk implements Field values and LabelledGeneric in Rust :)

Add Frunk to your project

Frunk is published to Crates.io, so to begin, add the crate to your list of dependencies:

1 2
[dependencies] frunk = "${latest_version}"

Outline

Why? (Motivation)

Silent runtime errors with Generic

LabelledGeneric to the rescue

Field ??

How it works

Field

Type-level characters and strings

(Anonymous) Records!

Field and LabelledGeneric

How the LabelledGeneric derivation is generated

Conclusion

Links

Why? (Motivation)

Silent runtime errors with Generic

To illustrate the problem, observe that the following 2 structs have the exact same “shape”

1 2 3 4 5 6 7 8 9 10 11 12 13
#[derive(Generic)] struct NewUser<'a> { first_name: &'a str, last_name: &'a str, age: usize, } #[derive(Generic)] struct JumbledUser<'a> { last_name: &'a str, first_name: &'a str, age: usize }

That is, the Generic representation of their fields as Generic is simply HList![&'a str, &'a str, usize]. As a result, when we do the following:

1 2 3 4 5 6 7 8
let n_user = NewUser { first_name: "Moe", last_name: "Ali", age: 30 }; // Convert from NewUser to JumbledUser let s_user: JumbledUser = convert_from(n_user);

Oh no! s_user has first_name and last_name flipped :(

As explained near the end of the post introducing Generic, you can catch this sort of mistake by introducing wrapper types like FirstName<'a>(&' str) for each field, but that introduces more boilerplate. This sucks, because Generic is supposed to help avoid boilerplate!

Can we have our cake and eat it too ?

LabelledGeneric to the rescue

LabelledGeneric was introduced in v0.1.12 of Frunk to solve this exact problem. This is how you use it.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
#[derive(LabelledGeneric)] struct NewUser<'a> { first_name: &'a str, last_name: &'a str, age: usize, } #[derive(LabelledGeneric)] struct SavedUser<'a> { first_name: &'a str, last_name: &'a str, age: usize, } let n_user = NewUser { first_name: "Moe", last_name: "Ali", age: 30 }; // Convert from NewUser to SavedUser let s_user: SavedUser = labelled_convert_from(n_user); #[derive(Generic)] struct JumbledUser<'a> { last_name: &'a str, first_name: &'a str, age: usize } // ⬇︎ This will fail at compile time let j_user: JumbledUser = labelled_convert_from(n_user);

There isn’t a whole lot different to using LabelledGeneric vs using Generic:

Instead of deriving Generic, derive LabelledGeneric

Instead of calling convert_from, call labelled_convert_from

These 2 changes buy you a lot more type-safety at compile time, with zero boilerplate. By the way, if you’d like the compiler to automatically “align”, the generic representations so that you could instantiate a JumbledUser from a NewUser, then stay tuned for a later post ;)

The tl;dr version of how this works is that deriving by LabelledGeneric, we make the struct an instance of the LabelledGeneric typeclass. This typeclass is almost identical to the Generic typeclass, but the derive does something a bit different with the generic representation of the struct: it isn’t just an HList wrapping naked values.

Instead, the generic representation will be an HList where each cell will contain field name information, at the type-level, and conceptually has the following types:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
// LabelledGeneric Representation for NewUser type NewUserRepr = HList![ Field<first_name, &'a str>, Field<last_name, &'a str>, Field<age, usize>]; // LabelledGeneric Representation for SavedUser type SavedUserRepr = HList![ Field<first_name, &'a str>, Field<last_name, &'a str>, Field<age, usize>]; // LabelledGeneric Representation for JumbledUser type JumbledUserRepr = HList![ Field<last_name, &'a str>, Field<first_name, &'a str>, Field<age, usize>];

This difference in type-level representation is how the compiler knows that one can’t simply convert a NewUser or SavedUser into a JumbledUser via labelled_convert_from.

Field ??

What is Field ? It’s simply a container struct that is parameterised by 2 types, and has the following signature:

1
pub struct Field<Name, Type> { ... }

The first type parameter is Name and its purpose is to contain a type-level String, and the second type parameter is Type, which reflects the type of value contained inside the struct.

It may help to think of Field as an ad-hoc wrapper type.

How it works

Field

The full definition of Field is currently as follows:

1 2 3 4
pub struct Field<Name, Type> { name: PhantomData<Name>, pub value: Type, }

PhantomData is used to allow us to bind a concrete type to the Name type parameter in an instance of Field without actually having it take up any space (for more details on Phantom data, refer to the official docs).

To construct a Field, Frunk exposes a macro called field! so that you don’t need to touch PhantomData yourself.

1 2 3 4 5
// Usage: we let the compiler figure out the value type for us let age = field!((a, g, e), 3); assert_eq!(age.name, "age"); assert_eq!(age.value, 3);

For more information about the field! macro, please refer to its Rustdoc page. Astute readers will notice the odd (a,g,e) type used for naming. What is that about ???

Type-level characters and strings

In order represent characters at the type level, Frunk currently uses enums that have zero members. This is because empty enums have distinct types, and yet cannot be instantiated at runtime and thus are guaranteed to incur zero cost.

Conceptually, we declare one enum for every character we want to represent:

1 2 3 4 5 6 7 8 9 10 11
pub enum a {} pub enum b {} pub enum c {} // ... pub enum A {} // ... etc // Numbers can't be identifiers, so we preface them with an underscore pub enum _1 {} pub enum _2 {} // In reality, the above is generated by a macro.

This means that characters outside English alphanumeric range will need to be specially encoded (the LabelledGeneric derivation uses unicode, but more on this later), but for the most part, this should suffice for the use case of encoding field names as types.

As you may have guessed, type-level strings are then simply represented as tuple types, hence (a,g,e). For the sake of reducing noise, in the rest of this post, we will refer to these name-types without commas and parentheses.

Note: This type-level encoding of strings may change in the future.

(Anonymous) Records!

Combining the Field and HList constructs gets us something else: Records. I believe once upon a time, Rust supported anonymous structs; well, you can get most of that functionality back with Frunk!

1 2 3 4 5 6 7 8 9 10 11
let record = hlist![ field!(name, "Joe"), field!(age, 30) ]; // We'll talk about pluck() in a later post, but just an FYI, it returns the // target value with the type you specified as well as the remainder // of the HList in a pair. It is checked at compile time to make sure it never // fails at runtime. let (name, _): (Field<name, _>, _) = record.pluck(); assert_eq!(name.value, "Joe")

This kind of thing is sometimes called an “anonymous Record” in Scala (see scala-records, or Shapeless).

In the future, the anonymous Records API in Frunk might be improved. As it stands, it exists mostly for the purpose of LabelledGeneric and is a bit noisy to use.

Field and LabelledGeneric

So, what is the relationship between Field and the LabelledGeneric typeclass?

Quite simply, the associated Repr type of an instance of LabelledGeneric should have the type of an anonymous record (labelled HList).

So, given the following

1 2 3 4
struct Person { name: String, age: usize }

This is one possible implementation of LabelledGeneric for Person:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
impl LabelledGeneric for Person { type Repr = HList![ Field<name, String>, Field<age, usize> ]; fn into(self) -> Self::Repr { hlist![ field!(name, self.name), field!(age, self.age) ] } fn from(r: Self::Repr) -> Self { let hlist_pat![ name, age ] = r; Person { name: name.value, age: age.value } } }

But writing that yourself is tedious and error-prone, so Frunk provides a derivation for you.

How the LabelledGeneric derivation is generated

As illustrated earlier, you can do the following to create an instance of LabelledGeneric for your struct:

1 2 3 4 5
#[derive(LabelledGeneric)] struct Person { name: String, age: usize }

It generates something conceptually similar to what we had above, so we won’t repeat that here.

That said, there is something special about the way that characters outside the range of the standard English alphabet and digits are handled. For each of those characters, we get the Unicode hexcode and use those digits, sandwiched by _uc and uc_ delimiters, as the type-level representation.

1 2 3 4 5 6 7 8
// This isn't possible (yet) in Rust, but let's pretend it is struct Fancy { ❤: usize } // Since ❤ has a Unicode hexcode of \u{2764}\u{fe0f}, the // labelled generic representation for the above would be type Repr = HList![ Field<_ucu2764ufe0fuc_, usize> ]

This allows us to effectively represent virtually any legal identifier at the type level, even when the ASCII-only restriction for identifiers is lifted from stable Rust. For more details, take a look at how characters are matched to identifiers here.

Conclusion

In closing, I’d like to stress that all the abstractions and techniques described in this post are type-safe (no casting happening) and thus get fully verified by Rust’s compiler and its strong type system.

As far as I am aware, this is the first implementation of labelled HLists (aka anonymous Records) and LabelledGeneric in Rust, and I hope this post did a good job of explaining what problems they solve, what they are, how they work, and why you might want to use them. As usual, please give them a go and chime in with questions, comments, ideas, or PRs!

Also, as alluded to in the section introducing LabelledGeneric, there is a way to automatically match up out-of-order fields. We’ll go through this in another post.

Links

Frunk on Github

Frunk on Crates.io

Rust Generic (Not Generics)

2017-02-04T02:14:00+00:00

Have you ever wanted to convert Hlists into Structs or to reuse logic across different types that are structurally identical or very similar (e.g. same data across different domains)? Generic can help you do that with minimal boilerplate.

Generic is a way of representing a type in … a generic way. By coding around Generic, you can write functions that abstract over types and arity, but still have the ability to recover your original type afterwards. This can be a fairly powerful thing.

Thanks to the new Macros 1.1 infrastructure added in Rust 1.15, Frunk comes out of the box with a custom Generic derivation so that boilerplate is kept to a minimum. Without further ado, let’s dive in to see what Generic can do for us.

Add Frunk to your project

Frunk is published to Crates.io, so to begin, add the crate to your list of dependencies:

1 2
[dependencies] frunk = "0.1.9"

Examples

HList ⇄ Struct

Have an HList lying around and want to turn it into a Struct with the same shape (maybe you’re using Validated)?

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
extern crate frunk; #[macro_use] // for the hlist macro extern crate frunk_core; use frunk::*; // for the Generic trait and HList #[derive(Generic, Debug, PartialEq)] struct Person<'a> { first_name: &'a str, last_name: &'a str, age: usize, } let h = hlist!("Joe", "Blow", 30); let p: Person = from_generic(h); assert_eq!(p, Person { first_name: "Joe", last_name: "Blow", age: 30, });

This also works the other way too; just pass a struct to into_generic and get its generic representation.

One usecase for something like this is if you have a bunch of fields that you want to validate “simultaneously”, and you want to transform the end result into a single Struct; this is often the case when you want to turn external input (e.g. coming into your API, a web form, or fields read from a database), and in a previous post I introduced Validated as a way of doing that.

With the introduction of Generic, that last step of transforming an HList into your struct gets much simpler:

1 2 3 4
let validated = get_first_name().into_validated() + get_last_name() + get_age(); let person: Result<Person, _> = validated .into_result() .map(|h| from_generic(h)); // <-- much simpler !

Converting between Structs

Sometimes you might have 2 or more types that are structurally the same (e.g. different domains but the same data) and you’d like to convert between them. An example of this might be when you have a model for deserialising from an external API and another one for internal application business logic, and yet another for persistence.

Generic comes with a handy convert_from method that helps here:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
// Assume we have all the imports needed #[derive(Generic)] struct ApiPerson<'a> { FirstName: &'a str, LastName: &'a str, Age: usize, } #[derive(Generic)] struct DomainPerson<'a> { first_name: &'a str, last_name: &'a str, age: usize, } let a_person = ApiPersion { first_name: "Joe", last_name: "Blow", age: 30, }; let d_person: DomainPersion = convert_from(a_person); // done

Another example of where this might be useful is if you want to use different types to represent the same data at different stages (see this post on StackOverflow).

How it works (what is going on ? is it safe ?)

At a glance, Generic might look magical and dangerous, but really it is no more mysterious than the From trait in the standard lib; the only difference (for now) is that every Generic instance is bidirectional (can turn an A into a Repr and a Repr into an A). If you don’t believe me, just look at the type signatures.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
pub trait Generic<Repr> { /// Go from something to Repr fn into(self) -> Repr; /// Go from Repr to something fn from(r: Repr) -> Self; } /// Given a generic Representation of an A, returns A pub fn from_generic<A, Repr>(gen: Repr) -> A where A: Generic<Repr> /// Given an A, returns its generic Representation pub fn into_generic<A, Repr>(a: A) -> Repr where A: Generic<Repr> /// Converts one type into another assuming they have the same generic Representation pub fn convert_from<A, B, Repr>(a: A) -> B where A: Generic<Repr>, B: Generic<Repr>

Most of the magic resides in how the custom derive of Generic, made possible by the 1.15 release of Rust, is implemented. If you want to find out more, take a look at the derives directory of Frunk on Github. In regards to the end-result though, the following:

1 2 3 4 5 6
#[derive(Generic)] struct Person<'a> { first_name: &'a str, last_name: &'a str, age: usize, }

Gets expanded at compile-time to something resembling:

1 2 3 4 5 6 7 8 9 10 11 12
impl<'a> Generic<Hlist!(&'a str, &'a str, usize)> for Person<'a> { fn into(self) -> Hlist!(&'a str, &'a str, usize) { let Person { first_name, last_name, age } = self; hlist! [ first_name, last_name, age ] } fn from(r: Hlist!(&'a str, &'a str, usize)) -> Self { let hlist_pat! [ first_name, last_name, age ] = r; Person { first_name: first_name, last_name: last_name, age: age } } }

To be clear, the actual expanded coded is much gnarlier because we use fully qualified names for the sake of hygiene and I’ve sugared some things up with their macro-powered equivalents to cut down on noise (namely the HList type signature, pattern matching, and construction).

Someone on Twitter raised the point that if you had mixed up the ordering of the fields in your struct declaration (e.g. last name and first name are swapped between structs), then Generic would cause silent errors at runtime because the Structs’ shape would be the same, and that implementing From was more typesafe. With all due respect to that individual, the same could happen even if you hand-wrote your From implementation and got your field assignments crossed. In the worst case; you’ve now got fields that are not ordered correctly, your From is wrong, and you’ve got more boilerplate to maintain.

Really, the only way to truly prevent this kind of fat-fingering error is to have wrapper types (like struct FirstName(String), etc) for all your fields, in which case Generic conversion would be foolproof (if you got your field declaration orders wrong, you’d get a compile-time error). Ultimately, how typesafe you want to be is a choice you will need to make while weighing the risk of fat-fingering against the burden of maintaining more code.

I hope you’re now convinced that there is no dirty casting / unsafe stuff going on, so you can rest easy knowing your code is still as type-safe as it would have been if you had gone with something like From instead.

Conclusion

There are probably many other ways that Generic can be used to make code nicer (more reusable, DRYer, less noisy), so go ahead and see what you can cook up. As always, please don’t hesitate to get in touch via comments, on Github or on Gitter with suggestions, issues, questions, or PRs.

Links

Frunk on Github

Frunk on Crates.io

Credit

Shapeless

Rust Performance Testing on Travis CI

2016-11-02T15:39:00+00:00

Rust describes itself as:

a systems programming language that runs blazingly fast, prevents segfaults, and guarantees thread safety. ### Featuring * zero-cost abstractions * minimal runtime *efficient C bindings

So, it’s likely that developers who choose to program in Rust are focused on performance. You can make sure your code is efficient by writing benchmarks, but in order to prevent performance regressions, you’ll need to run benchmarks on your Pull Requests or patches and somehow compare before and after. Doing this can be tedious, especially as the changeset evolves over the course of code review or miscellaneous refactoring.

Let’s see how we can get automated benchmark comparisons across commits on Travis CI.

Putting benchmarks in your project

First off, you’ll need to have benchmarks in your codebase. There are a few ways to do this:

The standard way documented in the Rust Book

Making a benches directory in your project root, putting your benchmarks there, and running cargo bench (this is how I’ve done it in Frunk)

Running benchmarks on Travis

Next, in order to run benchmarks on Travis, we’ll need to make sure that your .travis.yml file has nightly listed as one of the Rust versions that your project is built with:

1 2 3
rust: - stable - nightly # so we can run benchmarks (required as of writing)

Then, in after_success, we’ll want the following in order to have benchmarks run when we are on a build that uses Rust nightly:

1 2 3 4
after_success: - if [ "$TRAVIS_RUST_VERSION" == "nightly" ]; then cargo bench; fi

Some readers might be wondering why I’m not using travis-cargo here. The reason is because travis-cargo doesn’t support arbitrary cargo libraries/commands, which is needed in the next section ;)

Getting benchmark comparisons in Pull Requests

So we have benchmarks running automatically on Travis, but what about the before-after comparisons that we talked about earlier? This is where the cargo-benchcmp library comes into play. benchcmp is:

A small utility for comparing micro-benchmarks produced by cargo bench. The utility takes as input two sets of micro-benchmarks (one “old” and the other “new”) and shows as output a comparison between each benchmark.

What we’ll want to do next is add a condition to only run these benchmarks when we’re building a Pull Request (henceforth PR), install the benchcmp tool, and use it:

Travis after_success bash script code (travis-after-success.sh) download

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
#!/usr/bin/env bash if [ "${TRAVIS_PULL_REQUEST_BRANCH:-$TRAVIS_BRANCH}" != "master" ] && [ "$TRAVIS_RUST_VERSION" == "nightly" ]; then REMOTE_URL="$(git config --get remote.origin.url)"; # Clone the repository fresh..for some reason checking out master fails # from a normal PR build's provided directory cd ${TRAVIS_BUILD_DIR}/.. && \ git clone ${REMOTE_URL} "${TRAVIS_REPO_SLUG}-bench" && \ cd "${TRAVIS_REPO_SLUG}-bench" && \ # Bench master git checkout master && \ cargo bench > benches-control && \ # Bench variable git checkout ${TRAVIS_COMMIT} && \ cargo bench > benches-variable && \ cargo install cargo-benchcmp --force && \ cargo benchcmp benches-control benches-variable; fi

The first conditional is simply to check that the current branch being built is not master. It’s a bit verbose because $TRAVIS_BRANCH does not always provide the current branch name. So instead, we use ${TRAVIS_PULL_REQUEST_BRANCH:-$TRAVIS_BRANCH}, which consists of $TRAVIS_PULL_REQUEST_BRANCH because it gives us the current branch if the build was triggered by a PR, and a default of $TRAVIS_BRANCH, which gives us the branch name of non-PR builds.

The second conditional checks that the current Travis build is using nightly, which is a requirement for running benchmarks (as of writing).

Inside the if statements body, we first cd out of our provided directory and clone our project anew. I’m not entirely sure why, but in my testing, I was unable to checkout another branch (e.g. master) otherwise. Next, we run cargo bench on the master branch, sending the output to benches-control. Afterwards, we checkout the commit for the current build by using TRAVIS_COMMIT, and run cargo bench again, sending the output to benches-variable.

Lastly, we install and run cargo benchcmp, passing the path of the control and variable benchmark result files as arguments, letting cargo-benchcmp do its job.

Oh, we shouldn’t forget to add our script to the after_success block in our Travis file.

1 2
after_success: - ./travis-after-success.sh

Here is some sample output from my Rust functional programming library, Frunk.

The benchmark comparisons show up in the build log.

Conclusion

That’s it. Now, you can go to the Travis build log of your PRs and see how performance has been affected. Please give it a try, and send any questions or feedback. Oh, if you’re interested in a library that does this for you or if you want to turn this into some kind of a service, do let me know ;-)

Accumulating Results in Rust With Validated

2016-10-24T15:00:00+00:00

Rust comes out of the box with a Result type in its standard library. For those not familiar with it, it is a union-like enum type where T is a type parameter denoting the kind object held in a Result in the success case (Result::Ok), and E is a type paramter denoting the kind of error object held in the failure case (Result::Err). In Scala, this is represented in the standard library as Either[+A, +B], where the the success and error type params are swapped (traditionally, the one on the left stands for error and the one on the right is…well, right).

By default, Result comes with really good support for what I call “early return on error”. That is, you can use map, and_then (flatMap in some other languages) to transform them, and if there’s an error at an intermediate step, the chain returns early with a Result::Err :

1 2 3 4 5 6
fn double_arg(mut argv: env::Args) -> Result<i32, String> { argv.nth(1) .ok_or("Please give at least one argument".to_owned()) .and_then(|arg| arg.parse::<i32>().map_err(|err| err.to_string())) .map(|n| 2 * n) }

But .. what happens when you have multiple Results that are independent of each other, and you want to accumulate not only their collective success case, but also all their collective errors in the failure case?

Let’s have a look at Validated in Frunk (which is itself inspired by Validated in Cats)

Frunk is published to Crates.io, so to begin, add the crate to your list of dependencies:

1 2
[dependencies] frunk = "0.1.9"

By the way, to take a dive into the deep end, jump straight to Validated’s Rustdocs.

Imports

Next, let’s add a few imports.

1 2
use frunk::hlist::*; // brings the .to_tuple2() method in scope so we can destructure our HList easily use frunk::validated::*;

Scenario

Suppose we have a Person struct defined as follows:

1 2 3 4 5 6
#[derive(PartialEq, Eq, Debug)] struct Person { age: i32, name: String, email: String, }

And, we have 3 methods that produce age, name and email for us, but all could potentially fail with a Nope error.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
/// For demonstrations purposes only #[derive(PartialEq, Eq, Debug)] pub enum YahNah { Yah, Nah, } /// Our Errors #[derive(PartialEq, Eq, Debug)] pub enum Nope { NameNope, AgeNope, EmailNope, } fn get_name(yah_nah: YahNah) -> Result<String, Nope> { match yah_nah { YahNah::Yah => Result::Ok("James".to_owned()), _ => Result::Err(Nope::NameNope), } } fn get_age(yah_nah: YahNah) -> Result<i32, Nope> { match yah_nah { YahNah::Yah => Result::Ok(32), _ => Result::Err(Nope::AgeNope), } } fn get_email(yah_nah: YahNah) -> Result<String, Nope> { match yah_nah { YahNah::Yah => Result::Ok("hello@world.com".to_owned()), _ => Result::Err(Nope::EmailNope), } }

In real life, these methods would probably be taking an HTML form as an argument and doing some kind of parsing/validation or making calls to a service somewhere, but for simplicity, in our example, each of them takes a single argument that will let us toggle between the success and error cases.

Using Validated

Having set all that up, actually using Validated to accumulate our Results is actually very simple:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
let v = get_name(YahNah::Yah).into_validated() + get_age(YahNah::Yah) + get_email(YahNah::Yah); // Turn it into a result and then map, passing a lambda that takes the HList contained inside let person = v.into_result() .map(|hlist| { let hlist_pat!(name, age, email) = hlist; Person { name: name, age: age, email: email, } }); assert_eq!(person.unwrap(), Person { name: "James".to_owned(), age: 32, email: "hello@world.com".to_owned(), });

As you can see, all we need to do is call into_validated() on a given Result to kick off the validation context, and use + to add subsequent Results into it. At the end, you call into_result() on the Validated to turn it back into a Result and map on the HList that is contained inside. Inside the lambda, we destructure the HList using the hlist_pat! macro, and then instantiate our Person.

Oh, in case it isn’t obvious, the hlist passed to the lambda when we map is statically typed in the order that your Results were added into the Validated context, so your code is completely type safe. If you want to learn more about HLists in Frunk, check out this blog post.

Having said that, perhaps in the success case, not much has really changed in comparison to using naked Results. That is, you could have gotten here simply by chaining with map and/or and_then. But take a look at what happens when one or more of these fail:

1 2 3 4 5 6 7 8 9
let v = get_name(YahNah::Nah).into_validated() + // get_name will fail get_age(YahNah::Yah) + // get_age will succeed get_email(YahNah::Nah); // get_email will fail let person = v.into_result() .map(|_| unimplemented!()); // won't get here anyways assert_eq!(person.unwrap_err(), vec![Nope::NameNope, Nope::EmailNope]);

As you can see, the failure case is more interesting because Validated gives us the ability to accumulate all errors cleanly. For operations like parsing user input or checking parameters passed into our API, this non-early-abort behaviour is highly desirable compared with telling the user what went wrong One. Thing. At. At. Time.

Oh, Validateds can also be appended to each other:

1 2 3 4 5 6 7 8
let r1: Result<String, String> = Result::Ok(String::from("hello")); let r2: Result<i32, String> = Result::Ok(1); let r3: Result<i32, String> = Result::Ok(3); let v1 = r1.into_validated(); let v2 = r2.into_validated(); let v3 = r3.into_validated(); let comb = v1 + v2 + v3; assert_eq!(comb, Validated::Ok(hlist!(String::from("hello"), 1, 3)))

Conclusion

Please take Validated out for a spin and send suggestions, comments, PRs ! I’ve found this abstraction to be helpful in the Scala world so I’m eager to hear impressions from Rustaceans.

Links

Frunk on Github

Frunk on Crates.io

Rust HLists (Heterogenous List)

2016-10-23T13:05:00+00:00

A heterogeneous list (henceforth “HList”) is a useful abstraction that is implemented in many statically-typed functional programming languages. Unlike normal list-like structures (e.g. Vec, Slice, Array), a heterogenous list is able to hold elements of different types (hence heterogenous) and expose those types in its own type signature.

1 2
let h = hlist!["Joe", "Blow", 30, true]; // h has a static type of: HCons<&str, HCons<&str, HCons<{integer}, HCons>>>

Now, you might be thinking “Isn’t that just a tuple?”. The answer is: in a way. Indeed, in terms of data structure, a given implementation of HList is usually really nothing more than deeply nested pairs (tuple of 2 elements) that each hold an element of arbitrary type in its 1st element and knows that its 2nd element is itself an HList-like thing. While it may seem convoluted, HList buys us the ability to abstract over arity, which turns out to be extremely useful, as you can see from this Stackoverflow answer by Miles Sabin, the creater of the Shapeless library, which provides an HList implementation in Scala.

Given that description and justification for the existence of HLists, let’s take a look at how to use Frunk’s implementation of HList in Rust.

Frunk is published to Crates.io, so to begin, add the crate to your list of dependencies:

1 2
[dependencies] frunk = "0.1.9"

By the way, to take a dive into the deep end, jump straight to HList’s Rustdocs.

Imports

Next, let’s add a few imports. In particular, note that we have a #[macro_use] directive in order to enable the hlist! macro, which makes declaring HLists nicer by saving you the trouble of writing deeply nested HCons.

1 2
#[macro_use] extern crate frunk; use frunk::hlist::*;

Creating an HList

Making an HList is easy if you use the hlist! macro:

1 2 3 4
let h = hlist!["Joe", "Blow", 30, true]; // You can choose to explicitly annotate the type of HList let h2: HCons<&str, HCons<&str, HCons<{integer}, HCons<bool, HNil>>>> = hlist!["Joe", "Blow", 30, true];

Writing the type of an HList

Since HLists are a bunch of nested HConss, you may think that writing the type annotation for one would be a PITA. Well, it might have been if not for the type-level macros introduced in Rust 1.13.

1 2 3 4
let h: Hlist!(&str, &str, i32, bool) = hlist!["Joe", "Blow", 30, true]; // We use the Hlist! type macro to make it easier to write // a type signature for HLists, which is a series of nested HCons // h has an expanded static type of: HCons<&str, HCons<&str, HCons>>>

Getting the head of an HList

To retrieve the head element of an HList, use the .head accessor

1 2
let h = hList![ "Joe" ]; let joe = h.head;

Getting multiple elements from an HList

To retrieve multiple elements, it’s highly recommended to use the hlist_pat! macro to deconstruct your HList.

1 2 3 4 5 6 7 8
let h = hlist!["Joe", "Blow", 30, true]; // h has a static type of: HCons<&str, HCons<&str, HCons<{integer}, HCons>>> let hlist_pat!(f_name, l_name, age, is_admin) = h; assert_eq!(f_name, "Joe"); assert_eq!(l_name, "Blow"); assert_eq!(age, 30); assert_eq!(is_admin, true);

Appending HLists

The Add trait is implemented for HList so that you can simply call + to append to an existing HList

1 2 3 4
let joe = hlist!["Joe", "Blow", 30]; let is_admin = hlist![true]; let joe_is_admin = joe + is_admin;

Length

To get the length of an HList, simply call its length() method

1 2
let joe = hlist!["Joe", "Blow", 30]; assert_eq!(joe.length(), 3);

Have fun !

It will be interesting to see what you can cook up with HList. As mentioned before, abstracting over arity allows you to do some really cool stuff, for example Frunk already uses HList to define a Validated abstraction to help accumulate errors over many different Result (we’ll go through this in another post):

1 2 3 4 5 6
pub enum Validated<T, E> where T: HList { Ok(T), Err(Vec<E>), }

So please check it out, take it for a spin, and come back with any ideas, criticisms, and PRs!

Links

Frunk on Github

Frunk on Crates.io

Enumeratum 1.4: ValueEnums + Circe

2016-04-16T16:25:00+00:00

It’s been a while since the last major release of Enumeratum, and in 1.4.0, minor changes include Play 2.5 support, integration library version bumps, and small internal refactorings. More excitingly though, the latest version adds support for a new kind of enumeration, ValueEnum, as well as an integration with the Circe JSON library.

Points of interest:

Unlike other value enum implementations, Enumeration’s value enums perform uniqueness checks at compile time to make sure you have unique values across your enum members.

Circe integration allows you to send and receive JSON data between your front end and your server using the same code

The 1.4.0 release page on Github has a more detailed list of changes, but we’ll specifically go through:

ValueEnums

Circe Integration

ValueEnums

What is a ValueEnum? It’s an enum that represents a primitive value (e.g. Int, Long, Short) instead of a String. I may have just made up the term, but it doesn’t matter as long as you know what I mean.

1 2 3 4 5 6 7 8
// Have something like object ContentType { case object Text(1) case object Image(3) } // Want to do assert(ContentType.withValue(3) == ContentType.Image)

This may sound mundane, since you can already build something like this yourself with the standard library’s Enumeration (or previous versions of Enumeratum ), but sometimes the most straightforward solutions are suboptimal.

The trouble with Enumeration

The standard lib’s Enumeration comes with the notion of a customisable id: Int on each member, which is a great starting point for implementing numbers-based enumerations.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
/** * This is an anti-example to show what can go wrong. * * Do not use this */ object Things extends Enumeration { val First = Value(1) val Second = Value(2) val Third = Value(3) val Fourth = Value(3) // not enough coffeeeeee def withValueOpt(i: Int): Option[Things.Value] = values.find(_.id == i) } /* * The above Enumeration will compile. Fine, but imagine now your app is deployed * and somewhere else in your code you have to actually use your enum. */ Things.First // => java.lang.AssertionError: assertion failed: Duplicate id: 3 // If at first you don't succeed??? Things.First // => java.lang.NoClassDefFoundError: Could not initialize class Things$ // newp

This funny behaviour is caused by the fact that Enumeration#Values (First, Second, Third, Fourth) are not checked for unique ids at compile time, and are instantiated when their outer Enumeration object is lazily instantiated. When a Value is instantiated, its id is stuffed into a HashMap[Int, Value] after an assertion check that the id does not already exist in the map.

What has happend in the above example is the enumeration code compiles, but when we call Things.First, object Things gets instantiated, and throws an assertion error when val Fourth is being instantiated with an id of 3, which has already been assigned to Third and thus is already in the aforementioned HashMap. This prevents the singleton Things from getting instantiated, and the next time you try to use it, Scala will throw a NoClassDefFoundError.

One way to work around this is to write tests for every such Enumeration to make sure that no one working in the code base has fat-fingered any ids. I’m a big proponent of writing tests, but tests are also code and come with a maintenance and cognitive cost, so I would prefer not having to write tests to make sure my simple value enums can be safely initialised.

This kind of problem is not limited to Enumeration: careless implementation of something similar may result in arguably freakier outcomes such as silent failures (2 members with the same value but only one of the members can be retrieved by value).

ValueEnum

In version 1.4.0 of Enumeratum, we’ve introduced 3 pairs of traits: IntEnum and IntEnumEntry, LongEnum and LongEnumEntry, and ShortEnum and ShortEnumEntry. As their names suggest, these are value enum traits that allow you to create enums that are backed by Int, Long and Short respectively. Each pair extends ValueEnum and ValueEnumEntry. Note that this class hierarchy is a bit extensive for now, and it may be more streamlined in the future.

This is an example of how you would create an Long based value enum with Play integration (JSON readers and writers, Query string binders, Path binders, Form formatters, etc):

ContentType value enum with full Play integration (ContentType.scala) download

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
import enumeratum.values._ sealed abstract class ContentType(val value: Long, name: String) extends LongEnumEntry case object ContentType extends LongPlayEnum[ContentType] { val values = findValues case object Text extends ContentType(value = 1L, name = "text") case object Image extends ContentType(value = 2L, name = "image") case object Video extends ContentType(value = 3L, name = "video") case object Audio extends ContentType(value = 4L, name = "audio") /* case object Sticker extends ContentType(value = 4L, name = "audio") * => Fails at compile time because 4L is already used with the following error * It does not look like you have unique values. Found the following values correspond to more than one members: Map(4 -> List(object Audio, object Sticker)) */ } assert(ContentType.withValue(1) == ContentType.Text) ContentType.withValue(10) // => java.util.NoSuchElementException: // Use with Play-JSON import play.api.libs.json.{ JsNumber, JsString, Json => PlayJson } ContentType.values.foreach { item => assert(PlayJson.toJson(item) == JsNumber(item.value)) }

The findValues method of ValueEnums works similarly to the findValues method of Enumeratum’s older Enum, except the macro will ensure that there is a literal value member or constructor for each enum entry and fails the compilation if more than one member shares the same value.

As the above example demonstrates, there are Play (and standalone Play-JSON) integrations available for this new kind of enum, as well as for UPickle, and Circe.

~~Note that this new feature is not yet available in Scala 2.10 and in the REPL due to Macro expansion differences~~ (update: now works in the REPL and is available for 2.10.x!).

Circe integration

Enumeratum 1.4.0 also adds support for serialising/deserialising to JSON using Circe, an up-and-coming performant and feature-filled JSON library published for both JVM and ScalaJS.

This is how you would use Circe with Enumeratum’s Enum (integrations for ValueEnum also exist)

ShirtSize Enum with Circe integration (ShirtSize.scala) download

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
import enumeratum._ sealed trait ShirtSize extends EnumEntry case object ShirtSize extends CirceEnum[ShirtSize] with Enum[ShirtSize] { case object Small extends ShirtSize case object Medium extends ShirtSize case object Large extends ShirtSize val values = findValues } import io.circe.Json import io.circe.syntax._ ShirtSize.values.foreach { size => assert(size.asJson == Json.fromString(size.entryName)) }

Conclusion

Hopefully, Enumeratum’s new ValueEnum implementations will make development easier and safer for engineers out there who need to use value enumerations. Since uniqueness is checked at compile-time, you can save yourself the trouble of writing a bunch of pedantic tests. Circe is a promising JSON library that was really easy to integrate with and I look forward to taking advantage of the fact that it works on both server side and on the front end.

As always, if you have any problems, questions, suggestions, or better yet, PRs, please do not hesitate to get in touch on Github.

Links

Enumeratum on Github

Scala and OpenCV Ep 2: Akka Face Detector

2016-03-14T17:24:00+00:00

In Episode 1 of this series on Scala and computer vision, we created a basic Akka-Streams-powered webcam feed app. To bring it to the next level, we will dig a little deeper into the OpenCV toolset and bring in feature detection as well as video stream editing.

We will build on the foundations from the previous post and continue with the usage of Akka Streams, modeling our application as a series of small transformations that are run asynchronously, with backpressure handled automatically.

Flow chart

Previously, our app could be represented by a somewhat trivial flow chart that nonetheless had all the elements of a useful Akka stream: a Source, multiple transformations, and controlled side-effecting.

To build our face detector, we will add the following:

Conversion to grey scale: Many image analysis tools need to be run on greyscale images, both for simplicity and efficiency.

Facial features detector: We will make use of OpenCV’s Haar Cascade feature detection API to detect and identify faces in our video feed.

Video editing: We want to draw rectangles around the faces that have been identified into the image.

Our updated flow chart is as follows (new transformations are highlighted by a light green rectangle):

Greyscale

To convert a given Mat to a greyscale Mat, we can make use of the OpenCV method cvtColor. The only slight niggle is that the method isn’t idempotent: if you try to convert a greyscale image to greyscale, the method will throw. No matter, we can try handle that scenario ourselves by detecting the number of channels in the matrix.

1 2 3 4 5 6 7 8 9 10 11 12 13
def toGreyScale(mat: Mat): Mat = { if (mat.channels() == 1) { mat // just hand back the matrix as is; it is already grey } else { // allocate a new Matrix with the same dimensions val greyMat = { val (rows, cols) = (mat.rows(), mat.cols()) new Mat(rows, cols, CV_8U) } opencv_imgproc.cvtColor(mat, greyMat, COLOR_BGR2GRAY, 1) greyMat } }

However, since we want to pass the original colour image and the new greyscale image down the pipeline, we’ll make things a bit easier for ourselves by defining a simple WithGreyscale case class to hold both:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
object WithGrey { /** * Simple transformer method that produces a [[WithGrey]] */ def build(orig: Mat): WithGrey = { val grey = toGreyScale(orig) WithGrey(orig = orig, grey = grey) } // toGreyScale is in here too } /** * Original Matrix with a Grey image. Useful because almost all analysis processing requires a greyscale image instead of * a colour image. * * The constructor is private to make sure we don't mix up the two references * * Passing [[WithGrey]] images along with the original saves us from having to process to grey scale over and over again. */ final case class WithGrey private (orig: Mat, grey: Mat)

Face detection

To find faces in the images in our video feed, we will make use of Haar feature-based cascade classifiers, which are supported directly by OpenCV. Haar Cascade classifers define how to look at an image and quickly identify any areas in it that are of interest to us. A given classifier definition will usually contain multiple stages, so that a region is considered to test positive if all features in all stages of the definition return positive (thus cascade).

In actual usage, this relies on careful training and tuning of classifier defintions, as well as a combination of clever mathematics and pragmatic optimisation for detection. I will not cover exactly how they work in this tutorial (my understanding is dubious and there is a wealth of information online about them), but the following are a couple links that really helped me understand the theory behind them and how they work in practice:

OpenCV documentation on using Haar Cascades for face detection

Youtube video covering Haar Cascades with a really good explanation of summed area tables

OpenCV’s Haar Classifier API (or perhaps JavaCV’s wrapping of it) is fairly straight forward and boils down to:

Instantiating a CascadeClassifier, passing in a path to a classifier definition (you can find some here) as a constructor argument

Instantiating an instance of RectVector, which is aptly named because it is a wrapper for a native vector of rectangles.

Pass the allocated instance of RectVector to the CascadeClassifier’s detectMultiScale along with a greyscale image and some other options (yes, OpenCV will mutate the RectVector you pass in by adding in Rects)

In our implementation of a face detector, we’ll wrap a few raw (but aliased) primitives that serve as option flags in OpenCV, just for our own sanity. We’ll also create a delegator class that has a detect(withGrey: WithGrey): (WithGrey, Seq[Face]) method and wraps the classifier to hold constant values for the classifier options because for our purposes, those won’t be changing on the fly.

Tuple-like class for holding width and height in pixels (Dimensions.scala) download

1 2 3 4
/** * Tuple-like class for holding width and height in pixels */ case class Dimensions(width: Int, height: Int)

Nothing face-specific in this class per say; it can hold ids and Rects for any detected object (Face.scala) download

1 2 3 4 5 6 7
/** * Holds an id and an OpenCV Rect defining the corners of a rectangle. * * There is nothing *face* specific in this class per say; it can hold ids and Rects for any detected * object */ case class Face(id: Long, faceRect: Rect)

Haar classifier option wrapper class (HaarDetectorFlag.scala) download

1 2 3 4 5 6 7 8 9 10
sealed abstract class HaarDetectorFlag(val flag: Int) case object HaarDetectorFlag { case object DoCannyPruning extends HaarDetectorFlag(CV_HAAR_DO_CANNY_PRUNING) case object ScaleImage extends HaarDetectorFlag(CV_HAAR_SCALE_IMAGE) case object FindBiggestObject extends HaarDetectorFlag(CV_HAAR_FIND_BIGGEST_OBJECT) case object DoRoughSearch extends HaarDetectorFlag(CV_HAAR_DO_ROUGH_SEARCH) }

Face detector class that holds a Haar classifier (FaceDetector.scala) download

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62
object FaceDetector { /** * Builds a FaceDetector with the default Haar Cascade classifier in the resource directory */ def defaultCascadeFile( dimensions: Dimensions, scaleFactor: Double = 1.3, minNeighbours: Int = 3, detectorFlag: HaarDetectorFlag = HaarDetectorFlag.DoCannyPruning, minSize: Dimensions = Dimensions(width = 30, height = 30), maxSize: Option[Dimensions] = None ): FaceDetector = { val classLoader = this.getClass.getClassLoader val faceXml = classLoader.getResource("haarcascade_frontalface_alt.xml").getPath new FaceDetector( dimensions = dimensions, classifierPath = faceXml, scaleFactor = scaleFactor, minNeighbours = minNeighbours, detectorFlag = detectorFlag, minSize = minSize, maxSize = maxSize ) } } class FaceDetector( val dimensions: Dimensions, classifierPath: String, scaleFactor: Double = 1.3, minNeighbours: Int = 3, detectorFlag: HaarDetectorFlag = HaarDetectorFlag.ScaleImage, minSize: Dimensions = Dimensions(width = 30, height = 30), maxSize: Option[Dimensions] = None ) { private val faceCascade = new CascadeClassifier(classifierPath) private val minSizeOpenCV = new Size(minSize.width, minSize.height) private val maxSizeOpenCV = maxSize.map(d => new Size(d.width, d.height)).getOrElse(new Size()) /** * Given a frame matrix, a series of detected faces */ def detect(frameMatWithGrey: WithGrey): (WithGrey, Seq[Face]) = { val currentGreyMat = frameMatWithGrey.grey val faceRects = findFaces(currentGreyMat) val faces = for { i <- 0L until faceRects.size() faceRect = faceRects.get(i) } yield Face(i, faceRect) (frameMatWithGrey, faces) } private def findFaces(greyMat: Mat): RectVector = { val faceRects = new RectVector() faceCascade.detectMultiScale(greyMat, faceRects, scaleFactor, minNeighbours, detectorFlag.flag, minSizeOpenCV, maxSizeOpenCV) faceRects } }

To be clear, there is really nothing face-specific in our classifier because what it detects is entirely dependent on the Haar cascade XML file passed to it on construction.

Drawing rectangles

Once we have a list of rectangles that denote where our objects are in the image matrix, the last thing we need to do is draw the rectangles on the original image matrix. OpenCV provides a rectangle method that takes a Mat and two points denoting the top left and bottom right corners of a rectangle and draws the rectangle to the matrix it in-place. Here again, our implementation will clone the matrix first before calling the OpenCV method so as to keep our code easy to reason about.

(FaceDrawer.scala) download

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
class FaceDrawer(fontScale: Float = 0.6f) { private val RedColour = new Scalar(AbstractCvScalar.RED) /** * Clones the Mat, draws squares around the faces on it using the provided [[Face]] sequence and returns the new Mat */ def drawFaces(withGrey: WithGrey, faces: Seq[Face]): Mat = { val clonedMat = withGrey.orig.clone() for (f <- faces) drawFace(clonedMat, f) clonedMat } private def drawFace(clonedMat: Mat, f: Face): Unit = { rectangle( clonedMat, new Point(f.faceRect.x, f.faceRect.y), new Point(f.faceRect.x + f.faceRect.width, f.faceRect.y + f.faceRect.height), RedColour, 1, CV_AA, 0 ) // draw the face number val cvPoint = new Point(f.faceRect.x, f.faceRect.y - 20) putText(clonedMat, s"Face ${f.id}", cvPoint, FONT_HERSHEY_SIMPLEX, fontScale, RedColour) } }

Our FaceDrawer will expose adrawFaces method that takes a WithGrey with a list of detected Faces and use the above method to draw rectanges around each face. We’ll also make use of OpenCV’s putText method to write the word “Face” along with a number right on top of the rectangle.

UI

We’ll hook up all our components in a simple Swing app. To make things a little more interesting, the app will consist of 2 frames:

An initial frame to allow the user to choose between loading a custom Haar cascade classifier file or to load the default one that’s packaged in resources

The actual CanvasFrame shows our feed along with rectangles around detected objects

WebcamFaceDetector UI (WebcamFaceDetector.scala) download

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79
object WebcamFaceDetector extends SimpleSwingApplication { def top: Frame = new OptionsFrame /** * This is the initial frame, which presents two simple options, to load a custom Haar cascade file for face detection, * or to use the default one */ private class OptionsFrame extends Frame { currentFrame => peer.setDefaultCloseOperation(javax.swing.JFrame.EXIT_ON_CLOSE) val imageDimensions = Dimensions(width = 640, height = 480) val chooseCascadeBtn = Button("Load custom Haar cascade file") { val filePath = openChooser() filePath.foreach { path => val detector = new FaceDetector(dimensions = imageDimensions, classifierPath = path) openFaceDetectionWindow(detector) } } val defaultCascadeBtn = Button("Use default face Haar cascade file") { val detector = FaceDetector.defaultCascadeFile(imageDimensions) openFaceDetectionWindow(detector) } val mainPanel = new GridPanel(rows0 = 0, cols0 = 1) { preferredSize = new Dimension(300, 200) contents ++= Seq(chooseCascadeBtn, defaultCascadeBtn) } contents = mainPanel private def openChooser(): Option[String] = { val chooser = new FileChooser(new java.io.File(".")) chooser.fileSelectionMode = FileChooser.SelectionMode.FilesOnly chooser.showOpenDialog(currentFrame) match { case FileChooser.Result.Approve => Some(chooser.selectedFile.toPath.toAbsolutePath.toString) case _ => None } } private def openFaceDetectionWindow(faceDetector: FaceDetector): Unit = { new DetectionFrame(faceDetector) peer.setDefaultCloseOperation(javax.swing.WindowConstants.DO_NOTHING_ON_CLOSE) currentFrame.close() } } /** * Our detection window; opened by Initial Frame */ private class DetectionFrame(faceDetector: FaceDetector) { implicit val system = ActorSystem() implicit val materializer = ActorMaterializer() val webcamSource = Webcam.source(deviceId = 0, dimensions = faceDetector.dimensions) val canvas = new CanvasFrame("Webcam") // //Set Canvas frame to close on exit canvas.setDefaultCloseOperation(javax.swing.JFrame.EXIT_ON_CLOSE) val faceDrawer = new FaceDrawer() val flow = webcamSource .map(MediaConversion.toMat) // most OpenCV manipulations require a Matrix .map(Flip.horizontal) .map(WithGrey.build) .map(faceDetector.detect) .map((faceDrawer.drawFaces _).tupled) .map(MediaConversion.toFrame) // convert back to a frame .map(canvas.showImage) .to(Sink.ignore) flow.run() } }

Notice that once again, the code defining the Akka Flow Graph maps almost one to one to our flow chart.

Conclusion

We now have a face detector that uses OpenCV’s Haar cascade classifier toolbelt and draws rectangles around any identified faces, and we made it by expanding on the Akka Stream foundations laid in the previous post. As before, the code for this tutorial can be found on Github.

In the next post, we’ll expand this further by classifying the faces that we’ve detected as smiling or not using a supervised machine-learning model. We could of course continue to use Haar cascades to identify smiles in our feed (we can simply choose to load a smile Haar cascade classifier file), but what would be the fun in that ? :)

Credits

Playing with OpenCV in Scala to do face detection with Haarcascade classifier using a webcam

Scala and OpenCV Ep 1: Akka Webcam

2016-03-08T16:16:00+00:00

In a previous post, I talked about SBT-OpenCV, a plugin for SBT that makes it easy to get started with OpenCV in any SBT-defined JVM app using just one line in project/plugins.sbt. Having handled the issue of getting the proper dependencies into a project, we can turn our attention to actually using the libraries to do something cool.

This post is the beginning of a series, where the end goal is to build a smile detector. Akka and OpenCV will be used, with Spark joining later on to complete the buzzwords treble.

A well-rounded and fun first step is to get a video feed from a webcam showing on our screen. To do this, we will cover a variety of things, including how to define a custom Akka Source, how to use JavaCV, and some basic OpenCV image manipulation utilities.

Akka Streams

Many of the OpenCV tutorials floating around on the interwebs use a procedural approach; perhaps because it better fits the programming language of the tutorial, or for performance. In this series of posts, we will instead adopt a stream processing model, specifically in the manner of Reactive Streams.

There are many benefits of using the Reactive Stream model (this blog post, and this slide deck by Roland Kuhn are great places to start reading), but the main ones I feel are relevant for us are:

Simplicity: by turning data processing into a series of simple stateless transformations, your code is easy to maintain, easy to change, and easy to understand: in other words, it becomes agile (relax: your code, not your team…).

Backpressure: Reactive Streams implementations ensure that backpressure (when downstream transforms take too long, upstream is informed so as to not overload your system) is handled automatically

Asynchronous: Reactive Streams are run asynchronously by default, leaving your main thread(s) responsive

In Scala, Akka-Streams is the defacto implementation of the Reactive Streams spec, and although it is labelled experimental, its adoption looks imminent (for example, there is already a Play integration and the innards of Play are being rewritten to use Akka-Http, which is based on Akka-Streams). Another nice Reactive Streams implementation in Scala is Monix, which offers a (subjectively) cleaner interface that is more familiar for people who come from RxScala/RxJava.

For the purposes of this tutorial, we will be using Akka-Streams because it seems to have higher chances of wide-spread adoption.

Note that this tutorial was written based on an experimental version of Akka streams.

Flow chart

Asides from wrapping OpenCV, JavaCV comes with a number of useful classes. One such class is CanvasFrame, which is a hardware-accelerated Swing Frame implementation for showing images. CanvasFrame’s .showImage method accepts a Frame, which is the exact same type that OpenCVFrameGrabber (another useful JavaCV class) returns from its .grabh() method.

Before showing the image, we will flip the image so that the feed we see on screen moves in the direction we expect. This requires us to do a simple transformation to a Mat, a wrapper type for OpenCV’s native matrix, do the actual flipping of the matrix, convert the Mat back into a Frame, and then show it on the CanvasFrame.

In short, our pipeline looks something like this:

The Source

As the diagram suggests, the first thing we need is a Source that produces Frames; in other words, a Source[Frame].

The OpenCVFrameGrabber API for grabbing frames from a webcam is fairly simple: you instantiate one passing in an Int for the device id of the webcam (usually 0), optionally pass some settings to it, and then call start to initialise the grabber. Afterwards, it is simly a matter of calling .grab() to obtain a Frame.

1 2 3 4 5 6 7 8 9 10
val grabber = new OpenCVFrameGrabber(deviceId) grabber.setImageWidth(imageWidth) grabber.setImageHeight(imageHeight) grabber.setBitsPerPixel(bitsPerPixel) grabber.setImageMode(imageMode) grabber.start() //... grabber.grab() // returns a Frame

In order to create an Akka Source[Frame], we will make use of the Akka-provided ActorPublisher class, which provides helper methods that specifically make it easy to send data only when there is downstream demand (this is how backpressure is automagically handled).

In the actor’s receive method, we match on

Request message type, which use to then call emitFrames()

A custom Continue object, which also calls emitFrames()

Cancel in order to know when to stop the actor.

The emitFrames() method is a method that checks to see if the Actor is currently active (whether it has any subscribers), and if it is, grabs a frame and sends it to the onNext helper method from ActorPublisher to send a piece of data. It then checks if totalDemand (another ActorPublisher method) is greater than 0, and sends itself a Continue message, which invokes emitFrames() again. This somewhat convoluted way of sending data downstream is required because grabber.grab() is a blocking call, and we don’t want to block the Actor threadpool for too long at a time (this pattern is used by the built-in InputStreamPublisher).

In order to make a Source[Frame], we instantiate an instance of our actor, pass its ActorRef to a method that creates a Publisher[Frame], and then pass the publisher to a method that makes a Source[Frame].

For the purposes of keeping our API clean, we make it a private class and expose only a static method for creating a source.

Webcam Source[Frame] (WebcamSource.scala) download

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113
package com.beachape.video import akka.actor.{ DeadLetterSuppression, Props, ActorSystem, ActorLogging } import akka.stream.actor.ActorPublisher import akka.stream.actor.ActorPublisherMessage.{ Cancel, Request } import akka.stream.scaladsl.Source import org.bytedeco.javacpp.opencv_core._ import org.bytedeco.javacv.{ FrameGrabber, Frame } import org.bytedeco.javacv.FrameGrabber.ImageMode /** * Created by Lloyd on 2/13/16. */ object Webcam { /** * Builds a Frame [[Source]] * * @param deviceId device ID for the webcam * @param dimensions * @param bitsPerPixel * @param imageMode * @param system ActorSystem * @return a Source of [[Frame]]s */ def source( deviceId: Int, dimensions: Dimensions, bitsPerPixel: Int = CV_8U, imageMode: ImageMode = ImageMode.COLOR )(implicit system: ActorSystem): Source[Frame, Unit] = { val props = Props( new WebcamFramePublisher( deviceId = deviceId, imageWidth = dimensions.width, imageHeight = dimensions.height, bitsPerPixel = bitsPerPixel, imageMode = imageMode ) ) val webcamActorRef = system.actorOf(props) val webcamActorPublisher = ActorPublisher[Frame](webcamActorRef) Source.fromPublisher(webcamActorPublisher) } // Building a started grabber seems finicky if not synchronised; there may be some freaky stuff happening somewhere. private def buildGrabber( deviceId: Int, imageWidth: Int, imageHeight: Int, bitsPerPixel: Int, imageMode: ImageMode ): FrameGrabber = synchronized { val g = FrameGrabber.createDefault(deviceId) g.setImageWidth(imageWidth) g.setImageHeight(imageHeight) g.setBitsPerPixel(bitsPerPixel) g.setImageMode(imageMode) g.start() g } /** * Actor that backs the Akka Stream source */ private class WebcamFramePublisher( deviceId: Int, imageWidth: Int, imageHeight: Int, bitsPerPixel: Int, imageMode: ImageMode ) extends ActorPublisher[Frame] with ActorLogging { private implicit val ec = context.dispatcher // Lazy so that nothing happens until the flow begins private lazy val grabber = buildGrabber( deviceId = deviceId, imageWidth = imageWidth, imageHeight = imageHeight, bitsPerPixel = bitsPerPixel, imageMode = imageMode ) def receive: Receive = { case _: Request => emitFrames() case Continue => emitFrames() case Cancel => onCompleteThenStop() case unexpectedMsg => log.warning(s"Unexpected message: $unexpectedMsg") } private def emitFrames(): Unit = { if (isActive && totalDemand > 0) { /* Grabbing a frame is a blocking I/O operation, so we don't send too many at once. */ grabFrame().foreach(onNext) if (totalDemand > 0) { self ! Continue } } } private def grabFrame(): Option[Frame] = { Option(grabber.grab()) } } private case object Continue extends DeadLetterSuppression }

We’ll also define a simple Dimensions case class to make things a bit clearer (keyword arguments FTW)

Tuple-like class for holding dimensions (Dimensions.scala) download

1 2 3 4
/** * Tuple-like class for holding width and height in pixels */ case class Dimensions(width: Int, height: Int)

Conversion

In order to begin processing our feed with OpenCV, we first need to transform our Frame, which is a JavaCV type, into a type that works with JavaCV’s wrapping of OpenCV’s main representation of images, the matrix, aka Mat. Fortunately, JavaCV has a OpenCVFrameConverter.ToMat helper class that helps us do this. Since the class uses a mutable private field for holding on to temporary results, it normally isn’t advisable to use it in multithreaded code unless we make new copies of it each time, but we can make it thread safe by binding it to a ThreadLocal.

Media conversion utility methods (MediaConversion.scala) download

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
/** * Holds conversion and transformation methods for media types */ object MediaConversion { // Each thread gets its own greyMat for safety private val frameToMatConverter = ThreadLocal.withInitial(new Supplier[OpenCVFrameConverter.ToMat] { def get(): OpenCVFrameConverter.ToMat = new OpenCVFrameConverter.ToMat }) /** * Returns an OpenCV Mat for a given JavaCV frame */ def toMat(frame: Frame): Mat = frameToMatConverter.get().convert(frame) /** * Returns a JavaCV Frame for a given OpenCV Mat */ def toFrame(mat: Mat): Frame = frameToMatConverter.get().convert(mat) }

Manipulation

Once we have our Mat, we can use OpenCV methods to do manipulation. One thing though, is that (perhaps for efficiency) by default, these methods mutate the original object. This can cause strange issues in a multi-threaded, multi-path Flow graph, so instead of using them as is, we make use of the convenient clone method before doing our flip so that the original matrix remains as-is.

(Flip.scala) download

1 2 3 4 5 6 7 8 9 10 11 12
object Flip { /** * Clones the image and returns a flipped version of the given image matrix along the y axis (horizontally) */ def horizontal(mat: Mat): Mat = { val cloned = mat.clone() flip(cloned, cloned, 1) cloned } }

Hooking things up

Now that we have all our components, all we need to do is create a simple application that instantiates all our components and hooks them all together:

Instantiate our ActorSystem and Materializer

Instantiate a CanvasFrame

Instantiate our Source[Frame]

Define our Graph by using our components to transform it

Run the graph

Webcam feed app (WebcamWindow.scala) download

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
object WebcamWindow extends App { implicit val system = ActorSystem() implicit val materializer = ActorMaterializer() val canvas = new CanvasFrame("Webcam") // Set Canvas frame to close on exit canvas.setDefaultCloseOperation(javax.swing.JFrame.EXIT_ON_CLOSE) val imageDimensions = Dimensions(width = 640, height = 480) val webcamSource = Webcam.source(deviceId = 0, dimensions = imageDimensions) val graph = webcamSource .map(MediaConversion.toMat) // most OpenCV manipulations require a Matrix .map(Flip.horizontal) .map(MediaConversion.toFrame) // convert back to a frame .map(canvas.showImage) .to(Sink.ignore) graph.run() }

Looking at the code, one of the rewards of using the stream processing model over the procedureal approach might jump out at you: the near 1 to 1 correspondence that the graph definition has with our earlier diagram.

Conclusion

So, with that we should now have a very simple app shows what your webcam sees, flipped so that when you move left, the image moves with you. We’ve done it by declaring a custom Akka Stream Source and transforming it a little bit before shoving it onto the screen.

In the next post, we will look at how to do something a bit more complex: face detection using OpenCV.

Note the code for this post is on Github

Credits

Playing with OpenCV in Scala to do face detection with Haarcascade classifier using a webcam

OpenCV SBT Plugin

2016-03-05T00:30:00+00:00

OpenCV is arguably the defacto free, open-source computer vision library, but setting it up for usage in a JVM project can be hard because OpenCV itself is written in C++, so there are a bunch of system-dependent things that you need to download/compile/install before you can use it.

JavaCV, written by Bytedeco is a library that makes it more bearable to use OpenCV from JVM projects by providing a bunch of wrapper classes and logic around OpenCV (there’s a lot more to it, see their page for details).

Still, because JavaCV depends on JavaCPP for common and OpenCV C++ wrappers, and JavaCPP requires you to set your target platform (what platform you want to run on), I thought getting started could be easier still.

After taking a look at this Github project, I created an SBT plugin, SBT-OpenCV, that allows you to add just one line to your project/plugins.sbt to begin playing around with OpenCV:

1
addSbtPlugin("com.beachape" % "sbt-opencv" % "1.4")

The following is a list of SBT setting keys that you can set in order to customise the behaviour of the plugin:

1 2 3 4
* `javaCVPlatform`: The platform that you want to compile for (defaults to the platform of the current computer). You can also set this via the "sbt.javacv.platform" System Property * `javaCppVersion`: Version of Java CPP that you want to use * `javaCppPresetsVersion`: Version of Java CPP Presets that you want to use * `javaCVVersion`: Version of Java CV that you want to use

I think javaCVPlatform is the one that will be most interesting, since you may want to compile JARs for different target platforms; for a list of supported strings, look at the classifiers supported by JavaCPP presets, or work out the different strings that can result from the JavaCPP Loader.

For example:

1
javaCppPlatform := "android-arm"

Feel free to try it out and submit issues, ideas, and PRs at the Github page :)

Slim Play App

2015-07-25T20:50:00+00:00

Play is one of two officially-supported web frameworks from Typesafe, the company behind Scala (the other is Spray). It runs on its own webserver, is non-blocking, and encourages the use of idiomatic Scala. It is often compared with Rails because of its emphasis on convention over configuration and because it’s a full-on framework that comes with most of the bells and whistles needed to build a full-featured webapp. Spray is considered by many to be the defacto API-centric alternative to Play, offering a Sinatra-esque DSL for routing and being slimmer to boot (from a files + LOC perspective).

After looking around I began suspecting that Play comes with the ability to be slimmed down. By combining the String Interpolating Routing DSL and Compile-time dependency injection of Play 2.4, I was able to build a Scala app that would give Sinatra a run for its money in terms of the whole brevity thing.

Methodology

All I did was:

Use activator to generate a new Play app ($ activator new slim-play play-scala)

Delete the auto-generated controller, public, and view directories (won’t be using them)

Create a AppLoader.scala file in the ./app directory, which holds an ApplicationLoader and the router, which is super simple:
Play app in 38 lines including imports and comments (AppLoader.scala) download

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
import play.api.ApplicationLoader.Context import play.api._ import play.api.libs.concurrent.Execution.Implicits._ import play.api.mvc.Results._ import play.api.mvc._ import play.api.routing.Router import play.api.routing.sird._ import scala.concurrent.Future class AppLoader extends ApplicationLoader { def load(context: Context) = new BuiltInComponentsFromContext(context) { /** * Simple & fairly self-explanatory router */ val router = Router.from { // Essentially copied verbatim from the SIRD example case GET(p"/hello/$to") => Action { Ok(s"Hello $to") } /* Use Action.async to return a Future result (sqrt can be intense :P) Note the use of double(num) to bind only numbers (built-in :) */ case GET(p"/sqrt/${double(num)}") => Action.async { Future { Ok(Math.sqrt(num).toString) } } } }.application }
4. Add play.application.loader=AppLoader to ./conf/application.conf so that Play knows to load our custom app (that contains our simple router)

The end result is a small, one-file Play app powered by a custom router and compile-time dependency injection. For more information, take a look at the slim-play repo on Github.

Conclusion

Play is an awesome framework; scalable, idiomatic (type-safe, threadsafe), well documented, and well supported by Typesafe and a great community. I’ve been happily using it to build various-sized apps for the better part of 2.5 years. If you want to have a well-structured app, it comes out of the box configured to provide that. However, it also has the surprising ability to shed weight and turn into a slim API-focused engine.

A word Sinatra-clones in Scala

Ruby is fairly ubiquitous when it comes to server-side web programming. Rails aside, Sinatra has made its mark on the world and made a name for itself as the DSL to mimic, with imitators in Ruby (Cuba), Python (Bottle, Flask), PHP (Laravel), Scala (Scalatra and its wrapper Skinny), and Javascript (Express). Thanks to its simple and easy to follow DSL routing, it’s gained a large following as well.

That said, blindly copying Sinatra’s DSL in other languages may be problematic, because Sinatra’s DSL relies on the Rack execution model (one request at a time per process/thread), and embraces Ruby’s spirit of developer happiness at the cost of performance. This is especially true in Scala, where the language was designed for concurrency and the community places heavy emphasis on adhering to a non-blocking execution model, eschewing mutation of data.

For example, I filed an issue with Scalatra a few months ago that was largely caused by indiscriminate copying of Sinatra’s DSL, as well being based on the Servlet async API (an intro to why we should move away from Servlets). Among other things, it led to:

Loss of thread-safety, meaning you can no longer take advantage of Scala’s strength in concurrency for scaling purposes (a lot of Scala libraries also return Futures when dealing with I/O, as they should).

Loss of static typing, which is terrible at design-time (IDE assistence and refactoring perspective), as well as runtime (performance). Scalatra apps are written in non-idiomatic Scala because the routing implementation takes an Any as the result of a route definition, including…yes, shutting down the Servlet container. In addition, it encourages you to mutate existing data (setting statuses on responses).

If you’re coming to Scala from Ruby and what you want is to build a small app using Sinatra-esque DSL in Scala, I would highly suggest evaluating Spray or slim-Play (as presented here) before choosing to go with Scalatra and friends: “Thar be dragons” in the long-run.

Enumeratum: Sealed Trait Enums for Scala

2015-02-11T16:47:00+00:00

If you’ve been working with Scala for a while, you might have come across a few “problems” with the built in Enumeration that’s provided out-of-the-box. This is especially true if you have colleagues who come from a Java background and yearn for the Java-style Enum that gave them lots of power and flexibility.

A quick search on the internet for “Scala enumeration alternative” will yield a lot of results (perhaps on StackOverflow) where people have cooked up their own implementation of enumerations, usually built on sealed traits. Personally, I found most of them to be either too inconvenient to use, too over-powered, or too complicated, and I really didn’t want to have to copy-paste enum-related code into all my projects.

Thus Enumeratum was born.

Enumeratum aims to be simple to use, idiomatic, small (LoC), yet flexible enough to allow Scala devs to make power enums if they so wish. It is also Mavenised for easy import into any project.

To use it, simply add it as a dependency

1 2 3 4 5
val enumeratumVersion = "1.4.2" // latest version number can be found on the Maven Central version badge on the Github repo. libraryDependencies ++= Seq( "com.beachape" %% "enumeratum" % enumeratumVersion, "com.beachape" %% "enumeratum-play" % enumeratumVersion // if you are using Play and want to avoid boilerplate )

Then

Enumeratum example code (enumeratum_example.scala) download

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
// For use in a REPL import enumeratum._ sealed trait Phone { def call(number: Int): String } extends EnumEntry case object Phone extends Enum[Phone] { case object Android extends Phone { def call(number: Int) = "This is Larry Page." } case object Iphone extends Phone { def call(number: Int) = "This is Steve Jobs." } case object WindowsPhone extends Phone { def call(number: Int) = "This is Bill Gates." } val values = findValues } import Phone._ // Use as needed. val myPhone = Iphone // Get exhaustive match warnings def rate(phone: Phone): String = phone match { case Android => "Great!" case Iphone => "Awesome!" } /* :17: warning: match may not be exhaustive. It would fail on the following input: WindowsPhone */

You get nice things like exhaustive match warnings at compile-time, enums with methods, no more Enum-value type erasure, and other nice stuff.

How it works

Some of the solutions for custom enums out there are based on macros that reflect at compile time using knownDirectSubclasses to find enum values, but as of writing, there is a 2 year old bug for that method.

As a result, Enumeratum uses another method of finding enum values: looking in an enclosed object to find the enum values. The macro behind findValues does this for you so that you don’t have to maintain your own collection of enum values, which is both error-prone and tedious.

Play

If you want to use Enumeratum in a Play app, you may as well add enumeratum-play as a dependency instead so that you can use the PlayEnum[A] trait (instead of Enum[A]), which will give you nice things like QueryStringBinders, PathBinders, form mappers, and Json Reads/Writes/Formats. To make use of this integration, just extend from PlayEnum instead of Enum in the above example.

This means less boilerplate in your project, which is A Good Thing, right?

Limitations

There are a few limitations with Enumeratum:

~~Ordinality is not taken care of. From what I’ve seen, this is one of the least-used functions of Enums in general. That said, nothing is stoping you from defining an Ordering in your companion object for your sealed trait.~~

Because the Enum values are case objects, they will be inferred to have their own specific type, which may cause problems with compilation for typeclasses that are not contravariant. In that case, simply help the compiler by adding a type (e.g. val myPhone: Phone = Iphone)

~~The method withName relies on the toString method of the Enum values for lookup. Make sure to override this if you have specific requirements.~~

Update 2016/04/22 Crossed out a bunch of limitations that no longer apply.

Enjoy

I hope Enumeratum can help you out of your Enumeration woes. Have a look, play around, and send a PR or two !

Scala Unless-when via Macros

2014-08-11T18:41:00+00:00

Last week, I decided to take a stab at learning Scala macros. I had played around with macros when I wrote Scheme for a living (yes, believe it or not, these places exist…and existed long before Clojure made Lisp hip again), but the complexity of Scala’s macros always put me off (if you don’t believe me, check out the example given in the offical docs for a simple print macro).

In Scala, things are not so simple, but with the introduction of quasiquotes and some refinements brought by Scala 2.11, things are smoother. Still, for a guy like me, the documentation was both sparse and DRY. Since I learn best when I’m actively engaged in building something, I decided to try writing the run-of-the-mill unless-when macros in Scala.

This post aims to summarise my journey towards implementing unless-when and hopefully along the way make Scala macros accessible, at least at an introductory level, for Most People. There are already a few Scala macro blog posts out there but another one can’t hurt.

Note: this blog post aims to explore macros as they are usable in Scala 2.10+. It also focuses on implementing macros with quasiquotes, as using them is more human-friendly than manually constructing Abstract Syntax Trees (AST).

Goal

For those unfamiliar with when and unless: the basic idea is that when is an if without an else, and unless is it’s opposite. The main reason for their existence is to make code more readable by adding a tiny bit of syntatic sugar. Without further ado, an example of what we want to achieve

1 2 3 4 5 6 7 8 9 10 11 12 13
/* This should replace if (true) { block of code } */ when (true) { // block of code } /* This should replace if (!true) { do something } */ unless (false) { // block of code }

Since we’re writing Scala, it would be nice if these constructs returned something useful; using the Option monad seems reasonable: If the block is run, we return the result in a Some and otherwise return a None. This tutorial is a good guide for Options in case you are unfamiliar with the concept.

Macro-y basics

Taking a look at the documentation, you will quickly notice the general pattern for implementing a simple Scala macro

1 2 3 4 5 6 7 8 9 10
import scala.language.experimental.macros import scala.reflect.macros._ object Example { def meth[A](x: A): A = macro implRef[A] ... def implRef[A: c.WeakTypeTag](c: Context)(x: c.Expr[A]): c.Expr[A] = ... }

What does this mean? Let’s break it down:

import scala.language.experimental.macros and import scala.reflect.macros._ are standard Scala imports that allow us to play around with macros. What’s not listed in this example is the declaration that your project depends on scala-reflect. You can do so by adding the following to your build.sbt:

libraryDependencies ++= Seq("org.scala-lang" % "scala-reflect" % scalaVersion.value)

def meth[A](x: A): A this is still just normal Scala code that we would normally see. It simply declares a method belonging to the Example singleton that is parameterised on the input type, and we want to make sure that the output type matches this type (e.g. if we invoke meth with an Int, we expect the output to be an Int because that is the contract of the method). For more info on writing parametric polymorphism, please check out this guide:

macro implRef[A] this is where things start looking macro-ish. The macro keyword lets the compiler know that the body of this method is going to be implemented via a macro definition, in this case implRef.

def implRef[A: c.WeakTypeTag](c: Context)(x: c.Expr[A]): c.Expr[A] .. wow. This itself needs to be broken down:

def implRef[A: c.WeakTypeTag] The first part def implRef is still standard Scala

(c: Context) (we’ll cover [A: c.WeakTypeTag] in a bit). In this part, (c: Context) declares that the first argument passed to the macro implementation must be a Context. This is a requirement for playing around with Scala macros, and is actually passed by the compiler when it invokes macro expansion, so that you can write code that accesses the compiler API.

[A: c.WeakTypeTag] This is a bit mischievous because we combine Scala-shorthand for typeclasses with macro-magic. This probably deserves a post in and of itself, but for now, please consider this to mean “A is a type parameter passed during macro invocation, but we must ALSO have in scope a WeakTypeTag coming from the Context that is parameterised to type A, which can be written in full as c.WeakTypeTag[A]”. This WeakTypeTag business is required so that we can pass along the type parameter from meth into the implRef macro expansion implementation, allowing us to have a type parameterised macro definition.

For more information on type classes and the shorthand we use here, I highly recommend this blog post on type classes

(x: c.Expr[A]) means that the first non-Context parameter of the macro implementation (remember that the first one is always taken by the compiler and must be a Context) is x and it is a c.Expr[A]. It is important that the name of the parameter matches that used in the invoking method (see how meth also has x as the first parameter). c.Expr is type of object that wraps the abstract syntax tree that represents the input to the invoking function, and it is typed to A.

NOTE: since the argument’s type is c.Expr (essentially an abstract syntax tree), any expression passed to the method meth actually may not get invoked or evaluated even though it is not a pass-by-name parameter. In other words, while the macro is expanding, it acts like a pass-by name parameter and is “lazy”.

: c.Expr[A] all this means is that the result of the macro expansion is also a c.Expr type parameterised to A.

Quasiquotes

Quasiquotes are not a Scala-exclusive construct, and a Google search will show that they are used in other languages that support metaprogramming, like Scheme.

In short, they offer the macro programmer an easy way to manipulate or create abstract syntax trees without having to build them manually. This makes them extremely helpful in Scala because: 1. Scala syntax does not map to ASTs easily like Lisps 2. Scala is typed, which means your manually-built AST also needs typing…which wraps non-macro-land types (notice how a normal type parameter like [A] becomes c.Expr[A] … that’s twice as many characters !)

Quasiquotes allow us to use string-interpolation-like syntax to interpolate elements into a tree as we define it.

For example:

1 2 3 4 5
scala> val aquasiquote = q"a quasiquote" aquasiquote: universe.Select = a.quasiquote scala> val tree = q"i am { $aquasiquote }" tree: universe.Tree = i.am(a.quasiquote)

The above example was taken from the official documentation on quasiquotes, which I highly recommend you take a look at if you find the rest of this post hard to follow.

Implementation

For when, we know that we roughly want the following:

1
when (someCondition) { result }

To expand via our macro into the following (yes we are using an inline if .. if you don’t like it, pretend we didn’t)

1
if (someCondition) Some(result) else None

Using what we know, the following should work:

1 2 3 4 5 6 7 8 9 10 11 12
import scala.language.experimental.macros import scala.reflect.macros._ object Example { def when[A](p: Boolean)(f: A): Option[A] = macro whenImp[A] def whenImp[A: c.WeakTypeTag](c: Context)(p: c.Expr[Boolean])(f: c.Expr[A]): c.Expr[Option[A]] = { import c.universe._ c.Expr[Option[A]](q"if ($p) Some($f) else None") } }

Implementing unless is left as an exercise for the reader :)

Trying it out

Putting the above into a Scala REPL (you will probably need to use :paste mode) will prove that it works.

For example:

1 2 3 4 5
scala> import Example._ import Example._ when(true)(3) res1: Option[Int] = Some(3)

Also, remember that since our when is backed by a macro, the f argument (our block) passed to the second parameter list, behaves “lazily” and won’t execute if our predicatep returns false. This is because when when is invoked, the compiler knows to pass the entire AST for that block parameter (well, wrapped inside a c.Expr) to our macro, which interpolates the it into the final tree.

For the performance-conscious, this means that we get “lazy” for free; that is, without using Scala’s call-by-name parameter feature, which, although nice to use in many cases, does incur some run-time performance penalty because it is implemented by instantiating anonymous classes (see this paper for more information about the performance cost of call-by-name parameters .. among other performance-related Scala things).

unless-when library

I’ve put the above into a library and included trailing variants of when and unless as bonuses (Rubyists should be familiar with these).

You can find the lib here on Github. It is fully tested and Mavenised for easy out-of-the-box usage.

Conclusion

I hope this post has been helpful in giving a simple, but full example of how to get started with macros in Scala. If you spot any errors, have questions or suggestions, please feel free to leave a comment!

RxScala and Schwatcher

2014-05-03T17:44:00+00:00

A couple days ago, I released v0.1.3 of Schwatcher, which introduces the ability to monitor events on file paths using a composable Rx Observable interface. “What does that even mean and why should you care?” is what this blog post tries to answer.

The original version of Schwatcher allowed you to tell a MonitorActor what callback you want to fire when a certain type of event happened on a file path. This is fine and there are people out there using it in production as is. The limitation to this approach is that (at least by default), the events are difficult to treat as data and thus difficult to compose.

With Rx, we turn file path events into an asynchronous stream/channel. Essentially, you tell a RxMonitor object what path and event type you want to monitor and when an event happens, it will get pushed into its observable (the stream). You can then choose to filter, map, or fold over this data stream, creating new data streams. If you wish to cause side-effects, you can add one or more observers to these data streams.

Note: this blog post applies to v0.1.3 of Schwatcher, which uses v0.18.1 of RxScala. Future versions may introduce breaking changes that invalidate the examples in this blog post.

Example

Suppose we have the following directory structure:

1 2
directory1 - directoryFile1

Let’s set up an RxMonitor object to monitor for file creation and modifications events in directory1 (note: all operations on RxMonitor objects are thread-safe).

While we’re at it, let’s grab the base observable from the monitor as well. Note that this Observable will, according to the registerPath and unregisterPath calls made to its parent RxMonitor, push all EventAtPaths to its Observers. More on what an Observer is later, but for now, think of an Observable as a data stream and an Observer as an object gets pushed new objects from the Observable that it is, well, observing.

1 2 3 4 5 6 7 8 9 10 11 12
import com.beachape.filemanagement.RxMonitor import com.beachape.filemanagement.Messages.EventAtPath import java.nio.file.Paths import java.nio.file.StandardWatchEventKinds._ import rx.lang.scala.Observer val monitor = RxMonitor() val observable = monitor.observable val directory1 = Paths get "/Users/lloyd/Desktop/directory1" monitor.registerPath(ENTRY_MODIFY, directory1) monitor.registerPath(ENTRY_CREATE, directory1)

Let’s create 2 more Observables. Let’s make one called createsOnly that will only care about create events in the directory and another one called scalaSourceCreatesOnly that only cares about create events for files ending in .scala. Notice that we’re composing here :)

1 2
val createsOnly = observable.filter(_.event == ENTRY_CREATE) val scalaSourceCreatesOnly = createsOnly.filter(_.path.toString.endsWith(".scala"))

Now, let’s create some basic Observers that we can pass to the subscribe method of our new Observables. An Observer at minimum implements an onNext function, which takes an element that will be pushed to it from the Observable that it subscribes to and returns nothing (Unit). It may optionally implement onError (a function which takes a Throwable as an argument and returns nothing) and onCompleted (0 argument function that is called when the Observable it is subscribed to is finished and will no longer send further objects):

attach observers (attachObservers.scala) download

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
val createAndModifyObserver = Observer[EventAtPath](onNext = { event => println(s"Something was created or modified: $event")}) val createOnlyObserver = Observer[EventAtPath](onNext = { event => println(s"Something was created: $event")}) observable.subscribe(createAndModifyObserver) createsOnly.subscribe(createOnlyObserver) /* * The same as * {{{ * val createScalaOnlyObserver = Observer[EventAtPath](onNext = { event => println(s"A Scala source file was created: $event")}) * scalaSourceCreatesOnly.subscribe(createScalaOnlyObserver) * }}} * * The same as declaring an Observer separately and attaching it via #subscribe (as seen above), * since Observer as a type is just a way of binding 3 different functions, onNext, onCompleted, and onError * */ scalaSourceCreatesOnly.subscribe(onNext = { event => println(s"A Scala source file was created: $event")})

Now let’s make stuff happen in another terminal.

1 2 3 4
$ ~/Desktop/directory1: touch hello $ ~/Desktop/directory1: echo lol >> hello $ ~/Desktop/directory1: touch speedy.scala $ ~/Desktop/directory1: echo 'println("hmm")' >> speedy.scala

The following will be outputted

1 2 3 4 5 6 7
Something was created or modified: EventAtPath(ENTRY_CREATE,/Users/lloyd/Desktop/directory1/hello) Something was created: EventAtPath(ENTRY_CREATE,/Users/lloyd/Desktop/directory1/hello) Something was created or modified: EventAtPath(ENTRY_MODIFY,/Users/lloyd/Desktop/directory1/hello) Something was created or modified: EventAtPath(ENTRY_CREATE,/Users/lloyd/Desktop/directory1/speedy.scala) Something was created: EventAtPath(ENTRY_CREATE,/Users/lloyd/Desktop/directory1/speedy.scala) A Scala source file was created: EventAtPath(ENTRY_CREATE,/Users/lloyd/Desktop/directory1/speedy.scala) Something was created or modified: EventAtPath(ENTRY_MODIFY,/Users/lloyd/Desktop/directory1/speedy.scala)

Lastly, since we’re done, let’s call the stop() method on the RxMonitor object so that subscribed Observers are notified and we stop the underlying MonitorActor as well. Cleaning up is A Good Thing (TM).

1
monitor.stop()

Conclusion

I hope this post has demonstrated the power of using RxScala’s Observable as an abstraction of asynchronous events into a tangible data structure, and how using it through Schwatcher might simplify the process of building your own applications. If you have any questions or spot any mistakes, please feel free to leave a comment.

Schwatcher v0.1.3 Released

2014-05-02T00:10:00+00:00

Version 0.1.3 of Schwatcher has been released.

This version brings a new Observable interface that exposes a “stream” (or channel) of EventAtPaths that can be composed. Using this interface, you no longer need to register callbacks - you simply register paths and get notifications for events on them either by subscribing to the Observable or by composing.

For more information on how to use Observables (especially how they compose in awesome ways), checkout the Rx homepage

Example usage:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
import com.beachape.filemanagement.RxMonitor import java.io.{FileWriter, BufferedWriter} import java.nio.file.Paths import java.nio.file.StandardWatchEventKinds._ val monitor = RxMonitor() val observable = monitor.observable val subscription = observable.subscribe( onNext = { p => println(s"Something was modified in a file mufufu: $p")}, onError = { t => println(t)}, onCompleted = { () => println("Monitor has been shut down") } ) val desktopFile = Paths get "/Users/lloyd/Desktop/test" monitor.registerPath(ENTRY_MODIFY, desktopFile) Thread.sleep(100) //modify a monitored file val writer = new BufferedWriter(new FileWriter(desktopFile.toFile)) writer.write("Theres text in here wee!!") writer.close // #=> Something was modified in a file mufufu: /Users/lloyd/Desktop/test // stop monitoring monitor.stop() // #=> Monitor has been shut down

Relevant links: - Github page with how to install and example usage - Release page

Schwatcher v0.1.0 Released

2014-03-06T21:45:00+00:00

Version 0.1.0 of Schwatcher has been released.

Changes:

Thanks to this pull request from georgeOsdDev, modifiers can be used when registering callbacks

Upgrade to 2.3.0 of Akka

Relevent info:

Schwatcher Github repo

BeachApe.

Containerising an Octopress 2.0 Blog

Overview

References vs what’s new

Target audience

Dockerfile

Ubuntu 22.04

Installing Ruby 2.3

Gemfile.lock file for project dependencies

Rakefile

Docker

With Rancher Desktop

Makefile to work with all of the above

Conclusion

Structural Typing in Rust

Re: radio silence

Overview

Structural typing, you say?

Show me yours

Quick review of LabelledGeneric

Detour: Plucking by labelled field

PathTraverser

Path, path! and Path!

Value-level

Type-level

Another example

Conclusion

Footnotes

Rust: A Scala Engineer's Perspective

A bit about me

Forewarning

Overview

Things I’m happy with

Batteries included

Editor/IDE

Type System

Type inference

Macros

Compile-time optimisations

Syntax

Interoperability with C

The current Zeitgeist

Things I’ve adjusted to

Semicolons

Ownership model: Stack vs heap

Mutability

Things I wish were different

Async IO

Strings

Cross compiling

Odd headscratchers

Do we actually need ref?

&mut

Scoping of lifetimes with braces

Gimme

Conclusion

Boilerplate-free Struct Transforms in Rust.

Adding Frunk to your project

Boilerplate-free conversions between Structs

Conclusion

Gentle Intro to Type-level Recursion in Rust: From Zero to HList Sculpting.

“Type-level recursion”?

Outline

Basic Gist of Rust typeclasses (traits)

Dependent trait implementations

Mental model for type-level recursion

Recursion on the value level

Recursion on the type level

What the Frunk?

Plucking from HLists

Implementation intuition

First attempt

Second attempt

Type-level walkthrough

Sculpting HLists

Implementation intuition

First attempt

Second attempt

Conclusion

Credit

`Dockerfile`

Quick review of `LabelledGeneric`

`PathTraverser`

`Path`, `path!` and `Path!`

Do we actually need `ref`?

`&mut`

Silent runtime errors with `Generic`

`LabelledGeneric` to the rescue

`Field` ??

`Field`

`Field` and `LabelledGeneric`

How the `LabelledGeneric` derivation is generated

The trouble with `Enumeration`

`ValueEnum`