Advent of Code Day 7 in Hare

Posted on 2022-12-15

hare

Lots of people are doing Advent of Codehttps://adventofcode.com/ 2022. Lots of people also blog about doing AoC in different languages. Solving these puzzles in C is probably not that interesting, but that’s what I’ve been doing - until I decided to attack them with Hare.

Tackling the day 7 puzzle in Hare was interesting, because it showcases a fair few of Hare's strengths: parsing an input stream of lines of bytes, managing some memory, string handling and tagged unions. So as an excuse to write about Hare, I'll write about my solution for this puzzle.

I’ll only be talking about modeling and parsing the puzzle input, not computing the actual answers.

If you don’t know, Harehttps://harelang.org/ is a relatively new systems programming language that aims to be small in size, minimal in features, fast to bootstrap, and get productive with. Now you know.

You should read the puzzle description yourself, but the gist of it is that we are looking through file system at directories (with names) and files (with names and sizes). A directory can contain files and more directories (duh). The puzzle input is a sort of transcript of a command-line session where the only commands issued are “cd ” and “ls”. At this point, I already had an idea on how to represent a file system in Hare - with tagged unions.https://harelang.org/tutorials/introduction/#tagged-unions-in-depth A tagged union, as the name should imply, is the union of multiple types treated as a single type. For example, (str | void) could be a way to indicate an optional string - a variable of this type is either a str or void. The language then has facilities at runtime to test which it really is.

Back to our file system:

type inode = (*dir | *file);

type dir = struct {
    name: str,
    parent: (*dir | void),
    children: []inode,
};

type file = struct {
    name: str,
    sz: size,
};

This representation says that each file system entry (inode) is either a directory or file. The dir type has a parent pointer which must be another dir, but its children are just inodes.

I used pointers here, because we’re going to allocate each inode, so just modeling everything with pointers seemed simpler. From my experience with Hare, when you’re laying out the data structure you’ll use, it’s often useful to write the function that frees any allocated space for those data structures. Here’s how we free our file system:

fn freeinode(e: inode) void = {
    match (e) {
    case let f: *file =>
        free(f.name);
        free(f);
    case let d: *dir =>
        for (let child .. d.children)
            freeinode(child);
        free(d.children);
        free(d.name);
        free(d);
    };
};

Now we have a data structure for our file system. Let’s parse some input! Usually, I would take the puzzle input and rewrite it to something that I could just embed directly in the program, to sidestep any input parsing completely. This lets me focus on the algorithm that we need to implement and not spend any time on parsing lines and bytes from stdin. But for this one, it felt like an obvious line-oriented parser that should be easy to implement.

We can use bufio::read_line with os::stdin to get lines from stdin and parse them one by one. I started with this skeleton:

let root: *dir = alloc(dir {
    name = strings::dup("/"),
    parent = void,
    children = [],
});
defer freeinode(root);

let cwd = root;

for (true) match (bufio::read_line(os::stdin)) {
case let buf: []u8 =>
    defer free(buf);
    // Parse the line here
case io::EOF =>
    break;
case let err: io::error =>
    fmt::fatalf("error: {}", io::strerror(err));
};

First, we create a root dir and defer the call to freeinode(root). This ensures that we will free the entire file system tree when our program exits. We’ll keep track of our current directory with cwd, and then consume stdin one line at a time.

To parse the buffer, there are two main cases to consider: A command line, or output line. If the line starts with ‘$’, it’s a command, either “cd” or “ls”. Here’s how command lines are handled:

let cmd = strings::fromutf8(buf[2..4])!;
switch (cmd) {
case "cd" =>
    let to = strings::fromutf8(buf[5..])!;
    if (to == "/" && cwd == root) {
        void;
    } else if (to == "..") {
        assert(cwd != root);
        cwd = cwd.parent as *dir;
    } else {
        let subfolder = match (finddir(cwd, to)) {
        case let d: *dir=> yield d;
        case void =>
            let subfolder = alloc(dir {
                name = strings::dup(to),
                parent = cwd,
                children = [],
            });
            append(cwd.children, subfolder);
            yield subfolder;
        };

        cwd = subfolder;
    };
case "ls" => void;
};

A few obversations to make here: strings::fromutf8 is really useful here - it allows us to construct strings from the buffer without allocation. Coupled with Hare’s slicing syntax, it’s really easy to make strings and switch on them instead of mucking around with bytes arrays. Both “cd” and “ls” commands are two characters so we’ll just focus on those two.

You’ll note that nothing is done for “ls”. We need to to parse its output which appears on subsequent lines. “cd” command need a destination (to) that we can fish out as another str. We do a little special-casing on the root folder, then check if the destination is “..” which just directs us to make the parent the current directory.

The finddir() function is a little helper function to figure out if a directory we’re to cd into already exists. If it does, we just use it, otherwise we allocate a new dir. Note the call to strings::dup to get an owned copy of the folder name.

fn finddir(d: *dir, name: str) (void | *dir) = {
    for (let child .. d.children) {
         match (child) {
         case let d0: *dir =>
             if (d0.name == name)
                return d0;
         case => void;
         };
    };
};

finddir() also makes use of tagged unions to model an optional value.

Now we have handled command lines, all that remains is to handle output from “ls”. Output lines either start with “dir” followed by a space and then some name, or they start with a number, followed by a space and then some name again. In either case, we can split on a single space with strings::cut and inspect the parts.

Again, we’ll use strings::fromutf8 to get a str to work with and go from there:

let line = strings::fromutf8(buf)!;
let (what, name) = strings::cut(line, " ");

if (what == "dir") {
    if (finddir(cwd, name) is void) {
        let subfolder = alloc(dir {
            name = strings::dup(name),
            parent = cwd,
            children = [],
        });
        append(cwd.children, subfolder);
    };
} else {
    let sz = strconv::stoz(what)!;
    let file = alloc(file {
        name = strings::dup(name),
        sz = sz,
    });

    append(cwd.children, file);
};

If the output line is a dir line, we create the dir entry only if it’s not already known. The finddir() function comes in handy here, as does the “is” operator that lets us check if the tagged union returned from finddir() matches a specific type. We could have used a match statement here, but we’re only interested in the “is void” case, and we get to showcase another feature of working with Hare’s type system.

Lastly, we get to use yet another stdlib function, strconv::stoz, to convert a str to a number, so we can get the file size.

Now we have all the bits parsed and can begin working with the data structure we’ve allocated, and actually solve the puzzle. But that’s for another post. Or do it yourself. The point of this post is to underline just how well Hare’s language features and stdlib work together to make line-oriented byte stream parsing simple and pleasant. You’ve seen memory management, string handling, slices, tagged unions, match statements, and no fewer than six stdlib modules (strings, os, io, bufio, strconv, and fmt) come into play. In particular, the string handling deserves another callout: being able to construct a str whose contents are borrowed from some provided buffer is genius. You get to leverage the language level support for strings without incurring the cost of allocation, and when you do need to own the str, just dup() it.

Writing this program really felt like I was just composing bits that were already present and did precisely what I wanted them to. I believe this is exactly how a good language should feel. In summary, I really enjoyed this little puzzle - mostly the parsing it turned out, but in any case, fun was had.

The full solution is here.