Zig Patterns: Iterators


Imagine that you are writing a piece of code that parses the header of some kind of message, and the header can contain any number of key-value pairs. How would your API look for returning these pairs to the caller?

When I was writing an HTTP request parser in Zig, my first instinct was to reach for a hash map. Although this works, I believe that there is an alternative approach that fits the generic case a more neatly. The problems that I have with using a hash map are:

  • A hash map requires memory allocations. Whether we use a managed or unmanaged appracoach, we’ll still ask the caller to pass us an allocator at some point, as the hash map requires dynamic growth
  • Apart from us writing a doc comment to explain, the caller can’t know what the lifetimes of the keys and values in our hash map are
  • What if the entire header is not passed to us at once, but instead streamed to us in chunks? Note: Take a look at the source code for std.http.HeadParser if you’re interested in how the std lib solves this
  • What if the caller is only interested in a subset of the headers in the request? We always parse all headers, whether they are needed or not

Iterators

An approach that has become a bit of a Zig idiom is the “iterator”. An iterator typically has a next() method that you can repeatedly call to iterate over some set of values. You can see this pattern used in the standard library, std.mem.SplitIterator for example.

Here’s what that theoretical message head parser might look like if we took the iterator approach:

const Header = struct {
    key: []const u8,
    value: []const u8,
};

const HeadIterator = struct {
    pos: usize = 0,
    src: []const u8,

    pub fn next(self: *HeadIterator) ?Header {
        while (self.pos < self.src.len) {
            // logic for locating key-value pairs...
            return .{ .key = key, .value = value };
        }
        return null;
    }
};

// ...

var iter: HeadIterator = .{ .src = src };
while (iter.next()) |kvp| {
    // do something with kvp...
}

const std = @import("std");

The code ends up being flexible enough to let the caller do what they want with it, and pretty self explanitory to boot:

  • It is clear that no allocations are taking place, because we never ask the caller for an allocator
  • Any confusion about lifetimes is gone, since there is no allocation happening
  • The caller is in control of what they do with the headers as they are returned them
  • The implementation of extracting the headers is still abstracted away from the caller
  • The caller is not obligated to iterate over all the headers, we only do the work that they ask for

Of course, whether or not you choose this approach depends on what you are writing, but for a generic use case iterators are great because of the lack of assumptions they make about how the caller wants to use them.