4th Wall Programming – A Cursory Look at Compile-Time Metaprogramming and Rust Declarative Macros

It is not a rare thing for a class in academia to not cover the full contents of the literature it aims to cover. Often times, especially in my primary education, I eagerly awaited for that one chapter about World War II tucked away in the back of my world history book, or that one penultimate chapter in my math book that covered matrix math (not for any good reason mind you, likely because I played Medal of Honor and watched The Matrix the night before). Even with these naive, pure-hearted intentions, the teacher is under time constraints and movie days, so he/she has to send us off without such knowledge, with the peace of mind that, at the very least, we were taught Magna Carta and long division.

Granted, some classes have such a large scope that it is nothing short of understandable when some part of the syllabus gets the chopping block. Such was the case later on in academia for me as well when taking a community college introductory course for programming in C. I was taught all the basics of the procedural programming language: functions, structs, pointers, types, standard library interactions. It was not until I took higher level programming courses at university that I finally began to realize that course failed to teach me about macros.

It was at this moment that I was exposed to macros and the concept of metaprogramming for the first time. In the beginning, I couldn’t quite see the value in macros. Nothing about them seemed to be able to contribute directly to the solution of my coding assignments. Understandably, this is a natural reaction that most novice programmers exhibit, because they are often objective-oriented, seeking to solve a problem to get a good letter grade for their assignment. They can likely accept the value of abstractions like classes and OOP, and even make use of them to some extent, but metaprogramming may be just one too many layers of abstraction for the dude looking to get through his programming assignment. The code does not describe the solution, but is code that describes the code that describes the solution. This is metaprogramming, and it is because of these reasons it is often overlooked by syllabi and students alike.

But what is the value in all of this? Why the concern for code that produces more code, not an outright solution? Why add such obtuse layers of abstraction to writing code? Look no further than these important metaprograms that some readers might use on a daily basis:

  • Compilers (GCC)
  • Interpreters (Python, NodeJS)
  • Assemblers
  • Domain Specific Languages

Many of these programs are infinitely important on numerous levels, and have stood the test of time. Without metaprograms, would would not be able to turn our glorified high level language text files into the machine code needed to inform the machine what to do. These are the very tools that make most of our work a reality in the field of computer science.

This technique itself not only produces these important tools, but also reduces code size and complexity of everyday projects and libraries, as well as allowing users to define languages and syntax extensions within code for freedom and succinct solution expression.

This blog post will concern some macro concepts in computer languages and metaprogramming. It will showcase Rust, a compiled, typed, memory safe systems programming language that has an amazing macro system that really changes the game compared to C macros. The reader might need some basic knowledge of Rust in order to parse through some of the finer points of the post.

A Look at C Macros

So, let’s get this out of the way. The macro. What is it, and how does it contribute to metaprogramming?

Macros, generally speaking, are computer programming tool used as a sort of mapping of one input to another, usually larger, output. Macros are important in the space of compile-time metaprogramming for this reason. They allow for programmers to define tokens to translate to code output that is often too verbose, and too menial for what a developer is willing to type out. Often times this expansion of macros programmable in and of itself, conditionally expanding, accepting parameters.

C implements this by allowing programmers to communicate with the compiler’s preprocessor with preprocessor directive. These begin with a pound sign, followed by the name of the directive. The titular #define directive is the means which macros are defined in C. The following defines a collection of tokens that, when used together, compose a “Hello, world!” program that isn’t C code, but is expanded to be so because we communicate with the preprocessor what proper code it expands to.

#include <stdio.h>

#define MAIN int main()
#define BEGIN {
#define END }
#define PRINT_TO_SCREEN(string) printf("%s", string);

MAIN
    BEGIN
        PRINT_TO_SCREEN("Hello, world!\n")
    END
C

We can see the results of the macro expansion here by passing a flag to the C compiler with gcc -E macros.c

// ...
# 8 "macros.c"
int main()
    {
        printf("%s", "Hello, world!\n");
    }
C

We operate on a different level of abstraction here compared to a literal “Hello, world!” program in C. We express in our own way, not strictly in C, how we want to express instruction to write these words to screen, informing the compiler’s preprocessor the means of expanding it code it can understand. Rather than write explicit C code, we pass the task off to GCC to fill this in for us, allowing it to fill in C code on our behalf. Consider this a metaprogramming introduction in C.

These macros definitions can evaluate to possibly incredibly large outputs of C code. They can template and conditionally expanded in response to certain build flags, to the point where the mission is not really to directly code your application, but to code the compiler to code your project for you! Take an example of this common conditionally expanded macro for debug statements in a C project.

#ifdef DEBUG
#define debug(...) fprintf(...)
#else
#define debug(...) /* Expand to nothing */
#endif
C

It is at this point where your communication with the compiler takes on a life of its own that is not a direct “give me correct code to compile and I will translate it into machine code for you.” The preprocessor begins to not only replace your tokens, but conditionally expand in the presence of certain variables and instructions. It is here where we get to the heart of metaprogramming.

Downsides of C Macros

C macros have a nasty reputation for making code not only hard to read, but also harder to debug as well. One of the bigger gripes with C macros is that they are not “hygienic”. Hygiene refers to the phenomena where a macro’s expanded form conflicts with the scope of the code it is expanded in. Observe the following obtained from the wikipedia page for macro hygiene:

// We have a macro defined here that initializes a
#define INCI(i) do { int a=0; ++i; } while (0)

int main(void)
{
    // We also initialize a here
    int a = 4, b = 8;
    INCI(a);
    INCI(b);
    
    // A clash of scope happens, and we fail to increment a, but
    // increment b
    printf("a is now %d, b is now %d\n", a, b);
    return 0;
}
C

The reality is that C does not differentiate from the scope of the macro and the scope where it is expanded resulting in pernicious bugs that plague developers.

Diving Deeper into the Compiler and the Future with Rust Declarative Macros

C macros are a great and necessary invention, but they have their limitations, like lack of hygiene. Although you could avoid these mistakes by just handling macros responsibly, a critical fact remains: C macros can only be evaluated to C code at the end of the day, and there is little confidence that any bit of C code is completely memory safe.

Enter Rust, a language created at Mozilla by Graydon Hoare, solving not only the memory safety issues inherent to C, but solving concurrency issues in the same breath, as well as supplying the bare-metal performance as well. Despite its age compared to C, Rust has proven itself to be a powerful systems programming language. Clive Thompson of the MIT Technology Review writes that Rust has found its way in not only the Mozilla Firefox browser, but also powerful, performant applications, such as Firecracker, which powers the serverless execution of AWS Lambda through virtualization, and Dropbox’s (rewritten!) sync engine for distributed sharing of content. Cloud platforms, switching over to Rust solutions, find themselves using half as much electricity compared to running Java based solutions. Where performance matters in high-load applications where resources need granular management, Rust stands ready, but also willing to reprimand and correct you for excreting out unsafe code in mission critical applications.

The importance of Rust cannot be understated, but this blog post is not about Rust, but about macros. Rust is much more than just the safe C/C++ alternative: it builds upon C in so many ways-one of which is an expanded macro system. Not only are these macros extremely powerful and expressive, they are hygienic as well.

Rust Declarative Macros

For the those that are unaware, Rust macros come in two forms, declarative and procedural. For the sake of brevity, I will be discussing the declarative variety in this blog post.

C macros generally do not play an extremely pivotal role in any exposed library API; one might never have to directly use a macro from a library compared to functions, structs, or other library constituents. Compare this with Rust, which unveils a slew of different declarative macros not only in it’s standard library, but also it’s prelude (default imports used by a Rust program). Even the basic “Hello, world!” program for Rust uses a macro, not a function, to print those timeless worlds to screen.

fn main() {
    println!("Hello, world!");
} 
Rust

Yes, macro usage starts with a beginner “Hello, world!” program in Rust! The exclamation mark marks this invocation of said macro named “println.” We can expand these macros just like we did with C to reveal the mapped large output its use is expanded to. Compiling with a nightly version of Rust with rustc -Zunpretty=expanded main.rs reveals the expanded form:

#![feature(prelude_import)]
#[prelude_import]
use ::std::prelude::rust_2015::*;
#[macro_use]
extern crate std;
fn main() { 
    { ::std::io::_print(format_args!("Hello, world!\n")); }; 
}
Rust

Beginner Rustaceans might have run across other macros as well when coding simple Rust programs. All of these macros below evaluate to basic functionality, and all of it is exposed to people learning the language early on because of how prominent they are throughout Rust standard library and prelude. Building vectors, printing debug information, assert statements, and killing the process in response to irrecoverable errors are all facilitated through declarative macros! Observe the macros below, as well as their expanded forms.

fn main() {
    // Building a dynamically allocated array with vec!
    let my_vec = vec!(1,2,3,4,5);
    
    // Printing verbose debug information, anything implementing Debug 
    // trait can be printed to screen.
    // This would print file, line number, variable, contents of array...
    // [src/main.rs:7] my_vec = [1,2,...]
    dbg!(my_vec);
    
    // Assert statements in Rust use this macro. Like any other assert,
    // the program will stop if the condition is not true.
    assert_eq!(2 + 2, 4);
    
    // Fatal errors that kill in Rust can be triggered by panic! macro.
    // Sometimes, you just need to stop the program due to irrecoverable
    // error. Out of bounds access of arrays, failed asserts, etc.
    panic!("Crash and burn");
}
Rust
#![feature(prelude_import)]
#![no_std]
#[prelude_import]
use ::std::prelude::rust_2015::*;
#[macro_use]
extern crate std;
fn main() {
    // vec!
    let my_vec =.
        <[_]>::into_vec(#[rustc_box] ::alloc::boxed::Box::new([1, 2, 3, 4,
                        5]));
    // dbg!
    match my_vec {
        tmp => {
            {
                ::std::io::_eprint(format_args!("[{0}:{1}] {2} = {3:#?}\n",
                        "main.rs", 9u32, "my_vec", &tmp));
            };
            tmp
        }
    };
    
    // assert_eq!
    match (&(2 + 2), &4) {
        (left_val, right_val) => {
            if !(*left_val == *right_val) {
                    let kind = ::core::panicking::AssertKind::Eq;
                    ::core::panicking::assert_failed(kind, &*left_val,
                        &*right_val, ::core::option::Option::None);
                }
        }
    };
    
    // panic!
    { ::std::rt::begin_panic("Crash and burn"); };
}
Rust

As you can see, macros do their job here and make code short and sweet, mapping a shorter input to a larger one. This is not to say that macros are the only way to print to screen or create dynamically sized arrays with values, but they are certainly a terse way of doing so compared to normal means.

Now, if it was not for me telling you outright what a declarative Rust macro looks like, along with examples, one could easily mistake them for a function. For the most part, the macros available in Rust std/prelude do look very similar to functions! Even so, If one were to continue to explore the Rust macro-space, learning their inner workings and investigating non-std library macros, that thought is thrown out the window promptly. Consider the following: Why can Rust macros be invoked with brackets and curly braces? How is vec! able to accept arbitrary number of arguments? How can the “arguments” passed to this invocation break every known rule of Rust syntax?

fn main() {
    // Macros can be invoked by passing "arguments" 
    // to them inside [], {}, as well as (), 
    // while normal Rust functions only use ()
    // These invocations are all correct!
    println!{"Using curly braces..."};
    println!["Using brackets..."];
    println!("and finall using parenthesis!");
    
    // Some macros, like vec!, can accept arbitrary 
    // number of arguments. Rust typically does not support 
    // this type of behvior - variadic functions
    let empty_vec = vec!();
    let vec_with_nums = vec!(1,2,3,4,5,6,7,8,9,10);
    
    // In addition, some macros seem to outright 
    // violate the syntax of function invocation.
    // Arguments passed to functions in Rust and most other langs
    // are delimited with comma, but here we find semicolon?!
    // This macro creates a vector of a hundred zeros values 
    // of unsigned 32 bit integer type (u32).
    // This is actually doable with simple arrays.
    let hundred_zeros = vec!{0u32 ; 100};
    
    // This is a macro from diesel, an external crate or package on
    // crates.io. An ORM for Rust. Note that we almost seem to be passing a   
    // created struct here, but then there are arrows. These seem out of
    // place once you consider they should only be used in function 
    // signatures to denote return value, but here they are outside
    // of that.
    // What is going on with this!?
    // Source: 
    // https://docs.rs/diesel/latest/diesel/prelud/macro.table.html
    diesel::table! {
        users {
            id -> Integer,
            name -> VarChar,
            favorite_color -> Nullable<VarChar>,
        }
    }
}
Rust

It becomes fairly obvious now that, Rust macros, while they might seem to be functions, behave nothing like them. The vec! and table! macros here even reveal that they also seem to be syntactically outside of Rust itself.

How Declarative Macros Work

As previously stated, C macros are a product of the preprocessor stage of compilation. Tokens communicated by the #define directive are expanded lexically. Because of this, macros in a sense never touch the compiler, since the preprocessor will expand any macro to its appropriate C code beforehand, hence “preprocessor”.

Rust macros, on the other hand, are not expanded at the preprocessor stage. Rust delays the expansion of its macros well into compilation of the code. In fact, macro code gets parsed into tokens along with the rest of the code as compiler applies its rules into an abstract syntax tree, or AST. Now, this seems strange given what we just witnessed concerning a macro’s ability to violate Rust syntax by using semicolons and “->” incorrectly. Certainly the word syntax must mean something in an abstract syntax tree, right?

The reality is that the Rust compiler “turns a blind eye” when it sees macro invocation for a time. It parses the macro, along with whatever arbitrary tokens it might find that follow it into a subtree, but will put off trying to understand any of it. This is why the inside of these macros have completely different syntax; they are understood to be syntax extensions by the compiler. The “arguments” we’ve been passing inside macros are actually just arbitrary tokens collections. Enclosed parenthesis, curly braces, brackets might serve different purposes throughout Rust, but at the end of the day, they all are understood to be a collection of tokens, so in a sense, we are passing collections of tokens.

Even though the code might be a syntax extension that does not follow the rules of Rust syntax, all of these tokens with the macro must be mapped, at some point, to applicable Rust code. Expansion to actual Rust code inside the AST is done via a pattern matching system that is part of where the macro is defined. This not unlike that of pattern matching you would find in C# or Rust match syntax that acts as a smart if/else or switch statement, except this pattern matching returns Rust code based on token input. The compiler, after builing out the AST, will parse out the tokens at the macro invocation, and attempt to match it to corresponding Rust code according to macro definition. There can be any number of patterns to match, but there must be at least one match, otherwise, the program will not compile!


let possibly_none = Some(123);

// Some pattern matching against Some/None values
// This is a powerful tool that makes switch statements
// rather obsolete. ( Pattern ) => { Arm to execute }
let ret = match possibly_none {
    (Some(num)) => { num + num },
    (None) => { 0 }
};

// Declarative macros acros are defined in a similar way
// Patterns are arbitrary tokens passed to macros
// Compiler expands tokens within AST to equivalent Rust code
// not exactly flow control like a switch statement,
// but the motivation is similar
macro_rules! macro_name {
    ( /*Tokens*/ ) => { /*Rust code to evaluate to*/ };
    ( /*Another pattern of tokens */) => { /*More code to match*/ };
    /*...*/
}
Rust

macro_rules! here is the mechanism with which declarative macros are defined. All you need to supply to defined a macro is the name you want, along with patterns of tokens and the corresponding code to expand to. It’s through this means of pattern matching that we witness the multi-faceted nature of vec!, where it seems it could work with a variety of different patterns of tokens passed to it. Source code below is from standard library of Rust and can be found here, comments annotated above patterns by me for clarity:

macro_rules! vec {
    // The empty pattern creating an empty vector - vec!()
    () => (
        $crate::__rust_force_expr!($crate::vec::Vec::new())
    );
    
    // The ( value_to_fill_with ; size_of_vector ) pattern - vec!(0 ; 10)
    ($elem:expr; $n:expr) => (
        $crate::__rust_force_expr!($crate::vec::from_elem($elem, $n))
    );
    
    // The arbitrary number of values pattern - vec!(1,2,3,4,5)
    ($($x:expr),+ $(,)?) => (
        $crate::__rust_force_expr!(<[_]>::into_vec(
            // This rustc_box is not required, but it produces a dramatic improvement in compile
            // time when constructing arrays with many elements.
            #[rustc_box]
            $crate::boxed::Box::new([$($x),+])
        ))
    );
}
Rust

We have three patterns that characterize how we have used vec! thus far. The empty pattern may make enough sense, but token pattern syntax used for the non-empty cases is something some might not fully understand. That’s because, like macros themselves, macro_rules! is considered a syntax extension of Rust, so its patterns exist outside of the language in a sense.

($elem:expr; $n:expr) fits tokens that are Rust expressions, separated by a semicolon. These dollar signs denote that they are captures. They capture the tokens in form of a Rust expression, denoted by the :expr tag. These captures are then used in the expanded code itself. You will also notice that you can collect an arbitrary numbers of captures via regex like symbols “*, ?, +.” Expansion using the expression will compile if these expressions are correct within the vector creation, so although strings, floats are some valid expressions that fit the bill of expression captures, compilation will fail if the captures try to fit themselves where they do not belong in vector creation.

You can read more about all the possible things you can capture in The Little Rust Book of Macros, available to read online. They are a critical part of learning to make declarative macros.

Building a Simple Rust Macro with map!()

For those that have only just started experimenting with Rust, you might have found the vec! macro to be quite helpful, especially if you are coming off of a dynamically typed scripting language with lists like Python. Creating basic data structures, like lists and hash maps has always been fairly easy in such languages. The vec! macro is the closest thing we have to defining simple linear collections of elements like a Python list in one line of declarative code.

fn main() {
    // Creation of vector without macro.
    // Either create then push elements...
    let mut v1: Vec<u32> = Vec::new();
    
    for _ in 0..100 {
        v1.push(0);
    }
    
    // Or just use the macro, a single line
    let v2 = vec!{0u32 ; 100};
    
    // PS: JK , Rust has
    // an amazing iterator API, which, together
    // with drain, collect, from_fn, etc
    // can create a one liner, but now we tread into
    // Rust's functional side, and some more mature concepts
    let v3: Vec<u32> = std::iter::from_fn(|| Some(0)).take(100).collect();
    
}
Rust

Some might begin researching whether or not something similar exists for hash maps, so that one could express a Rust collections HashMap quickly, like a Python dictionary. Unfortunately, the standard library has no such macro that expedites hash map creation! So one is left to be either quite verbose and imperative, or still quite succinct depending on their knowledge of functional Rust.

use std::collections::HashMap;

fn main() {
   // Being explicit, heap allocated strings for keys, 
   // signed 32-bit values
   let mut hash1: HashMap<String, i32> = HashMap::new();
   
   // Inserting entries manually
   hash1.insert("a".to_string(), 0);
   hash1.insert("b".to_string(), 1);
   hash1.insert("c".to_string(), 2);
   
   // No macro with syntax extension to allow us to be declarative
   // map1 = {"abc": 0, "def": 1, "ghi": 2} or similar
   
   // Alternatively, collect() works with tuples of kv pairs, 
   // learn some more functional Rust, some Range stuff, iterators
   // Below line is still imperative, but is less code than above
   let hash2: HashMap<String, i32> = ('a'..='z')
       .map(|c| c.to_string())
       .zip(0..)
       .take(3)
       .collect();
}
Rust

There is no declarative way to get hash map like you can with vec! macro. But does not mean you are not able to create that declarative means yourself. We can easily define one to use using a declarative macro. Let’s define a macro called map! that will help us achieve this.

A good place to start is by imagining the macro and syntax extension we want in the final product, some positive visualization:

// This code does not constitute anything
// that compiles yet!

// For starters, like calling vec!(), we want the result to be an empty
// data structure
let my_map: HashMap<Key,Value> = map!();

// To build on with existing key-value pairs, we want to be declarative
// here with the syntax, but allow for disinction between what is a key,
// what is value, and how these things map to one another, how about this?
let map_with_one_elem = map!{ 
    i32: String
    where
        1 => "abc".to_string()
}


// Much like vec!, we also want to accept arbitrary number of elems, or
// key-value pairs, delimited by comma
let map_with_many_elems = map!{
    String: i32
    where
        "abc".to_string() => 1,
        "def".to_string() => 2,
        "ghi".to_string() => 3,
        //...
}
Rust

Now that we have this image of what we want out of this macro, it might be easier to find the tokens to do this with, The naive case of creating an empty hash map is certainly the easiest case, so let’s get that empty case mapped out. It should simply return an empty HashMap. Easy enough.

macro_rules! map {
    (  ) => { HashMap::new() };
}

// Empty case: map!()
Rust

Now, when we invoke this macro, even in the empty case, please note that type must be specified. Rust is a statically typed language, but does a few things in certain cases to extrapolate types for objects. In the cases where it has nothing to go on concerning a type, the compiler will complain. So the lone statement let m = map!(); will result in failure, but let m: HashMap<&str, u8> = map!(); will avoid such problems.

We could actually move the typing to the macro itself as tokens with the caputre tag :ty, which captures tokens of Rust types. Consider the following pattern.

macro_rules! map {
    (  ) => { HashMap::new() };
    ( $kt:ty : $vt:ty ) => { HashMap::<$kt, $vt>::new() };
}

// Empty, Typed Case: map!(String : u32), i.e. key type => value type, String keys to unsigned 32 bit values
Rust

The above would allow you to create an explicit type for that empty map. It accepts tokens that types :ty separated by “:” like shown. We can invoke the function with two given types, like map!(String, i32) and create an empty map of type HashMap<String, i32>. The pattern itself is still pretty straightforward here.

Now for the next case, actually handling key, value pairs passed to the macro. There could very well be an arbitrary number of these, so maybe we should investigate how vec! and the ilk accept arbitrary number of repeating tokens. This type of behavior is declared in macro definition with regex-like functionality with “?, *, +” symbols. Respectively, they match for presence of one or zero, zero or many, or one or many of a given token. We can then separate the type mapping with the expression mapping with a where allowed keyword, and then follow up with the repetition of expressions separated by “=>.” I would like to think this arrow syntax explains the idea that “this key” maps to “this value” well, but could very well be other tokens that are allowed to come after expressions.

Capturing the values we need with the right pattern is simple, but crafting the optimal Rust code for this might throw some off. I chose to go with a functional approach to build out the hash map here.

macro_rules! map {
    (  ) => { HashMap::new() };
    ( $kt:ty : $vt:ty where $($key:expr => $val:expr),* ) => {
        
        // Here I take all the captured token expressions under key
        // expand them in an array, then join iterator against an array of
        // values, collecting them into hash map as I iterate over tuples
        //of (key, val)
        [$( $key ),* ]
            .into_iter()
            .zip([$( $val ),* ])
            .collect::<HashMap<$kt, $vt>>()
    };
    ( $kt:ty : $vt:ty ) => { HashMap::<$kt, $vt>::new() };
}

// Declarative hash map macro: 
map!{
    char : u32
        where
        'a' => 0,
        'b' => 1,
        'c' => 2
};

map!{
    u64 : u64
        where
        100 => 1
};
Rust

The final result is something that we did not have before, and is considered a syntax extension by the Rust compiler. A red-headed step child of vec!() if you will. We technically have created our own small declarative structure of a HashMap, that is later expanded with some functional Rust that places the key-values pairs into a vector, then collect()s that vector into the HashMap. Not the most inexpensive way to create a hash map in Rust, but…

What’s that? What is maplit? You mean there is a Rust crate that already has a similar macro? As well as similar macros for sets, ordered maps? Oh… that’s neat. Cheers for reinventing the wheel I guess. Their implementation is called hashmap!(), and is worth a peek. Their syntax is a bit different than mine, but expands to far more optimal Rust code. They configure necessary capacity ahead of time to avoid resizing of the map on larger inputs, as well as avoid intermediate arrays, which is something my code definitely does not do.

Eliminating Repetition in Code with Declarative Macros

Appeasing the inner Pythonic urge to implement declarative hash maps in Rust is fine, if not a little bit haram, but it doesn’t really do anything notable. We can achieve the same affect by just typing 2 or 3 more lines, or learning some functional Rust. What about a macro that reduces a few more?

Often times when you are programming a Rust library to accomplish something, you will be cooperating with Rust traits. Traits are Rust’s way of logically grouping a collection of functions that certain data can execute (think interfaces). Implementing a trait is the process of defining how that type will execute that collection of functions. A lot of native data types in Rust have a lot of different traits already defined that enable these types to operate in different ways.

However, when writing some new data type that you define yourself, you start out with very little in terms of traits. So often times, you will find yourself implementing a lot of different traits for your custom data types. This can get tediously long and boring. Take for example this new data type I have created:

pub enum CustomEnum {
    Null,
    Text(String),
    Digit(u64),
}
Rust

This is just a simple enumeration that can be Null, Text, or Digit. I would like to be able to convert any Rust numeric data type into Digit. However, code like this needs a trait to actually compile, since my custom data type does not have this behavior defined yet.

// ERROR: need to implement From<u32/i32/f64...>
let digit1 = CustomEnum::from(123_u32);
let digit2 = CustomEnum::from(-1_i32);
let digit3 = CustomEnum::from(25.75_f64);
Rust

In fact, if we want to define CustomEnum::Digit according to any numeric data type that could be passed to it, we have 12 different From<_> traits to implement. So coding this could easily be ~80 lines of code.

impl From<u32> for CustomEnum {
    fn from(n: u32) -> CustomEnum {
        CustomEnum::Digit(n as u64)
    }
}

impl From<i32> for CustomEnum {
    fn from(n: i32) -> CustomEnum {
        CustomEnum::Digit(n as u64)
    }
}

// Getting tired yet? 10 more to go!
Rust

Notice that there is a lot of repetition here in this code. Implementing the trait for the 12 numeric data types could be abstracted out into a general process. Often times we use functions to reduce repetition in code, but functions cannot define classes/interfaces. Macros, however, are definitely able to do this!

// Abstract out the repetition with a macro, characterize the process
// as a normal implementation statement that uses arbitrarily captured
// types.
macro_rules! custom_enum_from_rust_numeric_impl {
    ( $( $numerictype:ident )* ) => {
        $(
            impl From<$numerictype> for CustomEnum {
                fn from(n: $numerictype) -> CustomEnum {
                    CustomEnum::Digit(n as u64)
                }
            }
        )*
    };
}

// Invoking the function across the data types we need we implement all
// 12 traits with less than 20 lines of code. Compare that to the
// hundreds.
custom_enum_from_rust_numeric_impl!(
    u8 i8 u16 i16 u32 i32 u64 i64 u128 i128 usize isize f32 f64
);
Rust

This is a huge reduction in code size. Given that Assuming there is about 6 lines of implementation code for every type, we could have written 72 lines here. But using macros we use 12 or so. That’s 6 times less code that accomplishes the same task! Let’s walk through this macro definition.

There is a new capture here, :ident, that captures identifiers of Rust language. For us, these are the identifiers of the data types native to Rust as listed above. We capture these types, which could be zero or more with $( $numerictype:ident )* and then use them in a templated form of the impl statement. This template is expanded for every type we pass to macro, thereby implementing the traits at compile time with its expansion.

Standing Up a Web Server via webserv!()

Building a web server in Rust is the final project of The Rust Book. It is a culmination of a lot of heavy topics, including concurrency, thread communication, and smart pointers of several varieties. After finishing it, readers will see that Steve Klabnik and Carol Nichols left the encouragement for readers to build on top of the code with better error handling, thread pools for processing things other than request, etc. I think implementing a macro that stands up the server from the code of library, as well as identifying its routes would be a good exercise! I will be using this final project code here.

Let’s again imagine the syntax we want to use. The code that produces the Rust code that operates the web server if you will. Subjective as the matter of what syntax is desirable among programmers, at the very least I know the important parameters that might go into creating a web server. The things that concern me are interface to listen with, thread counts for handling requests concurrently, routes and the responses given at these routes. There are certainly more parameters that go into a web server, but for simplicity’s sake:

// My goal for a macro that sets up a webserver
webserv!{ 
    interface as "127.0.0.1:7878", thread_count as 100;
    { 
        // (Request) -> (Response (HTML file, status code))
        ("GET / HTTP/1.1") => { ("root.html", 200) },
        ("GET /help HTTP/1.1") => { ("help.html", 200) },
        (_) => { ("404.html", 404) }
        /*...*/
    } 
}
Rust

Looks like a reasonable way to describe a web server right? The interesting thing about this syntax is that the greater part of it is really just a normal Rust pattern match, that part being how I handle requests sent to my web server. The patterns here character the request line associated with a client’s request, and the arm of the pattern is a block of Rust code that returns a filename for the content and the status code of the response. The (_) pattern is a Rust idiomatic way of handling any other pattern could not be matched, which, in our case, is a great way of describing a resource not found on server, i.e. 404.

Now, we definitely need to use a few captures here. We can use the same expression capture to get both the interface and the thread count here. For the routes we describe with patterns, we need to introduce a few more captures that are available to use in macro_rules!. Let’s begin:

macro_rules! webserv{
    ( 
        interface as $interface:expr, thread_count as $thread_count:expr ;
             { $( ($req:pat) => $resp:block )+ }
    ) => {/* Web server code to expand to*/ }
Rust

Here we use two new captures the pattern, :pat and block, :block. Pattern will capture Rust patterns, which come in a variety of forms and are a vital part of the languages pattern matching syntax. These patterns can be string liters, enumerations, integer values, etc, and are used as criteria to match something against. Here we want to match request line of client’s request to get the right response. The block capture here is a bit more general. It describes any bit of Rust code fit inside two curly braces. In a sense, we are recreating match statements in Rust within the macro in a very lazy way. We only hope that the code within the blocks has a return of tuple being a filename to file that exists, and a real HTTP status code. Also note the repetition , “+”. We want at least one route here, because, what use is a web server that has no routes to serve content?

We then use the library code from the final project of The Rust Book bind the TCP socket to interface capture, generate a thread pool with the given count capture metavariables. After this, we alter the handle_connection function to collect the string slice of the request line, and expand the routes we described as patterns and blocks of code that return the tuples we need to construct the response. We match, read the HTML file for content, pass content and status code to write to socket.

macro_rules! webserv{
    ( 
        interface as $interface:expr, thread_count as $thread_count:expr ; { $( ($req:pat) => $resp:block )+ }
    ) => {
    
        // Our handle connection function is defined within the macro
        // itself to set up routes for request -> response
        fn handle_connection(mut stream: TcpStream) {
            let mut request_buffer = [0; 1024];
            stream.read(&mut request_buffer).unwrap();

            // Some string slicing to get the request line portion to
            // match against
            let request = std::str::from_utf8(
                request_buffer
                    .split_inclusive(|c| *c == 10)
                    .next()
                    .unwrap()
            ).unwrap();

            // Expand the match statement of macro
            let (filename, status_code) = match request {
                $( $req => $resp )+
            };
        
            let contents = fs::read_to_string(filename).unwrap();
        
            let response = format!(
                "{}\r\nContent-Length: {}\r\n\r\n{}",
                status_code,
                contents.len(),
                contents
            );
        
            stream.write_all(response.as_bytes()).unwrap();
            stream.flush().unwrap();
        }

        // With routes defined, we get the captures for interface
        // and thread count to create socket, thread pool, and
        // begin accepting connections.
        let listener = TcpListener::bind($interface).unwrap();
        let pool = ThreadPool::new($thread_count);
    
        for stream in listener.incoming() {
            let stream = stream.unwrap();
    
            pool.execute(|| {
                handle_connection(stream)
            });
        }
    }
}
Rust

I had to adapt some of the code presented by Klabnik and Carol to get this to work, but the library code in lib.rs for this exercise is left unaltered. handle_connection was not part of the library crate, so I went ahead and defined it the macro itself. I expanded the { $( ($req:pat) => $resp:block )+ } captures in the match statement for identifying request and returning the right responses in this function. As for the captures, these are simply plugged in to the library code for creating thread pools and sockets. There are a lot of unhandled errors that could trigger panics, but for the sake of wrapping this blog post up, will be assuming that HTML files associated with responses exist, and the requests will be properly formatted, capable of being processed into unicode strings to compare against.

fn main() {
    webserv!{
        interface as "127.0.0.1:7878", thread_count as 100;
        {   
            ("GET /abc HTTP/1.1\r\n") => { ("hello.html", 200) },
            (_) => { ("404.html", 404) }
        }
    };
}

/* Compiling this and running, we see the macro expanded, and our
   configurations for requests and responses injected into our response
   handling. Thread pools spin up with this logic, and start handling
   requests coming in.
   
   hello.html - <h1>Hello from a web server stood up by a declarative
                  macro!</h1>
   
   $ cargo run
   
   Worker 0 got a job; executing.
    [main.rs:63] request = "GET / HTTP/1.1\r\n"
    Worker 2 got a job; executing.
    [main.rs:63] request = "GET /favicon.ico HTTP/1.1\r\n"
    Worker 1 got a job; executing.
    [main.rs:63] request = "GET /abc HTTP/1.1\r\n"
    Worker 3 got a job; executing.
    [main.rs:63] request = "GET /favicon.ico HTTP/1.1\r\n"
*/
Rust

In a sense, we have created a small very bare-bones NGINX configuration of sorts inside Rust, where we sort of inject this pattern matching flow control of code directly from the macro into the logic that threads use to handle the request. We could expand this macro in a variety of different directions, like using regex to match against request line patterns, defining a nesting operation. In terms of the response, we could offer other systems for generating content other than HTML files, like literal HTML code, redirects.

Wrapping Up

As you can see, Rust macros are a vehicle into an exciting of world of possibilities in (compile-time) metaprogramming. As Rust macros are part of its lower level representation of our code, the AST, we allow ourselves to talk to the grandfather of all metaprograms, the compiler, in ways we probably would not have been able to do with other languages. With this key communication with the compiler and its AST, facilitated by the token pattern matching of declarative macros, the common programmer can now consider metaprogramming as a viable part of his arsenal.

The most interesting thing I discovered about Rust macros while writing this piece is that, in terms of Rust declarative macros, macro_rules! seems to be only the beginning! Rust, like many programming languages, reserves a host of different keywords for distinguished use throughout the language. Languages can reserve keywords in large batches, using a set few while leaving the others still as keywords for a soon-to-be-delivered feature. Rust has a currently unused keyword, macro, that might be the result of an even greater macro system built from the beta that is macro_rules!. See this open issue with the Rust github repo for more information.

It’s also important to note that there are entire languages centered around the concept of metaprogramming. I feel somewhat guilty for not including any mention of the language Lisp anywhere in a blog post concerning metaprogramming, but my primary focus here was Rust at the end of the day. I also feel that I should have covered more embedded languages in Rust, and the incredibly important token tree (:tt) capture together with a recursive macro example, but these are some more advanced topics that probably deserve another blog post.

All that said, I urge the reader to experiment with Rust’s macro system, and begin considering metaprogramming as a worthwhile tool for solving programming problems, or in some way redefining the way we think about approaching/characterizing them.


Posted

in

by

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *