Building Your Own Programming Language with LLVM, Rust (Part 3): Optimization & Compilation

So far, we have contributed quite a bit of code to this Rust implementation of Kaleidoscope, but have yet to see ourselves compile and run a program in action. All we have so far is a hand-made parser and AST that can then emit some LLVM IR. Nearly one thousand lines of Rust deep, one might grow disheartened and think the reality of compiling/interpreting this model language, seeing it come to life on our native processors as it is translated to machine code, is far from reach, but nothing could be further from the truth.

The whole point of working with the LLVM infrastructure is to make the complicated process of building a compiler easier. Users will be happy to know that compiling any given piece of IR in a LLVM module is just a few library API calls away. So rest assured, the hardest part of this minimally viable language is already over! It’s all downhill from here.

This blog post will cover the fourth part of the original LLVM tutorial that was originally written in C++. Code for these posts is found at this github repository.

However, before we start discussing optimization and compilation, I’d like to do some preparatory refactoring of the code.

Some Quality-of-Life Improvements

A Better AST using Enum Dispatch

In the very first post of this series, I discussed the design and implementation of the AST for the frontend of this compiler. Recall that the parser is responsible for taking the source code provided by the user, either from file or directly from standard input of a REPL session, breaking it down to tokens, then piece those tokens together in a tree structure known as AST. Also recall how the original tutorial’s implementation relied on crafting the tree using abstract class ASTExpr node, where every node would inherit from this base class, and could enforce it’s own IR code generation logic via dynamic dispatch. The nodes themselves are all considered instances of ASTExpr, but they each call their own specific codegen method whether they are a number expression, a call expression, a variable expression, and so on.

While this makes a for a serviceable design, both in Rust and in C++, there were some things that irked me about it, particularly when it came to testing the parser by comparing the resulting parse tree to an expected parse tree. This was a problem tied to the limitations of Rust itself.

I would try to build a test module for the parser, like I did with lexer, by just passing a string of source code to the parser, taking the tree, then comparing it against an expected tree I built myself in memory. To compare things in Rust, you need to implement at least PartialEq. This can be done quite easily for any user-defined data structure by using the derive macro, but it became increasingly obvious that it wouldn’t be that simple.

Problem 1: since we use a tree composed of trait objects (the dyn syntax), we have disabled ourselves from simply deriving equality with #[derive(PartialEq)] on our tree nodes. The reason trait objects do not implement that trait is because the compiler has no idea how it might compared two typed erased objects that could be completely different objects. For example, if we have a trait named Fruit and then created two structs that implement this trait, like Apple and Orange, how would the Rust compiler know how to compare the two? They both share the same interface, Fruit, and are technically the same type, a boxed trait object, as well as being composed of the single seedless member. But if you compare fruit1 == fruit2, both being Box<dyn Fruit>, what do you do in the case where they are the same type of fruit, where they are different? Rust cannot make this decision and requires more input from the user to be specific about comparison. A specific implementation of the equality trait with impl PartialEq for Box<dyn fruit>.

// Say we are using trait objects, a Fruit trait defines the interface

trait Fruit {}

#[derive(PartialEq)] // Whenever we want to compare things, simply do this
struct Apple { seedless: bool }

#[derive(PartialEq)]
struct Orange { seedless: bool }

impl Fruit for Apple {}
impl Fruit for Orange {}

// This is totally legal since the Apple type implements PartialEq
// Same if these were oranges
let fruit1: Apple = ...
let fruit2: Apple = ...

(fruit1 == fruit2)

// This code below is illegal, since Box<dyn Fruit> does not implement
// PartialEq, even though both of its underlying types do.
let fruit1: Box<dyn Fruit> = ...
let fruit2: Box<dyn Fruit> = ...

(fruit1 == fruit2) // Compiler will complain here

// How can we overcome this? If the type implements the Any type, we can
// downcast and revert type erasure back to Apples and Oranges, which do
// implement PartialEq

Comparing apples to oranges is logical to most people. You could just say that if either one is a different kind of fruit, they are unequal, and thus return false. If they are the same, compare the data members of each and see if the first apple/orange has the same properties of the second. This can be accomplished with type downcasting Box<dyn Fruit> to its underlying type with Any. A third party crate called dyn_partial_eq even supplies a derive macro for this behavior so you can save yourself the trouble. But when I attempted to derive DynPartialEq for my tree, that is when I came upon the second problem.

use dyn_partial_eq::*;

#[derive(DynPartialEq)]
struct MyTreeNode {
    child: Box<dyn ASTExpr<'src>>
}

// ERROR: DynPartialEq requires all members to implement
// Any trait to work, Any trait is only
// implemented for types that have 'static lifetime. 'src is shorter than
// 'static! Will fail to compile unless you disassociate 'src lifetime

Problem 2: the Any trait, which is required for downcasting types and is the underlying magic for dyn_partial_eq, is only implemented for types which have a ‘static lifetime. Originally my tree followed a zero copy principle, where I only took references from the source code, and avoided copying the strings. This gave it the ‘src lifetime, which is shorter than ‘static and thus disabled me from using Any and DynPartialEq. I ended up having to change all my & ‘src str types to copied String‘s in order to actually compare my trees for testing my parser.

In the end I was able to compare trees, but had to start copying source, which makes for slower parsing. I wondered if dynamic dispatch was the right way to approach this problem at all, so I mapped out a rough idea of what a new AST based on enum dispatch would look like. I then derived PartialEq on it, and it compiled with zero complaints. Perhaps there was argument to made here for this approach.

#[derive(Debug, Clone, PartialEq)]
pub enum ASTExpr<'src> {
    NumberExpr(f64),
    VariableExpr(&'src str),
    BinaryExpr {
        op: Ops,
        left: Box<ASTExpr<'src>>,
        right: Box<ASTExpr<'src>>,
    },
    CallExpr {
        callee: &'src str,
        args: Vec<Box<ASTExpr<'src>>>,
    },
}

One of the bigger points with Rust is that, as a systems programming language, every cost is upfront and transparent. You start copying source code into owned String types, then that is revealed in the code with the types of data inside the nodes, and lets you know you might have two needless copies of source code. If you use references with lifetimes, then it’s obvious that you are not copying anything.

Likewise, dynamic dispatch has a stark keyword, dyn, that lets programmers know that they are paying a cost for jumping around in VTables for the correct method. Dynamic dispatch can be slow, and thus the creators of Rust saw fit to make it apparent to any programmer when it is being used with a clear keyword. Youtuber Lavaforth has a great video on the associated costs with trait objects and dynamic dispatch.

I settled on refactoring the AST to use this technology, which is quite the departure from the original tutorial’s code. In the end though, I was able to compare resulting parse trees, and make my parser zero copy. I believe this to be superior, faster, and a better showcase of the power of Rust enums!

// In ast.rs

#[derive(Debug, Clone, PartialEq)]
pub enum ASTExpr<'src> {
    NumberExpr(f64),
    VariableExpr(&'src str),
    BinaryExpr {
        op: Ops,
        left: Box<ASTExpr<'src>>,
        right: Box<ASTExpr<'src>>,
    },
    CallExpr {
        callee: &'src str,
        args: Vec<Box<ASTExpr<'src>>>,
    },
}

// Function
#[derive(Debug, PartialEq)]
pub struct Function<'src> {
    pub proto: Box<Prototype>,
    pub body: Box<ASTExpr<'src>>,
} 

// In parser.rs
// Overall, this file didn't need to change much, Just needed to 
// add ASTExpr::* to everywhere where I created a tree node. Example:

/// numberexpr ::= number
fn parse_number_expr<'src>(
    tokens: &mut Peekable<impl Iterator<Item = Token<'src>>>,
) -> ExprParseResult<'src> {
    if let Some(Token::Number(num)) = tokens.next() {
        Ok(Box::new(ASTExpr::NumberExpr(num)))
    } else {
        panic!("Expected next token to be number for parse_number_expr!")
    }
}

// In llvm_backend.rs

// There are three lifetimes at play when working with references from the
// source code (AST), and the LLVM objects (the context object and IR
// it produces) The IR is bound by contex, as seen below "where" keyword
pub trait LLVMCodeGen<'ctx, 'ir, 'src>
where
    'ctx: 'ir,
{
    fn codegen(&self, context: &LLVMContext<'ctx>) 
        -> IRGenResult<'ir, 'src>;
}

// Generating the IR for the new tree is just one implementation of this
// trait, handling each variant of the enum

impl<'ctx, 'ir, 'src> LLVMCodeGen<'ctx, 'ir, 'src> for ASTExpr<'src>
where
    'ctx: 'ir,
{
    fn codegen(&self, context: &LLVMContext<'ctx>) 
      -> IRGenResult<'ir, 'src> {
        use ASTExpr::*;
        
        match self {
            NumberExpr(num) => todo!()
            VariableExpr(var) => todo!()
            // ...so on and so forth
        }

A Quick and Dirty Command Line Interface

The next small of additional code has nothing to do with the compiler per se, but will be helpful when we are executing it. As you can imagine, compilers can have an absurdly large amount of command line flags and options that modify how the compiler generates code. They are complex programs after all, so it goes without saying that understanding all the different ways you can use it at a command line may require several months (see Clang’s Line Reference). While we are not aiming to be nearly as pedantic with the command line, it might be beneficial to pass the basics, like an optimization flag, specific optimization pass names, a specific machine target to compile for, a positional file to compile from, or an output filename. You know, basic compiler stuff.

Nowadays, there are entire libraries dedicated to building command line interfaces for programs. Even Python’s standard library has an incredibly easy-to-use argparse library that allows you to quickly add flags, positionals, sub-commands, default arguments and whatever else you might ever want to a CLI. While Rust has no such thing in it’s standard library, there are third party libraries, like clap, which give a similar experience.

Clap is great for a few reasons. One of them is that it provides two different ways of using the library to create your CLI. The first is the builder approach, where a typical builder design pattern is used to chain together functions to add flags, positionals, so on in a declarative sense. The second relies on metaprogramming with Rust procedural macros, a derived-based approach, where you simply derive a trait against a struct to generate all the correct code. I settled on the derive based approach, since it allowed me to use Rust doc comments with “///” to add the help messages in a very transparent way. See code and the resulting help message below.

// In cli.rs
use clap::{
    builder::{OsStr, PossibleValue},
    Parser, ValueEnum,
};

#[derive(Parser)]
#[command(version, about, long_about = None)]
pub struct Cli {
    /// A positional file containing Kaleidoscope code to compile to object/assembly, if not given, starts interpreter instead
    pub file: Option<PathBuf>,

    /// Specifies a non-native target to compile for, can be any one of the CPUs listed using "llc --version", or string parseable as LLVMTargetTriple
    #[arg(long)]
    pub target: Option<String>,

    /// What optimization level to pass to LLVM
    #[arg(short = 'O', long, value_enum, default_value = OptLevel::O2)]
    pub opt_level: OptLevel,
    
    // Whatever else you may want to add
}

#[derive(Copy, Clone, PartialEq, Eq, PartialOrd, Ord)]
pub enum OptLevel {
    O0,
    O1,
    O2,
    O3,
}

impl ValueEnum for OptLevel {
    fn value_variants<'a>() -> &'a [Self] {
        &[OptLevel::O0, OptLevel::O1, OptLevel::O2, OptLevel::O3]
    }

    fn to_possible_value(&self) -> Option<clap::builder::PossibleValue> {
        Some(match self {
            OptLevel::O0 => PossibleValue::new("0")
              .help("No optimization"),
            OptLevel::O1 => PossibleValue::new("1")
              .help("Less optimization"),
            OptLevel::O2 => PossibleValue::new("2")
              .help("Default optimization"),
            OptLevel::O3 => PossibleValue::new("3")
              .help("Aggressive optimization"),
        })
    }
}

impl Into<OsStr> for OptLevel {
    fn into(self) -> OsStr {
        match self {
            OptLevel::O0 => "0".into(),
            OptLevel::O1 => "1".into(),
            OptLevel::O2 => "2".into(),
            OptLevel::O3 => "3".into(),
        }
    }
}

// back in main.rs

mod cli;
use clap::Parser;

fn main() {
    let _externs: &[*const extern "C" fn(f64) -> f64] = &[
        putchard as _,
        printd as _,
    ];

    let cli = cli::Cli::parse();
}

cargo run -- --help
Usage: kaleidrs [OPTIONS] [FILE]

Arguments:
  [FILE]
          A positional file containing Kaleidoscope code to compile to object/assembly, if not given, starts interpreter instead

Options:
      --target <TARGET>
        Specifies a non-native target to compile for, can be any one of the CPUs listed in "llc --version", or valid target triple

  -O, --opt-level <OPT_LEVEL>
          What optimization level to pass to LLVM
          
          [default: O2]

          Possible values:
          - O0: No optimization
          - O1: Less optimization
          - O2: Default optimization
          - O3: Aggressive optimization

  -h, --help
          Print help (see a summary with '-h')

  -V, --version
          Print version

I encourage readers to consider read through the documentation regarding clap, you might find it useful for another future Rust project!

Adding LLVM Optimization Passes

The first thing that the corresponding part of the LLVM tutorial covers is discussing optimizations. In goes into detail of some very rudimentary ones that are baked into the IR construction itself.

You might have noticed when your were playing around with the REPL session, that, when entering in any simple expression composed of a single numeric constant or a binary expressions with exclusively constants, it does not explicitly add together anything in the IR, rather it automatically calculates the value and returns it! See below, where I enter 2+2, but no explicit addition of the two constants are found in the IR.

Ready >> 2 + 2;
Parsed a top level expression.
; ModuleID = 'kaleidrs_module'
source_filename = "kaleidrs_module"

define double @__anonymous_expr() {
entry:
  ret double 4.000000e+00
}

Ready >> 1*2*3*4*5;
Parsed a top level expression.
; ModuleID = 'kaleidrs_module'
source_filename = "kaleidrs_module"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"

define double @__anonymous_expr() {
entry:
  ret double 1.200000e+02
}

Ready >>

This is called trivial constant folding. This type of optimization is special in LLVM in that it is not tied to any specific pass, but rather is automatically done on your behalf as you call the builder API to add instructions to an LLVM module.

Since all calls to build LLVM IR go through the LLVM IR builder, the builder itself checked to see if there was a constant folding opportunity when you call it. If so, it just does the constant fold and return the constant instead of creating an instruction.

Author(s) of “My First Language Frontend with LLVM Tutorial” on llvm.org

Recall when we are walking our parse tree, generating IR, we encounter binary expressions like “2+2” that trigger this sort of code at the BinaryExpr nodes. After calling codegen on the left and right children, we observe the operator, and use the Inkwell type, Builder, to add an LLVM instruction to the IR for our current basic block in the function.

// Generate the left and right code first, 
// then build the correct
// instruction depending on the operator.
BinaryExpr { op, left, right } => {
    let left_genval = left.codegen(context)
      .map(AnyValueEnum::into_float_value)?;

    let left_genval = left.codegen(context)
      .map(AnyValueEnum::into_float_value)?;

    let resulting_value = match *op {
        Ops::Plus => {
            context
                .builder
                .build_float_add(left_genval, right_genval, &"addtmp")
        }
        Ops::Minus => {
            context
                .builder
                .build_float_sub(left_genval, right_genval, &"subtmp")
        }
        
        //...the rest of the operators following

As we call these methods like build_float_add, build_float_sub, underneath the builder is constantly observing what the values being passed to it are, specifically whether or not the LLVM value in question is statically known, i.e. a constant. If it notices that you asked to build an fadd LLVM instruction to add two constant floats, it simply computes the result then and there, moving the calculation from runtime to compile time. This might not seem to be transparent, given the method whose name suggests that the end result is another fadd instruction, but the end result is the same, only the IR will produce a faster program with fewer instructions, saving cylces for the CPU.

For those scenarios where any of the values are not known statically, (variables, call expressions) the result is an additional fadd, fsub, fmult instruction, like below.

Ready >> def double(num) num * 2;
Parsed a function definition.
; ModuleID = 'kaleidrs_module'
source_filename = "kaleidrs_module"

define double @double(double %num) {
entry:
  %multmp = fmul double %num, 2.000000e+00
  ret double %multmp
}
Ready >> double(2);
Parsed a top level expression.
; ModuleID = 'kaleidrs_module'
source_filename = "kaleidrs_module"

define double @double(double %num) {
entry:
  %multmp = fmul double %num, 2.000000e+00
  ret double %multmp
}

define double @__anonymous_expr() {
entry:
  %calltmp = call double @double(double 2.000000e+00)
  ret double %calltmp
}
Ready >>

However, there are non-trivial cases where constant folding does require a specific passes. The tutorial gives a prime example of such a non-trivial case:

ready> def test(x) (1+2+x)*(x+(1+2));
ready> Read function definition:
define double @test(double %x) {
entry:
        %addtmp = fadd double 3.000000e+00, %x
        %addtmp1 = fadd double %x, 3.000000e+00
        %multmp = fmul double %addtmp, %addtmp1
        ret double %multmp
}

Here, you can see that LLVM was successful in folding both 1+2’s on both sides to three, but it failed to notice the commutative property: (x+(1+2)) and (1+2+x) are the same value, just different order of operands. It did not notice this, and computes the same value twice, once for SSA value %addtmp, then again for %addtmp1. It could have been solved with one fadd and one fmult, doubling the resulting sum.

To catch non-trivial cases of constant folding like this, we need to rely on more than just the builder–we need specific passes.

We first run our codegen methods, creating an LLVM module that contains all of our IR without any specific optimization passes. Once the initial IR is there,we direct LLVM run passes on this module to catch this specific type of constant folding.

In the original tutorial, this delves in to quite a bit of C++ code, but with Inkwell, this is made a whole lot simpler. The wrapper library provides an entire Rust module, named passes, to help you easily apply passes to your IR. We will only add a single method to our LLVMContext struct to run whatever passes we would like.

#[derive(Debug)]
pub struct LLVMContext<'ctx> {
    context: &'ctx Context,
    builder: Builder<'ctx>,
    module: Module<'ctx>,
    target: Target,
    machine: TargetMachine,
    sym_table: RefCell<HashMap<String, AnyValueEnum<'ctx>>>,
}

impl<'ctx> LLVMContext<'ctx> {
    //...
    
    pub fn run_passes(&self, passes: &str) {
        let pass_options = PassBuilderOptions::create();

        // pass options reflect what is used in tutorial
        pass_options.set_verify_each(true);
        pass_options.set_debug_logging(false);
        pass_options.set_loop_interleaving(true);
        pass_options.set_loop_vectorization(true);
        pass_options.set_loop_slp_vectorization(true);
        pass_options.set_loop_unrolling(true);
        pass_options.set_forget_all_scev_in_loop_unroll(true);
        pass_options.set_licm_mssa_opt_cap(1);
        pass_options.set_licm_mssa_no_acc_for_promotion_cap(10);
        pass_options.set_call_graph_profile(true);
        pass_options.set_merge_functions(true);
        
        // The run_passes method accepts a a &str argument containing
        // a string of comma delimited LLVM pass names to run
        self.module
            .run_passes(passes, &self.machine, pass_options)
            .unwrap();
    }
}

The tutorial mentions a few “pass options” to use, so I reflected this as best I could using Inkwell. The heart of running passes on our IR comes from the run_passes method on the Module type. Passes are applied to the module by passing a simple string that has comma delimited pass names. LLVM passes are associated with unique names. Here is a list of all of them for the curious.

Now, with this method, we will need to pass is a string of specific pass names to apply to the IR to handle this specific case of non-trivial constant folding. The tutorial mentions two things in particular.

This requires two transformations: reassociation of expressions (to make the add’s lexically identical) and Common Subexpression Elimination (CSE) to delete the redundant add instruction.

Author(s) of “My First Language Frontend with LLVM Tutorial” on llvm.org

These were denoted in the original tutorial with these names within the function pass manager API in C++ code.

// Add transform passes.
// Do simple "peephole" optimizations and bit-twiddling optzns.
TheFPM->addPass(InstCombinePass());
// Reassociate expressions.
TheFPM->addPass(ReassociatePass());
// Eliminate Common SubExpressions.
TheFPM->addPass(GVNPass());
// Simplify the control flow graph (deleting unreachable blocks, etc).
TheFPM->addPass(SimplifyCFGPass());

In the list of passes above, these have names the names instcombine, reassociate, gvn, simplifycfg. In Inkwell, we can simply run these passes by just passing a string containing these “instcombine,reassociate,gvn,simplifycfg”, to the method we just created.

We can even take this a step further, and adjust our CLI to accept such comma delimited list as input from user, or even default to this specific set of passes. Passes seem like a really important thing to allow users to control easily, so what better way to enable it than a command line argument. The clap crate again takes the spotlight here.

// in cli.rs
#[derive(Parser)]
#[command(version, about, long_about = None)]
pub struct Cli {
    /// What optimization level to pass to LLVM
    #[arg(long, value_enum, default_value = OptLevel::O2)]
    pub opt_level: OptLevel,

    /// Comma separated list of LLVM passes (use opt for a list, also see https://www.llvm.org/docs/Passes.html)
    #[arg(short, long, default_value = "instcombine,reassociate,gvn,simplifycfg")]
    pub passes: String,
}

$ cargo run -- --help
Usage: kaleidrs [OPTIONS]

Options:
      --opt-level <OPT_LEVEL>
          What optimization level to pass to LLVM
          
          [default: O2]

          Possible values:
          - O0: No optimization
          - O1: Less optimization
          - O2: Default optimization
          - O3: Aggressive optimization

  -p, --passes <PASSES>
          Comma separated list of LLVM passes 
          (use opt for a list, also see https://www.llvm.org/docs/Passes.html)
          
          [default: instcombine,reassociate,gvn,simplifycfg]

      --use-frontend-only
          Interpret with frontend only, output AST, only valid for interpreter use

  -h, --help
          Print help (see a summary with '-h')

  -V, --version
          Print version

We can pass this list down to our infinite loops in repl.rs, and then call the run_passes method after we emit IR with codegen to our module. It is appropriate to run passes after parsing both top level expressions from REPL, as well as functions which the user enters, which might be called later.

pub fn llvm_ir_gen_driver(opt_level: OptLevel, passes: &str) {
  
  let context = inkwell::context::Context::create();

  let sesh_ctx = LLVMContext::new(&context, opt_level);
  let mut input_buf = String::new();

  loop {
      print!("Ready >> ");
      std::io::stdout().flush().unwrap();
      let _ = std::io::stdin().read_line(&mut input_buf);

      let mut tokens = input_buf.lex().peekable();

      match tokens.peek() {

        Some(_top_level_token) => match parse_top_level_expr(&mut tokens) {
                  Ok(ast) => {
                      println!("Parsed a top level expression.");
                      
                      match ast.codegen(&sesh_ctx) {
                          Ok(_ir) => {
                              sesh_ctx.run_passes(passes); // RUN PASSES
                              sesh_ctx.dump_module(); // THEN DUMP IR
                              
                              // ...

Success! It defaults to the passes we need, but you could also pass them explicitly like so, or even specify another list of passes altogether.

$ cargo r -- --passes "instcombine,reassociate,gvn,simplifycfg"
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.04s
     Running `target/debug/kaleidrs --passes instcombine,reassociate,gvn,simplifycfg`
Ready >> def test(x) (1+2+x)*(x+(1+2));
Parsed a function definition.
; ModuleID = 'kaleidrs_module'
source_filename = "kaleidrs_module"

define double @test(double %x) {
entry:
  %addtmp = fadd double %x, 3.000000e+00
  %multmp = fmul double %addtmp, %addtmp
  ret double %multmp
}
Ready >>

Even though this was made a bit easier with Inkwell, this seems like quite a bit of sweat to just optimize away a single unnecessary fadd instruction. But the result is an interface where the user can freely pass whatever LLVM passes they want to our Kaleidoscope interpreter!

JIT Compiling and Executing Code in REPL Session

At this point, compiler engineer hopefuls might be more than eager to finally execute a Kaleidoscope program. Admittedly, what good is all this IR and these optimization passes if we never actually run the damned thing? It is a landmark step, so we will tackle it next.

As previously discussed, LLVM IR is the currency of sorts to the entire infrastructure. With it, you have a variety of tools that will allow you to optimize, analyze, compile, and yes, even directly execute the program on your CPU!

The way this works is via JIT compilation, or Just-In-Time compilation. If readers are familiar with specific Javascript engines like V8 for Google Chrome, or even PyPy, a Python implementation using JIT compilation, or even Java’s JIT, you might have some idea of this technique.

In essence, it’s compilation during runtime. In simple statically compiled languages like C, we compile once before running the resulting executable. But what happens when we do both these things in the same breath? Running a program that is both compiling it’s source code to machine language, and running it directly on the CPU? JIT compilation.

Inkwell provides wrapper mechanisms for doing just this, but the catch here is that this requires unsafe Rust. I mentioned previously that Inkwell was a safe wrapper library, where it attempted to wrap LLVM library around Rust promises. This situation, however, is not due to any limitation of Rust or failure of the library. It’s the idea of compiling and executing a non-Rust program that is beyond the Rust compiler’s jurisdiction: there is no way to prove that is safe! It is possible to JIT compile and execute malfunctioning or even malicious programs using LLVM.

So, first things first, we need to define a function handle for a function we will JIT and execute. This will be the function that will be executed after every top level expression we enter in our REPL. The function will execute, and then evaluate to a floating point number, as it is the languages sole type.

// in llvm_backend.rs
// This is the signature of the function that we call line-by-line in REPL
type TopLevelSignature = unsafe extern "C" fn() -> f64;

As you can see, this function will bear the signature of a foreign C function, bearing no safety promises Rust provides. We will ask LLVM to compile a function for us that bears this signature. LLVM will return a function pointer of this type, which we can then call.

Next, we need some LLVM resources, particularly and LLVMExecutionEngine. Inkwell provides a module method, create_jit_execution_engine that you can call that will create such a object. With this, we can direct it to our top level anonymous expression, “__anonymous_expr”, which resides in the module, to JIT compile it to return the function pointer of that type, then calling it within our REPL. (Note that I am not sure how to handle a failure to JIT compile, so will just panic in this case, got a bit lazy here in the end.)

    
impl<'ctx> LLVMContext<'ctx> {
    
    //...
     
    // JIT evalution, creates an ExecutionEngine object, 
    // JIT compiles the function, then attempts to call the function, 
    // will return the resulting floating point val.
    pub unsafe fn jit_eval(&self) -> Result<f64, BackendError> {
        let exec_engine = self
            .module
            .create_jit_execution_engine(OptimizationLevel::None)
            .expect("FATAL: Failed to create JIT execution engine!");

        let jitted_fn: JitFunction<'ctx, TopLevelSignature> = exec_engine
            .get_function("__anonymous_expr")
            .expect("FATAL: symbol '__anonymous_expr' not present in module!");

        let res = jitted_fn.call();

        exec_engine.remove_module(&self.module).unwrap();

        Ok(res)
    }

We can then call this function every time the user enters a top level expression in the REPL, like 2+2. We will skip function definitions, as they only add a function to the module.

// In repl.rs

fn llvm_ir_driver() 
  Some(_top_level_token) => match parse_top_level_expr(&mut tokens) {
      Ok(ast) => {
          match ast.codegen(&sesh_ctx) {
              Ok(_ir) => {
                  // Run our passes first
                  sesh_ctx.run_passes(&cli_args.passes)
                  
                  // Allow ourselves to execute a function with no
                  // Rust safety promises, JIT compiled by LLVM
                  unsafe {
                      let res = sesh_ctx
                          .jit_eval()
                          .expect("Failed to JIT compile");

                      println!("Jit compiled and evaluated to: {res}");
                  }
              }
              Err(e) => eprintln!("Backend error: {}", e),
          }

And that’s it really, we are now JIT compiling and executing LLVM IR directly on our CPU, line-by-line in our REPL.

kaleidrs$ cargo run --
   Compiling kaleidrs v0.1.0 (/home/jdorman/projects/langs-test/kaleidrs)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 3.83s
     Running `target/debug/kaleidrs`
Ready >> 2 + 2;
Parsed a top level expression.
; ModuleID = 'kaleidrs_module'
source_filename = "kaleidrs_module"

define double @__anonymous_expr() {
entry:
  ret double 4.000000e+00
}
Jit compiled and evaluated to: 4

Ready >> def square(x) x*x;
Parsed a function definition.
; ModuleID = 'kaleidrs_module'
source_filename = "kaleidrs_module"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"

define double @square(double %x) {
entry:
  %multmp = fmul double %x, %x
  ret double %multmp
}

Ready >> square(7);
Parsed a top level expression.
; ModuleID = 'kaleidrs_module'
source_filename = "kaleidrs_module"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"

define double @square(double %x) {
entry:
  %multmp = fmul double %x, %x
  ret double %multmp
}

define double @__anonymous_expr() {
entry:
  %calltmp = call double @square(double 7.000000e+00)
  ret double %calltmp
}
Jit compiled and evaluated to: 49
Ready >>

Externing Standard Lib, Adding I/O

Remember the extern keyword? It’s important due to the fact that it define function symbols outside our language. This is something that is only truly realized when we actually run our code.

LLVM demonstrates this by using this keyword and defining a sin(a) function to extern:

ready> extern sin(x);
Read extern:
declare double @sin(double)

ready> extern cos(x);
Read extern:
declare double @cos(double)

ready> sin(1.0);
Read top-level expression:
define double @2() {
entry:
  ret double 0x3FEAED548F090CEE
}

Evaluated to 0.841471

One might wonder where these functions sin and cos come from. We give it no expression or definition ourselves, but simply declare it to exist in our LLVM module. The reality is that it originates from the C standard library, under math.h. The original tutorial explains it best:

…how does the JIT know about sin and cos? The answer is surprisingly simple: The KaleidoscopeJIT has a straightforward symbol resolution rule that it uses to find symbols that aren’t available in any given module: First it searches all the modules that have already been added to the JIT, from the most recent to the oldest, to find the newest definition. If no definition is found inside the JIT, it falls back to calling “dlsym("sin")” on the Kaleidoscope process itself. Since “sin” is defined within the JIT’s address space, it simply patches up calls in the module to call the libm version of sin directly. But in some cases this even goes further: as sin and cos are names of standard math functions, the constant folder will directly evaluate the function calls to the correct result when called with constants like in the “sin(1.0)” above.

Author(s) of “My First Language Frontend with LLVM Tutorial” on llvm.org

Default C linkages enable these standard library functions to be available to us. As long as the function both returns and accepts double type floating point arguments, we can freely declare them here. Our language only works with floats, so it goes without saying that trying to declare a function from the standard library that uses other types might result in problems.

This begs the question, “what else can be externed in apart from the C standard library?”

Well, the original LLVM tutorial does touch on this in this part of the tutorial. To illustrate more extern examples, they introduce some small I/O functions written in that add a symbol to the JIT to resolve to. These are written along with the rest of the code, but the difference is these are never called by the interpreter, but by the code being JIT interpreted by the user. These functions accept floats and return 0.0, making them totally compliant with Kaleidoscope.


// "Library" functions that can be "extern'd" from user code.
#ifdef _WIN32
#define DLLEXPORT __declspec(dllexport)
#else
#define DLLEXPORT
#endif

/// putchard - putchar that takes a double as ASCII code, prints, and 
/// returns 0.
extern "C" DLLEXPORT double putchard(double X) {
  fputc((char)X, stderr);
  return 0;
}

/// printd - printf that takes a double prints it as "%f\n", returning 0.
extern "C" DLLEXPORT double printd(double X) {
  fprintf(stderr, "%f\n", X);
  return 0;
}

Ready >> extern putchard(ascii);
LLVM IR Representation:
; ModuleID = 'kaleidrs_module'
source_filename = "kaleidrs_module"

declare double @putchard(double)

Ready >> putchard(97); # 97 ascii code for "a"
LLVM IR Representation:
; ModuleID = 'kaleidrs_module'
source_filename = "kaleidrs_module"

declare double @putchard(double)

define double @__anonymous_expr() {
entry:
  %calltmp = call double @putchard(double 9.700000e+01)
  ret double %calltmp
}

a # prints here
Jit compiled and evaluated to: 0
Ready >>

This only works due to the fact that the tutorial provides a custom JIT configuration that looks for symbols within the interpreter itself to resolve these external I/O functions. Our problem is thus, is there a way we can do this in a non-C language with Inkwell’s default JIT execution engine? C is the family of all languages, not Rust, so writing these functions will require at least C conventions and require us to compile C code. Once compiled, these functions will also need to be able to be discoverable in user JIT’ed code.

Enter Cargo build scripts. These scripts are often found in Rust crate projects that work in conjunction to C/C++ codebases that require a lot of cooperation between the two languages. The script exists outside the src/ directory at the top level of the crate as build.rs. If present, any cargo build command will use the script as a means to build what the developer requires. The idea is to not only invoke the Rust compiler to compiler our binary crate that produces the executable we’ve been running thus far, but also compile these C functions as their own standalone shared libraries, the link them with the project and insure that their discoverable within our final compiler/interpreter binary.

First, I copy-pasted the C code for putchard and printd above to my crate project under a C file located at src/clib/io.c. Then, I constructed the following build.rs script:

use std::env;
use std::process::Command;

fn main() {

    let out_dir = env::var("OUT_DIR").unwrap();
    
    // Try to execute this series of command line arguments that
    // invokes clang to compile the I/O C code to a shared lib
    let status = Command::new("clang")
        .args(["-shared", "src/clib/io.c"])
        .arg("-o")
        .arg(format!("{}/libio.so", out_dir))
        .status()
        .expect("Failed to invoke C compiler and build shared library for external C functions!");

    if !status.success() {
        panic!("Compilation of C add-on libraries failed!");
    }
    
    // With shared library in hand under libio.so, be sure the Rust
    // compiler links the library so it's discoverable, these print
    // statements actually configure rustc rather than just print
    println!("cargo:rustc-link-search=native={out_dir}");
    println!("cargo:rustc-link-lib=dylib=io");
}

This script both a) compiles the putchard and printd, found in a C file at src/clib/io.c, to a shared library, then b) tells rustc to link the library when building the Rust portion of the project. The result is a C function we can use in our Rust code. We can extern these in with a few lines:

// in main.rs
extern "C" {
    fn putchard(ascii_code: f64) -> f64;
    fn printd(float_value: f64) -> f64;
}

fn main() { /*...*/ }

However, when I first attempted to then extern these functions in during a REPL session, it still resulted in a segfault where I could not find these functions even after compiling and linking them with our final executable. After a bit of research, it seems the Rust compiler was throwing these symbols out because they were never used/called from my Rust code. Rust is willing to throw away any used symbols it can, so in order to avoid this, I needed to use these in some small way in the interpreter/compiler code itself. I found that collecting their pointers into an array at the beginning of main and then never touching them was enough to let them remain discoverable to the JIT process.

fn main() {
    let _externs: &[*const extern "C" fn(f64) -> f64] = &[
        putchard as _,
        printd as _,
    ];

    let cli = cli::Cli::parse();
    
    //...

After this effort, I could resolve these C functions, and call them from the Rust version of the interpreter just as the C++ version did.

Compiling to a Native Object/Assembly File

Languages nowadays can fall into two broad categories. Interpreted and compiled languages.

In an interpreted language, we often have access to a REPL, such as the one we just developed, where a user can enter a program line-by-line and observe how the language operates. This has been a large driving force in languages like Javascript, Python, and Ruby, where the REPL, on top of high level syntax, makes these languages highly accessible to anyone with fingers and eyes.

For this project, the JIT interpreted REPL sessions were equally as important when developing the language, allowing us to experiment with our parser and LLVM code generation in real time, line-by-line.

However, while the tutorial stresses JIT-ing the code quite a bit using LLVM, it is equally possible to use AOT (ahead-of-time) compilation to generate both objects and assembly files with LLVM. After all, LLVM brought forth Clang first, a compiler for one for the most important AOT compiled languages (the C language of course).

Most languages sit in either the interpreted or compiled camps, there are very few of them that can do both. For our simple language, we can certainly compile it to an object as well.

The idea I have is much like when a user uses python at the command line. When you pass no positional file arguments to Python, you start a Python REPL session. When you pass a file, Python instead knows that you want to take that file, and interpret it. I want to do something similar, where the absence of positional arguments will start a REPL session, but given a positional, I will compile to an object.

Let’s start by adding such functionality to our CLI.

// in cli.rs

#[derive(Parser)]
#[command(version, about, long_about = None)]
pub struct Cli {
    /// A positional file containing Kaleidoscope code to compile to object/assembly, if not given, starts interpreter instead
    pub file: Option<PathBuf>,

    /// When compiling a file, specifies an output file to write to
    #[arg(short, long, default_value = "a.out")]
    pub output: PathBuf,

    /// When compiling a file, specifies the output should be assembly instead of object file
    #[arg(short = 'S', long = "assembly")]
    pub asm_p: bool,
    
    // ...

pub fn compile_src<'src>(src_code: &'src str, cli: &Cli) 
    -> Result<(), Box<dyn Error + 'src>> 
{
    let ctx = inkwell::context::Context::create();
    let llvm_ctx = LLVMContext::new(&ctx, cli);

    let mut tokens = src_code.lex().peekable();

    while let Some(token) = tokens.peek() {
        match token {
            Token::Extern => match parse_extern(&mut tokens) {
                Ok(ast) => {
                    ast.codegen(&llvm_ctx)?;
                }
                Err(e) => eprintln!("Error: {}", e),
            },

            Token::FuncDef => match parse_definition(&mut tokens) {
                Ok(ast) => {
                    ast.codegen(&llvm_ctx)?;
                }
                Err(e) => eprintln!("Error: {}", e),
            },

            // Eat semicolons and move on
            Token::Semicolon => {
                tokens.next();
            }

            _top_level_expr => match parse_top_level_expr(&mut tokens) {
                Ok(ast) => {
                    ast.codegen(&llvm_ctx)?;
                }
                Err(e) => eprintln!("Error: {}", e),
            },
        }
    }

    // Run the optimization passes on IR in module, output to object/assembly file
    llvm_ctx.run_passes(&cli.passes);

    if cli.asm_p {
        llvm_ctx.compile(&cli.output.as_path(), FileType::Assembly);
    } else {
        llvm_ctx.compile(&cli.output.as_path(), FileType::Object);
    }

    Ok(())
}

When comes to code generation, the compile function is very similar to the REPL loops, only it parses the file into parse trees, calls the codegen function on each tree to generate IR to the module, then finally runs passes and emits either and object or assembly file, depending on what was passed from the user to the CLI.

Let’s take this out for a spin, shall we? I have a simple test.ks file here that contains a simple program for computing the tenth Fibonacci number. I compile it to an object, then use the LLVM toolchain objdump to dump a disassembly of it. Here we can see the resulting x86_64.

kaleidrs$ cat test.ks 
def fibonacci(n)
    if n < 3 then 
        1 
    else 
        fibonacci(n-1) + fibonacci(n-2)
;

fibonacci(10);
kaleidrs$ cargo r -- test.ks -o test.o
   Compiling kaleidrs v0.1.0 (/home/jdorman/projects/langs-test/kaleidrs)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 4.47s
     Running `target/debug/kaleidrs test.ks -o test.o`
kaleidrs$ llvm-objdump-17 -d test.o

test.o: file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <fibonacci>:
       0: 66 0f 28 c8                   movapd  %xmm0, %xmm1
       4: f2 0f 10 05 00 00 00 00       movsd   (%rip), %xmm0           # xmm0 = mem[0],zero
                                                                        # 0xc <fibonacci+0xc>
       c: 66 0f 2e c1                   ucomisd %xmm1, %xmm0
      10: 76 09                         jbe     0x1b <fibonacci+0x1b>
      12: f2 0f 10 05 00 00 00 00       movsd   (%rip), %xmm0           # xmm0 = mem[0],zero
                                                                        # 0x1a <fibonacci+0x1a>
      1a: c3                            retq
      1b: 48 83 ec 18                   subq    $0x18, %rsp
      1f: f2 0f 10 05 00 00 00 00       movsd   (%rip), %xmm0           # xmm0 = mem[0],zero
                                                                        # 0x27 <fibonacci+0x27>
      27: f2 0f 58 c1                   addsd   %xmm1, %xmm0
      2b: f2 0f 11 4c 24 08             movsd   %xmm1, 0x8(%rsp)
      31: e8 00 00 00 00                callq   0x36 <fibonacci+0x36>
      36: f2 0f 11 44 24 10             movsd   %xmm0, 0x10(%rsp)
      3c: f2 0f 10 44 24 08             movsd   0x8(%rsp), %xmm0        # xmm0 = mem[0],zero
      42: f2 0f 58 05 00 00 00 00       addsd   (%rip), %xmm0           # 0x4a <fibonacci+0x4a>
      4a: e8 00 00 00 00                callq   0x4f <fibonacci+0x4f>
      4f: f2 0f 58 44 24 10             addsd   0x10(%rsp), %xmm0
      55: 48 83 c4 18                   addq    $0x18, %rsp
      59: c3                            retq
      5a: 66 0f 1f 44 00 00             nopw    (%rax,%rax)

0000000000000060 <__anonymous_expr>:
      60: 50                            pushq   %rax
      61: f2 0f 10 05 00 00 00 00       movsd   (%rip), %xmm0           # xmm0 = mem[0],zero
                                                                        # 0x69 <__anonymous_expr+0x9>
      69: e8 00 00 00 00                callq   0x6e <__anonymous_expr+0xe>
      6e: 58                            popq    %rax
      6f: c3                            retq
kaleidrs$

The object alone will not run as an executable of course, but could be compiled together with a simple C program, or linked against your native platform runtime. This is beyond the scope of the this blog post sadly, but is encouraged all the same.

Realizing Every Language with an LLVM Backend Can Cross Compile

Remember how IR ends up as an architecture agnostic representation of a program that can be compiled to a variety of LLVM-supported targets? Our compiled object above is for our native platform architecture, x86_64 as ELF, elf64-x86-64, as we saw above in the object dump of the test program. How easy might it be to cross compile for ARM mobile devices? For Motorola68k? For RISC-V? For WebAssembly?

Again, this is not hard, it is simply a few API calls away using Inkwell. We can configure LLVMTargets to target a specific architecture, then compile our module by passing that as the machine we want to cross-compile for. Let’s go back to the LLVMContext object we made in llvm_backend.rs.

impl<'ctx> LLVMContext<'ctx> {
    pub fn new(context: &'ctx Context, cli_args: &Cli) -> Self {
        let builder = context.create_builder();
        let module = context.create_module("kaleidrs_module");
        
        
        // If the user supplied a specific target, attempt to create
        // it from a string, otherwise, get_default_triple, which is
        // just our native platform, no cross-compilation in this case
        let triple = match cli_args.target.as_ref() {
            None => TargetMachine::get_default_triple(),
            Some(target_str) => TargetTriple::create(target_str.as_str()),
        };

        let target = Target::from_triple(&triple)
            .expect("Unknown target: please specify a valid LLVM target");

        let machine = target
            .create_target_machine(
                &triple,
                "generic",
                "",
                cli_args.opt_level.into(),
                RelocMode::Default,
                CodeModel::Default,
            )
            .unwrap();

        Self {
            context,
            builder,
            module,
            machine,
            sym_table: RefCell::new(HashMap::new()),
        }
    }

Inkwell provides a convenient means of creating a specific target object by just passing a valid target triple string name of one. What is considered a valid string of a LLVM target? It follows a format of <architecture>-<vendor>-<os>-<abi>. My current machine, which is a x86_64 Windows machine running a WSL instance of Ubuntu, currently is expressed as x86_64-unknown-linux-gnu.

However, the <architecture> name alone below could suffice. Using llc, the command line utility that invokes the IR to machine code compiler, we can pass llc –version to print out some targets available to cross-compile for this LLVM toolchain. Passing any of these as the –target argument will be accepted as a valid cross compilation target.

~$ llc-17 --version
Ubuntu LLVM version 17.0.6
  Optimized build.
  Default target: x86_64-pc-linux-gnu
  Host CPU: alderlake

  Registered Targets:
    aarch64     - AArch64 (little endian)
    aarch64_32  - AArch64 (little endian ILP32)
    aarch64_be  - AArch64 (big endian)
    amdgcn      - AMD GCN GPUs
    arm         - ARM
    arm64       - ARM64 (little endian)
    arm64_32    - ARM64 (little endian ILP32)
    armeb       - ARM (big endian)
    avr         - Atmel AVR Microcontroller
    bpf         - BPF (host endian)
    bpfeb       - BPF (big endian)
    bpfel       - BPF (little endian)
    hexagon     - Hexagon
    lanai       - Lanai
    loongarch32 - 32-bit LoongArch
    loongarch64 - 64-bit LoongArch
    m68k        - Motorola 68000 family
    mips        - MIPS (32-bit big endian)
    mips64      - MIPS (64-bit big endian)
    mips64el    - MIPS (64-bit little endian)
    mipsel      - MIPS (32-bit little endian)
    msp430      - MSP430 [experimental]
    nvptx       - NVIDIA PTX 32-bit
    nvptx64     - NVIDIA PTX 64-bit
    ppc32       - PowerPC 32
    ppc32le     - PowerPC 32 LE
    ppc64       - PowerPC 64
    ppc64le     - PowerPC 64 LE
    r600        - AMD GPUs HD2XXX-HD6XXX
    riscv32     - 32-bit RISC-V
    riscv64     - 64-bit RISC-V
    sparc       - Sparc
    sparcel     - Sparc LE
    sparcv9     - Sparc V9
    systemz     - SystemZ
    thumb       - Thumb
    thumbeb     - Thumb (big endian)
    ve          - VE
    wasm32      - WebAssembly 32-bit
    wasm64      - WebAssembly 64-bit
    x86         - 32-bit X86: Pentium-Pro and above
    x86-64      - 64-bit X86: EM64T and AMD64
    xcore       - XCore
    xtensa      - Xtensa 32
~$

Yes, as long as you have IR in your hand, it is possible to compile for any of the above targets! Most of the above are CPUs, some GPUs, and others are virtual machines, like wasm32 and wasm64, which are not even real metal but virtual state machines, like Java.

Here’s some real examples of our compiler at work, cross compiling test.ks for ARM v7 Cortex-A and a RISCV-64 CPU targets, and using the handy -S flag, which outputs the assembly representation of the object. I encourage readers to clone the repo if they haven’t already and experiment as well, since this is an extremely fun way of comparing the same Kaleidoscope program compiled for architectures of all varieties!

kaleidrs$ cargo run -- -S --target armv7a test.ks -o test.S
   Compiling kaleidrs v0.1.0 (/home/jdorman/projects/langs-test/kaleidrs)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 3.28s
     Running `target/debug/kaleidrs -S --target armv7a test.ks -o test.S`
kaleidrs$ cat test.S
        // ...
fibonacci:
        .fnstart
        push    {r11, lr}
        vpush   {d8}
        vmov.f64        d16, #3.000000e+00
        vmov    d8, r0, r1
        vcmp.f64        d8, d16
        vmrs    APSR_nzcv, fpscr
        bpl     .LBB0_2
        vmov.f64        d16, #1.000000e+00
        b       .LBB0_3
.LBB0_2:
        vmov.f64        d16, #-1.000000e+00
        vadd.f64        d16, d8, d16
        vmov    r0, r1, d16
        bl      fibonacci
        vmov.f64        d16, #-2.000000e+00
        vadd.f64        d16, d8, d16
        vmov    r2, r3, d16
        vmov    d8, r0, r1
        mov     r0, r2
        mov     r1, r3
        bl      fibonacci
        vmov    d16, r0, r1
        vadd.f64        d16, d8, d16
.LBB0_3:
        vmov    r0, r1, d16
        vpop    {d8}
        pop     {r11, pc}
.Lfunc_end0:
        .size   fibonacci, .Lfunc_end0-fibonacci
        .fnend

        .globl  __anonymous_expr
        .p2align        2
        .type   __anonymous_expr,%function
        .code   32
__anonymous_expr:
        .fnstart
        push    {r11, lr}
        vmov.f64        d16, #1.000000e+01
        vmov    r0, r1, d16
        bl      fibonacci
        pop     {r11, pc}
.Lfunc_end1:
        .size   __anonymous_expr, .Lfunc_end1-__anonymous_expr
        .fnend

        .section        ".note.GNU-stack","",%progbits
        
kaleidrs$ cargo run -- -S --target riscv64 test.ks -o test.S
   Compiling kaleidrs v0.1.0 (/home/jdorman/projects/langs-test/kaleidrs)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 1.68s
     Running `target/debug/kaleidrs -S --target riscv64 test.ks -o test.S`
kaleidrs$ cat test.S
        .text
        //...
fibonacci:
        .cfi_startproc
        addi    sp, sp, -32
        .cfi_def_cfa_offset 32
        sd      ra, 24(sp)
        sd      s0, 16(sp)
        sd      s1, 8(sp)
        .cfi_offset ra, -8
        .cfi_offset s0, -16
        .cfi_offset s1, -24
        mv      s0, a0
        lui     a1, 2049
        slli    a1, a1, 39
        call    __ltdf2@plt
        bgez    a0, .LBB0_2
        li      a0, 1023
        slli    a0, a0, 52
        j       .LBB0_3
.LBB0_2:
        li      a1, -1025
        slli    a1, a1, 52
        mv      a0, s0
        call    __adddf3@plt
        call    fibonacci@plt
        mv      s1, a0
        li      a1, -1
        slli    a1, a1, 62
        mv      a0, s0
        call    __adddf3@plt
        call    fibonacci@plt
        mv      a1, a0
        mv      a0, s1
        call    __adddf3@plt
.LBB0_3:
        ld      ra, 24(sp)
        ld      s0, 16(sp)
        ld      s1, 8(sp)
        addi    sp, sp, 32
        ret
.Lfunc_end0:
        .size   fibonacci, .Lfunc_end0-fibonacci
        .cfi_endproc

        .globl  __anonymous_expr
        .p2align        2
        .type   __anonymous_expr,@function
__anonymous_expr:
        .cfi_startproc
        addi    sp, sp, -16
        .cfi_def_cfa_offset 16
        sd      ra, 8(sp)
        .cfi_offset ra, -8
        lui     a0, 4105
        slli    a0, a0, 38
        call    fibonacci@plt
        ld      ra, 8(sp)
        addi    sp, sp, 16
        ret
.Lfunc_end1:
        .size   __anonymous_expr, .Lfunc_end1-__anonymous_expr
        .cfi_endproc

        .section        ".note.GNU-stack","",@progbits

A MVP Compiler/Interpreter In Hand

There you have it! We now have both a Kaleidoscope interpreter capable of JIT compiling and executing LLVM IR directly on our native machine’s CPU, as well as a capable cross compiler that outputs either objects or assembly files. We also have a professional looking command line interface for our tool to boot, where we can pass targets, optimization levels, LLVM optimization pass names to run, among other things! The only thing we truly lack is a way to link these objects to an executable/library but beside that we can consider this a MVP in terms of our compiler.

This blog post sums up part 4 of the official LLVM Kaleidoscope tutorial. The next post will focus on extending the language with control flow, mutability, and user-defined operators, which correspond with parts 5-7 of the original tutorial. Implementing more nuanced language features will require exploring more of LLVM IR.