Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

AST Traversal Patterns

In the previous section, we handcrafted every LLVM instruction for a simple add function. That worked, but it was tedious - imagine doing that for every possible expression! We need a systematic way to turn any AST into LLVM IR.

The answer is recursive tree traversal. We already do this in the interpreter - walk the tree, evaluate each node. For code generation, we walk the tree and emit instructions for each node instead of computing values.

Two common patterns help structure this:

  • Builder pattern - Used here for LLVM IR generation
  • Visitor pattern - Introduced in Secondlang Optimizations for AST transformations

Builder Pattern

Think of the LLVM builder like a cursor in a text editor. You position it somewhere in your code, then “type” instructions at that position. The builder keeps track of where you are and ensures instructions are added in the right place.

Let’s compare our interpreter’s recursive evaluation to the new JIT approach.

Interpreter: Walk tree, compute values

struct Eval;

impl Eval {
    pub fn new() -> Self {
        Self
    }
    pub fn eval(&self, node: &Node) -> i32 {
        match node {
            Node::Int(n) => *n,
            Node::UnaryExpr { op, child } => {
                let child = self.eval(child);
                match op {
                    Operator::Plus => child,
                    Operator::Minus => -child,
                }
            }
            Node::BinaryExpr { op, lhs, rhs } => {
                let lhs_ret = self.eval(lhs);
                let rhs_ret = self.eval(rhs);

                match op {
                    Operator::Plus => lhs_ret + rhs_ret,
                    Operator::Minus => lhs_ret - rhs_ret,
                }
            }
        }
    }
}

calculator/src/compiler/interpreter.rs

The interpreter looks at each node and returns a computed integer. For Int(5), it returns 5. For Binary { op: Add, left: Int(1), right: Int(2) }, it recursively evaluates both sides and adds them.

JIT: Walk tree, emit LLVM instructions

struct RecursiveBuilder<'a> {
    i32_type: IntType<'a>,
    builder: &'a Builder<'a>,
}

impl<'a> RecursiveBuilder<'a> {
    pub fn new(i32_type: IntType<'a>, builder: &'a Builder) -> Self {
        Self { i32_type, builder }
    }
    pub fn build(&self, ast: &Node) -> IntValue<'a> {
        match ast {
            Node::Int(n) => self.i32_type.const_int(*n as u64, true),
            Node::UnaryExpr { op, child } => {
                let child = self.build(child);
                match op {
                    Operator::Minus => child.const_neg(),
                    Operator::Plus => child,
                }
            }
            Node::BinaryExpr { op, lhs, rhs } => {
                let left = self.build(lhs);
                let right = self.build(rhs);

                match op {
                    Operator::Plus => self
                        .builder
                        .build_int_add(left, right, "plus_temp")
                        .unwrap(),
                    Operator::Minus => self
                        .builder
                        .build_int_sub(left, right, "minus_temp")
                        .unwrap(),
                }
            }
        }
    }
}

The structure is identical! We still match on node types and recurse. But instead of computing values directly, we build LLVM instructions:

  • Int(n) - Create an LLVM integer constant. This doesn’t “do” anything at compile time - it creates a value that will exist when the code runs.

  • UnaryExpr - For negation, we recursively compile the inner expression (getting an LLVM value), then emit a subtraction from zero. In LLVM, -x is typically represented as 0 - x.

  • BinaryExpr - Recursively compile both sides, then emit the appropriate arithmetic instruction (build_int_add, build_int_sub). The "add" and "sub" strings are just names for debugging - they show up when you print the IR.

The key insight: recursive_builder returns LLVM values, not Rust integers. These values represent computation that will happen when the JIT-compiled code runs.

Putting It Together

Now we wire up the complete JIT pipeline:

pub struct Jit;

impl Compile for Jit {
    type Output = Result<i32>;

    fn from_ast(ast: Vec<Node>) -> Self::Output {
        let context = Context::create();
        let module = context.create_module("calculator");

        let builder = context.create_builder();

        let execution_engine = module
            .create_jit_execution_engine(OptimizationLevel::None)
            .unwrap();

        let i32_type = context.i32_type();
        let fn_type = i32_type.fn_type(&[], false);

        let function = module.add_function("jit", fn_type, None);
        let basic_block = context.append_basic_block(function, "entry");

        builder.position_at_end(basic_block);

        for node in ast {
            let recursive_builder = RecursiveBuilder::new(i32_type, &builder);
            let return_value = recursive_builder.build(&node);
            let _ = builder.build_return(Some(&return_value));
        }
        println!(
            "Generated LLVM IR: {}",
            function.print_to_string().to_string()
        );

        unsafe {
            let jit_function: JitFunction<JitFunc> = execution_engine.get_function("jit").unwrap();

            Ok(jit_function.call())
        }
    }
}

calculator/src/compiler/jit.rs

Let’s trace through what happens when we JIT 1 + 2:

  1. Parse - Turn "1 + 2" into AST: Binary { op: Add, left: Int(1), right: Int(2) }

  2. Setup LLVM - Create context, module, builder

  3. Create wrapper function - We need a function to call, so we create __jit with signature () -> i64

  4. Compile AST - recursive_builder walks the tree:

    • Compile Int(1) → creates i64 constant 1
    • Compile Int(2) → creates i64 constant 2
    • Compile Add → creates add i64 1, 2 instruction, returns the result
  5. Return result - build_return emits a ret instruction with our computed value

  6. JIT compile - LLVM turns our IR into native machine code

  7. Execute - Call the function, get 3

Why This Matters

The recursive builder pattern scales to any expression, no matter how complex:

-((1 + 2) * (3 - 4))

This parses to a nested tree, and recursive_builder handles each level automatically. Each call compiles its children first, then emits its own instruction using those children as operands.

This is the foundation of every LLVM-based compiler. Rust, Swift, Julia - they all use this pattern (with many more node types and optimizations).

Testing

assert_eq!(Jit::from_source("1 + 2").unwrap(), 3)

Run tests locally:

cargo test jit --tests

In the next section, we’ll see an alternative approach: compiling to bytecode for a virtual machine, which trades some speed for portability.