AST Traversal Patterns
In the previous section, we handcrafted every LLVM instruction for a simple add function. That worked, but it was tedious - imagine doing that for every possible expression! We need a systematic way to turn any AST into LLVM IR.
The answer is recursive tree traversal. We already do this in the interpreter - walk the tree, evaluate each node. For code generation, we walk the tree and emit instructions for each node instead of computing values.
Two common patterns help structure this:
- Builder pattern - Used here for LLVM IR generation
- Visitor pattern - Introduced in Secondlang Optimizations for AST transformations
Builder Pattern
Think of the LLVM builder like a cursor in a text editor. You position it somewhere in your code, then “type” instructions at that position. The builder keeps track of where you are and ensures instructions are added in the right place.
Let’s compare our interpreter’s recursive evaluation to the new JIT approach.
Interpreter: Walk tree, compute values
struct Eval;
impl Eval {
pub fn new() -> Self {
Self
}
pub fn eval(&self, node: &Node) -> i32 {
match node {
Node::Int(n) => *n,
Node::UnaryExpr { op, child } => {
let child = self.eval(child);
match op {
Operator::Plus => child,
Operator::Minus => -child,
}
}
Node::BinaryExpr { op, lhs, rhs } => {
let lhs_ret = self.eval(lhs);
let rhs_ret = self.eval(rhs);
match op {
Operator::Plus => lhs_ret + rhs_ret,
Operator::Minus => lhs_ret - rhs_ret,
}
}
}
}
}
calculator/src/compiler/interpreter.rs
The interpreter looks at each node and returns a computed integer. For Int(5), it returns 5. For Binary { op: Add, left: Int(1), right: Int(2) }, it recursively evaluates both sides and adds them.
JIT: Walk tree, emit LLVM instructions
struct RecursiveBuilder<'a> {
i32_type: IntType<'a>,
builder: &'a Builder<'a>,
}
impl<'a> RecursiveBuilder<'a> {
pub fn new(i32_type: IntType<'a>, builder: &'a Builder) -> Self {
Self { i32_type, builder }
}
pub fn build(&self, ast: &Node) -> IntValue<'a> {
match ast {
Node::Int(n) => self.i32_type.const_int(*n as u64, true),
Node::UnaryExpr { op, child } => {
let child = self.build(child);
match op {
Operator::Minus => child.const_neg(),
Operator::Plus => child,
}
}
Node::BinaryExpr { op, lhs, rhs } => {
let left = self.build(lhs);
let right = self.build(rhs);
match op {
Operator::Plus => self
.builder
.build_int_add(left, right, "plus_temp")
.unwrap(),
Operator::Minus => self
.builder
.build_int_sub(left, right, "minus_temp")
.unwrap(),
}
}
}
}
}
The structure is identical! We still match on node types and recurse. But instead of computing values directly, we build LLVM instructions:
-
Int(n)- Create an LLVM integer constant. This doesn’t “do” anything at compile time - it creates a value that will exist when the code runs. -
UnaryExpr- For negation, we recursively compile the inner expression (getting an LLVM value), then emit a subtraction from zero. In LLVM,-xis typically represented as0 - x. -
BinaryExpr- Recursively compile both sides, then emit the appropriate arithmetic instruction (build_int_add,build_int_sub). The"add"and"sub"strings are just names for debugging - they show up when you print the IR.
The key insight: recursive_builder returns LLVM values, not Rust integers. These values represent computation that will happen when the JIT-compiled code runs.
Putting It Together
Now we wire up the complete JIT pipeline:
pub struct Jit;
impl Compile for Jit {
type Output = Result<i32>;
fn from_ast(ast: Vec<Node>) -> Self::Output {
let context = Context::create();
let module = context.create_module("calculator");
let builder = context.create_builder();
let execution_engine = module
.create_jit_execution_engine(OptimizationLevel::None)
.unwrap();
let i32_type = context.i32_type();
let fn_type = i32_type.fn_type(&[], false);
let function = module.add_function("jit", fn_type, None);
let basic_block = context.append_basic_block(function, "entry");
builder.position_at_end(basic_block);
for node in ast {
let recursive_builder = RecursiveBuilder::new(i32_type, &builder);
let return_value = recursive_builder.build(&node);
let _ = builder.build_return(Some(&return_value));
}
println!(
"Generated LLVM IR: {}",
function.print_to_string().to_string()
);
unsafe {
let jit_function: JitFunction<JitFunc> = execution_engine.get_function("jit").unwrap();
Ok(jit_function.call())
}
}
}
calculator/src/compiler/jit.rs
Let’s trace through what happens when we JIT 1 + 2:
-
Parse - Turn
"1 + 2"into AST:Binary { op: Add, left: Int(1), right: Int(2) } -
Setup LLVM - Create context, module, builder
-
Create wrapper function - We need a function to call, so we create
__jitwith signature() -> i64 -
Compile AST -
recursive_builderwalks the tree:- Compile
Int(1)→ creates i64 constant1 - Compile
Int(2)→ creates i64 constant2 - Compile
Add→ createsadd i64 1, 2instruction, returns the result
- Compile
-
Return result -
build_returnemits aretinstruction with our computed value -
JIT compile - LLVM turns our IR into native machine code
-
Execute - Call the function, get
3
Why This Matters
The recursive builder pattern scales to any expression, no matter how complex:
-((1 + 2) * (3 - 4))
This parses to a nested tree, and recursive_builder handles each level automatically. Each call compiles its children first, then emits its own instruction using those children as operands.
This is the foundation of every LLVM-based compiler. Rust, Swift, Julia - they all use this pattern (with many more node types and optimizations).
Testing
assert_eq!(Jit::from_source("1 + 2").unwrap(), 3)
Run tests locally:
cargo test jit --tests
In the next section, we’ll see an alternative approach: compiling to bytecode for a virtual machine, which trades some speed for portability.