Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Optimizing LLVM IR

So far, our compiler generates correct but naive LLVM IR. Every variable gets a stack allocation (alloca), every parameter is stored and reloaded, and the code includes redundant operations. In this chapter, we integrate LLVM’s pass manager to optimize our IR before execution.

Prerequisites: This chapter assumes you understand LLVM IR syntax. If instructions like alloca, load, store, and getelementptr are unfamiliar, read From AST to IR first.

Requirement: This chapter requires LLVM 20+ because the run_passes() API we use is part of LLVM’s New Pass Manager, which is fully supported in LLVM 20. Check your version with llvm-config --version.

If you have a different LLVM version, update thirdlang/Cargo.toml:

# For LLVM 20.x
inkwell = { version = "0.7", features = ["llvm20-1"] }

What Are Optimization Passes?

An optimization pass is a transformation that improves the IR without changing its behavior. LLVM provides dozens of passes that work together in a pipeline:

PassWhat It Does
mem2regPromotes stack allocations to SSA registers
dceRemoves dead (unused) code
instcombineCombines redundant instructions
simplifycfgSimplifies control flow graph

The Optimization Pipeline in Action

Let us trace how each pass transforms the IR for a simple increment method. This is the most important section to understand.

Step 0: Original Source

Create examples/counter.tl with a Counter class (or use the existing one):

class Counter {
    value: int

    def increment(self) -> int {
        self.value = self.value + 1
        return self.value
    }
}

Step 1: Unoptimized IR (what our codegen produces)

Generate with:

cd thirdlang
rustup run nightly cargo run -- --ir examples/counter.tl

Output (showing just the increment method):

define i64 @Counter__increment(ptr %self) {
entry:
  %self1 = alloca ptr, align 8              ; allocate stack slot for self
  store ptr %self, ptr %self1, align 8      ; store parameter to stack
  %self2 = load ptr, ptr %self1, align 8    ; load self from stack
  %field_ptr = getelementptr %Counter, ptr %self2, i32 0, i32 0
  %field = load i64, ptr %field_ptr         ; load self.value
  %add = add i64 %field, 1                  ; add 1
  %self3 = load ptr, ptr %self1, align 8    ; load self again!
  %field_ptr4 = getelementptr %Counter, ptr %self3, i32 0, i32 0
  store i64 %add, ptr %field_ptr4           ; store to self.value
  %self5 = load ptr, ptr %self1, align 8    ; load self a third time!
  %field_ptr6 = getelementptr %Counter, ptr %self5, i32 0, i32 0
  %field7 = load i64, ptr %field_ptr6       ; load self.value for return
  ret i64 %field7
}

Problems: 14 instructions, redundant loads, stack allocation for a parameter.

Step 2: After mem2reg

Generate with:

rustup run nightly cargo run -- --ir --passes "mem2reg" examples/counter.tl

The mem2reg pass promotes the alloca to an SSA register:

define i64 @Counter__increment(ptr %self) {
entry:
  ; No more alloca! %self is used directly
  %field_ptr = getelementptr %Counter, ptr %self, i32 0, i32 0
  %field = load i64, ptr %field_ptr
  %add = add i64 %field, 1
  %field_ptr4 = getelementptr %Counter, ptr %self, i32 0, i32 0
  store i64 %add, ptr %field_ptr4
  %field_ptr6 = getelementptr %Counter, ptr %self, i32 0, i32 0
  %field7 = load i64, ptr %field_ptr6
  ret i64 %field7
}

Result: Eliminated alloca, store, and 3 redundant loads of %self.

Step 3: After mem2reg,instcombine

Generate with:

rustup run nightly cargo run -- --ir --passes "mem2reg,instcombine" examples/counter.tl

The instcombine pass merges redundant GEP instructions and simplifies patterns:

define i64 @Counter__increment(ptr %self) {
entry:
  %field = load i64, ptr %self              ; GEPs merged, %self IS the field ptr
  %add = add i64 %field, 1
  store i64 %add, ptr %self
  %field7 = load i64, ptr %self             ; still loading for return
  ret i64 %field7
}

Result: Since Counter has only one field at offset 0, GEP simplifies away.

Step 4: After full pipeline (with dce)

Generate with:

rustup run nightly cargo run -- --ir -O examples/counter.tl

Or explicitly:

rustup run nightly cargo run -- --ir --passes "mem2reg,instcombine,dce,simplifycfg" examples/counter.tl

The dce pass removes the unnecessary final load - we already have the value in %add:

define i64 @Counter__increment(ptr %self) {
entry:
  %field = load i64, ptr %self, align 4
  %add = add i64 %field, 1
  store i64 %add, ptr %self, align 4
  ret i64 %add                              ; return %add directly!
}

Final Result: 4 instructions instead of 14!

Summary of the Pipeline


llvm optimization pipeline

Implementation

The run_passes Method

Inkwell provides access to the pass manager through Module::run_passes():

    /// Run LLVM optimization passes using the New Pass Manager
    ///
    /// # Arguments
    /// * `passes` - A comma-separated list of passes, e.g., "dce,mem2reg,instcombine"
    ///   or a preset like "default<O2>"
    ///
    /// # Common passes for teaching:
    /// - `dce` - Dead Code Elimination
    /// - `mem2reg` - Promote allocas to SSA registers
    /// - `instcombine` - Combine redundant instructions
    /// - `simplifycfg` - Simplify control flow graph
    /// - `gvn` - Global Value Numbering
    /// - `default<O0>` through `default<O3>` - Standard optimization levels
    pub fn run_passes(&self, passes: &str) -> Result<(), String> {
        // Initialize native target for the current machine
        Target::initialize_native(&InitializationConfig::default())
            .map_err(|e| format!("Failed to initialize native target: {}", e))?;

        // Get the default target triple for this machine
        let triple = TargetMachine::get_default_triple();

        // Get the target from the triple
        let target = Target::from_triple(&triple)
            .map_err(|e| format!("Failed to get target from triple: {}", e))?;

        // Create target machine with default settings
        let target_machine = target
            .create_target_machine(
                &triple,
                "generic", // CPU
                "",        // Features
                OptimizationLevel::Default,
                RelocMode::Default,
                CodeModel::Default,
            )
            .ok_or_else(|| "Failed to create target machine".to_string())?;

        // Create pass builder options
        let pass_options = PassBuilderOptions::create();
        pass_options.set_verify_each(true); // Verify IR after each pass

        // Run the passes
        self.module
            .run_passes(passes, &target_machine, pass_options)
            .map_err(|e| format!("Failed to run passes: {}", e))
    }

Key points:

  1. Initialize Native Target: Required before creating a TargetMachine
  2. Get Target Triple: The host machine description (e.g., x86_64-apple-darwin)
  3. Create TargetMachine: Needed for target-specific optimizations
  4. run_passes: Takes a comma-separated list of passes

Pass Pipeline String

The passes argument is a string like:

  • "dce,mem2reg,instcombine" - Custom pipeline
  • "default<O2>" - LLVM’s standard O2 optimization

Integration with JIT

We add an optional optimization step to our JIT runner:

/// JIT compile and run a program with optional optimization passes
///
/// # Arguments
/// * `program` - The parsed and type-checked program
/// * `classes` - Class registry from type checking
/// * `passes` - Optional optimization passes (e.g., "dce,mem2reg,instcombine")
pub fn jit_run_with_opts(
    program: &Program,
    classes: ClassRegistry,
    passes: Option<&str>,
) -> Result<i64, String> {
    let context = Context::create();
    let mut codegen = CodeGen::new(&context, "thirdlang", classes);

    codegen.compile(program)?;

    // Run optimization passes if specified
    if let Some(pass_pipeline) = passes {
        codegen.run_passes(pass_pipeline)?;
    }

    // Create execution engine
    let engine = codegen
        .module
        .create_jit_execution_engine(OptimizationLevel::Default)
        .map_err(|e| format!("Failed to create JIT: {}", e.to_string()))?;

    // Call the __main wrapper function
    unsafe {
        let func: inkwell::execution_engine::JitFunction<unsafe extern "C" fn() -> i64> =
            engine.get_function("__main").map_err(|e| e.to_string())?;
        Ok(func.call())
    }
}

Understanding Each Pass

mem2reg: The Essential Pass

mem2reg converts stack-allocated variables to SSA registers. This is critical because our codegen creates allocas for every variable, but registers are much faster than memory.

Before:

%x = alloca i64
store i64 42, ptr %x
%val = load i64, ptr %x

After:

%val = 42

dce: Dead Code Elimination

Removes instructions whose results are never used:

Before:

%unused = add i64 %a, %b    ; result never used
%result = mul i64 %c, %d
ret i64 %result

After:

%result = mul i64 %c, %d
ret i64 %result

instcombine: Instruction Combining

Simplifies patterns:

  • sub i64 %x, 1 becomes add i64 %x, -1
  • mul i64 %x, 2 becomes shl i64 %x, 1 (shift left)
  • Constant folding: add i64 3, 4 becomes 7

simplifycfg: Control Flow Simplification

Cleans up the control flow graph:

  • Removes empty basic blocks
  • Merges blocks with single predecessors
  • Simplifies trivial branches

Using the CLI

# Run without optimization
thirdlang examples/point.tl

# Run with optimization
thirdlang -O examples/point.tl

# Run with custom passes
thirdlang --passes "mem2reg,dce" examples/point.tl

# Print unoptimized IR
thirdlang --ir examples/point.tl

# Print optimized IR
thirdlang --ir -O examples/point.tl

# Use LLVM's O2 pipeline
thirdlang --passes "default<O2>" examples/point.tl

Testing Optimization

We verify that optimization produces correct results:

#[test]
fn test_optimization_pipeline() {
    use thirdlang::compile_to_ir_with_opts;

    let source = r#"
        class Simple {
            value: int

            def __init__(self, v: int) {
                self.value = v
            }

            def get(self) -> int {
                return self.value
            }
        }

        s = new Simple(100)
        result = s.get()
        delete s
        result
    "#;

    // Get unoptimized IR
    let unopt_ir = compile_to_ir_with_opts(source, None).unwrap();

    // Get optimized IR
    let opt_ir = compile_to_ir_with_opts(source, Some("mem2reg,dce,instcombine")).unwrap();

    // Optimized IR should be shorter (fewer alloca instructions)
    assert!(
        opt_ir.len() < unopt_ir.len(),
        "Optimized IR should be smaller"
    );

    // Unoptimized should have alloca for parameters
    assert!(
        unopt_ir.contains("alloca"),
        "Unoptimized IR should have allocas"
    );
}

Optimization Levels

LLVM provides preset pipelines:

LevelDescription
default<O0>No optimization (verification only)
default<O1>Light optimization
default<O2>Standard optimization (recommended)
default<O3>Aggressive optimization

For teaching, we use dce,mem2reg,instcombine,simplifycfg to see each pass individually.

Summary

We added LLVM optimization:

  1. Created run_passes() method that accepts a pipeline string
  2. Integrated optimization into the JIT runner
  3. Added -O and --passes CLI flags

The key insight: our naive codegen produces correct but inefficient IR. LLVM passes transform it into efficient code:

  • mem2reg - Eliminates stack allocations
  • instcombine - Merges redundant instructions
  • dce - Removes unused code

Try thirdlang --ir examples/counter.tl vs thirdlang --ir -O examples/counter.tl to see the difference!

Cross-References