Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

LLVM Code Generation for Classes

Prerequisites: This chapter builds directly on Secondlang’s code generation. Make sure you understand how we compile expressions and functions to LLVM IR before proceeding.

Now we see how classes translate to LLVM IR. The key insight: classes become structs, methods become functions. We extend the patterns from From AST to IR with new concepts for object-oriented features.

The CodeGen Structure

Our code generator has new fields for class support:

/// Code generator state
pub struct CodeGen<'ctx> {
    context: &'ctx Context,
    module: Module<'ctx>,
    builder: Builder<'ctx>,
    /// Map from variable names to their stack allocations
    variables: HashMap<String, PointerValue<'ctx>>,
    /// Map from function names to LLVM functions
    functions: HashMap<String, FunctionValue<'ctx>>,
    /// Map from class names to their LLVM struct types
    class_types: HashMap<String, StructType<'ctx>>,
    /// Class registry from type checker
    classes: ClassRegistry,
    /// Current function being compiled
    current_fn: Option<FunctionValue<'ctx>>,
    /// Current class being compiled (for method compilation)
    current_class: Option<String>,
}

thirdlang/src/codegen.rs

The new fields:

  • class_types - Maps class names to LLVM struct types
  • classes - Maps class names to ClassInfo (field/method metadata)
  • current_class - The class we are currently compiling (for self resolution)

Compilation Pipeline

    /// Compile a program and return the module
    pub fn compile(&mut self, program: &Program) -> Result<(), String> {
        // Declare libc functions
        self.declare_libc_functions();

        // First pass: create LLVM struct types for classes
        for item in program {
            if let TopLevel::Class(class) = item {
                self.create_class_type(class)?;
            }
        }

        // Second pass: declare all functions and methods
        for item in program {
            match item {
                TopLevel::Class(class) => {
                    self.declare_class_methods(class)?;
                }
                TopLevel::Stmt(Stmt::Function {
                    name,
                    params,
                    return_type,
                    ..
                }) => {
                    self.declare_function(name, params, return_type)?;
                }
                _ => {}
            }
        }

        // Third pass: compile function and method bodies
        for item in program {
            match item {
                TopLevel::Class(class) => {
                    self.compile_class(class)?;
                }
                TopLevel::Stmt(stmt @ Stmt::Function { .. }) => {
                    self.compile_stmt(stmt)?;
                }
                _ => {}
            }
        }

        // Fourth pass: create __main wrapper for all top-level non-function statements
        self.compile_main_wrapper_all(program)?;

        // Verify module
        self.module
            .verify()
            .map_err(|e| format!("Module verification failed: {}", e.to_string()))?;

        Ok(())
    }

thirdlang/src/codegen.rs

The compilation happens in phases:

  1. Declare libc functions - malloc and free
  2. Create class struct types - Define LLVM struct for each class
  3. Declare methods - Create function signatures
  4. Compile class bodies - Generate method implementations
  5. Compile top-level code - Generate __main wrapper
  6. Verify module - Check IR is well-formed

Classes as LLVM Structs

Each class becomes an LLVM struct type:

class Point {
    x: int
    y: int
}

Becomes:

%Point = type { i64, i64 }
;               ^^^  ^^^
;                x    y (in field_order)

Creating the Struct Type

    /// Create LLVM struct type for a class
    fn create_class_type(&mut self, class: &ClassDef) -> Result<StructType<'ctx>, String> {
        let class_info = self
            .classes
            .get(&class.name)
            .ok_or_else(|| format!("Class {} not found in registry", class.name))?;

        // Create field types in order
        let mut field_types: Vec<BasicTypeEnum> = Vec::new();
        for field_name in &class_info.field_order {
            let field_type = class_info.get_field(field_name).unwrap();
            let llvm_type = self.llvm_basic_type(field_type)?;
            field_types.push(llvm_type);
        }

        // Create named struct type
        let struct_type = self.context.opaque_struct_type(&class.name);
        struct_type.set_body(&field_types, false);

        self.class_types.insert(class.name.clone(), struct_type);
        Ok(struct_type)
    }

thirdlang/src/codegen.rs

The field_order is crucial - it determines the memory layout.

Methods as Functions

Methods compile to regular functions with a naming convention:

class Point {
    def get_x(self) -> int {
        return self.x
    }
}

Becomes:

define i64 @Point__get_x(ptr %self) {
entry:
    %x_ptr = getelementptr %Point, ptr %self, i32 0, i32 0
    %x = load i64, ptr %x_ptr
    ret i64 %x
}

Method Naming

We use ClassName__methodName:

MethodLLVM Function Name
Point.__init__@Point____init__
Point.get_x@Point__get_x
Counter.increment@Counter__increment

This avoids name collisions between classes.

Self as First Parameter

Every method takes self (a pointer) as its first parameter:

define i64 @Point__get_x(ptr %self) { ... }
;                        ^^^^^^^^ self is a pointer to Point

When calling p.get_x(), we pass p as the first argument.

Field Access

Reading a field uses LLVM’s getelementptr (GEP):

return self.x

Becomes:

%x_ptr = getelementptr %Point, ptr %self, i32 0, i32 0
;                                         ^^^     ^^^
;                                         struct  field index
%x = load i64, ptr %x_ptr
ret i64 %x

The GEP instruction calculates the address of field 0 (which is x).

Writing a Field

self.x = 42

Becomes:

%x_ptr = getelementptr %Point, ptr %self, i32 0, i32 0
store i64 42, ptr %x_ptr

Same GEP, but followed by store instead of load.

Compiling Expressions

    /// Compile an expression
    fn compile_expr(&mut self, expr: &TypedExpr) -> Result<BasicValueEnum<'ctx>, String> {
        match &expr.expr {
            Expr::Int(n) => Ok(self.context.i64_type().const_int(*n as u64, false).into()),

            Expr::Bool(b) => Ok(self.context.bool_type().const_int(*b as u64, false).into()),

            Expr::Var(name) => {
                let ptr = self
                    .variables
                    .get(name)
                    .ok_or_else(|| format!("Undefined variable: {}", name))?;

                let load_type = if expr.ty.is_class() {
                    self.context
                        .ptr_type(AddressSpace::default())
                        .as_basic_type_enum()
                } else {
                    self.context.i64_type().as_basic_type_enum()
                };

                let val = self.builder.build_load(load_type, *ptr, name).unwrap();
                Ok(val)
            }

            Expr::SelfRef => {
                let ptr = self.variables.get("self").ok_or("'self' not in scope")?;
                let val = self
                    .builder
                    .build_load(self.context.ptr_type(AddressSpace::default()), *ptr, "self")
                    .unwrap();
                Ok(val)
            }
            Expr::Unary { op, expr: inner } => {
                let val = self.compile_expr(inner)?.into_int_value();
                let result = match op {
                    UnaryOp::Neg => self.builder.build_int_neg(val, "neg").unwrap(),
                    UnaryOp::Not => self.builder.build_not(val, "not").unwrap(),
                };
                Ok(result.into())
            }

            Expr::Binary { op, left, right } => {
                let l = self.compile_expr(left)?.into_int_value();
                let r = self.compile_expr(right)?.into_int_value();

                let result = match op {
                    BinaryOp::Add => self.builder.build_int_add(l, r, "add").unwrap(),
                    BinaryOp::Sub => self.builder.build_int_sub(l, r, "sub").unwrap(),
                    BinaryOp::Mul => self.builder.build_int_mul(l, r, "mul").unwrap(),
                    BinaryOp::Div => self.builder.build_int_signed_div(l, r, "div").unwrap(),
                    BinaryOp::Mod => self.builder.build_int_signed_rem(l, r, "mod").unwrap(),
                    BinaryOp::Lt => {
                        let cmp = self
                            .builder
                            .build_int_compare(IntPredicate::SLT, l, r, "lt")
                            .unwrap();
                        self.builder
                            .build_int_z_extend(cmp, self.context.i64_type(), "ext")
                            .unwrap()
                    }
                    BinaryOp::Gt => {
                        let cmp = self
                            .builder
                            .build_int_compare(IntPredicate::SGT, l, r, "gt")
                            .unwrap();
                        self.builder
                            .build_int_z_extend(cmp, self.context.i64_type(), "ext")
                            .unwrap()
                    }
                    BinaryOp::Le => {
                        let cmp = self
                            .builder
                            .build_int_compare(IntPredicate::SLE, l, r, "le")
                            .unwrap();
                        self.builder
                            .build_int_z_extend(cmp, self.context.i64_type(), "ext")
                            .unwrap()
                    }
                    BinaryOp::Ge => {
                        let cmp = self
                            .builder
                            .build_int_compare(IntPredicate::SGE, l, r, "ge")
                            .unwrap();
                        self.builder
                            .build_int_z_extend(cmp, self.context.i64_type(), "ext")
                            .unwrap()
                    }
                    BinaryOp::Eq => {
                        let cmp = self
                            .builder
                            .build_int_compare(IntPredicate::EQ, l, r, "eq")
                            .unwrap();
                        self.builder
                            .build_int_z_extend(cmp, self.context.i64_type(), "ext")
                            .unwrap()
                    }
                    BinaryOp::Ne => {
                        let cmp = self
                            .builder
                            .build_int_compare(IntPredicate::NE, l, r, "ne")
                            .unwrap();
                        self.builder
                            .build_int_z_extend(cmp, self.context.i64_type(), "ext")
                            .unwrap()
                    }
                };
                Ok(result.into())
            }

            Expr::Call { name, args } => {
                let function = self
                    .functions
                    .get(name)
                    .cloned()
                    .ok_or_else(|| format!("Undefined function: {}", name))?;

                let arg_values: Vec<BasicMetadataValueEnum> = args
                    .iter()
                    .map(|a| self.compile_expr(a).map(|v| v.into()))
                    .collect::<Result<_, _>>()?;

                let call = self
                    .builder
                    .build_call(function, &arg_values, "call")
                    .unwrap();
                Ok(call.try_as_basic_value().unwrap_basic())
            }

            Expr::MethodCall {
                object,
                method,
                args,
            } => {
                let obj_val = self.compile_expr(object)?;
                let obj_ptr = obj_val.into_pointer_value();

                // Get class name
                let class_name = object.ty.class_name().ok_or("Expected class type")?;
                let fn_name = format!("{}__{}", class_name, method);

                let function = self
                    .functions
                    .get(&fn_name)
                    .cloned()
                    .ok_or_else(|| format!("Undefined method: {}", fn_name))?;

                // Build argument list: self first, then other args
                let mut arg_values: Vec<BasicMetadataValueEnum> = vec![obj_ptr.into()];
                for arg in args {
                    arg_values.push(self.compile_expr(arg)?.into());
                }

                let call = self
                    .builder
                    .build_call(function, &arg_values, "call")
                    .unwrap();
                Ok(call.try_as_basic_value().unwrap_basic())
            }
            Expr::FieldAccess { object, field } => {
                let obj_val = self.compile_expr(object)?;
                let obj_ptr = obj_val.into_pointer_value();

                // Get field index
                let class_name = object.ty.class_name().ok_or("Expected class type")?;
                let class_info = self.classes.get(class_name).ok_or("Class not found")?;
                let field_idx = class_info.field_index(field).ok_or("Field not found")?;
                let field_type = class_info.get_field(field).ok_or("Field not found")?;

                // Get struct type
                let struct_type = self
                    .class_types
                    .get(class_name)
                    .ok_or("Class type not found")?;

                // GEP to field
                let field_ptr = self
                    .builder
                    .build_struct_gep(*struct_type, obj_ptr, field_idx as u32, "field_ptr")
                    .unwrap();

                // Load field value
                let load_type = self.llvm_basic_type(field_type)?;
                let val = self
                    .builder
                    .build_load(load_type, field_ptr, "field")
                    .unwrap();
                Ok(val)
            }
            Expr::New { class, args } => {
                // Get struct type and size
                let struct_type = self.class_types.get(class).ok_or("Class type not found")?;

                // Calculate size (number of fields * 8 bytes)
                let class_info = self.classes.get(class).ok_or("Class not found")?;
                let size = (class_info.size() * 8).max(8) as u64; // At least 8 bytes
                let size_val = self.context.i64_type().const_int(size, false);

                // Call malloc
                let malloc_fn = self.module.get_function("malloc").unwrap();
                let ptr = self
                    .builder
                    .build_call(malloc_fn, &[size_val.into()], "obj")
                    .unwrap()
                    .try_as_basic_value()
                    .unwrap_basic()
                    .into_pointer_value();

                // Initialize fields to zero
                for (i, _) in class_info.field_order.iter().enumerate() {
                    let field_ptr = self
                        .builder
                        .build_struct_gep(*struct_type, ptr, i as u32, "init_field")
                        .unwrap();
                    let zero = self.context.i64_type().const_int(0, false);
                    self.builder.build_store(field_ptr, zero).unwrap();
                }

                // Call constructor if exists
                let ctor_name = format!("{}____init__", class);
                if let Some(ctor) = self.functions.get(&ctor_name).cloned() {
                    let mut ctor_args: Vec<BasicMetadataValueEnum> = vec![ptr.into()];
                    for arg in args {
                        ctor_args.push(self.compile_expr(arg)?.into());
                    }
                    self.builder.build_call(ctor, &ctor_args, "").unwrap();
                }

                Ok(ptr.into())
            }
            Expr::If {
                cond,
                then_branch,
                else_branch,
            } => {
                let cond_val = self.compile_expr(cond)?.into_int_value();
                let cond_bool = self
                    .builder
                    .build_int_truncate(cond_val, self.context.bool_type(), "cond")
                    .unwrap();

                let function = self.current_fn.unwrap();
                let then_bb = self.context.append_basic_block(function, "then");
                let else_bb = self.context.append_basic_block(function, "else");
                let merge_bb = self.context.append_basic_block(function, "merge");

                self.builder
                    .build_conditional_branch(cond_bool, then_bb, else_bb)
                    .unwrap();

                // Then branch
                self.builder.position_at_end(then_bb);
                let mut then_val = self.context.i64_type().const_int(0, false);
                for stmt in then_branch {
                    if let Some(v) = self.compile_stmt(stmt)? {
                        then_val = v;
                    }
                }
                let then_end = self.builder.get_insert_block().unwrap();
                let then_has_terminator = then_end.get_terminator().is_some();
                if !then_has_terminator {
                    self.builder.build_unconditional_branch(merge_bb).unwrap();
                }

                // Else branch
                self.builder.position_at_end(else_bb);
                let mut else_val = self.context.i64_type().const_int(0, false);
                for stmt in else_branch {
                    if let Some(v) = self.compile_stmt(stmt)? {
                        else_val = v;
                    }
                }
                let else_end = self.builder.get_insert_block().unwrap();
                let else_has_terminator = else_end.get_terminator().is_some();
                if !else_has_terminator {
                    self.builder.build_unconditional_branch(merge_bb).unwrap();
                }

                // Merge
                if then_has_terminator && else_has_terminator {
                    unsafe {
                        merge_bb.delete().unwrap();
                    }
                    Ok(self.context.i64_type().const_int(0, false).into())
                } else {
                    self.builder.position_at_end(merge_bb);
                    let phi = self
                        .builder
                        .build_phi(self.context.i64_type(), "phi")
                        .unwrap();

                    if !then_has_terminator {
                        phi.add_incoming(&[(&then_val, then_end)]);
                    }
                    if !else_has_terminator {
                        phi.add_incoming(&[(&else_val, else_end)]);
                    }

                    Ok(phi.as_basic_value())
                }
            }

            Expr::While { cond, body } => {
                let function = self.current_fn.unwrap();
                let cond_bb = self.context.append_basic_block(function, "while_cond");
                let body_bb = self.context.append_basic_block(function, "while_body");
                let end_bb = self.context.append_basic_block(function, "while_end");

                self.builder.build_unconditional_branch(cond_bb).unwrap();

                // Condition
                self.builder.position_at_end(cond_bb);
                let cond_val = self.compile_expr(cond)?.into_int_value();
                let cond_bool = self
                    .builder
                    .build_int_truncate(cond_val, self.context.bool_type(), "cond")
                    .unwrap();
                self.builder
                    .build_conditional_branch(cond_bool, body_bb, end_bb)
                    .unwrap();

                // Body
                self.builder.position_at_end(body_bb);
                for stmt in body {
                    self.compile_stmt(stmt)?;
                }
                if self
                    .builder
                    .get_insert_block()
                    .unwrap()
                    .get_terminator()
                    .is_none()
                {
                    self.builder.build_unconditional_branch(cond_bb).unwrap();
                }

                // End
                self.builder.position_at_end(end_bb);
                Ok(self.context.i64_type().const_int(0, false).into())
            }

            Expr::Block(stmts) => {
                let mut last_val: BasicValueEnum =
                    self.context.i64_type().const_int(0, false).into();
                for stmt in stmts {
                    if let Some(v) = self.compile_stmt(stmt)? {
                        last_val = v.into();
                    }
                }
                Ok(last_val)
            }
        }
    }

thirdlang/src/codegen.rs

The important new cases:

Self Reference

            Expr::SelfRef => {
                let ptr = self.variables.get("self").ok_or("'self' not in scope")?;
                let val = self
                    .builder
                    .build_load(self.context.ptr_type(AddressSpace::default()), *ptr, "self")
                    .unwrap();
                Ok(val)
            }

self is stored as a local variable pointer, just like other parameters.

New Expression

            Expr::New { class, args } => {
                // Get struct type and size
                let struct_type = self.class_types.get(class).ok_or("Class type not found")?;

                // Calculate size (number of fields * 8 bytes)
                let class_info = self.classes.get(class).ok_or("Class not found")?;
                let size = (class_info.size() * 8).max(8) as u64; // At least 8 bytes
                let size_val = self.context.i64_type().const_int(size, false);

                // Call malloc
                let malloc_fn = self.module.get_function("malloc").unwrap();
                let ptr = self
                    .builder
                    .build_call(malloc_fn, &[size_val.into()], "obj")
                    .unwrap()
                    .try_as_basic_value()
                    .unwrap_basic()
                    .into_pointer_value();

                // Initialize fields to zero
                for (i, _) in class_info.field_order.iter().enumerate() {
                    let field_ptr = self
                        .builder
                        .build_struct_gep(*struct_type, ptr, i as u32, "init_field")
                        .unwrap();
                    let zero = self.context.i64_type().const_int(0, false);
                    self.builder.build_store(field_ptr, zero).unwrap();
                }

                // Call constructor if exists
                let ctor_name = format!("{}____init__", class);
                if let Some(ctor) = self.functions.get(&ctor_name).cloned() {
                    let mut ctor_args: Vec<BasicMetadataValueEnum> = vec![ptr.into()];
                    for arg in args {
                        ctor_args.push(self.compile_expr(arg)?.into());
                    }
                    self.builder.build_call(ctor, &ctor_args, "").unwrap();
                }

                Ok(ptr.into())
            }

Field Access

            Expr::FieldAccess { object, field } => {
                let obj_val = self.compile_expr(object)?;
                let obj_ptr = obj_val.into_pointer_value();

                // Get field index
                let class_name = object.ty.class_name().ok_or("Expected class type")?;
                let class_info = self.classes.get(class_name).ok_or("Class not found")?;
                let field_idx = class_info.field_index(field).ok_or("Field not found")?;
                let field_type = class_info.get_field(field).ok_or("Field not found")?;

                // Get struct type
                let struct_type = self
                    .class_types
                    .get(class_name)
                    .ok_or("Class type not found")?;

                // GEP to field
                let field_ptr = self
                    .builder
                    .build_struct_gep(*struct_type, obj_ptr, field_idx as u32, "field_ptr")
                    .unwrap();

                // Load field value
                let load_type = self.llvm_basic_type(field_type)?;
                let val = self
                    .builder
                    .build_load(load_type, field_ptr, "field")
                    .unwrap();
                Ok(val)
            }

Method Call

            Expr::MethodCall {
                object,
                method,
                args,
            } => {
                let obj_val = self.compile_expr(object)?;
                let obj_ptr = obj_val.into_pointer_value();

                // Get class name
                let class_name = object.ty.class_name().ok_or("Expected class type")?;
                let fn_name = format!("{}__{}", class_name, method);

                let function = self
                    .functions
                    .get(&fn_name)
                    .cloned()
                    .ok_or_else(|| format!("Undefined method: {}", fn_name))?;

                // Build argument list: self first, then other args
                let mut arg_values: Vec<BasicMetadataValueEnum> = vec![obj_ptr.into()];
                for arg in args {
                    arg_values.push(self.compile_expr(arg)?.into());
                }

                let call = self
                    .builder
                    .build_call(function, &arg_values, "call")
                    .unwrap();
                Ok(call.try_as_basic_value().unwrap_basic())
            }

Complete Example

Let us trace through compiling this class:

class Counter {
    count: int

    def __init__(self) {
        self.count = 0
    }

    def increment(self) -> int {
        self.count = self.count + 1
        return self.count
    }
}

c = new Counter()
c.increment()

Generated LLVM IR

; Struct type
%Counter = type { i64 }

; Constructor
define void @Counter____init__(ptr %self) {
entry:
    %count_ptr = getelementptr %Counter, ptr %self, i32 0, i32 0
    store i64 0, ptr %count_ptr
    ret void
}

; Increment method
define i64 @Counter__increment(ptr %self) {
entry:
    ; self.count + 1
    %count_ptr = getelementptr %Counter, ptr %self, i32 0, i32 0
    %count = load i64, ptr %count_ptr
    %new_count = add i64 %count, 1

    ; self.count = new_count
    store i64 %new_count, ptr %count_ptr

    ; return self.count
    %result = load i64, ptr %count_ptr
    ret i64 %result
}

; Main function
define i64 @__main() {
entry:
    ; c = new Counter()
    %raw = call ptr @malloc(i64 8)
    call void @Counter____init__(ptr %raw)

    ; c.increment()
    %result = call i64 @Counter__increment(ptr %raw)

    ret i64 %result
}

JIT Execution

Now that we can generate IR for classes, let’s actually run it. Here’s how we take our Thirdlang program from source code to executed result:

/// JIT compile and run a program
pub fn jit_run(program: &Program, classes: ClassRegistry) -> Result<i64, String> {
    jit_run_with_opts(program, classes, None)
}

thirdlang/src/codegen.rs

This follows the same pattern as Secondlang, but now with class support. Let’s walk through what happens:

  1. Create the code generator - This sets up our LLVM context, module, and builder. Think of it as preparing our workspace.

  2. Compile the program - We walk through the AST and emit LLVM IR for each class, method, and statement. When we encounter new Counter(), we emit malloc + constructor calls. When we see c.increment(), we emit a method call with c passed as self.

  3. Create the JIT engine - LLVM takes our IR and compiles it to native machine code for your CPU. This happens at runtime, hence “Just-In-Time”.

  4. Get the __main function - Remember how we wrapped top-level code in a __main function? We look it up in the compiled code.

  5. Call it and return - We execute the native code. When it calls malloc, it’s calling the real C malloc. When it calls our constructor, it’s running the native x86/ARM code we generated. Fast!

The magic is that the code we’re running isn’t being interpreted - it’s real compiled code, just like if you wrote it in C or Rust. Objects really live on the heap. Methods really jump to function addresses. It’s all native.

Memory Layout Summary

ThirdlangLLVM IRNotes
class Point { x: int }%Point = type { i64 }Struct type
new Point(10)call @malloc + call @Point____init__Heap allocation
p.xgetelementptr + loadField read
p.x = 5getelementptr + storeField write
p.method()call @Point__method(ptr %p)Method call
delete pcall @Point____del__ + call @freeDestruction

Performance Considerations

Our implementation is straightforward but not optimal:

What We Do

  • Direct field access via GEP (fast)
  • Method calls are static (no vtable lookup)
  • Objects are contiguous in memory

What Real Compilers Add

  • Inline method calls when possible
  • Escape analysis (stack allocate short-lived objects)
  • Field alignment optimization
  • Dead field elimination

Our simple approach is sufficient for learning the concepts.

Summary

ConceptImplementation
ClassLLVM struct type
ObjectPointer to struct on heap
FieldStruct element (GEP access)
MethodFunction with self as first param
ConstructorClassName____init__ function
DestructorClassName____del__ function
newmalloc + constructor call
deletedestructor call + free

At this point, you should be able to:

  • Run cargo run --bin thirdlang -- --ir examples/point.tl and see IR
  • See %Point struct type in the output
  • See Point__init and other method functions

In the final chapter, we run Thirdlang programs and see everything working together.