LLVM Code Generation for Classes
Prerequisites: This chapter builds directly on Secondlang’s code generation. Make sure you understand how we compile expressions and functions to LLVM IR before proceeding.
Now we see how classes translate to LLVM IR. The key insight: classes become structs, methods become functions. We extend the patterns from From AST to IR with new concepts for object-oriented features.
The CodeGen Structure
Our code generator has new fields for class support:
/// Code generator state
pub struct CodeGen<'ctx> {
context: &'ctx Context,
module: Module<'ctx>,
builder: Builder<'ctx>,
/// Map from variable names to their stack allocations
variables: HashMap<String, PointerValue<'ctx>>,
/// Map from function names to LLVM functions
functions: HashMap<String, FunctionValue<'ctx>>,
/// Map from class names to their LLVM struct types
class_types: HashMap<String, StructType<'ctx>>,
/// Class registry from type checker
classes: ClassRegistry,
/// Current function being compiled
current_fn: Option<FunctionValue<'ctx>>,
/// Current class being compiled (for method compilation)
current_class: Option<String>,
}
The new fields:
- class_types - Maps class names to LLVM struct types
- classes - Maps class names to
ClassInfo(field/method metadata) - current_class - The class we are currently compiling (for
selfresolution)
Compilation Pipeline
/// Compile a program and return the module
pub fn compile(&mut self, program: &Program) -> Result<(), String> {
// Declare libc functions
self.declare_libc_functions();
// First pass: create LLVM struct types for classes
for item in program {
if let TopLevel::Class(class) = item {
self.create_class_type(class)?;
}
}
// Second pass: declare all functions and methods
for item in program {
match item {
TopLevel::Class(class) => {
self.declare_class_methods(class)?;
}
TopLevel::Stmt(Stmt::Function {
name,
params,
return_type,
..
}) => {
self.declare_function(name, params, return_type)?;
}
_ => {}
}
}
// Third pass: compile function and method bodies
for item in program {
match item {
TopLevel::Class(class) => {
self.compile_class(class)?;
}
TopLevel::Stmt(stmt @ Stmt::Function { .. }) => {
self.compile_stmt(stmt)?;
}
_ => {}
}
}
// Fourth pass: create __main wrapper for all top-level non-function statements
self.compile_main_wrapper_all(program)?;
// Verify module
self.module
.verify()
.map_err(|e| format!("Module verification failed: {}", e.to_string()))?;
Ok(())
}
The compilation happens in phases:
- Declare libc functions -
mallocandfree - Create class struct types - Define LLVM struct for each class
- Declare methods - Create function signatures
- Compile class bodies - Generate method implementations
- Compile top-level code - Generate
__mainwrapper - Verify module - Check IR is well-formed
Classes as LLVM Structs
Each class becomes an LLVM struct type:
class Point {
x: int
y: int
}
Becomes:
%Point = type { i64, i64 }
; ^^^ ^^^
; x y (in field_order)
Creating the Struct Type
/// Create LLVM struct type for a class
fn create_class_type(&mut self, class: &ClassDef) -> Result<StructType<'ctx>, String> {
let class_info = self
.classes
.get(&class.name)
.ok_or_else(|| format!("Class {} not found in registry", class.name))?;
// Create field types in order
let mut field_types: Vec<BasicTypeEnum> = Vec::new();
for field_name in &class_info.field_order {
let field_type = class_info.get_field(field_name).unwrap();
let llvm_type = self.llvm_basic_type(field_type)?;
field_types.push(llvm_type);
}
// Create named struct type
let struct_type = self.context.opaque_struct_type(&class.name);
struct_type.set_body(&field_types, false);
self.class_types.insert(class.name.clone(), struct_type);
Ok(struct_type)
}
The field_order is crucial - it determines the memory layout.
Methods as Functions
Methods compile to regular functions with a naming convention:
class Point {
def get_x(self) -> int {
return self.x
}
}
Becomes:
define i64 @Point__get_x(ptr %self) {
entry:
%x_ptr = getelementptr %Point, ptr %self, i32 0, i32 0
%x = load i64, ptr %x_ptr
ret i64 %x
}
Method Naming
We use ClassName__methodName:
| Method | LLVM Function Name |
|---|---|
Point.__init__ | @Point____init__ |
Point.get_x | @Point__get_x |
Counter.increment | @Counter__increment |
This avoids name collisions between classes.
Self as First Parameter
Every method takes self (a pointer) as its first parameter:
define i64 @Point__get_x(ptr %self) { ... }
; ^^^^^^^^ self is a pointer to Point
When calling p.get_x(), we pass p as the first argument.
Field Access
Reading a field uses LLVM’s getelementptr (GEP):
return self.x
Becomes:
%x_ptr = getelementptr %Point, ptr %self, i32 0, i32 0
; ^^^ ^^^
; struct field index
%x = load i64, ptr %x_ptr
ret i64 %x
The GEP instruction calculates the address of field 0 (which is x).
Writing a Field
self.x = 42
Becomes:
%x_ptr = getelementptr %Point, ptr %self, i32 0, i32 0
store i64 42, ptr %x_ptr
Same GEP, but followed by store instead of load.
Compiling Expressions
/// Compile an expression
fn compile_expr(&mut self, expr: &TypedExpr) -> Result<BasicValueEnum<'ctx>, String> {
match &expr.expr {
Expr::Int(n) => Ok(self.context.i64_type().const_int(*n as u64, false).into()),
Expr::Bool(b) => Ok(self.context.bool_type().const_int(*b as u64, false).into()),
Expr::Var(name) => {
let ptr = self
.variables
.get(name)
.ok_or_else(|| format!("Undefined variable: {}", name))?;
let load_type = if expr.ty.is_class() {
self.context
.ptr_type(AddressSpace::default())
.as_basic_type_enum()
} else {
self.context.i64_type().as_basic_type_enum()
};
let val = self.builder.build_load(load_type, *ptr, name).unwrap();
Ok(val)
}
Expr::SelfRef => {
let ptr = self.variables.get("self").ok_or("'self' not in scope")?;
let val = self
.builder
.build_load(self.context.ptr_type(AddressSpace::default()), *ptr, "self")
.unwrap();
Ok(val)
}
Expr::Unary { op, expr: inner } => {
let val = self.compile_expr(inner)?.into_int_value();
let result = match op {
UnaryOp::Neg => self.builder.build_int_neg(val, "neg").unwrap(),
UnaryOp::Not => self.builder.build_not(val, "not").unwrap(),
};
Ok(result.into())
}
Expr::Binary { op, left, right } => {
let l = self.compile_expr(left)?.into_int_value();
let r = self.compile_expr(right)?.into_int_value();
let result = match op {
BinaryOp::Add => self.builder.build_int_add(l, r, "add").unwrap(),
BinaryOp::Sub => self.builder.build_int_sub(l, r, "sub").unwrap(),
BinaryOp::Mul => self.builder.build_int_mul(l, r, "mul").unwrap(),
BinaryOp::Div => self.builder.build_int_signed_div(l, r, "div").unwrap(),
BinaryOp::Mod => self.builder.build_int_signed_rem(l, r, "mod").unwrap(),
BinaryOp::Lt => {
let cmp = self
.builder
.build_int_compare(IntPredicate::SLT, l, r, "lt")
.unwrap();
self.builder
.build_int_z_extend(cmp, self.context.i64_type(), "ext")
.unwrap()
}
BinaryOp::Gt => {
let cmp = self
.builder
.build_int_compare(IntPredicate::SGT, l, r, "gt")
.unwrap();
self.builder
.build_int_z_extend(cmp, self.context.i64_type(), "ext")
.unwrap()
}
BinaryOp::Le => {
let cmp = self
.builder
.build_int_compare(IntPredicate::SLE, l, r, "le")
.unwrap();
self.builder
.build_int_z_extend(cmp, self.context.i64_type(), "ext")
.unwrap()
}
BinaryOp::Ge => {
let cmp = self
.builder
.build_int_compare(IntPredicate::SGE, l, r, "ge")
.unwrap();
self.builder
.build_int_z_extend(cmp, self.context.i64_type(), "ext")
.unwrap()
}
BinaryOp::Eq => {
let cmp = self
.builder
.build_int_compare(IntPredicate::EQ, l, r, "eq")
.unwrap();
self.builder
.build_int_z_extend(cmp, self.context.i64_type(), "ext")
.unwrap()
}
BinaryOp::Ne => {
let cmp = self
.builder
.build_int_compare(IntPredicate::NE, l, r, "ne")
.unwrap();
self.builder
.build_int_z_extend(cmp, self.context.i64_type(), "ext")
.unwrap()
}
};
Ok(result.into())
}
Expr::Call { name, args } => {
let function = self
.functions
.get(name)
.cloned()
.ok_or_else(|| format!("Undefined function: {}", name))?;
let arg_values: Vec<BasicMetadataValueEnum> = args
.iter()
.map(|a| self.compile_expr(a).map(|v| v.into()))
.collect::<Result<_, _>>()?;
let call = self
.builder
.build_call(function, &arg_values, "call")
.unwrap();
Ok(call.try_as_basic_value().unwrap_basic())
}
Expr::MethodCall {
object,
method,
args,
} => {
let obj_val = self.compile_expr(object)?;
let obj_ptr = obj_val.into_pointer_value();
// Get class name
let class_name = object.ty.class_name().ok_or("Expected class type")?;
let fn_name = format!("{}__{}", class_name, method);
let function = self
.functions
.get(&fn_name)
.cloned()
.ok_or_else(|| format!("Undefined method: {}", fn_name))?;
// Build argument list: self first, then other args
let mut arg_values: Vec<BasicMetadataValueEnum> = vec![obj_ptr.into()];
for arg in args {
arg_values.push(self.compile_expr(arg)?.into());
}
let call = self
.builder
.build_call(function, &arg_values, "call")
.unwrap();
Ok(call.try_as_basic_value().unwrap_basic())
}
Expr::FieldAccess { object, field } => {
let obj_val = self.compile_expr(object)?;
let obj_ptr = obj_val.into_pointer_value();
// Get field index
let class_name = object.ty.class_name().ok_or("Expected class type")?;
let class_info = self.classes.get(class_name).ok_or("Class not found")?;
let field_idx = class_info.field_index(field).ok_or("Field not found")?;
let field_type = class_info.get_field(field).ok_or("Field not found")?;
// Get struct type
let struct_type = self
.class_types
.get(class_name)
.ok_or("Class type not found")?;
// GEP to field
let field_ptr = self
.builder
.build_struct_gep(*struct_type, obj_ptr, field_idx as u32, "field_ptr")
.unwrap();
// Load field value
let load_type = self.llvm_basic_type(field_type)?;
let val = self
.builder
.build_load(load_type, field_ptr, "field")
.unwrap();
Ok(val)
}
Expr::New { class, args } => {
// Get struct type and size
let struct_type = self.class_types.get(class).ok_or("Class type not found")?;
// Calculate size (number of fields * 8 bytes)
let class_info = self.classes.get(class).ok_or("Class not found")?;
let size = (class_info.size() * 8).max(8) as u64; // At least 8 bytes
let size_val = self.context.i64_type().const_int(size, false);
// Call malloc
let malloc_fn = self.module.get_function("malloc").unwrap();
let ptr = self
.builder
.build_call(malloc_fn, &[size_val.into()], "obj")
.unwrap()
.try_as_basic_value()
.unwrap_basic()
.into_pointer_value();
// Initialize fields to zero
for (i, _) in class_info.field_order.iter().enumerate() {
let field_ptr = self
.builder
.build_struct_gep(*struct_type, ptr, i as u32, "init_field")
.unwrap();
let zero = self.context.i64_type().const_int(0, false);
self.builder.build_store(field_ptr, zero).unwrap();
}
// Call constructor if exists
let ctor_name = format!("{}____init__", class);
if let Some(ctor) = self.functions.get(&ctor_name).cloned() {
let mut ctor_args: Vec<BasicMetadataValueEnum> = vec![ptr.into()];
for arg in args {
ctor_args.push(self.compile_expr(arg)?.into());
}
self.builder.build_call(ctor, &ctor_args, "").unwrap();
}
Ok(ptr.into())
}
Expr::If {
cond,
then_branch,
else_branch,
} => {
let cond_val = self.compile_expr(cond)?.into_int_value();
let cond_bool = self
.builder
.build_int_truncate(cond_val, self.context.bool_type(), "cond")
.unwrap();
let function = self.current_fn.unwrap();
let then_bb = self.context.append_basic_block(function, "then");
let else_bb = self.context.append_basic_block(function, "else");
let merge_bb = self.context.append_basic_block(function, "merge");
self.builder
.build_conditional_branch(cond_bool, then_bb, else_bb)
.unwrap();
// Then branch
self.builder.position_at_end(then_bb);
let mut then_val = self.context.i64_type().const_int(0, false);
for stmt in then_branch {
if let Some(v) = self.compile_stmt(stmt)? {
then_val = v;
}
}
let then_end = self.builder.get_insert_block().unwrap();
let then_has_terminator = then_end.get_terminator().is_some();
if !then_has_terminator {
self.builder.build_unconditional_branch(merge_bb).unwrap();
}
// Else branch
self.builder.position_at_end(else_bb);
let mut else_val = self.context.i64_type().const_int(0, false);
for stmt in else_branch {
if let Some(v) = self.compile_stmt(stmt)? {
else_val = v;
}
}
let else_end = self.builder.get_insert_block().unwrap();
let else_has_terminator = else_end.get_terminator().is_some();
if !else_has_terminator {
self.builder.build_unconditional_branch(merge_bb).unwrap();
}
// Merge
if then_has_terminator && else_has_terminator {
unsafe {
merge_bb.delete().unwrap();
}
Ok(self.context.i64_type().const_int(0, false).into())
} else {
self.builder.position_at_end(merge_bb);
let phi = self
.builder
.build_phi(self.context.i64_type(), "phi")
.unwrap();
if !then_has_terminator {
phi.add_incoming(&[(&then_val, then_end)]);
}
if !else_has_terminator {
phi.add_incoming(&[(&else_val, else_end)]);
}
Ok(phi.as_basic_value())
}
}
Expr::While { cond, body } => {
let function = self.current_fn.unwrap();
let cond_bb = self.context.append_basic_block(function, "while_cond");
let body_bb = self.context.append_basic_block(function, "while_body");
let end_bb = self.context.append_basic_block(function, "while_end");
self.builder.build_unconditional_branch(cond_bb).unwrap();
// Condition
self.builder.position_at_end(cond_bb);
let cond_val = self.compile_expr(cond)?.into_int_value();
let cond_bool = self
.builder
.build_int_truncate(cond_val, self.context.bool_type(), "cond")
.unwrap();
self.builder
.build_conditional_branch(cond_bool, body_bb, end_bb)
.unwrap();
// Body
self.builder.position_at_end(body_bb);
for stmt in body {
self.compile_stmt(stmt)?;
}
if self
.builder
.get_insert_block()
.unwrap()
.get_terminator()
.is_none()
{
self.builder.build_unconditional_branch(cond_bb).unwrap();
}
// End
self.builder.position_at_end(end_bb);
Ok(self.context.i64_type().const_int(0, false).into())
}
Expr::Block(stmts) => {
let mut last_val: BasicValueEnum =
self.context.i64_type().const_int(0, false).into();
for stmt in stmts {
if let Some(v) = self.compile_stmt(stmt)? {
last_val = v.into();
}
}
Ok(last_val)
}
}
}
The important new cases:
Self Reference
Expr::SelfRef => {
let ptr = self.variables.get("self").ok_or("'self' not in scope")?;
let val = self
.builder
.build_load(self.context.ptr_type(AddressSpace::default()), *ptr, "self")
.unwrap();
Ok(val)
}
self is stored as a local variable pointer, just like other parameters.
New Expression
Expr::New { class, args } => {
// Get struct type and size
let struct_type = self.class_types.get(class).ok_or("Class type not found")?;
// Calculate size (number of fields * 8 bytes)
let class_info = self.classes.get(class).ok_or("Class not found")?;
let size = (class_info.size() * 8).max(8) as u64; // At least 8 bytes
let size_val = self.context.i64_type().const_int(size, false);
// Call malloc
let malloc_fn = self.module.get_function("malloc").unwrap();
let ptr = self
.builder
.build_call(malloc_fn, &[size_val.into()], "obj")
.unwrap()
.try_as_basic_value()
.unwrap_basic()
.into_pointer_value();
// Initialize fields to zero
for (i, _) in class_info.field_order.iter().enumerate() {
let field_ptr = self
.builder
.build_struct_gep(*struct_type, ptr, i as u32, "init_field")
.unwrap();
let zero = self.context.i64_type().const_int(0, false);
self.builder.build_store(field_ptr, zero).unwrap();
}
// Call constructor if exists
let ctor_name = format!("{}____init__", class);
if let Some(ctor) = self.functions.get(&ctor_name).cloned() {
let mut ctor_args: Vec<BasicMetadataValueEnum> = vec![ptr.into()];
for arg in args {
ctor_args.push(self.compile_expr(arg)?.into());
}
self.builder.build_call(ctor, &ctor_args, "").unwrap();
}
Ok(ptr.into())
}
Field Access
Expr::FieldAccess { object, field } => {
let obj_val = self.compile_expr(object)?;
let obj_ptr = obj_val.into_pointer_value();
// Get field index
let class_name = object.ty.class_name().ok_or("Expected class type")?;
let class_info = self.classes.get(class_name).ok_or("Class not found")?;
let field_idx = class_info.field_index(field).ok_or("Field not found")?;
let field_type = class_info.get_field(field).ok_or("Field not found")?;
// Get struct type
let struct_type = self
.class_types
.get(class_name)
.ok_or("Class type not found")?;
// GEP to field
let field_ptr = self
.builder
.build_struct_gep(*struct_type, obj_ptr, field_idx as u32, "field_ptr")
.unwrap();
// Load field value
let load_type = self.llvm_basic_type(field_type)?;
let val = self
.builder
.build_load(load_type, field_ptr, "field")
.unwrap();
Ok(val)
}
Method Call
Expr::MethodCall {
object,
method,
args,
} => {
let obj_val = self.compile_expr(object)?;
let obj_ptr = obj_val.into_pointer_value();
// Get class name
let class_name = object.ty.class_name().ok_or("Expected class type")?;
let fn_name = format!("{}__{}", class_name, method);
let function = self
.functions
.get(&fn_name)
.cloned()
.ok_or_else(|| format!("Undefined method: {}", fn_name))?;
// Build argument list: self first, then other args
let mut arg_values: Vec<BasicMetadataValueEnum> = vec![obj_ptr.into()];
for arg in args {
arg_values.push(self.compile_expr(arg)?.into());
}
let call = self
.builder
.build_call(function, &arg_values, "call")
.unwrap();
Ok(call.try_as_basic_value().unwrap_basic())
}
Complete Example
Let us trace through compiling this class:
class Counter {
count: int
def __init__(self) {
self.count = 0
}
def increment(self) -> int {
self.count = self.count + 1
return self.count
}
}
c = new Counter()
c.increment()
Generated LLVM IR
; Struct type
%Counter = type { i64 }
; Constructor
define void @Counter____init__(ptr %self) {
entry:
%count_ptr = getelementptr %Counter, ptr %self, i32 0, i32 0
store i64 0, ptr %count_ptr
ret void
}
; Increment method
define i64 @Counter__increment(ptr %self) {
entry:
; self.count + 1
%count_ptr = getelementptr %Counter, ptr %self, i32 0, i32 0
%count = load i64, ptr %count_ptr
%new_count = add i64 %count, 1
; self.count = new_count
store i64 %new_count, ptr %count_ptr
; return self.count
%result = load i64, ptr %count_ptr
ret i64 %result
}
; Main function
define i64 @__main() {
entry:
; c = new Counter()
%raw = call ptr @malloc(i64 8)
call void @Counter____init__(ptr %raw)
; c.increment()
%result = call i64 @Counter__increment(ptr %raw)
ret i64 %result
}
JIT Execution
Now that we can generate IR for classes, let’s actually run it. Here’s how we take our Thirdlang program from source code to executed result:
/// JIT compile and run a program
pub fn jit_run(program: &Program, classes: ClassRegistry) -> Result<i64, String> {
jit_run_with_opts(program, classes, None)
}
This follows the same pattern as Secondlang, but now with class support. Let’s walk through what happens:
-
Create the code generator - This sets up our LLVM context, module, and builder. Think of it as preparing our workspace.
-
Compile the program - We walk through the AST and emit LLVM IR for each class, method, and statement. When we encounter
new Counter(), we emit malloc + constructor calls. When we seec.increment(), we emit a method call withcpassed asself. -
Create the JIT engine - LLVM takes our IR and compiles it to native machine code for your CPU. This happens at runtime, hence “Just-In-Time”.
-
Get the
__mainfunction - Remember how we wrapped top-level code in a__mainfunction? We look it up in the compiled code. -
Call it and return - We execute the native code. When it calls
malloc, it’s calling the real C malloc. When it calls our constructor, it’s running the native x86/ARM code we generated. Fast!
The magic is that the code we’re running isn’t being interpreted - it’s real compiled code, just like if you wrote it in C or Rust. Objects really live on the heap. Methods really jump to function addresses. It’s all native.
Memory Layout Summary
| Thirdlang | LLVM IR | Notes |
|---|---|---|
class Point { x: int } | %Point = type { i64 } | Struct type |
new Point(10) | call @malloc + call @Point____init__ | Heap allocation |
p.x | getelementptr + load | Field read |
p.x = 5 | getelementptr + store | Field write |
p.method() | call @Point__method(ptr %p) | Method call |
delete p | call @Point____del__ + call @free | Destruction |
Performance Considerations
Our implementation is straightforward but not optimal:
What We Do
- Direct field access via GEP (fast)
- Method calls are static (no vtable lookup)
- Objects are contiguous in memory
What Real Compilers Add
- Inline method calls when possible
- Escape analysis (stack allocate short-lived objects)
- Field alignment optimization
- Dead field elimination
Our simple approach is sufficient for learning the concepts.
Summary
| Concept | Implementation |
|---|---|
| Class | LLVM struct type |
| Object | Pointer to struct on heap |
| Field | Struct element (GEP access) |
| Method | Function with self as first param |
| Constructor | ClassName____init__ function |
| Destructor | ClassName____del__ function |
| new | malloc + constructor call |
| delete | destructor call + free |
At this point, you should be able to:
- Run
cargo run --bin thirdlang -- --ir examples/point.tland see IR - See
%Pointstruct type in the output - See
Point__initand other method functions
In the final chapter, we run Thirdlang programs and see everything working together.