Class Syntax and Parsing
Now that we understand why we want classes, let us see how to add them to our language. The grammar changes are significant but follow patterns we have seen before in Secondlang’s grammar.
If the PEG syntax looks unfamiliar, review the PEG and pest Syntax section in the Crash Course.
New Grammar Rules
Types: Adding Class Types
In Secondlang, types were just int or bool. Now any class name is also a type:
// Types - now includes class types
Type = { IntType | BoolType | ClassType }
IntType = { "int" }
BoolType = { "bool" }
ClassType = { Identifier } // Class name as type
The ClassType rule matches any identifier. When we see Point in a type position, we now parse it as a class type. The type checker (later) verifies that a class with that name actually exists.
Class Definition
Here is the grammar for class definitions:
// =============================================================================
// Class Definition
// =============================================================================
// class Point {
// x: int
// y: int
//
// def __init__(self, x: int, y: int) {
// self.x = x
// self.y = y
// }
//
// def distance(self, other: Point) -> int {
// dx = self.x - other.x
// return dx * dx
// }
//
// def __del__(self) { }
// }
ClassDef = { "class" ~ Identifier ~ "{" ~ ClassBody ~ "}" }
ClassBody = { (FieldDef | MethodDef)* }
// Field definition: x: int
FieldDef = { Identifier ~ ":" ~ Type }
// Method definition: def name(self, params) -> type { body }
// First parameter must be 'self'
MethodDef = { "def" ~ Identifier ~ "(" ~ SelfParam ~ MethodParams? ~ ")" ~ ReturnType? ~ Block }
SelfParam = { "self" }
MethodParams = _{ "," ~ TypedParam ~ ("," ~ TypedParam)* }
Let us break this down:
- ClassDef - The whole class:
class Name { body } - ClassBody - Zero or more fields and methods
- FieldDef - A field declaration:
name: type - MethodDef - A method:
def name(self, params) -> type { body } - SelfParam - The literal
selfkeyword - MethodParams - Additional parameters after
self
The key difference from regular functions: methods must have self as their first parameter.
Object Creation with New
// new Point(1, 2)
NewExpr = { "new" ~ Identifier ~ "(" ~ Args? ~ ")" }
The new keyword followed by a class name and constructor arguments. This allocates memory and calls __init__.
Object Deletion
// Delete statement: delete obj
Delete = { "delete" ~ Expr }
The delete statement takes an expression (which should evaluate to an object) and frees its memory.
Field Access and Method Calls
// Postfix: field access and method calls
Postfix = { Primary ~ PostfixOp* }
PostfixOp = { MethodCall | FieldAccessOp }
MethodCall = { "." ~ Identifier ~ "(" ~ Args? ~ ")" }
FieldAccessOp = { "." ~ Identifier }
// For assignment target parsing
FieldAccess = { (SelfKeyword | Identifier) ~ ("." ~ Identifier)+ }
SelfKeyword = { "self" }
Postfix operations handle:
- Field access:
obj.field- read a field - Method calls:
obj.method(args)- call a method
The PostfixOp* means zero or more, allowing chaining: a.b.c.method().
The Typed AST
Top-Level Items
Programs now contain both classes and statements:
/// Top-level items in a program
#[derive(Debug, Clone, PartialEq)]
pub enum TopLevel {
/// Class definition
Class(ClassDef),
/// Statement (function definition or expression)
Stmt(Stmt),
}
A program is now Vec<TopLevel> instead of Vec<Stmt>. Each top-level item is either a class definition or a statement.
Class Definition AST
/// Class definition
#[derive(Debug, Clone, PartialEq)]
pub struct ClassDef {
/// Class name
pub name: String,
/// Field definitions (in order)
pub fields: Vec<FieldDef>,
/// Method definitions
pub methods: Vec<MethodDef>,
}
/// Field definition
#[derive(Debug, Clone, PartialEq)]
pub struct FieldDef {
pub name: String,
pub ty: Type,
}
/// Method definition
#[derive(Debug, Clone, PartialEq)]
pub struct MethodDef {
/// Method name (e.g., "__init__", "__del__", "distance")
pub name: String,
/// Parameters (excluding self)
pub params: Vec<(String, Type)>,
/// Return type
pub return_type: Type,
/// Method body
pub body: Vec<Stmt>,
}
impl MethodDef {
pub fn is_constructor(&self) -> bool {
self.name == "__init__"
}
pub fn is_destructor(&self) -> bool {
self.name == "__del__"
}
}
The ClassDef struct contains:
- name - The class name (e.g.,
"Point") - fields - List of field definitions
- methods - List of method definitions
Each FieldDef has a name and type. Each MethodDef is like a function but with self implied.
Statements with Delete
/// Statements in our language
#[derive(Debug, Clone, PartialEq)]
pub enum Stmt {
/// Function definition with types
Function {
name: String,
params: Vec<(String, Type)>,
return_type: Type,
body: Vec<Stmt>,
},
/// Return statement
Return(TypedExpr),
/// Assignment with optional type annotation
Assignment {
target: AssignTarget,
type_ann: Option<Type>,
value: TypedExpr,
},
/// Delete statement
Delete(TypedExpr),
/// Expression statement
Expr(TypedExpr),
}
The Stmt enum gains a Delete variant for the delete statement.
Assignment Targets
Assignments can now target fields:
/// Assignment target - can be a variable or field access
#[derive(Debug, Clone, PartialEq)]
pub enum AssignTarget {
/// Simple variable: x = ...
Var(String),
/// Field access: self.x = ... or obj.field = ...
Field {
object: Box<TypedExpr>,
field: String,
},
}
This allows both:
x = 10- assign to variableself.x = 10- assign to field
New Expressions
/// Expressions in our language
#[derive(Debug, Clone, PartialEq)]
pub enum Expr {
/// Integer literal
Int(i64),
/// Boolean literal
Bool(bool),
/// Variable reference
Var(String),
/// Self reference (inside methods)
SelfRef,
/// Unary operation
Unary { op: UnaryOp, expr: Box<TypedExpr> },
/// Binary operation
Binary {
op: BinaryOp,
left: Box<TypedExpr>,
right: Box<TypedExpr>,
},
/// Function call
Call { name: String, args: Vec<TypedExpr> },
/// Method call: obj.method(args)
MethodCall {
object: Box<TypedExpr>,
method: String,
args: Vec<TypedExpr>,
},
/// Field access: obj.field
FieldAccess {
object: Box<TypedExpr>,
field: String,
},
/// Object creation: new ClassName(args)
New { class: String, args: Vec<TypedExpr> },
/// Conditional
If {
cond: Box<TypedExpr>,
then_branch: Vec<Stmt>,
else_branch: Vec<Stmt>,
},
/// While loop
While {
cond: Box<TypedExpr>,
body: Vec<Stmt>,
},
/// Block
Block(Vec<Stmt>),
}
The Expr enum gains several new variants:
- SelfRef - The
selfkeyword - New - Object creation:
new Point(1, 2) - FieldAccess - Reading a field:
obj.x - MethodCall - Calling a method:
obj.method(args)
Parser Implementation
Here is how we parse classes in Rust:
fn parse_class_def(pair: Pair<Rule>) -> Result<ClassDef, String> {
let mut inner = pair.into_inner();
let name = inner.next().unwrap().as_str().to_string();
let body = inner.next().unwrap(); // ClassBody
let mut fields = Vec::new();
let mut methods = Vec::new();
for item in body.into_inner() {
match item.as_rule() {
Rule::FieldDef => {
fields.push(parse_field_def(item)?);
}
Rule::MethodDef => {
methods.push(parse_method_def(item)?);
}
_ => {}
}
}
Ok(ClassDef {
name,
fields,
methods,
})
}
fn parse_field_def(pair: Pair<Rule>) -> Result<FieldDef, String> {
let mut inner = pair.into_inner();
let name = inner.next().unwrap().as_str().to_string();
let ty = parse_type(inner.next().unwrap())?;
Ok(FieldDef { name, ty })
}
fn parse_method_def(pair: Pair<Rule>) -> Result<MethodDef, String> {
let mut inner = pair.into_inner();
let name = inner.next().unwrap().as_str().to_string();
// Skip 'self' parameter
inner.next(); // SelfParam
let mut params: Vec<(String, Type)> = Vec::new();
let mut return_type = Type::Unit;
let mut body = Vec::new();
for item in inner {
match item.as_rule() {
Rule::TypedParam => {
let mut param_inner = item.into_inner();
let param_name = param_inner.next().unwrap().as_str().to_string();
let param_type = parse_type(param_inner.next().unwrap())?;
params.push((param_name, param_type));
}
Rule::ReturnType => {
let type_pair = item.into_inner().next().unwrap();
return_type = parse_type(type_pair)?;
}
Rule::Block => {
body = parse_block(item)?;
}
_ => {}
}
}
Ok(MethodDef {
name,
params,
return_type,
body,
})
}
The parser:
- Extracts the class name from the first child
- Iterates over the class body, sorting items into fields and methods
- For each method, skips the
selfparameter (it is implicit) - Returns a
ClassDefwith all collected data
Parsing Example
Let us trace through parsing this class:
class Point {
x: int
y: int
def __init__(self, x: int, y: int) {
self.x = x
self.y = y
}
def get_x(self) -> int {
return self.x
}
}
Step 1: Match ClassDef
The parser sees class, then:
- Identifier:
Point {: Start of body- ClassBody: Fields and methods
}: End of body
Step 2: Parse ClassBody
Inside the body:
- FieldDef:
x: int→FieldDef { name: "x", ty: Type::Int } - FieldDef:
y: int→FieldDef { name: "y", ty: Type::Int } - MethodDef:
def __init__(...) - MethodDef:
def get_x(...)
Step 3: Parse MethodDef
For def __init__(self, x: int, y: int):
def: Method keyword- Identifier:
__init__ (: Start parameters- SelfParam:
self - MethodParams:
, x: int, y: int ): End parameters- No ReturnType: Defaults to
Unit - Block:
{ self.x = x; self.y = y }
Step 4: Parse Method Body
Inside { self.x = x; self.y = y }:
- Assignment:
self.x = x- Target:
AssignTarget::Field { object: SelfRef, field: "x" } - Value:
Expr::Var("x")
- Target:
- Assignment:
self.y = y- Similar structure
Final AST
TopLevel::Class(ClassDef {
name: "Point".to_string(),
fields: vec![
FieldDef { name: "x".to_string(), ty: Type::Int },
FieldDef { name: "y".to_string(), ty: Type::Int },
],
methods: vec![
MethodDef {
name: "__init__".to_string(),
params: vec![
// Note: 'self' is NOT stored in params - it's implicit
// Only the parameters AFTER self are stored
("x".to_string(), Type::Int),
("y".to_string(), Type::Int),
],
return_type: Type::Unit,
body: vec![/* assignments */],
},
MethodDef {
name: "get_x".to_string(),
params: vec![], // No params after self
return_type: Type::Int,
body: vec![/* return self.x */],
},
],
})
Note: The self parameter is implicit in methods - it is not stored in the params list. The type checker knows every method receives self of the class type.
Type Information for Classes
Classes need metadata for type checking. We store this in ClassInfo:
/// Information about a class definition
#[derive(Debug, Clone)]
pub struct ClassInfo {
/// Class name
pub name: String,
/// Fields: name -> type
pub fields: HashMap<String, Type>,
/// Field order (for memory layout)
pub field_order: Vec<String>,
/// Methods: name -> (param_types, return_type)
pub methods: HashMap<String, MethodInfo>,
/// Whether the class has a destructor
pub has_destructor: bool,
}
/// Information about a method
#[derive(Debug, Clone)]
pub struct MethodInfo {
/// Method name
pub name: String,
/// Parameter types (excluding self)
pub params: Vec<(String, Type)>,
/// Return type
pub return_type: Type,
/// Is this the constructor?
pub is_constructor: bool,
/// Is this the destructor?
pub is_destructor: bool,
}
The ClassInfo struct tracks:
- name - Class name
- fields - Map from field name to type
- field_order - Order of fields (for LLVM struct layout)
- methods - Map from method name to
MethodInfo - has_destructor - Whether
__del__exists
The MethodInfo struct tracks each method’s signature.
The Type Enum
Our type system now includes class types:
/// Types in our language
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub enum Type {
/// Integer type (64-bit signed)
Int,
/// Boolean type
Bool,
/// Class type (by name)
Class(String),
/// Function type: (param_types) -> return_type
Function { params: Vec<Type>, ret: Box<Type> },
/// Method type: (self_type, param_types) -> return_type
Method {
class: String,
params: Vec<Type>,
ret: Box<Type>,
},
/// Unit type (for statements with no value)
Unit,
/// Unknown type (for type inference)
Unknown,
}
The new Class(String) variant holds the class name. When we see Point as a type, we create Type::Class("Point".to_string()).
Comparison with Secondlang
| Aspect | Secondlang | Thirdlang |
|---|---|---|
| Types | int, bool | int, bool, ClassName |
| Top-level | Vec<Stmt> | Vec<TopLevel> |
| Functions only | Yes | Functions + Methods |
| Field access | No | obj.field |
| Method calls | No | obj.method() |
| New expressions | No | new Class(args) |
| Delete statement | No | delete obj |
At this point, you should be able to:
- Parse
class Point { x: int }without errors - Parse methods with
selfparameter - Parse
new Point(1, 2)expressions
In the next chapter, we look at constructors and object creation.