Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Class Syntax and Parsing

Now that we understand why we want classes, let us see how to add them to our language. The grammar changes are significant but follow patterns we have seen before in Secondlang’s grammar.

If the PEG syntax looks unfamiliar, review the PEG and pest Syntax section in the Crash Course.

New Grammar Rules

Types: Adding Class Types

In Secondlang, types were just int or bool. Now any class name is also a type:

// Types - now includes class types
Type = { IntType | BoolType | ClassType }
IntType = { "int" }
BoolType = { "bool" }
ClassType = { Identifier }  // Class name as type

thirdlang/src/grammar.pest

The ClassType rule matches any identifier. When we see Point in a type position, we now parse it as a class type. The type checker (later) verifies that a class with that name actually exists.

Class Definition

Here is the grammar for class definitions:

// =============================================================================
// Class Definition
// =============================================================================
// class Point {
//     x: int
//     y: int
//
//     def __init__(self, x: int, y: int) {
//         self.x = x
//         self.y = y
//     }
//
//     def distance(self, other: Point) -> int {
//         dx = self.x - other.x
//         return dx * dx
//     }
//
//     def __del__(self) { }
// }

ClassDef = { "class" ~ Identifier ~ "{" ~ ClassBody ~ "}" }
ClassBody = { (FieldDef | MethodDef)* }

// Field definition: x: int
FieldDef = { Identifier ~ ":" ~ Type }

// Method definition: def name(self, params) -> type { body }
// First parameter must be 'self'
MethodDef = { "def" ~ Identifier ~ "(" ~ SelfParam ~ MethodParams? ~ ")" ~ ReturnType? ~ Block }
SelfParam = { "self" }
MethodParams = _{ "," ~ TypedParam ~ ("," ~ TypedParam)* }

thirdlang/src/grammar.pest

Let us break this down:

  • ClassDef - The whole class: class Name { body }
  • ClassBody - Zero or more fields and methods
  • FieldDef - A field declaration: name: type
  • MethodDef - A method: def name(self, params) -> type { body }
  • SelfParam - The literal self keyword
  • MethodParams - Additional parameters after self

The key difference from regular functions: methods must have self as their first parameter.

Object Creation with New

// new Point(1, 2)
NewExpr = { "new" ~ Identifier ~ "(" ~ Args? ~ ")" }

thirdlang/src/grammar.pest

The new keyword followed by a class name and constructor arguments. This allocates memory and calls __init__.

Object Deletion

// Delete statement: delete obj
Delete = { "delete" ~ Expr }

thirdlang/src/grammar.pest

The delete statement takes an expression (which should evaluate to an object) and frees its memory.

Field Access and Method Calls

// Postfix: field access and method calls
Postfix = { Primary ~ PostfixOp* }
PostfixOp = { MethodCall | FieldAccessOp }
MethodCall = { "." ~ Identifier ~ "(" ~ Args? ~ ")" }
FieldAccessOp = { "." ~ Identifier }

// For assignment target parsing
FieldAccess = { (SelfKeyword | Identifier) ~ ("." ~ Identifier)+ }
SelfKeyword = { "self" }

thirdlang/src/grammar.pest

Postfix operations handle:

  • Field access: obj.field - read a field
  • Method calls: obj.method(args) - call a method

The PostfixOp* means zero or more, allowing chaining: a.b.c.method().

The Typed AST

Top-Level Items

Programs now contain both classes and statements:

/// Top-level items in a program
#[derive(Debug, Clone, PartialEq)]
pub enum TopLevel {
    /// Class definition
    Class(ClassDef),
    /// Statement (function definition or expression)
    Stmt(Stmt),
}

thirdlang/src/ast.rs

A program is now Vec<TopLevel> instead of Vec<Stmt>. Each top-level item is either a class definition or a statement.

Class Definition AST

/// Class definition
#[derive(Debug, Clone, PartialEq)]
pub struct ClassDef {
    /// Class name
    pub name: String,
    /// Field definitions (in order)
    pub fields: Vec<FieldDef>,
    /// Method definitions
    pub methods: Vec<MethodDef>,
}

/// Field definition
#[derive(Debug, Clone, PartialEq)]
pub struct FieldDef {
    pub name: String,
    pub ty: Type,
}

/// Method definition
#[derive(Debug, Clone, PartialEq)]
pub struct MethodDef {
    /// Method name (e.g., "__init__", "__del__", "distance")
    pub name: String,
    /// Parameters (excluding self)
    pub params: Vec<(String, Type)>,
    /// Return type
    pub return_type: Type,
    /// Method body
    pub body: Vec<Stmt>,
}

impl MethodDef {
    pub fn is_constructor(&self) -> bool {
        self.name == "__init__"
    }

    pub fn is_destructor(&self) -> bool {
        self.name == "__del__"
    }
}

thirdlang/src/ast.rs

The ClassDef struct contains:

  • name - The class name (e.g., "Point")
  • fields - List of field definitions
  • methods - List of method definitions

Each FieldDef has a name and type. Each MethodDef is like a function but with self implied.

Statements with Delete

/// Statements in our language
#[derive(Debug, Clone, PartialEq)]
pub enum Stmt {
    /// Function definition with types
    Function {
        name: String,
        params: Vec<(String, Type)>,
        return_type: Type,
        body: Vec<Stmt>,
    },
    /// Return statement
    Return(TypedExpr),
    /// Assignment with optional type annotation
    Assignment {
        target: AssignTarget,
        type_ann: Option<Type>,
        value: TypedExpr,
    },
    /// Delete statement
    Delete(TypedExpr),
    /// Expression statement
    Expr(TypedExpr),
}

thirdlang/src/ast.rs

The Stmt enum gains a Delete variant for the delete statement.

Assignment Targets

Assignments can now target fields:

/// Assignment target - can be a variable or field access
#[derive(Debug, Clone, PartialEq)]
pub enum AssignTarget {
    /// Simple variable: x = ...
    Var(String),
    /// Field access: self.x = ... or obj.field = ...
    Field {
        object: Box<TypedExpr>,
        field: String,
    },
}

thirdlang/src/ast.rs

This allows both:

  • x = 10 - assign to variable
  • self.x = 10 - assign to field

New Expressions

/// Expressions in our language
#[derive(Debug, Clone, PartialEq)]
pub enum Expr {
    /// Integer literal
    Int(i64),
    /// Boolean literal
    Bool(bool),
    /// Variable reference
    Var(String),
    /// Self reference (inside methods)
    SelfRef,
    /// Unary operation
    Unary { op: UnaryOp, expr: Box<TypedExpr> },
    /// Binary operation
    Binary {
        op: BinaryOp,
        left: Box<TypedExpr>,
        right: Box<TypedExpr>,
    },
    /// Function call
    Call { name: String, args: Vec<TypedExpr> },
    /// Method call: obj.method(args)
    MethodCall {
        object: Box<TypedExpr>,
        method: String,
        args: Vec<TypedExpr>,
    },
    /// Field access: obj.field
    FieldAccess {
        object: Box<TypedExpr>,
        field: String,
    },
    /// Object creation: new ClassName(args)
    New { class: String, args: Vec<TypedExpr> },
    /// Conditional
    If {
        cond: Box<TypedExpr>,
        then_branch: Vec<Stmt>,
        else_branch: Vec<Stmt>,
    },
    /// While loop
    While {
        cond: Box<TypedExpr>,
        body: Vec<Stmt>,
    },
    /// Block
    Block(Vec<Stmt>),
}

thirdlang/src/ast.rs

The Expr enum gains several new variants:

  • SelfRef - The self keyword
  • New - Object creation: new Point(1, 2)
  • FieldAccess - Reading a field: obj.x
  • MethodCall - Calling a method: obj.method(args)

Parser Implementation

Here is how we parse classes in Rust:

fn parse_class_def(pair: Pair<Rule>) -> Result<ClassDef, String> {
    let mut inner = pair.into_inner();

    let name = inner.next().unwrap().as_str().to_string();
    let body = inner.next().unwrap(); // ClassBody

    let mut fields = Vec::new();
    let mut methods = Vec::new();

    for item in body.into_inner() {
        match item.as_rule() {
            Rule::FieldDef => {
                fields.push(parse_field_def(item)?);
            }
            Rule::MethodDef => {
                methods.push(parse_method_def(item)?);
            }
            _ => {}
        }
    }

    Ok(ClassDef {
        name,
        fields,
        methods,
    })
}

fn parse_field_def(pair: Pair<Rule>) -> Result<FieldDef, String> {
    let mut inner = pair.into_inner();
    let name = inner.next().unwrap().as_str().to_string();
    let ty = parse_type(inner.next().unwrap())?;
    Ok(FieldDef { name, ty })
}

fn parse_method_def(pair: Pair<Rule>) -> Result<MethodDef, String> {
    let mut inner = pair.into_inner();

    let name = inner.next().unwrap().as_str().to_string();

    // Skip 'self' parameter
    inner.next(); // SelfParam

    let mut params: Vec<(String, Type)> = Vec::new();
    let mut return_type = Type::Unit;
    let mut body = Vec::new();

    for item in inner {
        match item.as_rule() {
            Rule::TypedParam => {
                let mut param_inner = item.into_inner();
                let param_name = param_inner.next().unwrap().as_str().to_string();
                let param_type = parse_type(param_inner.next().unwrap())?;
                params.push((param_name, param_type));
            }
            Rule::ReturnType => {
                let type_pair = item.into_inner().next().unwrap();
                return_type = parse_type(type_pair)?;
            }
            Rule::Block => {
                body = parse_block(item)?;
            }
            _ => {}
        }
    }

    Ok(MethodDef {
        name,
        params,
        return_type,
        body,
    })
}

thirdlang/src/parser.rs

The parser:

  1. Extracts the class name from the first child
  2. Iterates over the class body, sorting items into fields and methods
  3. For each method, skips the self parameter (it is implicit)
  4. Returns a ClassDef with all collected data

Parsing Example

Let us trace through parsing this class:

class Point {
    x: int
    y: int

    def __init__(self, x: int, y: int) {
        self.x = x
        self.y = y
    }

    def get_x(self) -> int {
        return self.x
    }
}

Step 1: Match ClassDef

The parser sees class, then:

  1. Identifier: Point
  2. {: Start of body
  3. ClassBody: Fields and methods
  4. }: End of body

Step 2: Parse ClassBody

Inside the body:

  1. FieldDef: x: intFieldDef { name: "x", ty: Type::Int }
  2. FieldDef: y: intFieldDef { name: "y", ty: Type::Int }
  3. MethodDef: def __init__(...)
  4. MethodDef: def get_x(...)

Step 3: Parse MethodDef

For def __init__(self, x: int, y: int):

  1. def: Method keyword
  2. Identifier: __init__
  3. (: Start parameters
  4. SelfParam: self
  5. MethodParams: , x: int, y: int
  6. ): End parameters
  7. No ReturnType: Defaults to Unit
  8. Block: { self.x = x; self.y = y }

Step 4: Parse Method Body

Inside { self.x = x; self.y = y }:

  1. Assignment: self.x = x
    • Target: AssignTarget::Field { object: SelfRef, field: "x" }
    • Value: Expr::Var("x")
  2. Assignment: self.y = y
    • Similar structure

Final AST

TopLevel::Class(ClassDef {
    name: "Point".to_string(),
    fields: vec![
        FieldDef { name: "x".to_string(), ty: Type::Int },
        FieldDef { name: "y".to_string(), ty: Type::Int },
    ],
    methods: vec![
        MethodDef {
            name: "__init__".to_string(),
            params: vec![
                // Note: 'self' is NOT stored in params - it's implicit
                // Only the parameters AFTER self are stored
                ("x".to_string(), Type::Int),
                ("y".to_string(), Type::Int),
            ],
            return_type: Type::Unit,
            body: vec![/* assignments */],
        },
        MethodDef {
            name: "get_x".to_string(),
            params: vec![],  // No params after self
            return_type: Type::Int,
            body: vec![/* return self.x */],
        },
    ],
})

Note: The self parameter is implicit in methods - it is not stored in the params list. The type checker knows every method receives self of the class type.

Type Information for Classes

Classes need metadata for type checking. We store this in ClassInfo:

/// Information about a class definition
#[derive(Debug, Clone)]
pub struct ClassInfo {
    /// Class name
    pub name: String,
    /// Fields: name -> type
    pub fields: HashMap<String, Type>,
    /// Field order (for memory layout)
    pub field_order: Vec<String>,
    /// Methods: name -> (param_types, return_type)
    pub methods: HashMap<String, MethodInfo>,
    /// Whether the class has a destructor
    pub has_destructor: bool,
}

/// Information about a method
#[derive(Debug, Clone)]
pub struct MethodInfo {
    /// Method name
    pub name: String,
    /// Parameter types (excluding self)
    pub params: Vec<(String, Type)>,
    /// Return type
    pub return_type: Type,
    /// Is this the constructor?
    pub is_constructor: bool,
    /// Is this the destructor?
    pub is_destructor: bool,
}

thirdlang/src/types.rs

The ClassInfo struct tracks:

  • name - Class name
  • fields - Map from field name to type
  • field_order - Order of fields (for LLVM struct layout)
  • methods - Map from method name to MethodInfo
  • has_destructor - Whether __del__ exists

The MethodInfo struct tracks each method’s signature.

The Type Enum

Our type system now includes class types:

/// Types in our language
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub enum Type {
    /// Integer type (64-bit signed)
    Int,
    /// Boolean type
    Bool,
    /// Class type (by name)
    Class(String),
    /// Function type: (param_types) -> return_type
    Function { params: Vec<Type>, ret: Box<Type> },
    /// Method type: (self_type, param_types) -> return_type
    Method {
        class: String,
        params: Vec<Type>,
        ret: Box<Type>,
    },
    /// Unit type (for statements with no value)
    Unit,
    /// Unknown type (for type inference)
    Unknown,
}

thirdlang/src/types.rs

The new Class(String) variant holds the class name. When we see Point as a type, we create Type::Class("Point".to_string()).

Comparison with Secondlang

AspectSecondlangThirdlang
Typesint, boolint, bool, ClassName
Top-levelVec<Stmt>Vec<TopLevel>
Functions onlyYesFunctions + Methods
Field accessNoobj.field
Method callsNoobj.method()
New expressionsNonew Class(args)
Delete statementNodelete obj

At this point, you should be able to:

  • Parse class Point { x: int } without errors
  • Parse methods with self parameter
  • Parse new Point(1, 2) expressions

In the next chapter, we look at constructors and object creation.