Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Type Annotations

Now that we understand why we want types, let us see how to add them to our language. The good news: the grammar changes are minimal. Most of the work happens in the type checker.

What Changes in the Grammar?

Remember Firstlang’s function definition:

Function = { "def" ~ Identifier ~ "(" ~ Params? ~ ")" ~ Block }
Params = _{ Identifier ~ ("," ~ Identifier)* }

A function like def add(a, b) { ... } has parameters a and b, but we do not know their types.

In Secondlang, we add type annotations:

Function = { "def" ~ Identifier ~ "(" ~ TypedParams? ~ ")" ~ ReturnType? ~ Block }
TypedParams = _{ TypedParam ~ ("," ~ TypedParam)* }
TypedParam = { Identifier ~ ":" ~ Type }
ReturnType = { "->" ~ Type }

Now we write def add(a: int, b: int) -> int { ... }. The changes are:

  1. Parameters become name: type instead of just name
  2. Functions can have a return type -> type

This syntax is similar to type hints in Python, TypeScript, and Rust.

We also need to define what a Type is:

Type = { IntType | BoolType }
IntType = { "int" }
BoolType = { "bool" }

That is it. Just these few lines enable static typing in our language.

The Complete Grammar

Here is the full Secondlang grammar. Notice that most of it is identical to Firstlang. Only the rules involving declarations changed:

// Secondlang Grammar - A typed Python-like language
// Same syntax as Firstlang, but with type annotations

WHITESPACE = _{ " " | "\t" | NEWLINE }
COMMENT = _{ "#" ~ (!NEWLINE ~ ANY)* }

// Keywords
KEYWORD = @{ "def" | "if" | "else" | "while" | "true" | "false" | "return" | "int" | "bool" }

// Identifiers (variable/function names)
Identifier = @{ !KEYWORD ~ (ASCII_ALPHA | "_") ~ (ASCII_ALPHANUMERIC | "_")* }

// Types
Type = { IntType | BoolType }
IntType = { "int" }
BoolType = { "bool" }

// Program is a sequence of statements/expressions
Program = _{ SOI ~ Stmt* ~ EOI }

// Statements
Stmt = { Function | SimpleStmt }
SimpleStmt = _{ (Return | Assignment | Expr) }

// Function definition with types: def name(x: int, y: int) -> int { body }
Function = { "def" ~ Identifier ~ "(" ~ TypedParams? ~ ")" ~ ReturnType? ~ Block }
TypedParams = _{ TypedParam ~ ("," ~ TypedParam)* }
TypedParam = { Identifier ~ ":" ~ Type }
ReturnType = { "->" ~ Type }

// Block: { statements }
Block = { "{" ~ Stmt* ~ "}" }

// Return statement
Return = { "return" ~ Expr }

// Assignment with optional type: x: int = 10  or  x = 10
Assignment = { Identifier ~ (":" ~ Type)? ~ "=" ~ Expr }

// Expressions (ordered by precedence - lowest to highest)
Expr = { Conditional | WhileLoop | Comparison }

// if (cond) { ... } else { ... }
Conditional = { "if" ~ "(" ~ Expr ~ ")" ~ Block ~ "else" ~ Block }

// while (cond) { ... }
WhileLoop = { "while" ~ "(" ~ Expr ~ ")" ~ Block }

// Comparison operators
Comparison = { Additive ~ (CompOp ~ Additive)* }
CompOp = { "<=" | ">=" | "<" | ">" | "==" | "!=" }

// Addition/Subtraction
Additive = { Multiplicative ~ (AddOp ~ Multiplicative)* }
AddOp = { "+" | "-" }

// Multiplication/Division
Multiplicative = { Unary ~ (MulOp ~ Unary)* }
MulOp = { "*" | "/" | "%" }

// Unary operators
Unary = { UnaryOp ~ Unary | Call }
UnaryOp = { "-" | "!" }

// Function call: name(args)
Call = { Primary ~ CallArgs* }
CallArgs = { "(" ~ Args? ~ ")" }
Args = _{ Expr ~ ("," ~ Expr)* }

// Primary expressions
Primary = _{ Literal | Identifier | "(" ~ Expr ~ ")" }

// Literals
Literal = { Bool | Int }
Int = @{ ASCII_DIGIT+ }
Bool = @{ "true" | "false" }

secondlang/src/grammar.pest

Take a moment to compare this with Firstlang’s grammar. The expression rules (Expr, Comparison, Additive, etc.) are exactly the same. The only differences are in Function, TypedParam, ReturnType, Type, and Assignment (which now optionally accepts a type annotation). If the pest syntax looks unfamiliar, review the PEG and pest Syntax section.

The Typed AST

In Firstlang, expressions were simple:

enum Expr {
    Int(i64),
    Bool(bool),
    Var(String),
    Binary { op: BinaryOp, left: Box<Expr>, right: Box<Expr> },
    // ...
}

In Secondlang, every expression carries its type. This is sometimes called a decorated AST or annotated AST:

/// A typed expression: expression + its inferred type
#[derive(Debug, Clone, PartialEq)]
pub struct TypedExpr {
    pub expr: Expr,
    pub ty: Type,
}

impl TypedExpr {
    pub fn new(expr: Expr, ty: Type) -> Self {
        TypedExpr { expr, ty }
    }

    pub fn unknown(expr: Expr) -> Self {
        TypedExpr {
            expr,
            ty: Type::Unknown,
        }
    }
}

secondlang/src/ast.rs

The TypedExpr struct wraps an expression with its type. Let us understand the two constructors:

  • TypedExpr::new(expr, ty) - creates an expression with a known type
  • TypedExpr::unknown(expr) - creates an expression with Type::Unknown

When we parse 1 + 2, we do not yet know the type of the result. So we create:

TypedExpr::unknown(Expr::Binary {
    op: BinaryOp::Add,
    left: Box::new(TypedExpr::unknown(Expr::Int(1))),
    right: Box::new(TypedExpr::unknown(Expr::Int(2))),
})

All the types are Unknown. The type checker (next chapter) will fill them in.

Statements with Types

Statements also include type information:

/// Statements in our language
#[derive(Debug, Clone, PartialEq)]
pub enum Stmt {
    /// Function definition with types
    Function {
        name: String,
        params: Vec<(String, Type)>,
        return_type: Type,
        body: Vec<Stmt>,
    },
    /// Return statement
    Return(TypedExpr),
    /// Assignment with optional type annotation
    Assignment {
        name: String,
        type_ann: Option<Type>,
        value: TypedExpr,
    },
    /// Expression statement
    Expr(TypedExpr),
}

secondlang/src/ast.rs

Notice the differences from Firstlang:

  • Function now has params: Vec<(String, Type)> instead of params: Vec<String>
  • Function now has a return_type: Type field
  • Assignment now has an optional type_ann: Option<Type> for explicit type annotations

The Expression Enum

/// Expressions in our language
#[derive(Debug, Clone, PartialEq)]
pub enum Expr {
    /// Integer literal
    Int(i64),
    /// Boolean literal
    Bool(bool),
    /// Variable reference
    Var(String),
    /// Unary operation
    Unary { op: UnaryOp, expr: Box<TypedExpr> },
    /// Binary operation
    Binary {
        op: BinaryOp,
        left: Box<TypedExpr>,
        right: Box<TypedExpr>,
    },
    /// Function call
    Call { name: String, args: Vec<TypedExpr> },
    /// Conditional
    If {
        cond: Box<TypedExpr>,
        then_branch: Vec<Stmt>,
        else_branch: Vec<Stmt>,
    },
    /// While loop
    While {
        cond: Box<TypedExpr>,
        body: Vec<Stmt>,
    },
    /// Block
    Block(Vec<Stmt>),
}

secondlang/src/ast.rs

This looks almost identical to Firstlang. The key difference is that child expressions are Box<TypedExpr> instead of Box<Expr>. Every sub-expression carries its type.

Parsing Example

Let us trace through parsing def add(a: int, b: int) -> int { return a + b }:

  1. Function rule matches - we have def, an identifier, parameters, return type, and a block

  2. Parse function name - Identifier matches add

  3. Parse parameters - TypedParams matches a: int, b: int

    • First TypedParam: a: int("a", Type::Int)
    • Second TypedParam: b: int("b", Type::Int)
  4. Parse return type - ReturnType matches -> intType::Int

  5. Parse body - Block matches { return a + b }

    • Return statement with expression a + b
    • The expression is parsed as TypedExpr::unknown(Expr::Binary { ... })

The final AST looks like:

Stmt::Function {
    name: "add".to_string(),
    params: vec![
        ("a".to_string(), Type::Int),
        ("b".to_string(), Type::Int),
    ],
    return_type: Type::Int,
    body: vec![
        Stmt::Return(TypedExpr {
            expr: Expr::Binary {
                op: BinaryOp::Add,
                left: Box::new(TypedExpr { expr: Expr::Var("a"), ty: Type::Unknown }),
                right: Box::new(TypedExpr { expr: Expr::Var("b"), ty: Type::Unknown }),
            },
            ty: Type::Unknown,  // <-- filled in by type checker
        })
    ],
}

Notice that all expression types are Unknown. The type checker will walk through this AST and fill in Type::Int everywhere.

In the next chapter, we will implement the type checker.