Type Annotations
Now that we understand why we want types, let us see how to add them to our language. The good news: the grammar changes are minimal. Most of the work happens in the type checker.
What Changes in the Grammar?
Remember Firstlang’s function definition:
Function = { "def" ~ Identifier ~ "(" ~ Params? ~ ")" ~ Block }
Params = _{ Identifier ~ ("," ~ Identifier)* }
A function like def add(a, b) { ... } has parameters a and b, but we do not know their types.
In Secondlang, we add type annotations:
Function = { "def" ~ Identifier ~ "(" ~ TypedParams? ~ ")" ~ ReturnType? ~ Block }
TypedParams = _{ TypedParam ~ ("," ~ TypedParam)* }
TypedParam = { Identifier ~ ":" ~ Type }
ReturnType = { "->" ~ Type }
Now we write def add(a: int, b: int) -> int { ... }. The changes are:
- Parameters become
name: typeinstead of justname - Functions can have a return type
-> type
This syntax is similar to type hints in Python, TypeScript, and Rust.
We also need to define what a Type is:
Type = { IntType | BoolType }
IntType = { "int" }
BoolType = { "bool" }
That is it. Just these few lines enable static typing in our language.
The Complete Grammar
Here is the full Secondlang grammar. Notice that most of it is identical to Firstlang. Only the rules involving declarations changed:
// Secondlang Grammar - A typed Python-like language
// Same syntax as Firstlang, but with type annotations
WHITESPACE = _{ " " | "\t" | NEWLINE }
COMMENT = _{ "#" ~ (!NEWLINE ~ ANY)* }
// Keywords
KEYWORD = @{ "def" | "if" | "else" | "while" | "true" | "false" | "return" | "int" | "bool" }
// Identifiers (variable/function names)
Identifier = @{ !KEYWORD ~ (ASCII_ALPHA | "_") ~ (ASCII_ALPHANUMERIC | "_")* }
// Types
Type = { IntType | BoolType }
IntType = { "int" }
BoolType = { "bool" }
// Program is a sequence of statements/expressions
Program = _{ SOI ~ Stmt* ~ EOI }
// Statements
Stmt = { Function | SimpleStmt }
SimpleStmt = _{ (Return | Assignment | Expr) }
// Function definition with types: def name(x: int, y: int) -> int { body }
Function = { "def" ~ Identifier ~ "(" ~ TypedParams? ~ ")" ~ ReturnType? ~ Block }
TypedParams = _{ TypedParam ~ ("," ~ TypedParam)* }
TypedParam = { Identifier ~ ":" ~ Type }
ReturnType = { "->" ~ Type }
// Block: { statements }
Block = { "{" ~ Stmt* ~ "}" }
// Return statement
Return = { "return" ~ Expr }
// Assignment with optional type: x: int = 10 or x = 10
Assignment = { Identifier ~ (":" ~ Type)? ~ "=" ~ Expr }
// Expressions (ordered by precedence - lowest to highest)
Expr = { Conditional | WhileLoop | Comparison }
// if (cond) { ... } else { ... }
Conditional = { "if" ~ "(" ~ Expr ~ ")" ~ Block ~ "else" ~ Block }
// while (cond) { ... }
WhileLoop = { "while" ~ "(" ~ Expr ~ ")" ~ Block }
// Comparison operators
Comparison = { Additive ~ (CompOp ~ Additive)* }
CompOp = { "<=" | ">=" | "<" | ">" | "==" | "!=" }
// Addition/Subtraction
Additive = { Multiplicative ~ (AddOp ~ Multiplicative)* }
AddOp = { "+" | "-" }
// Multiplication/Division
Multiplicative = { Unary ~ (MulOp ~ Unary)* }
MulOp = { "*" | "/" | "%" }
// Unary operators
Unary = { UnaryOp ~ Unary | Call }
UnaryOp = { "-" | "!" }
// Function call: name(args)
Call = { Primary ~ CallArgs* }
CallArgs = { "(" ~ Args? ~ ")" }
Args = _{ Expr ~ ("," ~ Expr)* }
// Primary expressions
Primary = _{ Literal | Identifier | "(" ~ Expr ~ ")" }
// Literals
Literal = { Bool | Int }
Int = @{ ASCII_DIGIT+ }
Bool = @{ "true" | "false" }
Take a moment to compare this with Firstlang’s grammar. The expression rules (Expr, Comparison, Additive, etc.) are exactly the same. The only differences are in Function, TypedParam, ReturnType, Type, and Assignment (which now optionally accepts a type annotation). If the pest syntax looks unfamiliar, review the PEG and pest Syntax section.
The Typed AST
In Firstlang, expressions were simple:
enum Expr {
Int(i64),
Bool(bool),
Var(String),
Binary { op: BinaryOp, left: Box<Expr>, right: Box<Expr> },
// ...
}
In Secondlang, every expression carries its type. This is sometimes called a decorated AST or annotated AST:
/// A typed expression: expression + its inferred type
#[derive(Debug, Clone, PartialEq)]
pub struct TypedExpr {
pub expr: Expr,
pub ty: Type,
}
impl TypedExpr {
pub fn new(expr: Expr, ty: Type) -> Self {
TypedExpr { expr, ty }
}
pub fn unknown(expr: Expr) -> Self {
TypedExpr {
expr,
ty: Type::Unknown,
}
}
}
The TypedExpr struct wraps an expression with its type. Let us understand the two constructors:
TypedExpr::new(expr, ty)- creates an expression with a known typeTypedExpr::unknown(expr)- creates an expression withType::Unknown
When we parse 1 + 2, we do not yet know the type of the result. So we create:
TypedExpr::unknown(Expr::Binary {
op: BinaryOp::Add,
left: Box::new(TypedExpr::unknown(Expr::Int(1))),
right: Box::new(TypedExpr::unknown(Expr::Int(2))),
})
All the types are Unknown. The type checker (next chapter) will fill them in.
Statements with Types
Statements also include type information:
/// Statements in our language
#[derive(Debug, Clone, PartialEq)]
pub enum Stmt {
/// Function definition with types
Function {
name: String,
params: Vec<(String, Type)>,
return_type: Type,
body: Vec<Stmt>,
},
/// Return statement
Return(TypedExpr),
/// Assignment with optional type annotation
Assignment {
name: String,
type_ann: Option<Type>,
value: TypedExpr,
},
/// Expression statement
Expr(TypedExpr),
}
Notice the differences from Firstlang:
Functionnow hasparams: Vec<(String, Type)>instead ofparams: Vec<String>Functionnow has areturn_type: TypefieldAssignmentnow has an optionaltype_ann: Option<Type>for explicit type annotations
The Expression Enum
/// Expressions in our language
#[derive(Debug, Clone, PartialEq)]
pub enum Expr {
/// Integer literal
Int(i64),
/// Boolean literal
Bool(bool),
/// Variable reference
Var(String),
/// Unary operation
Unary { op: UnaryOp, expr: Box<TypedExpr> },
/// Binary operation
Binary {
op: BinaryOp,
left: Box<TypedExpr>,
right: Box<TypedExpr>,
},
/// Function call
Call { name: String, args: Vec<TypedExpr> },
/// Conditional
If {
cond: Box<TypedExpr>,
then_branch: Vec<Stmt>,
else_branch: Vec<Stmt>,
},
/// While loop
While {
cond: Box<TypedExpr>,
body: Vec<Stmt>,
},
/// Block
Block(Vec<Stmt>),
}
This looks almost identical to Firstlang. The key difference is that child expressions are Box<TypedExpr> instead of Box<Expr>. Every sub-expression carries its type.
Parsing Example
Let us trace through parsing def add(a: int, b: int) -> int { return a + b }:
-
Functionrule matches - we havedef, an identifier, parameters, return type, and a block -
Parse function name -
Identifiermatchesadd -
Parse parameters -
TypedParamsmatchesa: int, b: int- First
TypedParam:a: int→("a", Type::Int) - Second
TypedParam:b: int→("b", Type::Int)
- First
-
Parse return type -
ReturnTypematches-> int→Type::Int -
Parse body -
Blockmatches{ return a + b }Returnstatement with expressiona + b- The expression is parsed as
TypedExpr::unknown(Expr::Binary { ... })
The final AST looks like:
Stmt::Function {
name: "add".to_string(),
params: vec![
("a".to_string(), Type::Int),
("b".to_string(), Type::Int),
],
return_type: Type::Int,
body: vec![
Stmt::Return(TypedExpr {
expr: Expr::Binary {
op: BinaryOp::Add,
left: Box::new(TypedExpr { expr: Expr::Var("a"), ty: Type::Unknown }),
right: Box::new(TypedExpr { expr: Expr::Var("b"), ty: Type::Unknown }),
},
ty: Type::Unknown, // <-- filled in by type checker
})
],
}
Notice that all expression types are Unknown. The type checker will walk through this AST and fill in Type::Int everywhere.
In the next chapter, we will implement the type checker.