Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Materials in this book are distributed under the terms of Creative Commons BY-NC-SA 4.0

license CreateLang.rs Logo CreateLang.rs Logo

Motivations and Goals

This book arises from my frustration of not finding modern, clear, and concise teaching materials that are readily accessible to beginners like me who want to learn how to create their own programming language.

“If you don’t know how compilers work, then you don’t know how computers work” 1

“If you can’t explain something in simple terms, you don’t understand it” 2

Pedagogically, one of the most effective methods of teaching is co-creating interactively. Introducing the core aspects around the simplest example (here, our calculator language) helps build knowledge and confidence. We use mature technologies instead of reinventing the wheel.


Getting Started

This book assumes basic knowledge of Rust. If you’re new to Rust, start with the official Rust book.

The code and materials are available on GitHub. To follow along:

git clone https://github.com/ehsanmok/create-your-own-lang-with-rust
cd create-your-own-lang-with-rust

Calculator and Firstlang (stable Rust)

These projects work with stable Rust 1.70+ and require no external dependencies:

# Calculator - interpreter mode
cd calculator
cargo run --bin main examples/simple.calc

# Calculator - VM mode
cargo run --bin main --features vm examples/simple.calc

# Firstlang - interpreter
cd firstlang
cargo run -- examples/fibonacci.fl
cargo run  # REPL

Secondlang and Thirdlang (nightly Rust + LLVM)

These projects require nightly Rust and LLVM for JIT compilation:

# Install nightly Rust
rustup toolchain install nightly

# Install LLVM (macOS)
brew install llvm

# Install LLVM (Debian/Ubuntu) - see https://apt.llvm.org/

Check your LLVM version with llvm-config --version and update the inkwell dependency in Cargo.toml to match:

LLVM Versioninkwell feature
20.xllvm20-1
19.xllvm19-1
18.xllvm18-1

For example, with LLVM 20:

inkwell = { version = "0.7", features = ["llvm20-1"] }
# Secondlang
cd secondlang
rustup run nightly cargo run -- examples/fibonacci.sl
rustup run nightly cargo run -- --ir examples/fibonacci.sl  # view LLVM IR

# Thirdlang
cd thirdlang
rustup run nightly cargo run --bin thirdlang -- examples/point.tl
rustup run nightly cargo run --bin thirdlang -- examples/counter.tl

Learning Progression

We build four languages, each building on concepts from the previous:

LanguageGrammarNew ConceptsExecution
Calculator18 linesPEG basics, AST, operatorsInterpreter, VM, JIT
Firstlang70 linesVariables, functions, control flow, recursionTree-walking interpreter
Secondlang77 linesTypes, type inference, optimization passesLLVM JIT compilation
Thirdlang140 linesClasses, methods, constructors, memory managementLLVM JIT compilation

Part I: Calculator

We start with the simplest possible language: integer arithmetic with + and -. The grammar fits in 18 lines:

Program = _{ SOI ~ Expr ~ EOF }
Expr = { UnaryExpr | BinaryExpr | Term }
Term = _{Int | "(" ~ Expr ~ ")" }
...

This minimal language lets us focus on the fundamentals: what is a grammar? How does pest generate a parser? What is an AST? We explore three different backends (interpreter, bytecode VM, JIT) to show that the same AST can be executed in multiple ways.

Part II: Firstlang

With the basics understood, we add features that make a real programming language. The grammar grows to 70 lines:

// Statements instead of just expressions
Stmt = { Function | Return | Assignment | Expr }

// Functions with parameters
Function = { "def" ~ Identifier ~ "(" ~ Params? ~ ")" ~ Block }

// Control flow
Conditional = { "if" ~ "(" ~ Expr ~ ")" ~ Block ~ "else" ~ Block }
WhileLoop = { "while" ~ "(" ~ Expr ~ ")" ~ Block }

We focus on a single backend (tree-walking interpreter) to deeply understand scoping, call stacks, and recursion. The culminating example is computing Fibonacci recursively.

Part III: Secondlang

We add static types and compile to native code. The grammar changes are minimal (just 7 more lines), but the compiler grows significantly:

// Type annotations
Type = { IntType | BoolType }
TypedParam = { Identifier ~ ":" ~ Type }
ReturnType = { "->" ~ Type }

// Functions now have types
Function = { "def" ~ Identifier ~ "(" ~ TypedParams? ~ ")" ~ ReturnType? ~ Block }

Types are primarily a semantic addition, not a syntactic one. The grammar changes are small, but we need new compiler phases (type checking, type inference) and can now generate efficient native code via LLVM.

Part IV: Thirdlang

Finally, we add object-oriented programming with classes, methods, and memory management. The grammar grows to 140 lines:

// Class definitions
ClassDef = { "class" ~ Identifier ~ "{" ~ ClassBody ~ "}" }
FieldDef = { Identifier ~ ":" ~ Type }
MethodDef = { "def" ~ Identifier ~ "(" ~ SelfParam ~ ... ~ ")" ~ ... ~ Block }

// Object operations
NewExpr = { "new" ~ Identifier ~ "(" ~ Args? ~ ")" }
Delete = { "delete" ~ Expr }

This introduces heap allocation (malloc/free), struct types in LLVM, and the self parameter for methods. We see how OOP features map to lower-level constructs.


Outline


Support
If you found this book useful, please consider donating to:

Child FoundationBlack Lives MatterFood Bank of Canada