Abstract Syntax Tree (AST)

AST comes into picture when we want to go from the string representation of our program like "-1" or "1 + 2" to something more manageable and easier to work with. Since our program is not a random string (the grammar is for), we can use the structure within the expressions "-1" and "1 + 2" to our own advantage and come up with a new representation like a tree


ast

One thing to note here is that the kinds of the nodes in our tree are not the same i.e. + node is different from 1 node. In fact, + has an Operator type and 1 is an integer Int type


ast

so we define our AST nodes as


#![allow(unused)]
fn main() {
pub enum Operator {
    Plus,
    Minus,
}

pub enum Node {
    Int(i32),
}
}

Referring back to our grammar, we actually have different kinds of recursive expressions;

  • unary grammar
    UnaryExpr = { Operator ~ Term }
    
  • binary grammar
    BinaryExpr = { Term ~ (Operator ~ Term)* }
    

So for example, the expression "-1 + (2 + 3)" has this recursive structure


compiler

To include those into our AST to make it an actual tree data structure, we complete our AST definition as follows


#![allow(unused)]
fn main() {
pub enum Operator {
    Plus,
    Minus,
}

pub enum Node {
    Int(i32),
    UnaryExpr {
        op: Operator,
        child: Box<Node>,
    },
    BinaryExpr {
        op: Operator,
        lhs: Box<Node>,
        rhs: Box<Node>,
    },
}
}

Filename: calculator/src/ast.rs

Now, we can use the pest generated CalcParser::parse to map the Rules of our Calc language string to our AST.


pub fn parse(source: &str) -> std::result::Result<Vec<Node>, pest::error::Error<Rule>> {
    let mut ast = vec![];
    let pairs = CalcParser::parse(Rule::Program, source)?;
    for pair in pairs {
        if let Rule::Expr = pair.as_rule() {
            ast.push(build_ast_from_expr(pair));
        }
    }
    Ok(ast)
}

Checkout calculator/src/parser.rs.

Note that CalcParser::parse takes care of the AST traversal and correctly maps it to Vec<Node> for easier access in later stages of compilation.

Interpreter

CPU is the ultimate interpreter. That is, it executes opcodes as it goes. To do that, after we have changed the representation (aka lowered the representation) of our source code &str to AST Node, a basic interpreter looks at each node of the AST (via any tree traversal methods) and simply evaluates it recursively

    pub fn eval(&self, node: &Node) -> i32 {
        match node {
            Node::Int(n) => *n,
            Node::UnaryExpr { op, child } => {
                let child = self.eval(&child);
                match op {
                    Operator::Plus => child,
                    Operator::Minus => -child,
                }
            }
            Node::BinaryExpr { op, lhs, rhs } => {
                let lhs_ret = self.eval(&lhs);
                let rhs_ret = self.eval(&rhs);

                match op {
                    Operator::Plus => lhs_ret + rhs_ret,
                    Operator::Minus => lhs_ret - rhs_ret,
                }
            }
        }
    }

To sum up, we define a Compile trait that we will use throughout this chapter

pub trait Compile {
    type Output;

    fn from_ast(ast: Vec<Node>) -> Self::Output;

    fn from_source(source: &str) -> Self::Output {
        println!("Compiling the source: {}", source);
        let ast: Vec<Node> = parser::parse(source).unwrap();
        println!("{:?}", ast);
        Self::from_ast(ast)
    }
}

and we can now implement our interpreter

pub struct Interpreter;

impl Compile for Interpreter {
    type Output = Result<i32>;

    fn from_ast(ast: Vec<Node>) -> Self::Output {
        let mut ret = 0i32;
        let evaluator = Eval::new();
        for node in ast {
            ret += evaluator.eval(&node);
        }
        Ok(ret)
    }
}

Filename: calculator/src/compiler/interpreter.rs

and test

assert_eq!(Interpreter::from_source("1 + 2").unwrap(), 3);

Run such tests locally with

cargo test interpreter --tests