Virtual Machine (VM), Bytecode and Interpreter - Create Your Own Programming Language with Rust

Virtual Machine

Recall from the crash course that a (process) VM abstracts away hardware specific instructions so that its Bytecodes (abstract instructions) can be executed in any environment that has the Bytecode runtime support. So to create our VM and its runtime, we need to define our

Opcodes (new encoding atoms)
Bytecode representation and
Runtime model as Stack Machine

Opcode

Since our expression based Calc language is made up of

constant integers
unary (plus, minus sign) operators and
binary (addition, subtraction) operators

we can define our new opcode encodings like

pub enum OpCode {
    OpConstant(u16), // pointer to constant table
    OpPop,           // pop is needed for execution
    OpAdd,
    OpSub,
    OpPlus,
    OpMinus,
}

Filename: calculator/src/compiler/vm/opcode.rs

We choose the simplest form of encoding i.e. encoding the ops as bytes u8 (in hex format). That is,

        OpCode::OpConstant(arg) => make_three_byte_op(0x01, arg),
        OpCode::OpPop => vec![0x02],  // decimal repr is 2
        OpCode::OpAdd => vec![0x03],  // decimal repr is 3
        OpCode::OpSub => vec![0x04],  // decimal repr is 4
        OpCode::OpPlus => vec![0x0A], // decimal repr is 10
        OpCode::OpMinus => vec![0x0B], // decimal repr is 11

For easy of access, we store constant Node::Int(i32) nodes in a separate memory and OpConstant(arg) tracks these values. In a unary expression like "1", we encode Node::Int(1) as the opcode [1, 0, 0] as the first constant. (0x01 in decimal is 1).

Bytecode

We define

pub struct Bytecode {
    pub instructions: Vec<u8>,
    pub constants: Vec<Node>,
}

Filename: calculator/src/compiler/vm/bytecode.rs

and pictorially, here is how "1 + 2" AST to Bytecode conversion would look like

and in Rust

ByteCode {
    instructions: [1, 0, 0, 1, 0, 1, 3, 2],
    constants: [Int(1), Int(2)]
}

Now, we can implement our Bytecode interpreter

pub struct Interpreter {
    bytecode: Bytecode,
}

impl Compile for Interpreter {
    type Output = Bytecode;

    fn from_ast(ast: Vec<Node>) -> Self::Output {
        let mut interpreter = Interpreter {
            bytecode: Bytecode::new(),
        };
        for node in ast {
            println!("compiling node {:?}", node);
            interpreter.interpret_node(node);
            // pop one element from the stack after
            // each expression statement to clean up
            interpreter.add_instruction(OpCode::OpPop);
        }
        interpreter.bytecode
    }
}

Filename: calculator/src/compiler/vm/bytecode.rs

Runtime

From previous example, our interpreter goes through the bytecode instructions and executes them.

Continuing our "1 + 2" Bytecode example,

instructions: [1, 0, 0, 1, 0, 1,   3,   2],
              -------- --------    -    -
                 |         |       |    |
constants: [  Int(1),   Int(2)]  OpAdd  OpPop

[1, 0, 0] points to the first element in constants table i.e. Int(1)
[1, 0, 1] points to Int(2)
[3] (or [0x03]) corresponding to the Opcode *OpAdd*, performs the addition operation Int(1 + 2)
[2] (or [0x02]) corresponding to the Opcode *OpPop* pops out the computed Bytecodes

and since we want to model our runtime as a Stack Machine so we define our VM as struct with Bytecode, stack memory (in Stack Machine) and a stack pointer to the next free space

const STACK_SIZE: usize = 512;

pub struct VM {
    bytecode: Bytecode,
    stack: [Node; STACK_SIZE],
    stack_ptr: usize, // points to the next free space
}

and with the help of instruction pointer (IP), we execute the Bytecodes as follows

    pub fn run(&mut self) {
        let mut ip = 0; // instruction pointer
        while ip < self.bytecode.instructions.len() {
            let inst_addr = ip;
            ip += 1;

            match self.bytecode.instructions[inst_addr] {
                0x01 => {
                    //OpConst
                    let const_idx = convert_two_u8s_to_usize(
                        self.bytecode.instructions[ip],
                        self.bytecode.instructions[ip + 1],
                    );
                    ip += 2;
                    self.push(self.bytecode.constants[const_idx].clone());
                }
                0x02 => {
                    //OpPop
                    self.pop();
                }
                0x03 => {
                    // OpAdd
                    match (self.pop(), self.pop()) {
                        (Node::Int(rhs), Node::Int(lhs)) => self.push(Node::Int(lhs + rhs)),
                        _ => panic!("Unknown types to OpAdd"),
                    }
                }
                0x04 => {
                    // OpSub
                    match (self.pop(), self.pop()) {
                        (Node::Int(rhs), Node::Int(lhs)) => self.push(Node::Int(lhs - rhs)),
                        _ => panic!("Unknown types to OpSub"),
                    }
                }
                0x0A => {
                    // OpPlus
                    match self.pop() {
                        Node::Int(num) => self.push(Node::Int(num)),
                        _ => panic!("Unknown arg type to OpPlus"),
                    }
                }
                0x0B => {
                    // OpMinus
                    match self.pop() {
                        Node::Int(num) => self.push(Node::Int(-num)),
                        _ => panic!("Unknown arg type to OpMinus"),
                    }
                }
                _ => panic!("Unknown instruction"),
            }
        }
    }

    pub fn push(&mut self, node: Node) {
        self.stack[self.stack_ptr] = node;
        self.stack_ptr += 1; // ignoring the potential stack overflow
    }

    pub fn pop(&mut self) -> Node {
        // ignoring the potential of stack underflow
        // cloning rather than mem::replace for easier testing
        let node = self.stack[self.stack_ptr - 1].clone();
        self.stack_ptr -= 1;
        node
    }

Filename: calculator/src/compiler/vm/vm.rs

To examine the generated Bytecodes and run our VM, we can do

let byte_code = Interpreter::from_source(source);
println!("byte code: {:?}", byte_code);
let mut vm = VM::new(byte_code);
vm.run();
println!("{}", vm.pop_last());

Run tests locally for our VM with

cargo test vm --tests

Checkout the next section on how to create a REPL for our Calc to compare different compilation paths.