Virtual Machine
Recall from the crash course that a (process) VM abstracts away hardware specific instructions so that its Bytecodes (abstract instructions) can be executed in any environment that has the Bytecode runtime support. So to create our VM and its runtime, we need to define our
- Opcodes (new encoding atoms)
- Bytecode representation and
- Runtime model as Stack Machine
Opcode
Since our expression based Calc
language is made up of
- constant integers
- unary (plus, minus sign) operators and
- binary (addition, subtraction) operators
we can define our new opcode encodings like
pub enum OpCode {
OpConstant(u16), // pointer to constant table
OpPop, // pop is needed for execution
OpAdd,
OpSub,
OpPlus,
OpMinus,
}
Filename: calculator/src/compiler/vm/opcode.rs
We choose the simplest form of encoding i.e. encoding the ops as bytes u8
(in hex format). That is,
OpCode::OpConstant(arg) => make_three_byte_op(0x01, arg),
OpCode::OpPop => vec![0x02], // decimal repr is 2
OpCode::OpAdd => vec![0x03], // decimal repr is 3
OpCode::OpSub => vec![0x04], // decimal repr is 4
OpCode::OpPlus => vec![0x0A], // decimal repr is 10
OpCode::OpMinus => vec![0x0B], // decimal repr is 11
For easy of access, we store constant Node::Int(i32)
nodes in a separate memory and OpConstant(arg)
tracks these values.
In a unary expression like "1"
, we encode Node::Int(1)
as the opcode [1, 0, 0]
as the first constant. (0x01 in decimal is 1).
Bytecode
We define
pub struct Bytecode {
pub instructions: Vec<u8>,
pub constants: Vec<Node>,
}
Filename: calculator/src/compiler/vm/bytecode.rs
and pictorially, here is how "1 + 2"
AST to Bytecode conversion would look like
and in Rust
ByteCode {
instructions: [1, 0, 0, 1, 0, 1, 3, 2],
constants: [Int(1), Int(2)]
}
Now, we can implement our Bytecode interpreter
pub struct Interpreter {
bytecode: Bytecode,
}
impl Compile for Interpreter {
type Output = Bytecode;
fn from_ast(ast: Vec<Node>) -> Self::Output {
let mut interpreter = Interpreter {
bytecode: Bytecode::new(),
};
for node in ast {
println!("compiling node {:?}", node);
interpreter.interpret_node(node);
// pop one element from the stack after
// each expression statement to clean up
interpreter.add_instruction(OpCode::OpPop);
}
interpreter.bytecode
}
}
Filename: calculator/src/compiler/vm/bytecode.rs
Runtime
From previous example, our interpreter goes through the bytecode instructions and executes them.
Continuing our "1 + 2"
Bytecode example,
instructions: [1, 0, 0, 1, 0, 1, 3, 2],
-------- -------- - -
| | | |
constants: [ Int(1), Int(2)] OpAdd OpPop
[1, 0, 0] points to the first element in constants table i.e. Int(1)
[1, 0, 1] points to Int(2)
[3] (or [0x03]) corresponding to the Opcode *OpAdd*, performs the addition operation Int(1 + 2)
[2] (or [0x02]) corresponding to the Opcode *OpPop* pops out the computed Bytecodes
and since we want to model our runtime as a Stack Machine so we define our VM as struct with Bytecode, stack memory (in Stack Machine) and a stack pointer to the next free space
const STACK_SIZE: usize = 512;
pub struct VM {
bytecode: Bytecode,
stack: [Node; STACK_SIZE],
stack_ptr: usize, // points to the next free space
}
and with the help of instruction pointer (IP), we execute the Bytecodes as follows
pub fn run(&mut self) {
let mut ip = 0; // instruction pointer
while ip < self.bytecode.instructions.len() {
let inst_addr = ip;
ip += 1;
match self.bytecode.instructions[inst_addr] {
0x01 => {
//OpConst
let const_idx = convert_two_u8s_to_usize(
self.bytecode.instructions[ip],
self.bytecode.instructions[ip + 1],
);
ip += 2;
self.push(self.bytecode.constants[const_idx].clone());
}
0x02 => {
//OpPop
self.pop();
}
0x03 => {
// OpAdd
match (self.pop(), self.pop()) {
(Node::Int(rhs), Node::Int(lhs)) => self.push(Node::Int(lhs + rhs)),
_ => panic!("Unknown types to OpAdd"),
}
}
0x04 => {
// OpSub
match (self.pop(), self.pop()) {
(Node::Int(rhs), Node::Int(lhs)) => self.push(Node::Int(lhs - rhs)),
_ => panic!("Unknown types to OpSub"),
}
}
0x0A => {
// OpPlus
match self.pop() {
Node::Int(num) => self.push(Node::Int(num)),
_ => panic!("Unknown arg type to OpPlus"),
}
}
0x0B => {
// OpMinus
match self.pop() {
Node::Int(num) => self.push(Node::Int(-num)),
_ => panic!("Unknown arg type to OpMinus"),
}
}
_ => panic!("Unknown instruction"),
}
}
}
pub fn push(&mut self, node: Node) {
self.stack[self.stack_ptr] = node;
self.stack_ptr += 1; // ignoring the potential stack overflow
}
pub fn pop(&mut self) -> Node {
// ignoring the potential of stack underflow
// cloning rather than mem::replace for easier testing
let node = self.stack[self.stack_ptr - 1].clone();
self.stack_ptr -= 1;
node
}
Filename: calculator/src/compiler/vm/vm.rs
To examine the generated Bytecodes and run our VM, we can do
let byte_code = Interpreter::from_source(source);
println!("byte code: {:?}", byte_code);
let mut vm = VM::new(byte_code);
vm.run();
println!("{}", vm.pop_last());
Run tests locally for our VM with
cargo test vm --tests
Checkout the next section on how to create a REPL for our Calc
to compare different compilation paths.