Building a toy language to learn language design
Day 1: Why Moye?
Decided to build a toy language called Moye just for fun and learning.
The goal is to understand language design and how compilers work at a fundamental level.
Step 1: Create a Parser
What is a parser?
A parser takes a flat structure (text source code) and converts it into a tree structure (like an Abstract Syntax Tree or AST).
Useful Reference: Parsing - Wikipedia
Designing the Syntax
Started defining the basic syntax rules for Moye:
- Identifiers (variable names)
- Literals (numbers, strings)
- Keywords
- Basic expressions
Why Can’t Identifiers Start With a Number?
“The compiler should be able to identify a token as an identifier or a literal after looking at the first character.”
If variable names could start with digits, there would be ambiguity:
- Is
123
a variable name or a number? 123abc
– is that a malformed number or a bad identifier?
To avoid this confusion, most languages (Rust, C, Python) require identifiers to start with a letter or underscore, not a digit.
Compiler Design 101: 7 Phases of Compilation
- Lexical Analysis (Tokenizing input text)
- Syntax Analysis (Parsing into AST)
- Semantic Analysis (Checking for meaning and type correctness)
- Intermediate Code Generation (Turning AST into low-level IR)
- Code Optimization (Speed/efficiency improvements)
- Code Generation (Producing assembly/machine code)
- Symbol Table Management (Tracking variables, functions, etc.)
Ref: Compilers: Principles, Techniques, and Tools (“Dragon Book”)
Backtracking
Backtracking is like trying different paths, and when you hit a dead end, you backtrack to the last choice and try a different route.
Blocks
Blocks are used to group statements together and create a scope for the variables and controlling the flow of execution
In rust, everything wrapped inside a { ... }
comes inside the block.
In rust and also in our case, blocks are a way to group a bunch of bindings together and ensure that they are not accessible from outside the block.
Usage of blocks
Consider this real world example of taking user input and storing it to a value in Rust
let mut input = String::new();
std::io::stdin().read_line(&mut input)?;
Here note that we need to explicitly declare the input as mut
able because we are taking the user input in next line which changes the value of input
. But what if we want to make the input immutable? We can use blocks here.
let input = {
let mut s = String::new();
std::io::stdin().read_line(&mut s)?;
s
};
Here input
has the same value which we took as user input but it is immutable.