Writing a Compiler (Part 4): Designing the Language

One of the first things you do when you build a programming language is you figure out what you want it to look like: How is it going to feel? What do you want the code to look like? What paradigm will it be in? What differentiates your language from the rest?

When you are making your first language, it is a lot easier to take from an existing language, since they have gone through all the hard work of designing it for you.

For this series, we will create an F#-like language. It won't have all of the fancy frills, just the bare minimum for us to call it a "real" programming language, something that is actually "useful".

For the purposes of our language, we will consider "useful" as being able to communicate with a C library. Any feature that our language cannot support, a C function could be created as a work-around.

The reason I am choosing F# is it is a very simple, functional language. The language constructs we need to support are very small, and the parsing of F# (and similar functional languages) is pretty straight-forward.

What We Are Going To Build

Although subject to change, this the basic functionality which our language will support:

// This is a comment!

let num = 123
let pi: double = 3.1415

println "Hello, World!"

let square x = x * x
let addN x y = x + y

let chainExample = 3 |> square |> addN 2

let intTuple = (1, 8, 3, 4, -7, 0, 2)

if num = 123 then
  println "num is 123"
  println "num is not 123"

extern cFunc()
cFunc ()

module m =
  let helloWorld () = println "Hello, world!"

m.helloWorld ()

So now what?

Breaking It Down

Basically, we need to break up these lines of code into statements and expressions. An expression is something that returns a value, such as a numeric literal, a function call, or a variable. A statement is something that doesn't return a value, like a variable declaration, module declaration, if statement, and so on.

F# does actually support if statements which return an expression, basically making it a ternary operator. We probably won't support this, though that might change in the future.


Let's start by defining what an expression is. Expressions are at the heart of all programming languages, and are one of the hardest part to get right. An expression node will (probably) have the following fields:

Expression Node:

lhsExpression (optional)Left-hand-side of expression (in the case of binary/unary operator)
rhsExpression (optional)Right-hand-side of expression (same as lhs)
operExpressionOperatorThe operator that this expression is (ie, +, -, FuncCall, Literal
valueAny (optional)Value of the expression, in the case of a terminal expression, such as a literal
typeTBDType of the expression

The lhs and rhs are only needed for binary operators, such as +, -, and so on. In the example below:

(1 + 2) * 3 + 4

We would get an expression tree somewhat similar to this:

        / \
       *   4
      / \
     +   3
    / \
   1   2

As you can see, nodes 1, 2, 3, and 4 are all terminal nodes (have no child nodes), meaning they will have a value set. The + and * nodes have a lhs/rhs, so those fields will be set respectively.


These next few nodes are pretty simple, so I will go thought them all at once:

Variable Declaration Node:


External Declaration Node:


Module Declaration Node:

stmtsList[Statement]List of statements that make up this module

There will also have to be a Statement type which includes variable, external data, and module declarations (that way the Module node can have a list of sub modules/variables etc).

That's It!

Really this blog has been just a stepping stone, something to help lay the groundwork for the next few blogs.

Next time, we will actually flesh out these node objects, and start to turn our token stream into AST nodes!

[prev] [next]