Starforth: A Minimal Self-Hosting Compiler
This is going to be the beginning of a series of articles on writing a self-hosting compiler. This is an introductory post. No source code or actual development is going to be discussed in this installment, or possibly not even in the next one. But we’ll get there…hopefully!
EDIT: Second installment is now available.
I’ve always been vaguely aware of Forth. It’s the language that works with only
a stack. Which also means it uses reverse polish notation, that is, instead of
2 + 2, you write
2 2 +.
My (inaccurate) knowledge of Forth ended there. But then recently I was reading
up on Forth and I realized something. Here’s how you define a variable named
foo in Forth:
Here’s how you write
100 foo !
And for reading it:
! are called “fetch” and “store” in Forth nomenclature. I sorta
understood why you’d need “store”, but what about “fetch”? You don’t need an
operator to read variables in other languages, do you? Also, if
@ reads a
variable, and puts its value on the stack, what exactly does
foo put on the
A bit of experimentation, and it dawned on me:
foo puts the address of the
variable on the stack, and
@ reads an address and de-references it. Forth has
Just like that, Forth found its place in the language hierarchy I keep in my head. If Lisp, is the highest level language, Forth is the lowest-level one (except for assembly, obviously).
I liked that so much, that I decided it’s time I start a new compiler project. I’m gonna call it Starforth. For no reason at all!
I set these goals for myself:
- The compiler should produce machine code. I’ve written compiler’s targeting virtual machines before. This time I want to generate real machine code.
- The compiler should be self-hosting, that is, it should be able to compile its own source code.
- This is going to be an ahead-of-time compiler. While Forths are generally written in an interpreter style known as threaded-code, I’ve always had a fondness for compilers that do their job and get out of the way. I realized later that there are good reasons for Forths being written the way they are, but the goal stands nonetheless.
- The compiler should be bootstrappable with nothing more than an assembler. I want the starting binary to be as small as possible.
And here are some decisions to make things more concrete:
- The target environment is Linux. Both the compiler, and the binary it produces run on Linux. It would be nice if I can write a version of the compiler capable of producing code that can run on bare metal, but that should be left for another time.
- The compiler will produce 32-bit x86 code. The code is smaller than 64-bit, and more similar to the good old days! Also, 4 gigabytes of memory should be enough for everyone, right? Note that 32-bit binaries can run just fine on Linux, as long as they don’t reference any non-existing shared objects, which brings us to the next point…
- The compiler will output self-contained ELF binaries. No shared objects allowed, and libc is not used (neither statically nor dynamically).
I’ve already written Starforth at the time of this writing. You can find the source code on Sourcehut. I plan to write a series of articles about the development process, starting from my earliest iterations and going forward from there. I’ll be using my memory, which is hopefully still fresh, and my trusty git logs.