Another tutorial for writing a Forth interpreter in assembly (Part 1)

This tutorial will teach you how to make a Forth interpreter in assembly with initial prototype simulators made in Python.


The aim is twofold. First, we want to provide a description that is more flexible to allow other implementations (possibly drastically different ones from the description here). Stopping to read after the first few section should still give some idea of how to implement Forth. Second, we want each intermediate step to be runnable so that feedback can be used to verify the reader's understanding.

We will skip most optimization that only improves speed or memory usage by a constant factor except for one. These optimizations may be discussed at the end of the final part.

The assembly interpreter will be in x86-64 assembly for Linux but since most of the program is written in "Forth style" using only call/ret. Of course, system calls have to be translated.

Computation model

Forth consists of five parts:

Each entry of the stacks and array are fixed width. We'll use the usual trick of storing a location of (i.e., a pointer to) anything more complicated.

Intended use

The memory array is intended to be segmented where each segment corresponds to the body of a single function. A segment for f stores a list of function names, which are the calls to make (one after the other) when f is called.

The data stack is intended to store the arguments/parameters for function calls, effectively function calls need no explicit argument specification, only the occasional data stack shuffling before the call. A function will pop the first few elements of the data stack and push its output(s), if any, onto the data stack. This is the defining property of Stack-oriented programming languages like Forth.

The call stack is intended to store locations in memory where we should continue execution.

The names map is a table from function names to their first instruction in memory (if not a primitive function) or to a predefined function.

The primitive functions should be defined in the host language in which our Forth interpreter will be written. Usually, in a Forth interpreter, there are enough primitive functions so that interpreted instructions are able to read and write to memory, data, call and names.

Function names can be anything that does not contain a whitespace.

Main loop

The main loop of the Forth interpreter repeats the following indefinitely.

  1. Read the location of the top of the call stack.
  2. Read the function name at that memory location.
  3. Look up this name in the names map to find f.
  4. Increment the top of the stack (to point to the next location).
  5. If f is a primitive function, execute it.
  6. Otherwise, f is the location (in memory) of the function we should call. We "make this call" by pushing f on the call stack.

There's no input handling yet (that is done by an outer loop).

We can already witness this loop working with a hard coded initial call stack, data stack and set of primitive functions. See for an example with only calls to primitive functions. See for an example with function definition and call.

Feel free to skip examples like and instead implement your own interpreter from these instructions.

Debugging hints

For debugging, we can piggyback on Python's debugger to use as our own debugger. (Optionally) import

from pdb import stack_trace as bp

And add "bp": bp to names. Now writing bp to memory anywhere will insert a break point. Use pdb's shell to explore the program's state at that point.


By adding some more primitive functions, we could make the language interpreted Turing-complete. We won't do that here and will instead add primitives as we need them but feel free to add them to your own interpreter at this step.

We'll now add input/output to our interpreter. We've already seen an example of output using the primitive function print.

The user inputs a list of whitespace separated function names that the interpreter calls in order. We call these function names as input tokens from now on.

We add the exit primitive to terminate execution no matter how deeply nested the current call is.

Try and type push1 print at the input prompt.

Press Control-C or Control-D to stop the program. This gives an error and there's current no other way to exit the input loop.

Implementation detail: In this implementation, we chose to use memory index 0 to contain an command (one at a time) and memory index 1 to contain exit effectively ending the current command.

Writing to memory

We'll now add primitives to write to memory. This will also give us function definition (both anonymous and named).

[ will call a primitive function write_loop that is a loop which reads the input until it finds a ]. Every function name in between will be appended to memory. At the end of the loop, return is appended.

What is between [ and ] will effectively contain the (anonymous) function's body. Note that this is our first primitive function which consumes input tokens.

To name our anonymous function, we add a bind: primitive which reads the next function name and binds it to the index at the top of the stack. We make [ return the first index of memory written so the two work in conjection to define functions. To get the function call example of, we can now write

[ push1 print ] bind: my_func my_func

at the input prompt. The second my_func calls the function we just defined.


Implementation detail: We change how input is handled since input needs to be stored so both input_loop and write_loop can access it. We also add a check for the end of file so input_loop can be exited cleanly by pressing Control-D.

Implementation detail: write_loop is the name of the Python function bound to the token [.

Language differences

Note that the language implemented in this tutorial isn't exactly Forth but the same steps here can be used to implement the original Forth and the computation model is the same for both languages.

Self interpreter

We now have the three central parts of the interpreter written: main_loop, input_loop and write_loop.

We will add more functions (primitive and non-primitive) to our interpreter with the goal of writing as much of main_loop, input_loop and write_loop themselves as possible in Forth. Our plan is to eventually write "Forth-style" assembly with mostly just call and ret.


We don't want a new function for each constant we want to push onto the stack (such as push1, push2, push3, ...). Instead, just like bind:, we can have a generic push: function which pushes the next input token onto the stack.

Implementation detail: For our Python simulator, we will have an extra function pushe: which evaluates the result using Python's ast.literal_eval before pushing it onto the data stack so now we can push things other than strings.

In general, we'll name functions that consume the next input token like bind: and push: with a semi-colon ":" at the end.

Try with input

pushe: 1 print
[ pushe: 1 print ] bind: my_func my_func


if takes two arguments, a boolean and memory index. If the boolean is true, what's at the memory index is interpreted. Otherwise, it is skipped.

Now is a good time to add some arithmetic and comparison operations to our interpreter.

They are all primitive and we put them in to be imported.

Try Try inputs

pushe: 3 pushe: 2 == [ pushe: 1 print ] if
pushe: 2 pushe: 2 == [ pushe: 1 print ] if
[ pushe: 1 print ] bind: print1
pushe: 3 pushe: 3 == push: print1 names.get if

Operators are defined in the file.


repeat repeats the last three instructions (before repeat's position in memory) indefinitely.

call calls the index at the top of the stack.

In tandem, they can be used to execute loops. Try with input

[ pushe: 10 print ] bind: foo push: foo call
[ pushe: 11 print ] bind: foo [ push: foo call repeat ] bind: bar bar

The problem is that repeat only works when inside a function. This is OK for now but we will eventually change how write_loop works so that repeat can be called on the input (maybe not in this tutorial).

Stack shuffling

We'll need a number of primitives that simply rearrange the top of the data stack. In this implementation, they will all be named s#### where #### is a sequence of stack indices.

Here are some examples of primitives and their effect.

In general, s#### removes the top n stack elements from the stack where n is the maximum in ####, and the elements at indices #### (before the removal) are put at the top of the stack.

Just like for constants, we could instead implement only one function s: that treats the next input token as indices (e.g., s: 11 instead of s11). However, we actually need very few stack shufflers so in this implementation, we'll use a separate function for each.


push: a push: b s21 print print
push: a push: b s2 print

Add wrappers

For data_stack, call_stack, memory and names, we add getters, setters, push, pop, etc as primitive functions as needed.

Forth in Forth

We're finally ready to write Forth in Forth itself. This implementation follows the Python implementation of the same functions.

[ call_stack.len pushe: 0 == [ pushe: 3 call_stack.pop_many ] if
  call_stack.pop s11 memory.get names.get
  s21 pushe: 1 + call_stack.push
  s11 is-primitive s11 not [ s21 call_stack.push ] if
  push: call-primitive names.get if
] call repeat ] bind: main-loop

[ next-input pushe: 2 memory.set_at pushe: 2 call_stack.push main-loop
] call repeat ] bind: input-loop

[ memory.len
  [ next-input
    s11 pushe: ']' ==
      [ s2 push: return memory.append pushe: 3 call_stack.pop_many ] if
  ] call repeat ] bind: write-loop

Here, we're using nested square brackets which isn't defined in our interpreter. So for this version, we must first move the anonymous functions defined by inner square bracket and name these functions. forth.f contains the version that can be read by our interpreter.

However, running main-loop after our function definitions unexpectedly stops at the (first) call_stack.pop. Indeed popping the call stack should return from the current function but we actually meant to pop the stack of the inner Forth that's now running in our Forth interpreter! To fix this, we will create a second call stack call_stack2 for the inner Forth and make all call_stack.* functions and a few other functions refer to call_stack2, except for call_stack.pop_many where we actually want to return from inner functions.

We also need an ugly hack so the write loop in forth.f is run as a primitive with respect to call_stack2 (this means that its run on call_stack instead of call_stack2). This change is made in the is-primitive and call-primitive functions. Alternatively, we could put 2's everywhere inside the body of this write loop but then we couldn't test it independently of main-loop.

Surprisingly data_stack, memory and names can be shared between both inner and outer Forth interpreters without major issue. We'll use memory index 2 and 3 for storing the current input token of the inner Forth (and 0 and 1 for the outer Forth as usual).

Remark: We use call_stack.pop_many in place of return and break. For the moment, we just count the number of elements to pop from the call stack. To avoid keeping track of these numbers, we should implement continuations, which is not covered in this tutorial.

Try Enable some debugging messages to see that the last line is actually running inside the inner Forth.

Related links


location: Integer index, usually in the memory array (but possibly elsewhere like the strings array). Sometimes called pointers. function name: A string naming a function. This string can contain any character that's not a whitespace. They are call words in Forth.

Blog index