Another tutorial for writing a Forth interpreter in assembly (Part 3)

Part 1 of this tutorial
Part 2 of this tutorial

Direct link to the final (pregenerated) interpreter at the end of this tutorial. Try running

nasm -f elf64 forth.asm -o forth.o ;and ld forth.o ;and ./a.out < test2.f

inside forth22.

Input loop

We will finally add input-loop.

Initializing the names dictionnary

We first need to initialize the names dictionnary with the names of all primitive functions. Since values in the (writable) .bss section cannot be directly initialized, we put initial values in the .data section and copy it over with a new copyinit macro. setup in header.asm will do the copying and other initialization. We expect _start to call it before anything else. We'll also move _start to forth.f.

compiler.py will generate these initial values for us. Which assembly functions should be exposed as primitives? We might as well provide all of them.

Add is_primitive

We can now replace our placeholder is_primitive by the actual function. We'll waste the first # primitive entries in memory and call anything with lower index a primitive and everything else a function.

Make compiler.py automatically generate the number of primitives as well as the index of some useful primitives like return (and replace return by return-index in write-loop).

Input loop

Everything is ready and we add input-loop. Try this in forth20:

python compiler.py > forth.asm
nasm -f elf64 forth.asm -o forth.o ;and ld forth.o ;and ./a.out

Try it with input

push1 push1 + print

forth-debug.f is also provided for a version with more debugging output.

Adding remaining functions

Let's at least add all the primitives needed for a self interpreter our simulator ran.

Adding bind:

If we had already defined bind:, we could add it this way:

[ next-input names.set ] bind: bind:

(assuming bind isn't called from inside functions). But we can actually replace the only call to bind: here by its body

[ next-input names.set ] next-input bind: names.set

Running this once our interpreter has started will define bind! (We put its name bind: right after next-input instead of at the very end since that's where next-input will read it).

Adding push:

push: transforms the next token into a string and pushes onto a stack

[ next-command strings.address ] bind: push:

next-command is next-input when the stack is empty except for the special memory locations 0 and 1 we reserved. Otherwise, it pushes the next value in memory onto the stack. In this particular case, we expect it to be a string index. We implement it as a primitive since it depends on the call stack length.

It could possibly be interpreted instead of compiled by accounting for the calls to itself.

New primitive added: >.

Adding call

As with the Python simulator, call was already part of main-loop and we just need to move it out. In compiled Forth, it is

[ is-primitive push: call-primitive push: call_stack.push if-else ] bind: call

We rename call to call_function in assembly using compiler.py. Try forth21.

python compiler.py > forth.asm
nasm -f elf64 forth.asm -o forth.o ;and ld forth.o ;and ./a.out < test.f

test.f is used as input.

Detect popping from an empty stack

We should have added the option to detect (data) stack underflows earlier.

%macro popdata 1
push r5
dec qword [data_stack_length]
js %%negative
mov r5, [data_stack_length]
mov %1, [data_stack + 8*r5]
pop r5
jmp %%end
%%negative:
prints "Popping empty stack! "
inc qword [data_stack_length]
call call_stack.len
call print
call call_stack.pop
call print
pushdata __LINE__
call print
jmp exit
%%end:
%endmacro

This debug version of popdata detects an error in s2. When there's only one element on the stack, s2 still tries to pop two elements. Fix this by moving s2 to header.asm.

Adding repeat

repeat works the same way as in the Python version and just moves the index at top of the call stack back. We need an extra strings.address and memory.get to call loops.

[ push1 print ] bind: foo
push: foo strings.address memory.get call repeat

As in the Python simulator, since the input isn't saved, repeat only works when inside the body of a function.

Adding pushe

pushe evaluates the next token and then pushes onto the stack. Let's aim for something simpler: parsing integers only. Add primitives int which turns a string into an integer (read in base 10) and pushi which reads the following string and turns it into an integer.

int reads on digit at a time keeping a tally of what the integer would be if te current characters is the last one in the token. No validity checks are made on the input.

New primitive added: *.

write-loop is updated to take into account pushi just like we did for push.

Try these two functions in forth22.

python compiler.py > forth.asm
nasm -f elf64 forth.asm -o forth.o ;and ld forth.o ;and ./a.out < test.f
nasm -f elf64 forth.asm -o forth.o ;and ld forth.o ;and ./a.out < test2.f

The first test runs an infinite loop printing 1 and needs to be interrupted (by press Control-C). 123 is 0x7b and 456 is 0x1c8.

Conclusion

This completes our tutorial for making a Forth interpreter in assembly. Below we discuss some possible extensions and optimizations. The other interesting thing to do would be to add more primitives (either compiled or interpreted).

Appendix: Extensions

Here are some possible extensions not shown in this tutorial

Add (dynamically allocated) objects by reserving them in memory and making the right memory.* calls.
Dynamically allocated more memory by calling the system call blk. This probably needs memory to be the array allocated in the .bss section.
Add arrays and dictionnaries that can be created from the interpreter. This could be done by reusing some of the existing array functions.
Add a debugger. Possibly trap SIGSEGV and print the call_stack (maybe even the primitives stack).

Appendix: Optimizations

Here are some possible optimizations not done in this tutorial.

Some improvement also make the assembly source look more human written.

Asymptotic improvement: names dictionnary

Use a binary search tree or hash table for the names dictionnary. Even a simple implementation should drop the linear search for lookup down to logarithmic.

Constant factor improvement: Shorten instructions list

Some useless instructions (like for s2) or longer sequences that can be replaced by shorter sequences.

s11:
    popdata r1
    pushdata r1
    pushdata r1
    ret

could instead just increase [data_stack_length] and duplicate the top of the stack.

Other constant factor improvement:

Don't bother saving r5 for pushdata and popdata since it not read from anywhere else.
Use registers for call stack and data stack length.

Appendix: Tips for using Forth

Maybe the data stack always seem to have the wrong arguments at the top.

If there's no branching in execution, setup the stack first and then call all functions. Try avoiding shuffling the stack in the middle.

Appendix: Full primitives list

A list of primitives in the final version, arranged by type.

Three main loops (generated from Forth): main_loop, input_loop, write_loop
System calls: exit, read_stdin
I/O: next_char, next_input, next_command, print, print_string, printspace, printeol, print_top, push, pushi
Strings: string_equal, equal_right_bracket, int
Binary: add, sub, not, equal, mul, greater
Control flow: return, if, if_else, call_function
Stack shuffling s11, s21, s2, s1212, s1312, s231
Call stack, memory, strings and names methods: call_stack.is_empty, call_stack.pop, call_stack.push, memory.get, memory.set_at, memory.len, memory.append, strings.len, strings.set_at, strings.append, strings.pop, names.get, names.len, names.keys.get, names.values.get, names.set, strings.get, strings.address
Input buffer: input_buffer.len, input_buffer.push, input_buffer.append, input_buffer.pop, input_buffer.get, input_buffer.set_at, input_buffer.address
Internal (could be hidden): setup
Auxiliary (could be hidden, some generated from Forth): exit_main_loop, primitive_case, names_found, names_not_found, space_loop, word_loop, next_char_is_space, is_input_end
Auxiliary (could be hidden): is_primitive, call_primitive, is_whitespace, raw_write