Part 1 of this tutorial
Part 2 of this tutorial
Direct link to the final (pregenerated) interpreter at the end of this tutorial. Try running
nasm -f elf64 forth.asm -o forth.o ;and ld forth.o ;and ./a.out < test2.f
inside forth22.
We will finally add input-loop
.
We first need to initialize the names dictionnary with the names of all primitive functions. Since values in the (writable) .bss
section cannot be directly initialized, we put initial values in the .data
section and copy it over with a new copyinit
macro. setup
in header.asm will do the copying and other initialization. We expect _start
to call it before anything else. We'll also move _start
to forth.f.
compiler.py will generate these initial values for us. Which assembly functions should be exposed as primitives? We might as well provide all of them.
We can now replace our placeholder is_primitive
by the actual function. We'll waste the first # primitive entries in memory
and call anything with lower index a primitive and everything else a function.
Make compiler.py automatically generate the number of primitives as well as the index of some useful primitives like return (and replace return
by return-index
in write-loop
).
Everything is ready and we add input-loop
. Try this in forth20:
python compiler.py > forth.asm
nasm -f elf64 forth.asm -o forth.o ;and ld forth.o ;and ./a.out
Try it with input
push1 push1 + print
forth-debug.f is also provided for a version with more debugging output.
Let's at least add all the primitives needed for a self interpreter our simulator ran.
If we had already defined bind:
, we could add it this way:
[ next-input names.set ] bind: bind:
(assuming bind isn't called from inside functions). But we can actually replace the only call to bind:
here by its body
[ next-input names.set ] next-input bind: names.set
Running this once our interpreter has started will define bind! (We put its name bind:
right after next-input
instead of at the very end since that's where next-input
will read it).
push:
transforms the next token into a string and pushes onto a stack
[ next-command strings.address ] bind: push:
next-command
is next-input
when the stack is empty except for the special memory locations 0 and 1 we reserved. Otherwise, it pushes the next value in memory onto the stack. In this particular case, we expect it to be a string index. We implement it as a primitive since it depends on the call stack length.
It could possibly be interpreted instead of compiled by accounting for the calls to itself.
New primitive added: >
.
As with the Python simulator, call
was already part of main-loop
and we just need to move it out. In compiled Forth, it is
[ is-primitive push: call-primitive push: call_stack.push if-else ] bind: call
We rename call
to call_function
in assembly using compiler.py. Try forth21.
python compiler.py > forth.asm
nasm -f elf64 forth.asm -o forth.o ;and ld forth.o ;and ./a.out < test.f
test.f is used as input.
We should have added the option to detect (data) stack underflows earlier.
%macro popdata 1
push r5
dec qword [data_stack_length]
js %%negative
mov r5, [data_stack_length]
mov %1, [data_stack + 8*r5]
pop r5
jmp %%end
%%negative:
prints "Popping empty stack! "
inc qword [data_stack_length]
call call_stack.len
call print
call call_stack.pop
call print
pushdata __LINE__
call print
jmp exit
%%end:
%endmacro
This debug version of popdata
detects an error in s2
. When there's only one element on the stack, s2
still tries to pop two elements. Fix this by moving s2
to header.asm
.
repeat
works the same way as in the Python version and just moves the index at top of the call stack back. We need an extra strings.address
and memory.get
to call loops.
[ push1 print ] bind: foo
push: foo strings.address memory.get call repeat
As in the Python simulator, since the input isn't saved, repeat
only works when inside the body of a function.
pushe
evaluates the next token and then pushes onto the stack. Let's aim for something simpler: parsing integers only. Add primitives int
which turns a string into an integer (read in base 10) and pushi
which reads the following string and turns it into an integer.
int
reads on digit at a time keeping a tally of what the integer would be if te current characters is the last one in the token. No validity checks are made on the input.
New primitive added: *
.
write-loop
is updated to take into account pushi
just like we did for push
.
Try these two functions in forth22.
python compiler.py > forth.asm
nasm -f elf64 forth.asm -o forth.o ;and ld forth.o ;and ./a.out < test.f
nasm -f elf64 forth.asm -o forth.o ;and ld forth.o ;and ./a.out < test2.f
The first test runs an infinite loop printing 1 and needs to be interrupted (by press Control-C). 123
is 0x7b
and 456
is 0x1c8
.
This completes our tutorial for making a Forth interpreter in assembly. Below we discuss some possible extensions and optimizations. The other interesting thing to do would be to add more primitives (either compiled or interpreted).
Here are some possible extensions not shown in this tutorial
memory
and making the right memory.*
calls.blk
. This probably needs memory
to be the array allocated in the .bss
section.SIGSEGV
and print the call_stack
(maybe even the primitives stack).Here are some possible optimizations not done in this tutorial.
Some improvement also make the assembly source look more human written.
Use a binary search tree or hash table for the names
dictionnary. Even a simple implementation should drop the linear search for lookup down to logarithmic.
Some useless instructions (like for s2
) or longer sequences that can be replaced by shorter sequences.
s11:
popdata r1
pushdata r1
pushdata r1
ret
could instead just increase [data_stack_length]
and duplicate the top of the stack.
r5
for pushdata
and popdata
since it not read from anywhere else.Maybe the data stack always seem to have the wrong arguments at the top.
If there's no branching in execution, setup the stack first and then call all functions. Try avoiding shuffling the stack in the middle.
A list of primitives in the final version, arranged by type.
main_loop, input_loop, write_loop
exit, read_stdin
next_char, next_input, next_command, print, print_string, printspace, printeol, print_top, push, pushi
string_equal, equal_right_bracket, int
add, sub, not, equal, mul, greater
return, if, if_else, call_function
s11, s21, s2, s1212, s1312, s231
call_stack.is_empty, call_stack.pop, call_stack.push, memory.get, memory.set_at, memory.len, memory.append, strings.len, strings.set_at, strings.append, strings.pop, names.get, names.len, names.keys.get, names.values.get, names.set, strings.get, strings.address
input_buffer.len, input_buffer.push, input_buffer.append, input_buffer.pop, input_buffer.get, input_buffer.set_at, input_buffer.address
setup
exit_main_loop, primitive_case, names_found, names_not_found, space_loop, word_loop, next_char_is_space, is_input_end
is_primitive, call_primitive, is_whitespace, raw_write