Yet Another Stack Machine

This Sunday, I had a couple hours free, and having gained a couple levels in Hearthstone, leveled up two toon in Heroes of the Storm, and finished my daily farming in World of Warcraft, I decided to prototype another stack machine.

I did this on in javascript, and had a working instruction fetch/decode cycle + ALU modeled in about 10 minutes. I opted for a 7bit instruction set packed 4 to a 32bit word, with 31bit literals. A high bit on the literal would flag the full 32 bit value as a literal. A literal 0 can be coded with one byte less: dup dup xor, but this only occasionally helps alignment.

The big question in my mind was if I had push ( >R in forth) and pop ( R> in forth), did I need any other operation than return ( ; in forth ), otherwise know as jump to the address on the top of the return stack.

For a loop it is simple:

: loop 1 + loop ;

Would compile to:


This would product an infinite loop and the binary would look like

1000 0001 // 1
0100 0000 // + nop nop nop
1000 0000 // loop
0f1f 0000 // push return nop nop

It looks doable if not terribly verbose. A16bit version is much more memory efficient:

1001 0100 // 1 + nop
1000 0f1f // loop push return

The trade off is of course addressable space and range of math operations. But it scales well enough. When implemented in Verilog it will be interesting to see how many cores of the different sizes could fit on different FPGAs. I could easily see a 32bit core surrounded by 16bit cores.

But what about if / then? Well we can compute a relative jump, assuming -1 is jump to branch and 0 is continue:

: loop 1 + dup 10 < loop ?

Will add 1 until the top of the stack is 10 or greater.

then // address of loop exit
- // loop - then (negative offset)
and // if flag (loop -then) else 0
then // then
+ // if flag (loop - then +then) else (0 + then)
push // loop or then
return // jump to loop or then.

So this implementation of conditional branching works by computing the goto address in software.

The unwritten requirement for both examples is that we know the actual address at compile time or have a link table and process to rewrite addresses. We can get around this by introducing the ability to access the instruction pointer ip and fetch its value to calculate an absolute address from a relative offset. This would add two opcodes to the address jump:

ip + push return

That would make all labels relative to the cell containing the ip instruction. This however is really awkward unless you force alignment of the ip instruction to a word boundary.