One of the peculiar implementation details of ArrayBuffer objects, is that access of the buffer object is done with a typed proxy object. If you look at the source of an emscripten project you'll see an ArrayBuffer used as the heap, and a separate HEAP# variable for the typed array to access the main heap buffer by each of the 6 alignment + sign combinations for access. HEAP8 provides byte oriented access, whereas HEAPU32 provides unsigned long access to the same memory space. While this scheme works well for machine generated code, the human aspect is suspect. When I did my accessors, instead of separate variables, I attached the accessor proxy to the ArrayBuffer object itself as an additional property:
Memory = new ArrayBuffer(RAM)
Memory.char = new Int8Array(Memory)
Memory.word = new Int16Array(Memory)
Memory.int = new Int32Array(Memory)
Memory.byte = new Uint8Array(Memory)
Memory.short = new Uint16Array(Memory)
Memory.long = new Uint32Array(Memory)
This made the code easy to read and produced a nice programmer interface. Using this interface I produced a stack object to wrap an ArrayBuffer object for both the return and data stacks:
stack = function() ...
Using this stack interface object, I could then cleanly implement most of the instructions as method calls:
case 2: s.nos(s.nos() + s.tos()); s.pop(); break;
This sort of straight forward code was a direct translation of the behavior of the stack machine. And the primary reason for this was during the port, I had to translate what the C version did:
case 2: add(s); break;
Where "add" was a macro:
#define tos(S) SS[-1&31]
#define nos(S) S(S[-1-1)&31]
#define pops(S) S(S[-1--)&31]
#define add(S) binop(S,+)
In the case of the C implementation a 34 cell array was allocated on the stack using alloca, and the pointer was set to the third element in the array, so that the offset & 31 would index the remaining 32 cells as a circular stack. A similar technique was applied to the ArrayBuffer implementation allocating 2 cells past the end of the stack as a stack index and a temporary buffer.
This design produced a human readable code base, but it definitely traded performance for human readability. Just how much so?
Well invoking the compiler as;
Compiler.eval('1600000 push 0 ~ : loop 1 + <- loop')
Which evaluates ~4.8 million VM instructions, produced:
Safari. 1s 4.8mips
Chrome. 0.5s. 8mips
Firefox. 10s. 0.5mips
But a quick rewrite later restructuring the memory access to use straight proxies without the memory object property lookup, removing the function call overhead of the stack accessor functions, and directly coding the assignments inline in the instructions produced:
Safari. 0.495s 9.7mips. (2x improvement)
Chrome. 0.085s. 56mips. (7x improvement)
Firefox. 3.6s. 1.5mips. (3x improvement)
These numbers are nothing short of astonishing. Safari doubles its performance when you remove the function call overhead. Firefox benefitted from. 3x improvement from removing the function call overhead. But Chrome! Chrome demonstrates what happens when the JIT can finally kick in. Not only did the function call overhead go way, but we obviously hit a code path that the JIT liked. In fact, Chrome's performance is not only approaching the performance of native code, but approaching what the FPGA implementation would achieve. As such the performance is at a level suitable for prototyping actual applications, although at almost 100x the power draw.