DAVE'S LIFE ON HOLD

JavaScript Abstract Machine

If you look at projects like Low Level JavaScript and understand the ramifications of what JIT technology can do, you will quickly realize that most tasks that require the manipulation of compact data structures loose their overhead when JIT enabled TypedArray access is built into the browser. Since the browser can establish the security model of the memory region by mmaping it with PROT_EXEC turned off, CPU based no execute flags can be observed and prevent a wide category of malicious code exploits.

In my own work, NewScript and Firth both exploited techniques similar to emscripten's mode 2 memory model. When I first publically demonstrated the NewScript VM, I had it compiling native x86 machine code in the browser. On Linux, with the addition of a minimal ELF header, the JavaScript compiler could produce executable binaries locally. For Mac OS X, I was able to produce Mach format object files that were also natively executable in the browser. I left it as an exercise to figure out how to add an exe header for windows. All of this was done before emscripten, and used base64 encoded URLs to download the locally produced binary.

These days, with modern JITs, we can do even better, and compile systems that emulate entire machines. The speed is approaching near native, and will eventually become indistinguishable from native apps written for the CLR or JVM. The reason for this is the Javascript JIT compiles efficient memory access native code the exact same way the JVM or CLR compiles proper bound checking native code. On x86_64, all will produce the same SIB mov instructions, baking in the offset, scale factor, and bounds checked index. C/C++/Assembler may produce faster code, but only by throwing out the bounds check. A clever programmer can make the bounds check almost free by laying out memory intelligently and using power of 2 friendly sized arrays, but most won't bother.

After you take all of these factors into account, and look at where the bottlenecks in most programs arise are not a even in the raw manipulation of data structures but:


You then realize most tasks can be completed equally fast by a high or low level language. About 15 years ago, I had this epiphany and produced bindings for the Simple DirectMedia Layer for Perl (and python too but I handed what code I had to Pete to add to PyGame...), and it is as true today as it was 15 years ago; very little of your code is HLL. Most HLL code, too, isn't so high level or dynamic to suffer from JIT optimizations.

What this means for JavaScript is we are at the point where JavaScript in JavaScript to native is not just possible but inevitable. As typed arrays give us the ability to match the C code of the engine itself, JavaScript can now compile its own object model. A simple memory model and implementation of JavaScript's base objects in x86_machine code are all now easily implemented in JS.

Once JavaScript in JavaScript JITs JavaScript, the evolution of the language is limited only by the weight of culture. Few people will ever run a native JavaScript engine, or a operating system written entirely in levels of JavaScript. This is not because it isn't possible, but because we have inherited a vast ecosystem of other people's code which runs on the extant systems of yesterday. Nor is this to say that projects like Node are doomed to the dustbin of history, once proper JavaScript programs can be written. But it does offer an entertaining possibility of cutting the web free of the cord of the Browser, which gathers layers of sedimentary code with each passing year.

What changes would be desirable in JavaScript in a native implementation?
There are security concerns with the last of these, in that any 3rd party code can in theory trigger that behavior, but segmenting the memory model merely helps prevent the JIT from breaking the JIT while JITTING. Since 3rd party code may be compiled into a separate memory space, keeping a separation of concerns between memory regions becomes critical to ensuring safety. Message passing and fundamental types allows for passing data across segments, avoiding taint problems and privilege escalation attack vectors.

The net result of all this would be a system that more easily implements JavaScript in JavaScript without the headaches of defining a virtual machine and unsafe at any cost execution model.