Intermediate Runtime Languages and the Abstract Tree of Abstraction

Many times we find ourselves running around in circles, not because we don't know better, but because we are on a closed track and have nothing better to do. That almost captures my feelings towards the JavaScript language games I've been playing. The idea of targeting a high level language as the architecture on which software can be compiled seems counter intuitive. Obviously the abstraction must make every operation so much more costly. How could any code produced for a HLL be faster than what a human could write by hand?

This enters into the strange world of abstract machine representations and the limits of compiler technology. Years ago, I came across a Forth2C compiler which boasted faster speeds than C for many applications. This claim at face value seemed ridiculous, how could a language compile to a language and generate code faster than the host language? The answr lay in what the compiler could do that no human C programmer would.

The resulting C code was an unreadable listing of a single function filled with goto statements and address labels. The C code was essentially a high level assembler listing designed to target many of the sweet spots of the compiler technology of the day. It did a SSA pass before most C compilers could, did branch optimization in ways no human would, and produced the sort of spaghetti code no human would want to maintain. But as a intermediate representation that could be compiled to a wide range of architectures, C was a great language. And since the programmer was maintaining the Forth and not the Generated C code, the human factor did not have to deal with the complexity of the intermediate representation.

This is the same thinking behind languages which target the JVM or JavaScript. By targeting a platform that has a peculiar set of complicated optimization characteristics but wide availability one may actually gain both portability and performance enhancements through translation. By providing convenient abstractions that compile down to forms that are more suitable for JIT compilation, the additional level of abstraction can help programmers avoid writing code which is difficult to optimize. It is not that human programmers are incapable of writing efficient code, just that by making it easy to do so we remove the barrier which is the specialized knowledge and attention to detail.

This problem space grows even more complicated when on realizes different target JS engines have wildly different performance characteristics. Building a differentiating optimizer which can conver common forms to engine specific representations can also help alleviate the proliferation of optimization techniques necessary to produce performant code across multiple browser platforms. This efficient management of programmer time by shifting the cognate the burden from one level of abstraction to another is a fundamental precept of all computer language development.

Each bifurcation of the platform hints at a long term trend, unification through successive layers of translation. As the quantity of hardware and software platforms grow over time, and the use of heterogeneous networks become common place, no single abstract virtual machine, high level scripting language, or meta object system will work. Languages like Ometa, demonstrate a formal methodology for describing these linguistic systems, which hints at the lessons learned by Forth: a unified language will always produce domain specific / problem oriented language which will become the basis of the mental model of a programmer in a given context.

The meta language which will win will be the first to purge itself of the structural components based upon the underlying platform.