So Crazy It Might Just Work
What if the problems we face with making Object Orient Programming Languages efficient is merely a product of confusion on the part of most Object Oriented Programmers? What if the implementation and model bore no resemblance to each other, and message sends were entirely compiled away at run time? What if we could avoid garbage collection entirely by writing object oriented programs that never produced garbage? What if we never had a class or any form of inheritance (biological metaphors poorly adapted to artificial constructs), but instead just built objects from a box of parts? What if we had a finite box of Lego, and could rearrange them as we wished, but could not fundamentally alter the structures we had already built without disassembling first?
There is a fundamental flaw in the way we think about software in general, and that is that the language is the program. It is not, it is a description of the program. The actual program is a collection of bits in memory that the hardware interprets to generate a result. There is no sufficient reason that your program need be preserved in the artifacts running on the CPU. Most of the effort in C/C++ compilers over the years has been specifically aimed at rewriting your code into something more CPU friendly. But yet at some fundamental level, we let this reasoning cloud our thinking when we go up the stack, and see things like virtual machines with dynamic dispatch instructions (ignoring the fact the whole vm is a dynamic dispatch loop). Why does the VM need an implementation that maps to the language model but the VM not need to be mapped into the language? Why can't I dynamically extend the VM in the VM while it is running? Why must I resort to a statically compiled language? It's not because of efficiency, because the context switch kills that.
I feel there is a huge need to challenge these basic assumptions about how languages and machines should work. Even the cutting edge bits of VPRI's cola, ometa, and maru miss the mark in this regards. While small and malleable, cola's object model is a hash data structure with a magic pointer to a vtable (née class map). This is essentially a compression technique, designed to reduce repetition of data. Much like classes and prototype trees, these techniques allow for code and data to be separated, but at a cost of multiple layers of indirection. The peg work is nice as it generalizes a pattern matching pattern generator, but it too is bound by an assumption that a program should be expressed as a linear string of text. Here too we apply a concept, "human readable", and thoroughly mistake having an abundance of existing tools for accessibility. If you are a programmer already or have a math background the representation may make sense, but does little for those who aren't. The idea of homoiconism is even more dangerous when applied in this manner as the language's ability to manipulate its own representation is more a reflection of the language's facilities for mangling text than for mangling the machine representation. For example, it is easy to represent a macro in lisp, but damn near impossible to represent a block of machine code and jump to it. Doing so will require violating the user illusion.
This is not to say that the work done by the people involved with VPRI isn't worthwhile. In fact the work that Ian has done is spectacularly clear and built upon proven methods. Adopting what VPRI would represent a massive improvement on the state of the art as implemented in industry. But even so, the techniques espoused are not necessarily suitable for general consumption. At the end of the day, most programmers are little more than welders who use glue code to stitch together other people's algorithms. There is no need to build a system that gives these programmers more expressive power or greater insight into their systems, as it does not support the economic model used by large corporations. Code at scale requires a different set of challenges be met:
- how do I hide complexity from programmers
- how do I discover available interfaces
- how do I access accurate documentation
- how do I enforce policy and obligations
- how do I avoid tight coupling at all levels
- how do I handle failure
- how do I distribute processing
- how do I handle upgrades
- how do I characterize a system's performance
- how do I identify potential problems
- how do I resolve conflicts
All of these problems fall far outside the scope of most language designs. Java attempts to address a few of the social issues, but fails miserably at most of the operational and structural problems. Erlang does a good job at managing many of the operational details, but doesn't address, and addresses a few of the structural questions, but fails to address the social aspect. Self probably provides the best social model so far, but doesn't have the operational legs to stand on. Most other languages (especially those of a more academic bent like Haskell) don't even bother paying lip service to these critical problems. The pragmatic languages like Ruby and Python and Perl are popular to the degree they are because they've been good enough at the social and structural aspects, and leave programmers to struggle with the operational aspect.
Consider for example a type system. Which of these issues does a complex type system in a language like Scala or Ocaml actually address? It doesn't hide complexity, it makes it explicit. It doesn't make interfaces discoverable, you need reflection to do that. It doesn't provide documentation, it necessitates the writing of more in fact. It may enforce policy if one models one's policy as a type system, which seems like an odd way to define requirements for behavior. It actually necessitates tight coupling to the degree that any interface must share the same type definitions. It doesn't allow for graceful handling of failure, but rather fails us repeatedly until we satisfy the constraints of the type system, while providing no guard against common errors in logic, sequencing, or synchronization. The type system makes little contribution to distribution as that lies outside of what is typically modeled in this manner, cf. monads. Type safety makes upgrades more difficult as defining a transformation from one type system to the next is on the order of difficulty of defining the type system in the first place. While strong typing can improve performance, it does do by restricting possibilities. This makes profiling a bit easier as the runtime is well known. But this same process also makes it less flexible, making it harder to patch with diagnostics, add backtracking and deoptimization for live debugging, and often means a fundamental change in design to add such tooling. Finally a type system makes it difficult to resolve conflicts due to the a priori nature of a type system; without knowing the types ahead of time you can't perform static analysis. Runtime type checking allows you to work around these limitations, but does not help when one wants to edit the code live and restart as is typical in Smalltalk and Self.
A large part of the problem is the people who design languages first and foremost optimize for the person who designs languages, then design for the computer, then placate their friends, and as an afterthought address the issues of actually using the language to get shit done. Languages like Perl & Forth get ridiculed because of their idiosyncratic natures, but both languages tend to be looked upon fondly by people who's primary goal is not writing software in the abstract, but are writing to solve problems on a specific piece of hardware. C, with it's unsafe macros, squirrelly typing, and simplistic flow control, still accounts for most of the software that matters. C doesn't address any of the above issues at the language level, and after decades of development only has the tooling to compensate if you look at the higher order languages and systems written in C.