Object Orientation (and other poorly understood idioms)

There is an overwhelming preponderance of evidence that computer scientists should never be allowed to touch a computer. With that overly inflammatory remark out of the way, it's time to poke at a sacred cow, Object Oriented design. I'm not the only programmer to poke at this religious bovine:

17 Principles of OO Design

Probably the harshest critic of all, however, is a voice from the past on the very design principles behind Smalltalk and Object Oriented programming in general. Dan Ingalls' article in Byte magazine in August 1981 listed 17 design principles behind Smalltalk:

  1. Personal Mastery: If a system is to serve the creative spirit, it must be entirely comprehensible to a single individual.

  2. Good Design: A system should be built with a minimum set of unchangeable parts; those parts should be as general as possible; and all parts of the system should be held in a uniform framework.

  3. Purpose of Language: To provide a framework for communication.

  4. Scope: The design of a language for using computers must deal with internal models, external media, and the interaction between these in both the human and the computer.

  5. Objects: A computer language should support the concept of "object" and provide a uniform means for referring to the objects in its universe.

  6. Storage Management: To be truly "object-oriented", a computer system must provide automatic storage management.

  7. Messages: Computing should be viewed as an intrinsic capability of objects that can be uniformly invoked by sending messages.

  8. Uniform Metaphor: A language should be designed around a powerful metaphor that can be uniformly applied in all areas.

  9. Modularity: No component in a complex system should depend on the internal details of any other component.

  10. Classification: A language must provide a means for classifying similar objects, and for adding new classes of objects on equal footing with the kernel classes of the system.

  11. Polymorphism: A program should specify only the behavior of objects, not their representation

  12. Factoring: Each independent component in a system would appear in only one place.

  13. Leverage: When a system is well factored, great leverage is available to users and implementers alike.

  14. Virtual Machine: A virtual machine specification establishes a framework for the application of technology.

  15. Reactive Principle: Every component accessible to the user should be able to present itself in a meaningful way for observation and manipulation.

  16. Operating System: An operating system is a collection of things that don't fit into a language. There shouldn't be one.

  17. Natural Selection: Languages and systems that are of sound design will persist, to be supplanted only by better ones.

Good Ideas Gone Bad

Now if you examine the general design principles, they are all at face value, quite reasonable. A system must be masterable to be useful, otherwise it is inherently unmanageable by mere mortals. The design itself should be as simple as possible, without being too simple, self-consistent without being awkward. A language which fails to communicate intent and meaning is not a language, and a computer programming language which doesn't program a computer useless.

Where OO goes wrong, however, is in the implementation of the specific details. "Everything is an object" should not mean that everything must be implemented in object code. "Automatic storage management" need not mean that you should produce garbage. "Sending messages" does not imply that one need actually send a message. The "uniform metaphor" does not require the elimination of context in syntax and grammar. Similarly, "modularity" does not mandate modules or packages, but rather only that they do not meddle in the internals of other objects. "Classification" does not mean classes, but rather introspection. And "polymorphism" does not mean overriding and bad puns, but rather a separation between the semantics of expression and semantics of implementation. "Factoring" does not mean inheritance and classes, but rather elimination of repetitive structures and elements. The principle of "leverage" derives from maintaining a maximal vocabulary, with minimal words, not from expansive verbosity. A "virtual machine" need not be implemented as a virtual machine, it is merely a model for thinking about application onto an actual machine. The "reactive principle" does not mean MVC, but rather the capacity to directly interact with both data and code. Similarly, an operating system is an application which hosts your platform, and is inherently redundant, trying to support "system calls" is to view the software as the machine.

Evolution of Thinking about OO

The reason I began with that inflammatory statement concerning comp-scientists is that comp-sci is a branch of mathematics, not a branch of linguistics. Computer programs are first and foremost about the use of language, not the use of mathematics. The representation of a program as a sequence of binary values is simply no more relevant to a program than vibrations in the air are to a poem. A program exists as much in your head as it does on the hardware, and this is the point of polymorphism. If we distinguish the semantics of expression, the meaning and intent conveyed by the program, and the semantics of implementation, the binary representation for a particular piece of hardware, then we can focus on saying what we mean, and incidentally also produce a functional system as well. By focusing on how you would like to say something, rather than what the computer needs to hear, your programs can more readily match what a programmer will understand.

Rather than confusing our notion of object and the metaphor of sending messages, consider the difference between code and data. Code is data which the computer will interpret to produce a result. Data is just any binary encoded information. In an object oriented application, objects are code. Object manipulate data which represents the state of our virtual machine, the internally consistent metaphor for the physical hardware. Objects, themselves, are also data which may be manipulated by other objects, within the context of the virtual machine. If your language does not have this basic facility, then it is either not object oriented or crippled ( I'm looking at you Java ).

From this view point, a class is simply an optimization, an object which factors out shared code. All of the "meta-data" associated with an "object" is merely cruft resulting from insufficient factoring of the runtime implementation. Reactivity and classification require some means to distinguish objects from each other, but do not actually require such meta-data. For example, one could replace most type information in most existing OO language implementation by simply using a memory partitioning scheme, where classification is a product of virtual machine addressing, and not of a complex runtime meta-data structure. For the system to be reactive, the system must merely provide tools to interpret its own internal state.

When it comes to the concepts of storage management, many OO language implementations mistake automatic garbage collection for storage management. If you think of storage management in terms of the expressive semantics of the program, a Forth system's "block" is sufficient. A Forth block is typically just 1024 bytes linearly addressed. The expression "10 block" would merely address the 1024 bytes at address 10 * 1024. What is automatic about the Forth system's storage management is that Forth systems typically memory map storage, automatically saving ram blocks to their associated disk blocks. Similarly, Forth's colon definitions, variable definitions, and CREATE DOES> mechanism all provide automatic storage management for forth's "objects". The word "forget" even provides a means to reclaim system resources, when necessary. What is key to understand regarding automatic storage management is that the system requires a simple way for the user to produce both code and data. As to whether you create garbage objects is a matter of programming style or lack there of.

Modularity is another one of those concepts that most OO language implementations observe more in its violation than its practice. The purpose of modularity to prevent fragility. Changes to the internals of one object should not break the operation of other objects. This concept is familiar to C programmers as maintaining a consistent ABI. If you change an existing API, then necessarily you will break dependent components. Similarly, if you modify the API of another component, say by adding a method to a base class, you've once again violated the principle of modularity. Source code, add-on packages, and code libraries are not sufficient conditions for modularity, but rather the concept that each object is inviolate in its person.

Message passing, similarly, is often confused with arguments over early or late binding. Whether or not a branch to a piece of code is computed at runtime or at compile time is immaterial to the metaphor of message passing. What is important is that the invocation of a block of code can be achieved through a semantic action. The when is immaterial. This can best be seen by looking at JIT vs inline blocks vs dynamic linking. If I express the concept of "a(Circle).at(100,100).radius(20)" to quote a bit of Javascript code, the exact binary implementation of this representation can actually be calculated at compile time. The corresponding JSON would look something like:


Whether or not the "a", "at", and "radius" methods are invoked at compile, link, or runtime is immaterial to the semantics of each is a message sent to an object: "a" to Circle, "at" to the product of "a(Circle)", and "radius" to the product of "a(Circle).at(100,100)". There need not be any vtable dispatch, associated inline polymorphic cache, or even function call at run time.

A Way Out of This Mess

So how do we find our way out of this wet paper sack sans sherpa guide, torch, and length of rope? By recognizing that the critiques of Object Oriented techniques are more critiques of particular implementations and languages, than critiques of principles.

For example, one of Java's greatest failings lies in its separation of some fundamental types from its object model, and subsequent failure to adequately manage arbitrary binary data and code. Java has a virtual machine, with a byte code format, and some capacity for runtime introspection, and just-in-time compilation, but it has trouble with structured data. If you find yourself copying data into parallel arrays to avoid paying the Java object overhead cost, you've found yourself fighting both the metaphor and paying an immense semantic cost.

Similarly, if your Smalltalk system contains 378 methods in your base Object class, most of which are there to catch fall through cases in your introspection system, and your object model implements "doesNotUnderstand: aMessage" you've failed to factor. Clearly, any such implementation does not have the minimal number of unchangeable parts, as these parts require being overridden to be useful. More over, if your basic vocabulary exceeds the reasonable capacity for any programmer to remember, then you've ignored the first principle of OO design - Personal Mastery. If you still discover new parts of your base system year after year for decades, your base system is exceedingly complex.

And just because your favorite OO language implementation sucks, doesn't mean that it has to. Just remember Principle 17!