Language Musings

So the other day I was thinking some more about cyclic dependencies in the definition of certain core Smalltalk classes. Class objects like Array and Compiler have static references to each other driven largely by the need for the Compiler to use Arrays to compile code, which may also compile Arrays. While it could be possible to unravel these dependencies, and create a CompilationArray class, which implements only that subset of necessary behavior, this would serve little practical purpose and would defeat the purpose of using class objects in the first place (namely to remove duplication of code).

This then lead to thinking about how could one collapse codependent classes into a single uber class that understood how to accommodate these cases. If you look at it a Compiler is a utility that takes in objects and outputs new objects which can be interpreted by the VM. The fact that the Compiler is an object that produces objects from other objects means that it is an auxiliary class. It exists primarily to aid developers in maintaining a mental model which separates out concerns dealing with the machine representation of an execuable. But by that same token, because it is auxiliary there is nothing in and of itself which mandates that it's features need be separate.

A few years ago, I altered my Smalltalk image to move the compiler into the string class. You would simply tell a string to compile itself and it would return a compiled method, class, or object literal based on the contents of the string. This is similar to having eval operate on a string containing a function definition or object literal in JavaScript. Since all source files are just strings, rolling the compiler into a ByteArray class that could generate other arrays of bytes would seem like a natural fit for the compiler interface. But could it go deeper? After all, what if we didn't want to represent our program as a string, but rather as an object graph, similar to Lisp? Could the base Object class know how to compile itself?

What would it mean to a system where any object could be sent a message that would return a machine executable representation of that object? Any object could represent the input to a program, that program may not do anything useful, and may cause the CPU to fault, but it still would be able to generate a representation. Would it mean that any object which received a compile message would cascade it to any object for which it has a static reference? Or would the implementation assume a link phase to dynamically bind the references ( turning them into message sends to a global object ala the Smalltalk object or lobby in Self)?

What is the minimal set of such objects that one would need to implement to build useful systems?
My initial guess would be:

The first two simply wrap the basic machine instructions for a given architecture. The second allows for placing the first two next to each other in memory and manipulating ranges of address space. The fourth provides the necessary abstractions to treat the other 3 as if they were structured entities, and would provide the message passing infrastructure, compilation interface, and define the basic behavior of transformation of a program's representation into a mechanically functional representation.

That is to say, if we represent bytes as an object, machine registers as an object, and memory as an object, the structure of our communications is represented as object. I have built a few implementations of this set of entities over the years as the basis of NewScript, and am curious to see if a general purpose object system could be defined at an even lower level. After all a byte is an array of 8 bits, a byte is only really interesting as it is the unit of addressing on many common processors. Integers are just arrays of bytes, either LSB or MSB in format, and are interesting Ingush hardware operates primarily upon entities of this size. I wrote a DSL for a language which captured this once, where a bit was the base level object, and all larger entities responded to array syntax at an appropriate level of detail. Since array access is merely another variation on a function call, treating the symbolic reference as a function which maps an index number to a value, one could deference any bit in an array using sequential calls

Memory(integer_index)(byte_index)(bit_index) = 1|0

Allowed one to write a bit to any addressible bit in memory. This language is sufficient to represent any possible program or object system. Since we can short hand this we could just write literal machine code as I did in my JavaScript based compiler for NewScript. This sort of translation merely maps a specification for bit/byte/int/array operations into a machine native representation. In this methodology a message send is merely a transfer of control from one object to another. If objects may only alter themselves or inform Memory to alter itself, then all compilation is just a sequence of message sends. In this scheme the Memory array object is the system; compiler; and global context. It is what can enforce partitioning and separation of concerns. It is also the only object which can produce objects as all objects are sub components of it.

In a distributed system, Memory objects could easily communicate and share data over a bus. No Memory object has the physical ability to alter another, and conceptually should be isolated. The net result of this model is a physical system could be subdivided into a collection of separate memory regions (aka processes) which shared nothing and communicate through busses (IPC/RPC). This would make for running programs on millions of cores consistent across wide varieties of hardware. It could so allow for models of shipping state around many-cores systems. Granted this system sounds a lot like Forth meets Erlang meets Self. Which is exactly what NewScript is meant to become.