The Firth Machine

So for about the past 22 years, I have had a series of projects which involved implementing programming languages. In addition to building forths, lisps, and C compilers, I have also poked in the internals of about a dozen other languages. Most of this is driven by curiosity, but in the case of other people's code frustration. 

For about the past 12 years, I've worked off and on on a language I've dubbed Firth, which is a pun on the "Firth of Forth" (the tidal estuary on which Edinburgh resides).   In that time I've done about 18 partial implementations. Assembler, Firth in Firth, FirthML, Objective-Firth, SmallFirth, cFirth, plFirth, and a host of variations on the above.  Over a decade it is easy to reiterate over an idea exploring the same space from different angles. 

The initial inspiration I had for Firth was the first language I wrote as a teenager, Glish, which bore a striking resemblance to TCL (which I didn't learn for about 6 years later). It had the following properties:

At the age of 14, I only knew C, Pascal, Forth, and Basic. The design of the system was basically the best Forth I could think of that I could build in C.  These were the days where I'd type Intel machine code into MS Debug to write COM files.  Having a big hard drive and C compiler had gone to my head I think.  I couldn't quite get a handle on how to do message sends. I had seen Smalltalk only in magazines, and the closest approximation to a message send was a parameter passed on the stack.  Since the only data structures were C strings and ints, the dictionary was a linked list of the form:

  Address, Length, String, ...

Where Address was the value associated with the string, Length was the size of the string plus null, and String was the string data itself.   Walking the list was simple and involved a for loop.   The heap only grew, and it took a while to develop the idea of walking this structure backwards to all for redefining terms.   But I was young and only got to read code in magazines and off BBSes at friends houses. 

So now thinking about this design, 22 years later, I really love the idea still. There are a lot of rough edges, but the basic concepts were sound. I had worked around scary concepts like Abstract Syntax Trees by writing Forth, a linear sequence of instructions.  I didn't have a hash table, but a linear scan of a neatly packed RLE structure.  (I was familiar with RLE bitmaps). Programs were parsed and converted to string pointer lists, which at 16bit pointers, these were cheaper than words, and made the interpreter very simple. A simple switch statement covered the core words of the language, and any word beyond that was a literal or subroutine. If you walked to the end of a dictionary the value was just left on the stack.   And then...

22 years pass

Modern Firth designs (of which there are many in many languages) look like this:
And it works in practice because of how one must approach programming in this environment.   Objects don't access each others memories, pass around large objects (by reference only), or mutate remote state. In fact nearly all Objects in Firth are immortal. Once created they are never unmade, but are referenced over and over again.  Rather the style treats data as separate from objects, and code as another form of data.   An Object may use its entire heap however it sees fit, and may implement a GC internally based on its needs. This shift in thinking allows for many approaches to managing memory usage, and avoids the on size fits all schemes common in other languages. 

Firth also dedicates registers specifically to streaming data from and to locations in memory in a linear fashion. Fetch + Increment and Store + Increment instructions make it easy to fetch from two cache pages and store in a third. This means an object can make efficient use of 4 separate cache lines in short order, and take advantage of  streaming instructions within its model. Since each object can manage its own heap, it can also ensure that two regions do not overlap, avoiding aliasing problems which prevent optimization in other languages. 

The Firth model also supports massively parallel processing similar to Erlang, as each Object is a process unto itself. Passing control from one object to another is not only done cooperatively, but also pre-emptively via a Scheduler object. The Scheduler is itself an object which makes use of the Timer to prevent any one object from monopolizing a core.  The combination of inexpensive process switching with a per-core watchdog allows for fine tuning on resource usage.  For example the Screen object registers with the scheduler to prevent preemption during a Screen draw, ensuring graphics performance. The Mixer does the same for audio data, and makes use of the scheduler to ensure full DMA channels.  Since the Scheduler itself is only just an object, it too can be replaced with another object which meets the specific needs of the application. 

And this brings me to the design ideology of Firth, it is just Lego. There are known points of interface, they plug together end on end, and each piece has its own peculiar design for its purpose. If you need more than one, you make more than one, and once made, you keep it in a toolbox to pull out as necessary.  No block ever depends on the internal layout of the other, and few key blocks make up the foundation of most structures.   Objects are concrete, data is ephemeral.  Like with human lives, Objects represent a continuity of structure, not the instantaneous vagaries of state.  Firth objects are like firths, they ebb and flow, changing with the tide, yet always remaining the same. 

Philosophically, Firth is an object oriented system.  Firth does not, however, treat everything as an object. All code and data in Firth are just that. Conceptually, all Objects are implemented as data managed by the Memory object.  This Object is responsible for managing all access to a system's memory. They System object, itself, communicates with the Memory object to perform reads, writes, updates, etc.  Since the Memory object is used for so many things, its operations are delegated to by nearly all objects in the system.  These sorts of calls in practice, do not result in a context switch, as the Compiler object is smart enough to inline all Memory method calls. The Compiler itself, relies upon the Memory object to store all of the code and provide all of the source data.   As such, in Firth, all code is data and all data is code, an Object is neither.  An Object is a set of related behaviors exhibited by the system. 

The Firth system explicitly exposes the entire set of behaviors at compile time as well. Rather than making a clear distinction between a program which is run later and a script which is interpreted now, Firth allows for words in a source stream to modify the compilation environment by immediately triggering the Compiler to invoke code.  The Compiler may be passed a message in a source stream using a   parenthetical note allowing for modification of the environment, the source stream, or the Compiler itself. This alternative approach replaces preprocessing, macro expansions, and code rewriting engines.  Doing something as trivial of including other source files, downloading 3rd party objects, or setting up a distributed build environment can all be done using compiler instructions. 

Current prototypes of the Compiler with minimal optimizations, peephole optimizer, tail call optimizations,  and streaming instruction support, get performance between gcc -O1 and -O2 on comparable programs.  The lack of SSA analysis or complex AST rewriting engines doesn't seem to have crippled the base performance of Firth.  While not as fast as gcc -O2 in general, most programs I have written in Firth exhibit better memory usage, startup time, threading, and I/O characteristics.   The use of multiple heaps and stacks combined with cheap function calls make context switches as expensive as the typical C function call. Using a register based strategy for passing stack values between processes makes message sending incredibly inexpensive as well.  Heavy use of inlining and avoiding branching in general also helps.   Since the Firth programming style encourages use of permanent objects, most core message sends are actually inlined and avoid any branch, dynamic dispatch, or context change. 

The lack of an interpreter also makes for a very responsive environment.  Programming a system image involves sending messages to live objects.  The interactive environment, reads a line from STDIN to a buffer, passes it to the Compiler which compiles it to code, and then executes the resulting binary, returning to the REPL.  So typing 1 + 2 literally compiles:

  mov rax,2
  add rax,1
  jmp Console_print

Where "Console print" writes the value in rax to STDOUT and returns to the main loop.  Because the compiler is incredibly fast, there is no reason to interpret anything.  This compile and discard approach to code also highlights one of the approaches Firth programs use to make things fast. Rather than try to outsmart the developer, Firth allows the programmer to write code that writes code.  If you have an app that needs to calculate fib(n) and n is known to be 17, write an app that compiles code that calculates fib(17) at compile time and the app just returns the result.   If you need to write an app that does something exactly 5 times in a row, write an app that compiles the same code 5 times in a row, even if it is just 5 identical message sends. 

And that's about it. Firth is a retro-future programming system with bare metal ambitions. Hopefully it will be useful in practice one day.