The "new" Newscript

Oh what tangled webs we weave! The past few weeks I've been slowly gleaning a few spare moments to build a new version of Newscript. Progress has been slow but steady, and I'm approaching the first release any day now. The basic premise behind the new Newscript is that it supports a syntax not entirely dissimilar to Javascript. wait rewind -- Javascript?. Yeah Javascript. The rationale goes something like this:

Now that being said, 99% of all the Javascript in the world will not run on the Newscript engine. In fact, I'm making some rather radical changes to the syntax, but retaining much of the object model. The idea is more to write a language in the spirit of Javascript rather than strictly implement ECMAscript or one of its bastards. After all, why repeat all of the old mistakes when you can make bizarre new ones?

What is working?

The short list of things that work:

What is left to do?

The list of things left on the ToDo list are:

So how is it implemented?

Well here's a breakdown of the current code using my code metrics suite:

Globals: 16
Functions: 60
Statements: 188
Lines: 305
Characters: 8394

Yes you read that right, it is 188 statements in 60 functions, with 16 globals, in a 305 line file (about 28 characters per line). The average function is 3.4 statements long, and I haven't finished the latest refactorings, so I could probably cut that down further with a little effort. For most postfix based code, it can interpret and produce a result, read input from the keyboard or a file, and printout the results. It can also dump rather complex nested objects into JSON that can be read by any standard Javascript engine.

To give you an idea of the programming style, 29 of the functions are inline one liners. 22 functions are 2-3 liners. And 4 or the remaining 9 functions take a variable number of arguments. The longest function in all of the code is 8 lines long, and 3 of those lines are local variable declarations. This is highly factored code in C. One reason that so few lines are required is because the object model was designed to facilitate this. Data is laid out in very straight forward ways, and memory is partitioned to make type identification easy. Strings are internalized and counted. Lengths and data are referenced using macros based on raw pointers. A few utility macros keep initialization of data structures a simple affair.

Some sample code

If you would like to see some of the ways I've translated ideas from the phosphor / ad hoc code base into C, you can probably go with no better example than the object composition function from. From produces an ordered union of the key,value pairs into a new composite object. This is the way that inheritance works in Phosphor, and the model I'm adopting in Newscript as well. The way it is currently implemented is:

object from(object o, ...) <br />        object retval = Object(0);<br />        va_list args;<br />        va_start(args,o);<br />        object copy_slot(object x) { slot(x->value,x->key); ++intAt(retval);
for (object t = o; t; t = va_arg(args,object)) each(t,copy_slot);
return retval;

From takes a variable number of arguments, terminated with a NULL, and returns a new object after doing a shallow copy of each of the object's slots. The function uses a nested function as well which is passed as a function pointer to the each iterator function. Slot allocates a new slot in memory, and the intAt() macro treats retval as an integer pointer. Strictly speaking slots are numbers 1 ... length(object) and the exact size is dependent on the processor architecture. Object(0) allocates an object with 0 slots, and like from, also takes a variable list of key,value pairs. You can probably guess how Object(...) is implemented underneath the hood too!

Forward Looking Statements

Whenever I do a project like this, something about the implementation language inevitably pisses me off. C's stdarg.h is one of those things. The ABI for 64-bit Unix is another. I've got some prototyped code for directly compiling the interpreter's functions to Intel native machine code. I think that the full compiler for the core language will be somewhere in the 50-100 lines of C code range, half of which will be a table of binary data. I've toyed around with playing games with the page table settings, but am probably going to settle for doing ad hoc JIT. One of the focuses of this work is to make floating point and integer math integral to the language. I've decided to go with a OCaml style +. operator for floating point math, and removing the utter braindead string overloading of +. Instead, the , operator will serve for all sorts of concatenation, just as the : operator will construct key:value pairs.

All in all I expect to finish the initial interpreter in under 500 lines of C code. The final native compiling JIT will fall somewhere well under 1k LOC. Since I'm not even to 200 statements yet, I may even be able to sneak a functional JIT in under 500 statements. Pulling this off has some serious upside for a wide variety of projects I'm working on. I can embed this sucker, as its memory footprint is very small. It takes nearly no time to initialize, and would be suitable for integration with Jawas. And finally, the code base is small enough that it can be used for teaching. I have a pet project I've always wanted to build, and I think this will be the perfect tool for embedding scripting into it. But that is a post for another day.