The "new" Newscript
- I want to use Newscript in other projects with other programmers
- Most programmers speak some dialect of Algol
- Why not implement an Algol like parser for Newscript, so you can use either postfix or infix notation?
- Great idea! Let's try that!
What is working?
The short list of things that work:
- Objects & Object operations
- Arrays & Array operations
- Integers & Interger operations
- JSON output
- Postfix Interpreter
What is left to do?
The list of things left on the ToDo list are:
- Infix -> Postfix translation
- String operations
- C ABI calls
- Native compilation
- JSON input
- Standard Library
- DLL Support
So how is it implemented?
Well here's a breakdown of the current code using my code metrics suite:
To give you an idea of the programming style, 29 of the functions are inline one liners. 22 functions are 2-3 liners. And 4 or the remaining 9 functions take a variable number of arguments. The longest function in all of the code is 8 lines long, and 3 of those lines are local variable declarations. This is highly factored code in C. One reason that so few lines are required is because the object model was designed to facilitate this. Data is laid out in very straight forward ways, and memory is partitioned to make type identification easy. Strings are internalized and counted. Lengths and data are referenced using macros based on raw pointers. A few utility macros keep initialization of data structures a simple affair.
Some sample code
If you would like to see some of the ways I've translated ideas from the phosphor / ad hoc code base into C, you can probably go with no better example than the object composition function from. From produces an ordered union of the key,value pairs into a new composite object. This is the way that inheritance works in Phosphor, and the model I'm adopting in Newscript as well. The way it is currently implemented is:
object from(object o, ...)
for (object t = o; t; t = va_arg(args,object)) each(t,copy_slot);
From takes a variable number of arguments, terminated with a NULL, and returns a new object after doing a shallow copy of each of the object's slots. The function uses a nested function as well which is passed as a function pointer to the each iterator function. Slot allocates a new slot in memory, and the intAt() macro treats retval as an integer pointer. Strictly speaking slots are numbers 1 ... length(object) and the exact size is dependent on the processor architecture. Object(0) allocates an object with 0 slots, and like from, also takes a variable list of key,value pairs. You can probably guess how Object(...) is implemented underneath the hood too!
Forward Looking Statements
Whenever I do a project like this, something about the implementation language inevitably pisses me off. C's stdarg.h is one of those things. The ABI for 64-bit Unix is another. I've got some prototyped code for directly compiling the interpreter's functions to Intel native machine code. I think that the full compiler for the core language will be somewhere in the 50-100 lines of C code range, half of which will be a table of binary data. I've toyed around with playing games with the page table settings, but am probably going to settle for doing ad hoc JIT. One of the focuses of this work is to make floating point and integer math integral to the language. I've decided to go with a OCaml style +. operator for floating point math, and removing the utter braindead string overloading of +. Instead, the , operator will serve for all sorts of concatenation, just as the : operator will construct key:value pairs.
All in all I expect to finish the initial interpreter in under 500 lines of C code. The final native compiling JIT will fall somewhere well under 1k LOC. Since I'm not even to 200 statements yet, I may even be able to sneak a functional JIT in under 500 statements. Pulling this off has some serious upside for a wide variety of projects I'm working on. I can embed this sucker, as its memory footprint is very small. It takes nearly no time to initialize, and would be suitable for integration with Jawas. And finally, the code base is small enough that it can be used for teaching. I have a pet project I've always wanted to build, and I think this will be the perfect tool for embedding scripting into it. But that is a post for another day.