DAVE'S LIFE ON HOLD

Sour Grapes of Syntax

How often do you find a comparison of two languages fall into the syntax trap?

  "Oh YYZ has terrible syntax!"

Where YYZ is the programming language which is not the one you are most familiar with.   And then there is my favorite statement:

  "YYZ has decent syntax."

The general criteria by which languages are typically judged are:


At no point in these criteria does anyone actually look at syntax in any sort of depth or detail. They look at it from the standpoint of mimicking a syntax they learned as a child in school for mathematical expressions.  At no point does a serious consideration of syntax enter the picture. 

Look at Lisp:

  ( + 1 2 3 )

( is punctuation meaning start of an expression, + is a verb, 1,2,3 are predicate objects. ) is punctuation meaning much the same as period in English .

Ignoring punctuation for a second, Lisp has a base syntax of a verb followed by an arbitrary number of predicate objects with the subject being implicit. 

Look at Smalltalk:

  1 + 2 + 3 .

In Smalltalk 1 is the subject, + is the verb, 2 is the predicate object, and this defines a clause.  The clause (1+2) is the subject of the sentence to which + is the verb and 3 the predicate object.   It differs from Lisp in that there is an explicit subject and implicit clauses. Sometimes we have explicit clauses in Smalltalk:

  Transcript print: 'hello world'; cr .

Here Transcript is the subject, print: is a verb, and 'hello world' is a predicate object. ; is punctuation which means roughly, ignore the result of the last clause and apply the following predicate to the prior subject. Hence Transcript is the subject and cr the verb. 

Look at C and you'll find:

  printf("hello world");

Where printf is the verb, and "hello world" is the predicate object. ; still means forget the result of that last clause, but like Lisp, the subject is implicitly the program itself.   C uses another syntax:

  Foo = 1 + 2;

Where in Foo is the indirect object of the predicate, = is the verb, and 1 + 2 is a subordinate clause that serves as the predicate object. Strictly speaking + is a verb with 1 and 2 being predicate objects, as the subject is implicitly the program itself. Foo is an indirect object as it is a symbolic reference that is in a dative and not nominative case as the left value of the = operator.   C's syntax in this case is defined largely by precedence rules which establish the subordination of one clause with respect to the next.  These rules are fixed in the language, and not extended to use defined words like variables and functions which have a fixed order in the syntax.

Some of the ML family of languages share C's operator based precedence scheme, but allow the programmer to extend the table at runtime. This makes it possible to introduce new syntactic forms into the language by providing a basic mechanism to order subordinate clauses. This in turn results in what some programmers call "write only code", because their limited knowledge of syntax and grammar makes it difficult for them to read other people's syntax extensions.  C++ with operator overloading also can suffer from this sort of complaint due to the programmer having more control over clause creation. 

Perl has additional mechanisms to extend the syntax of the language.  By specifying either a function prototype or declaring attributes, it is possible to alter the clause formation rules on a per subroutine basis:

  sub mygrep ($&)    ...  

By being able to coerce the parameter types or take blocks of code, Perl allows the average programmers to create new special forms.  Compared to Lisp or C's special forms, this is a radical departure.   C and Lisp both make use of special forms for flow control, Perl use it for syntax extension.  Consider:

  for ($i = 0; $i < length(); ++$i )  ;

This special form is implemented in C, Perl, and JavaScript. In fact, depending on your compiler, this literal code will work in all 3 languages.  It is a special form in that each subordinate clause has different evaluation characteristics, and have significantly different semantics from the code:

  fore( $i = 0, $i < length(), ++$i , );

which is valid Perl, JavaScript, and almost C.  What differs between the special and regular forms is the regular form is reduced prior to the verb, where as the special form defers evaluation to fixed points in the flow.  Any word in C that takes a block is a special form such as: do, while, for, if, else, and switch.  In these languages the user is incapable of defining words which exhibit these syntaxes.

When most programmers talk about write-only languages what they really mean is that the range of vocabulary and syntax falls outside of their personal comfort level.  Languages which have irregular syntax like C can be deemed normal when one learns the two primary ways of producing subordinate clauses and a limited set of special forms for flow control.  Consider the use of ( and ) punctuation in C.  You can think of them as subclause generators:

  Foo(1, (2+bar))

In this case Foo is a verb and (...) is the subordinate clause as the predicate object.  Inside of (...) is an expression:

  1,(...)

where 1 and (...) are predicate objects and , is the adjacency operator which specifies which objects are next to each other on the stack of on the heap.  The final subordinate clause:

  2+bar

hold 2 and bar as predicate objects with the verb being +. In C the addition of multiply nested () has no net effect on the program. 

  3 == (((((1+1+1)))))

These punctuation marks have no semantic meaning and are not a syntax error, poor style but not wrong. In Lisp on the other hand (()) has an explicit meaning as this represents trying to evaluate the verb nil aka. () .  This is the core difference between the C verb outside the clause vs the Lisp inside the clause. And this is why C, Java, and Algol language family programmers rail against the ugliness of Lisp (though they find it conceptually beautiful). 

But Lisp, C, and Smalltalk have a fourth sibling in the family tree: Forth.  Among programmers familiar with any of the above, the complaint of write-only-language is levied most frequently against Forth. Forth has only two syntactic forms:

  verb object
   object verb

In Forth a word either consumes the word to the right at compile time as in a defining word, or operates on the word to its left at run time.  That's all of Forth's syntax.  Now what makes Forth's syntax alien is it has two modes of operation in mind: compiling and running.  Since a word may be a compiling word that runs some code at compile time, Forth words can introspect about the source itself. For example the word "variable" or "var" will actually read the next word in the compiler's buffer (TIB or The Input Buffer in Forth parlance) and compile a dictionary entry that allocates space and generates a named routine to access that space.   Another word "create" will generate a named dictionary entry based on the next word in TIB and leave no data there, to be filled in by other words.  Some words like "immediate" modify the last word defined, flagging it as a compiling word. 

From a programmers perspective who is unfamiliar with Forth, this means the syntax is a bizarre collection of special forms. More over, programs themselves are capable and often extend the compiler semantics producing domain specific languages (DSL) which in the heyday of Forth were call Problem Specific Languages. Because each DSL could define new syntaxes and programmers writing code found this handy, people with partial understanding of the code and problem space would often find it impossible to modify the code without deep introspection.  This becomes the root origin of write-only-language complaints, and most comments on syntax.  If your language of choice has a limited syntax, it is easier for a novice to start editing without knowledge of the system. This doesn't mean it is easier to program, or maintain, merely the barrier to entry is lower. 

Looking at these 4 schools of syntax, there is a reason why each school has its own adherents.  The Algo family is the most popular because it mimics what most people learn in school. Programming in schools then follows suit, and programmers cut their teeth on very limited, highly inconsistent, and inflexible syntax. Languages which only deviate from this root in terms of punctuation and spacing achieve varying levels of popularity based on their core similarity. 

Lisp, Smalltalk, and Forth are three schools of the same camp, but represent the other ends of the spectrum. All three are exceedingly simple, genuinely consistent, and far less popular because they depart from a cultural tradition that is instilled from grammar school on.  Of these two Lisp and Forth are the oldest languages and most capable of meta-programming.  In fact, the trend over the years has been away from both bare metal and meta-programming, two tasks both of these languages excel at. The machine translation engine for gcc represents its intermediate form in a Lisp variant.  Forth is often the first language ported to a new embedded controller simply because it is easy to port in a small footprint as well. 

Smalltalk, being of the Lisp camp, has both the most regular syntax and in some ways the most rigid. It has a whopping 5 rules, and flow control involves no special forms. It also lacks the meta programming capabilities of Lisp or Forth, and runs on a more complicated virtual machine.  It has all the hallmarks that would seem to make it ideal for programming by novices, as it was designed with that in mind, but lacks popularity as it fails to integrate cleanly with the world of C. By this point, it should come as little surprise that after years of languishing, Objective-C, which marries a Smalltalk runtime with the C language, adding yet another inconsistent syntax to C, is more popular than Smalltalk itself. 

Much of what is said about syntax by people comparing programming languages is sour grapes.  There are only a handful of ways to define subordinate clauses in programming languages in use today. Each language lacks the richness of natural language, and the complexities of human thought. At best they are short hands, and at worst piles of tradition. Like most human languages they evolve by taste, necessity, and historical accident; probably in that order. New languages are popping up nearly month, but few venture into new meaningful territory.   Most are rehashes of ideas that have gone before, and few are as capable. Ometa is a good example, as it defines a language based on another, without having the ability to implement itself in practice. It is a beautiful meta-programming language which is only a symbiot and not a full fledged individual.   Other languages are far more parasitic like Scala or Groovy, which rely upon another language's runtime and conform to its restrictions, without  amounting to anything more than sugar. There is little of substance and much of theory.   And those, I would contend, are mere expressions of the sour grapes of syntax.