Language vs. Syntax vs. Vocabulary

The Scheme Steering Committee recently released a position statement. Over all, as someone who has developed commercial applications in scheme in the past, I can appreciate the reason behind this statement. However, there are a few bits of it that stand out as symptomatic of a misconception of what and how languages should be designed. For example:

It is almost misleading to call Scheme a "programming language;" it would be more accurate to characterise Scheme as a family of dialects, all loosely related by the common features of lexical scope, dynamic typing, list structure, higher-order functions, proper tail-recursion, garbage collection, macros, and (some form of) s-expression based lexical syntax.

What the authors of this statement are asserting is essentially that scheme is a language without sufficient vocabulary. The argument is that the R6RS specification does not so much define a language but rather the formal structure upon which a language could be built. The problem with this statement is it is indicative of a larger problem with the perception of programming held by most programmers.

Grammar + Syntax + Vocabular = Language

The meat and potatoes of the position statement is that Scheme needs more of an extended vocabulary such as the module systems of Perl, PHP, Squeak Smalltalk, Python, and Ruby. Which have less to do with formal committee based language design, and more with the informal, ad hoc, use of the language by native speakers in their day to day activities. Or in the words of the position statement:

More importantly, libraries or modules of code, such as URL parsers, network-protocol stacks, regular-expression implementations, and so forth, wind up stranded within the realm of a specific implementation – which, in turn, means that Scheme programmers are constantly building "from scratch," rather than being able to benefit from the fruits of each others labors.

While building such an archive system is to be applauded, and in today's heavily networked open source community not terribly difficult, such a solution is not a panacea. In fact, even among those languages which have long held publicly managed name spaces like Perl, huge swaths of code in the repository is often out of date, broken, incomplete, or incorrect. As the implementation of Perl5 and Perl6 are still to this day moving targets, as well as, the myriad of "Core" CPAN modules that are often necessary to get a system running, the mere existence of a resource doesn't mean that you should use it.

Now try to normalize the core language across multiple vendors, create portable code across multiple schemes, and provide a coherent vocabulary that can be used across all Large Scheme implementations, and you run into the same problems faced by the various Smalltalk vendors. While the task is not insurmountable, considerable effort has been put into portability by projects like Seaside to run on the 6 platforms it supports. Read through the coding conventions for the project and you will quickly realize how rather fundamental concepts such as strings and integers lack portability.

Write from Scratch

I'm going to more than play devil's advocate here, and I'm going to flat out advocate an opposing point of view. Write from scratch. Don't pretend that you can write portable code. Don't convolute your language to accommodate other people's idioms. Don't reuse other people's code when ever possible. Write from scratch.

Each time you work on a project, start with a blank page. Think about what you would like to say, and how you would like to say it, and then say it. Don't beat around the bush, pouring through mounds of inaccurate documentation, searching for a 3rd party library that will meet some of your needs, learn how to solve your actual problem yourself. Learn how to create your solution by coming up with a hypothesis, testing it in real world settings, and revising your mental model accordingly. Prototype, test, revise! This is both how the scientific method and the design methodology work, and how we learn.

And read for gods' sake

What is wrong with the steering committee's view of language is that it views language as features, as libraries, and as algorithms. The statement equates the language (incorrectly I might add) with a shark (only about 2 dozen species are obligate ram ventilators). A language does not need to move to live, rather it merely requires people who use it day to day. A programming language, by the very nature of programming, largely consists of synthetic vocabulary. We make up words and give them definitions as we go along. A successful language makes it easy for people to add new words to the general lexicon of all those who speak the language. It is in this respect that things like CPAN have been phenomenally successful, and not because there is a plethora of high quality libraries. (The quality of most modules is actually quite low).

More importantly, the ease with which new programmers can read other programmer's source code, directly influences how successful a language becomes. Compare the rise of Python and Javascript over languages like Smalltalk. Smalltalk has simpler syntax, is generally written such that English speakers can understand the code, but yet a language like Python can eclipse it in terms of popularity, why? The reason is that its syntax is more similar to the other popular languages. Java, C, C++, PHP, Python, C#, Perl, Javascript, and Ruby all share a common heritage when it comes to syntax. All these languages are members of the ALGOL family. More over, the style of ALGOL has influenced the style of the description of algorithms in textbooks and academic papers. Even the BNFs we use to describe the syntax of most languages, were derived to describe ALGOL. Since we teach kids in school this style, it is familiar. We then teach them these languages in this familiar style, these languages then become popular. Even though this familiar style is totally alien to every other human language we use in every day life.

So if you're writing from scratch, why should you read other people's programs? Because that's the quickest way to learn how things work. Being able to read is a prerequisite to learn something written in a language. Learning to read scheme is actually the greatest barrier to new programmers mastering the language. Because it's syntax is unfamiliar, due to not matching the normative syntax of algebra taught in schools, it requires a high degree of retaining and reading large amounts of scheme code to learn the language. Not only must you learn what the program does, you must also learn how to read the syntax of the program. As such no amount of new vocabulary will make it any more popular. In fact, adding new unfamiliar vocabulary to an already alien language will do little to breed new interest in scheme.

The problem with languages like Klingon, is they don't have enough words.

In my view, the Scheme Steering Committee is basically making the argument that "Klingon would be more popular if only it had more words." The goal of Large Scheme to add more standard vocabulary to the language is exactly the same as the proposition to add words to the Klingon language. In reality, the only way Klingon would become more popular is if:

So short of NASA actually discovering actual real live Klingons out in space, we're probably only going to see an uptick in Klingon popularity if there were a government sponsored KSL mandate, and someone opened a Klingon-only Star Trek theme park. This is the same reality that marginal languages like Scheme, like Forth, and like Smalltalk, will always find themselves facing. There will never be a sufficiently high performance or economic incentive great enough to sway people from languages that are more familiar, due to the synthetic familiarity of having learned it as kids in school. Once you've trained a kid to read one language, all others tend to become hard to read, not because they actually are harder, but because subjectively they seem harder.