The Preminence of Vocabulary

A few days ago I was writing about some of the fundamental concepts that keep my brain continuously turning. Probably the most important of these is Vocabulary. Most programs fail at legibility not because the programmer abused the language syntax. They fail because the programmer gave too few clues to his intent by both not refactoring and by not choosing his words carefully. When you write a program it is for two primary audiences. The first audience is you and the second is your fellow programmers. You need to be able to inspect it, understand it, and maintain it. If you can't do this, then your comrades-in-arms won't be able to save you either.

The secondary audience for the program is the compile/runtime environment which must translate your instructions into some sort of performant machine representation suitable to your chosen architecture. In other words, the program you write is used by another program to write a new program(s) that the machine will interpret. If your language targets an intermediate high level runtime environment or abstract virtual machine, you may find two or three additional layers of translation between your program and the bare metal. JIT trace compilers will actually modify the machine representation based on the paths actually taken. Specializing OO compilers will generate new versions of branches based on the types of objects in use. All of these translation techniques further separate your mental model from the underlying reality. The point is not worrying about optimization is futile, but rather that is must take a back seat to more primary concerns such as writing such that you will understand what you wrote 6 months from now.

One way people have tried to make this easier is by naming conventions. And if you follow these your methods will be named just like everyone else's and will be no better than the status quo. Conventions have a funny way of failing, they lull you into an expectation of uniformity where none actually exists. Look at .length, .length(), .size, .size() in Java. Depending on the implementation of a class or runtime construct (for which no proper class exists), it can be any of the above. As a convention, it isn't terribly consistent as some classes also offer a third contender .capacity, .capacity() which are the size or length of the actual thing, not just the subset you are using. All of these things exists to allow a programmer to micro-manage some detail of the implementation on a specific instance ofthe VM. This seems like a good thing if you need to optimize for memory footprint or array performance. But what it botches is providing a consistent interface with well defined words that do the right thing every time.

Take .length property access vs. .length() method invocation. The original argument for .length is property access is fast and avoids a potential vtable lookup (or some nonsense to that effect). The use of .length() should only be used when the size is unknown and requires computation like walking a list (or similar nonsense).