DAVE'S LIFE ON HOLD

Doing Something Wrong

So how do you know when you're doing something wrong? The first indication is probably going to show up when you do an average lines of code per function analysis on your code. Currently, Phosphor's complete code base is hovering around 480 lines of code, and has around 220 functions. This means that it contains slightly more than 2 lines of code per function on average. If javascript were a better language, we could drive this down to under 2, but there's still a lot of API wrangling that goes on.

x.x million lines of code


So why do we have projects that consist of x.x million lines of code? Is it because these projects are so irreducibly complex that there is no better representation of the solutions? Is it because the programming languages we choose are insufficient to the task? Are the engineers simply incompetent or just perversely incentivized towards excessive verbosity? My guess it is a fair smattering of each of these, but more due to habit and custom than anything else.


When you look at what a x.x million lines of code project represents, the irreducibly complex argument is doubtful. Consider that a line of code is approximately 80 characters. Even with an excessively verbose language, that should easily be able to contain 10 terms each, which is more than sufficient to define most operations. Since each term itself may represent


terms per function = average lines per function * terms per line
10 to 50 = (1 to 5) * 10

some where between 10 and 50 terms, we can increase the number of operations represented by an order of magnitude every 1-5 lines of code. An application that consists of 200 functions, represents between 200 and 10^200 operations.

100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 operations

A program with x.x million lines of code, should have x.x to x.x/5 million functions, and would represent between x.x/5 million operations to 10^(x.x million) operations. To put that in perspective, if your machine were capable of operating 1 trillion operations per second, your computer would evaporate due to the heat death of the universe before long it executed every instruction once. Obviously, if you need x.x million lines of code, you're doing something wrong.

a different perspective


Rather than bragging about how many lines of code you write, it is probably better to brag about how many lines of code you didn't. In any project, you should quickly reach a critical mass of code at which all future revisions result in a smaller code base than the one that came before. This can be done through a continual process of refactoring and redesigning your code in successive revisions. Often you will find that once you refactor your code in one location, you can gain knock-on effects which require subsequent removal of then unnecessary glue and support code.


For example, today I removed the 2 switch statements from the Phosphor code base, moving them into a new HotKeys object. As a result of that change, I was able to remove all of the hotkey tests from the Text widget, and run them only once per event cycle. This prompted two changes to the state machine, which resulted in another 20 lines or so of code disappearing from the code base. Not only did the user gain the ability to add new hot keys at run time, but also typically over 5000 function calls per second were avoided.


By reducing the size of your code base, you make it easier to re-read, easier to maintain, easier to re-write, easier to re-factor, and easier to explain. Even if you suffer a slight performance decrease due to a refactoring, it is rarely worth the added complexity in the final code to retain the optimization. In fact, any optimization which comes at the expense of comprehensibility is better considered a de-optimization towards the lifespan of the code. When considered in the context of the life span of the software vs. the hardware on which it runs, any such optimizations may only be of temporary value, but impose great cost in the long run.