Learning to Design

Process and Limitations

Most of the problems found in the design of software are artificial. Aside from resource constraints, which are limited by the available technology and the fundamental forces and structure of the universe, all other limitations are the product of the mere aggregation of design decisions. When you start learning to program, many sagely and experienced programmers will encourage you to learn a language with a wealth of APIs, 3rd party libraries of code, that will make producing a solution easier. So the common wisdom goes, it is more important to start producing solutions than it is to understand how they work.

Risk Aversion

Part of the motivation for this train of thought lies in the business realities most programmers face on a daily basis. In order to reduce risk, it is better to reuse an old design that is known to work marginally well, than it is to attempt to build a better one. As it is easy to measure the cost of an existing system, but hard to measure the actual cost of the new design, business logic will favor going with the tried and true. This is the appeal of operating systems like Windows and Unix, as it is the appeal of languages like Perl, Java, and Smalltalk. As most of the design decisions concerning how most of the software works have already been made, there is far less room for an inexperienced or foolish programmer to screw it up.


But what are the trade offs of relying upon these tools? In what ways do the choices made by other people impact your finished work? In yesterday's post, I detailed a process for designing software, in which one repeatedly measures and tests the outcomes of their design decisions. When it comes to measuring software, there are a few important categories of measurements that should always be tracked each step of the development process:
  • How many lines of code were written? ( total, per term, per function, per statement, etc)
  • How many terms are defined? (variables, functions, classes, methods, etc)
  • How much processing time are required by each functional unit? (function, thread, etc)
  • How much memory is required? ( for processing, for storage, for I/O buffering, etc)
The first two sets of measurements deal with the complexity of the solution from a reader's point of view, the later two from the machine's. By using some general rules of thumb, you can judge your designs against reasonable metrics. Remember that these are not strict rules, but rather guidelines derived from real world experience.

Lines of Code

While many people attempt to compare programming languages by comparing the lines of code produced to solve comparable problems, these comparisons are generally disingenuous. As a result, there is a general sentiment that counting lines of code is a poor metric by which to judge code quality. And this is true. When used to guide your design, on the other hand, lines of code counts can be invaluable for identifying components in need of redesign:

Rule of Thumb 0: One line of code per term defined is the ideal.

Rule of Thumb 1: Write no more than 5 lines of code per function on average.

People in general are incapable of holding more than a handful of ideas in their heads at any one time. As such, breaking your problem down into digestible chunks makes it much easier to learn. And as with the English language, programming languages which allow for multiple clauses to be chained together into more complex sentences, also allow for lower line counts.


As you reduce the number of lines of code per function, you will find that the number of user defined functions will also increase. At a ratio of 5 to 1, a 1000 line program will have approximately, 200 or so functions defined. The size of the vocabulary of the project is a good indication of the complexity of your solution. By factoring out common elements, common routines, and common idioms, you effectively compress your source code.

Rule of Thumb 2: The ideal solution uses the smallest number of definitions.

The size of the vocabulary also determines how hard it is for a new programmer to learn how your software works. In English, for example, 80% of all written text uses the 2000 most common words. A college educated American will typically know 15,000-20,000 words well. It is difficult to learn more than 8-10 or so words well per day. Software which consists of millions of lines of code is effectively incomprehensible software, as it would take years to learn how it all worked.

Rule of Thumb 3: Define no more than 2000 terms in a single application.

The lower limit of how much complexity can be encapsulated in a single line of code and the upper limit of how many discrete concepts a programmer can easily learn and understand, define the range of well written software. However, these limitations are not nearly as constraining as one would at first think. Since definitions can build upon each other, the complexity expressed grows geometrically with each term added. With 2000 definitions, the total number of possible combinations is so large as to be practically incomprehensible.


Once you've managed to keep your code base to a reasonable size, and you can explain what each term means, you're ready to start worrying about performance. Many of the performance problems we encounter in software design are rooted in decisions made by the hardware engineers. For example, if your processor runs at 2GHz, and has a 19 instruction deep pipeline, each instruction will take between 500 and 9500 picoseconds to execute. These represent the lower limit for the CPU is physically capable of producing. When you read manufacturer's benchmarks, they will typically tend towards the idealized 500ps number, and shy away from the 9500 picosecond number.

If you choose to write software for a popular preemptive multitasking operating system, you also choose additional performance trade-offs. For example, depending on your processor architecture a Linux kernel will make a task switch in between 24µs and 240µs. These numbers are derived less from what the CPU is capable of, and more dependent on the speed of the memory bus and size of your executable. For a high performance webserver, where 10,000 requests per second are expected, 24µs is a substantial portion of the 100µs of available time for processing per client.

Rule of Thumb 4: Design based on observed timing metrics, not theoretical performance.

There are many profiling tools available to monitor the performance of your software. Some tools like Apple's Shark allow you to attach to an already running process and monitor its progress over a controllable span of time. Other tools allow you to monitor performance over the entire life of the program. It is important to remember that external monitoring tools do impact the performance of your software, and can introduce their own source of errors. Similarly, adding performance tracking and debugging routines within your application can also directly impact performance.

Once you establish what effect your monitoring has on the performance of your application, and can reliably predict the impact different changes to your software will have, you can then design with performance in mind. Relying upon 3rd party libraries or advanced compilers to provide performance will guarantee failure to produce efficient code. Similarly, focusing on algorithmic efficiency without empirically studying with real world constraints will also result in poorly performing code.

Rule of Thumb 5: Optimize critical paths first, and be wary or 3rd party libraries and optimizers.

For example, on one recent project a microprocessor emulator I was writing was displaying a virtual performance of 8MHz, well below any realistic measure of what it should have done on a 1.8GHz host. Removing a single debugging statement which periodically logged performance to a file increased the observed performance to 24MHz. This line consisted of a single standard library fprintf call, yet resulted in a 300% improvement in performance. Running the code through the profiler, I noticed that one of the 3rd party library routines which periodically checked for skew between the host and simulated system clock was taking way too long. Examining its implementation, I noticed it used a usleep, or "microsleep" command to ensure synchronization with the hardware clock. Removing the routine resulted in an additional performance boost from 24MHz to 50MHz, an additional 200% increase. Rearranging some code such that a series of functions in the critical code path to take advantage of compiler optimizations increased the observed performance gain from 50MHz to 75MHz, a 50% increase. And applying some further compiler hints as to which functions could be compiled inline boosted performance from 75MHz to 100MHz, a 33% improvement.


It is not terribly unusual to hear programmers say silly things like, "who cares memory is cheap!" They say this especially when trying to justify using programming techniques which supposedly achieve performance at the expense of more memory consumption. A whole host of technologies, from memory models, to heap allocation routines, to automatic garbage collection have been created to free the programmer from having to think about how they use memory. In fact, the trend in programming languages has been to remove any concept of RAM from the programmer's experience.

Unfortunately, for new programmers trying to learn to program actual computers, these layers of abstraction present a grossly distorted view of reality. Raw memory may be cheap in economic terms compared to years ago, but it is not raw memory that a programmer should be concerned with, it is memory bandwidth. Within a computer, there are different speeds of memory. Historically, remote storage on a server was the slowest, with local disk being the next slowest. Once data has left on disk cache, over the main system bus, it makes its way to main memory, also known as RAM, or the heap. From main memory, it travels across a higher speed memory bus to the CPU's onboard L2 cache memory, and from the CPU's L2 cache to the full speed L1 cache, and eventually to the register file. Each of these memories have their own size constraints, bus speeds, and timing latencies.

Rule of Thumb 6: Data structures should be designed with cache size and memory bandwidth in mind.

Rule of Thumb 7: Data should be laid out in memory so that it can be accessed sequentially, not randomly.

Developing an appreciation for the physical limitations of your system's architecture is probably the hardest part of learning to program. Once you know your CPU's instruction set, have a firm grasp on instruction timings, and know your way around your tools, learning to design with your hardware, and not around it, takes a lot of practice.

For example, most modern Intel CPUs have 8 cache lines, this means that the can address at most 8 pages of memory simultaneously. Each cache page of memory consists of 4kB of data (128 cache lines) and each cache line holds only 32 bytes of data. At any point, the CPU will probably need 2 of the 8 cache lines for accessing instruction memory, and another 2 for addressing the return stack and heap. As such, your active data structures should ideally fit within 16kB (4 pages), and address no more than 4 distinct 32 byte regions at any one point in time. It is only by designing software with these limitations in mind can you avoid wasting time and energy on unproductive designs.

Of Art and Engineering

Mastering the art of computer programming is more than learning the basics of software engineering. While the above rules of thumb provide good guidelines for making software engineering decisions, they speak little to the artistic side of the endeavor. More than just functioning within tolerances, software must also communicate the author's intent. It is easy to render code which function, yet which is incomprehensible to everyone. There are even competitions such as the IOCCC to render the most illegible yet functional code possible.

Learning to design software which is easy to read and maintain requires particular attention to the artistic side of programming. Engineering discipline alone is not sufficient to write sustainable code, rather it is a function of how well one marries one's native tongue to the formulation of the solution that determines the code's long term viability. Algorithmic elegance never trumps eloquence when it comes to well written code, just as documentation is never a substitute for legible code.