Lots of Little Functions

This past Friday, over lunch, a good friend of mine commented on my coding style. He observed that “you tend to write a ton of little functions, which are much smaller than most people would be comfortable with”. And the observation was based on some code I wrote earlier that day that could be broken down as follows:

As you can see, most of the functions are 4 lines of code or less (including function prototype definition), and most of the functions include a comment block describing how to use the function. The 2 line functions are all simply prototype and a single line of code, usually used to handle a callback for an associated asynchronous function call (which contains the comment covering the pair of functions). The average number of lines of code per function is 4.45 including comment lines!

When writing this code, I made little to no effort to write this way. Years of practice and measurement have made it second nature. But note, it took years of conscious effort (i.e. practice) and building a set of tools that analyze my work output (measurements). The act of intentionally writing well factored code produces a handful of side effects:

These behaviors are emergent. Smaller functions both make it easier to test and understand each individual unit, but they also force you to develop approaches for modeling the complexity of having so many named concepts in play. Often this means a greater attention to the design of state machines which operate on a higher order of organization. The interplay between these state machines eventually gives way to a focus on the design of interfaces and messaging. The communication between two or more state machines also becomes another form of state machine, in which the data is the program.

In fact, the code in question, contained an entire indirect threaded interpreter for handling the transitions between states for a number of state machine which modeled complex interactions between asynchronous callbacks. This finite state machine model even supported the ability to make subroutine calls and inject continuations. This high level architectural model allowed for writing functions that looked like Lisp (but were just Python data structures), which would then be evaluated sequentially. The callbacks were each written to advance the state of the state machine by invoking a context relative _next() instruction similar to an indirect threaded FORTH. And while this complexity may seem difficult to comprehend, the implementation consists of a 6 line function _next(), and a 7 line function _eval(), and judicious use of these two functions across callbacks and within higher level functions.

When it came time to test this code, aside from a couple of typos which were quickly caught, the code worked correctly the first time it ran. More importantly, the initial test cases had state machines which needed to successfully navigate chains of 10 to 20 callbacks, before the basic test operation was attempted. I would be hard pressed to write this in a CPS fashion and achieve the same level of correctness. The reason is that it becomes harder to reason about state the more hoops one has to jump through. By only having the callbacks advance the finite state machine, it separate the concerns between code which managed state, and code which advanced it.

Writing lots of little functions, shifts how you decompose problems and compose solutions. It becomes easier to identify what is important and what is merely a product of your own confusion. With careful measurement, it can also help reduce the amount of complexity you bite off at any one stage. This provides opportunities to explore different forms of architecture, as your components are reliable and comprehensible enough to be recombined in new ways. This can lead to significant design improvements by exploiting the new found flexibility.