DAVE'S LIFE ON HOLD

Designing Software Systems for Fun and Profit

An ever increasing amount of my time is being spent documenting the design and operation of complex systems. And as I try to convey the rationale behind each design decision, I find that the terminology we use to discuss complexity to be inadequate. Part of the problem is most audiences glaze over once you mention complexity. The next most common breakdown occurs when the topic of O() notion enters into the conversation. You can talk in terms of constant, linear, logarithmic, or exponential, but intuition in that space is severly lacking in general.

So the net result of all this is a conversation that goes like this:

We have two solutions. One performs a linear search of the data set and generates a lookup table, and then applies a transformation to the entire data set in a second linear scan. This transform is in constant time due to the lookup table. The second solution partioons the data set at population time and then performs a exponential sweep of each partition.

The math thinkers then fall into O(n) vs O(n^2) and deem the second more complex. The code thinkers look at the first solution and correctly deem it to be twice as much work and deem it much more complex to maintain, but the second one's partitioning scheme is too scary, so they'd rather do the first. The systems thinkers realize that when you build the lookup table vs the partition sets, the lookup table can not fit sufficiently in ram for large n so it slows the entire process down to the speed of ram. Negating any algorithmic advantage, as it will trash the northbridge with random reads. The lookup table must either be generated on all modes of the cluster or synchronized along with the data to each node also making it a pain to distribute. They like the second one though as the partitions can be made to fit in L1 cache and run at CPU speed for n will be always kept small. The service can also be kept stateless making it easy to distribute.

Each group views the complexity in it's own domain correctly, and each is making an adequate evaluation from their area of concern. But they come to very different real world outcomes. In the end the mathematical model requires real world constraints. Emperical evidence trumps conjecture. But measuring complexity in the real world is seldom simple.