Building Better Systems: Part 5

Once you have identified the value proposition for your project, sold it to your management structure by creating a high perceived value among the investing group, and structured your team according to the abilities of each member, it is time to start logging data. For a system to work in reality, instead of just theory, a system designer needs metrics. Without the ability to know how a system will perform, very little design can actually be accomplished. Any sufficiently complex system will have sufficient hidden variables which will impact the functional component of a design. Building systems which can account for the unintuitive gracefully requires examining the behavior in a wide range of states. In order to examine those state changes we must first derive representative metrics. Let us say you have a web server with a hypothetical 400us response time for the average response. In theory, your inner loop will be capable of processing 2500 requests per second at 100% utilization. This is not a benchmark, but a theoretical performance limit. In reality, your web server will never be able to sustain 100% utilization. Other processes will be competing for resources, and a fair scheduler will restrict your CPU usage to well below this limit. The same is true of 10GbE, where TCP/IP overhead will soak 80% of your pipe on a good day, and other network services like monitoring, syslog, database connections, and backend service requests (nevermind DNS, NTP, etc) will all also share your pipe with your service. Theory here is limited by the extent of our modeling. While we could derive better models by examining the behavior of each component, we still won't have an accurate picture until we get some actual data. Taking benchmarks should not be seen as a futile exercise in dick waving. The perception that benchmarks can aid in purchasing decisions has been one of the greatest disservices the popular press has done for system engineers. The true purpose of a benchmark is to allow you to measure the scaling up from a development prototype to a production quality device. If in your dev environment you achieve better single unit performance than in production, that tells you something important about your scaling model. If in production you see a linear growth in performance when you add units this also tells you something important about the scalability of your system. The value of a benchmark is you can control for a set of variables and see how those variables impact the design. They do not tell you how great you design is, or if it works better than the competition. Empirical data gathering requires several difficult things to gather:

The first time you take a set of metrics, you want to run through your usage pattern a sufficient number of times at a rate that produces a stable state condition. If your system exhibits an inconsistent nontrivial number of errors during ordinary operation, establishing a base line should be associated with a known fail rate. Typically 99.9% success is an acceptable rate of failure for a baseline. If you are consistently seeing 1 in 1000 events fail across your system, you are probably pushing one of your system limits. This may be something simple like running out of file handles or process ids. Once you have a baseline that is pushing your system limits, it is time to start varying system level components and identifying how the system responds. Increasing RAM, tweaking kernel buffer sizes, and altering niceness levels will allow you to identify which parts of your system impact behavior. If you alter your TCP/IP buffer sizes and your performance improves, you can be certain that you are burning too many system calls. If you increase size and see a performance degradation, that probably means your MTU is the limiting factor. By tweaking each parameter in turn you can examine how changes in the environment alter system performance. This allows you to develop your intuition about how specific solutions to a problem my influence the stability and failure points of the system. It also gives you ample ammunition when debugging problems with a production system, as you can quickly identify the different types of behavior that result from various kinds of resource starvation. But most importantly, having metrics about how your system responds allows you to make design choices based on actual real world behavior. This is often a liberating experience as you will often discover that the bottle necks and the theoretical limitations are not your true limits. Often these kinds of benchmarks shed the necessary light to reveal a simpler solution using the features of the platform you chose to implement. In the end, no real engineering can happen without applying emperical data to the evaluation and design of your system.