Statistics and Programming
We are a profession of amateurs. If you look at the average developer in just about any shop, you'll fail to see a single instance of the "software engineering" class of development. By this I mean, the development methodology will lack any empirical basis for the design decision, and little to not attempt to validate one's models against real world requirements will involve quantitative metrics (which would be oxymoronic were it not for the vast number of fuzzy metrics one finds in "user stories").
Part of the blame is on the sad state of software engineering in education. Few academics in computer science are rewarded for theories that work in reality with today's economics in mind. In fact, focusing on computational work that is repeatable and measurable opens you up to actual review by peers who might actually run your code. As the average PhD student will never need defend his thesis on the grounds of "it can be made to work more profitably on today's consumer electronics", any such thoughts of practicality which may float into consciousness are quickly discarded for more esoteric shiny (which will serve as the basis for entering the club of professorship). At no point does one need to justify their opinions on idealistic computers against the cold reality of silica and heavy metals.
A few years back, I did a comparison of 4 different web servers. I had been for years running a web application server of my own devising. It was proven to serve 6000 simultaneous requests per second for the typical payload of that application. The trick it used was a state machine that modeled the change of the system as a function of time with a 200ms period. This allowed it to internally cache each result for 200ms, and a worker process managed the update of each world element. The resulting server spent 78-83% of its time in system calls handling socket events and writing data. Based on this set of expectations, I wanted to evaluate how the competition stacked up:
- Apache + mod_memcache
- Perl + tmpfs
The test would be as follows:
- Each webserver would be tuned as documented and measured with AB
- A range of network buffer sizes would be configured in the kernel, and optimal settings based on AB results would be used
- a static file would be altered on the same directory, it would be the same size but the contents would change
- 10-1000 curls would be run on machines on the same subnet
- total bandwidth used would not exceed 20% of the capacity of the interface
- RAM, open file descriptors, and disk would all be kept identical
- 1 in 1000 requests would be malformed intentionally
The results were telling:
- Apache = 1800-2000 requests per second, <1% request failures
- Nginx = 200 -4000 requests per second, up to 90% request failures
- Lighttpd = 800 -1200 requests per second
- Perl - 4000-4400 requests per second <1% failure
Why was Perl the winner? Because with Perl I cheated. With Perl, I had it remounted the directory as a tmpfs RAM file system and then used a 5 line Perl script using the old Net::HTTP::Daemon module. Funnily enough Perl was the only one using select() on the file descriptors, all of the others used epoll.
What did I discover in all this?
- epoll only matters if you are doing over 1000 concurrent connections per node
- no real application can deliver that much real content anyways
- Lighttpd naively read files from disk each time, and mmap is pure win unless you guarantee your buffers are in RAM
- Multi-threaded single process models do poorly in the event of failure, Multi-process single threaded tend to do much better, but it relies upon clients not sharing processes.
- pre-fork does wonders especially if you can share the cache among processes via COW semantics.
- sendfile vs writev always favor writev, it does the intuitive thing, and io_vecs play very nicely with most web response formats.
Looking at the design of these systems a few years back, I realized that my application servers had spoiled me. Having coded them in an "unportable fashion" ( I only ever ran them on Linux, FreeBSD, and Mac OS X), I was able to write software and tune it for the hardware I had. My server had tuning scripts that would configure kernel buffers to have the right size for the application based on observed characteristics. Because I had written the metrics and tooling together, I also had the server log data relevant to tuning it. Keeping the application server current and optimized became a cron job. Going through and attempting to tune other people's servers was much too hard, because it requires patching and running in a fake environment to collect enough data.
I'm soon going to release another application server, powering applications driven by WebSockets and canvas. It too has deep introspection built in. Tolerances are well known, so much so, that I've been using it to characterize problems with popular we browsers. (hint Chrome cheats and is unstable under load, Safari maxes out but reaches a steady state). In addition to the metrics, load testing, and call graph analysis, the development process of this server revolves around making small changes and then measuring their impacts each step of the way. The observed behaviors are all constrained by a simple price point of 2000 concurrent users per standard server with a maximum throughput of 20k messages per second . The only way one can meet that design goal is to measure and repeat.
The process of software engineering is an art of cost / benefit analysis mixed with the creative application of analogy. Design decisions based on fashion or popularity does little for the bottom line. And real programmers ship profitable code.