Lessons Learned From Jawas
Jawas like Node is a C10k style application, which uses mmap, epoll, and kqueue to provide efficient kernel level signaling. Jawas aggressively caches results, and in fact will only calculate a given resources based on a fixed update period. For example, by default Jawas has a period of 200ms, meaning that any request for a given URI will return the same result for 200ms. If a series of POST or UT events update the state of a resource that change would only become apparent at the start of the next 200ms period. This design came out of a desire for consistency where many game clients would typically request a status update every 200ms (which is enough headroom for most network latency) and update accordingly. Conflicts between client requests would typically be resolved out of band, so the server never waited for backend state machines to update. Lesson here is learn to create a perception of consistency, even though you system state may be inconsistent. As long as you can resolve conflict within a the period of resolution, there is no conflict from the user's POV.
Since I had been building systems where thousands of concurrent users would be hitting hundreds of servers several times a second, I got how important maintaining a consistent view could be for performance. Consider the following:
10k concurrent users @ 1 request/200ms = 50k requests per second.
At 200 users per server, we needed 50 just to handle the load (running at a steady 60% of capacity). This meant that each server needed to be able to process 1000 requests per second. A request would have about 1ms on average to parse, dispatch, process, and return. Most servers in production today are 10x slower than this. Most static asset servers are on par with these numbers. The lesson learned here is you can get static asset performance at an acceptable cost by designing pseudo-static data sources. This is not the same as relying on dynamic caching, but rather you can gain the benefits of caching by designing data that has a known period of updating
So since we don't have a lot of spare cycles, we need to not only design such that all backend processing is cheap on DB access, etc. We need to design such that requests produce similar results for a significant percentage of our users. This is true of any app that relies on a coherent caching strategy to scale. You need to make sure your shared resources are doing work that can be shared, and your individual resources (clients' machines) are doing the personalization. Jawas was designed to scale this way. The lesson learned was shared resources are a precious commodity, but your users are cheap as free, push customization to the edge by design, never serve a unique asset.
Part of the reason benchmarks often broke on Jawas is that the comparisons never took into account this design objective. Sure Jawas could serve 6000 requests per second for a page that rendered the current time, only the time would have a granularity of 200ms. Sure the comparable Catalyst app would get you higher precision time, but as it would crap out at 40 requests per second, it would have an effective resolution of 25ms. This might be attractive if an 8x improvement in timer resolution was important, but you'd spend 8x in hardware, power, and rack space to get it. The lesson here was optimize for what you need, and know your trade offs. Better to sacrifice resolution than to pay for something you don't need.
Most of the lessons here are applicable to every system that is highly distributed. My original Connection Server used data ase backed serialized C++ objects, and had a in memory distributed object store (like an early precursor to Redis or Memcache). Jawas servers used even more aggressive memory cycling, and would spend 65-70% of their run time in kernel space. (avg in production was 68% sys time). When writev is the top thing you spend processing on, you know you've reached you limit to optimize out the problem. Since the first apps were Flash heavy, all of the presentation we did was client rendered. When we switched to HTML5, we persisted in the style of shipping data only. This scales beautifully, and performs fantastic especially when your screen refreshes are tied to the browser animation loop. And that's the key. Performance at scale is not about how fast you can serve an individual request or x000 requests/second. It is about. Creating a perception of consistency with reliable performance at a predictable cost. And that's a lesson most platform developers have yet to learn.