What is Wrong with Web Frameworks

In my line of work, I spend a considerable amount of time reading the implementations of new web frameworks. Part of it is due to simple curiosity as to how other people have solved common problems, but mainly because I build prototypes of new products for a living, and am responsible for evaluating the suitability of new off the shelf components to reduce time to market. Over the past 2 years, I've professionally (not hobby) built systems that went not to production in 13 programming languages on 10 different platforms. For every system I built something in, I probably evaluated a half dozen possible technologies, and the only way to give anything a fair shake is to actually use it to build something. You don't really learn much about solving your problem doing this, but you definitely learn how the assumptions of other people impact your ability to meet deadlines and deliverables.

Web frameworks are probably the single most common piece of software I've been asked to evaluate. You'd think that after 17 years of programming on the web, I'd have developed a favorite and identified a "best of breed" solution which I would recommend as the "go to" platform, but nothing could be further from the truth. This is not because I haven't tried out dozens of frameworks in dozens of languages. Nor is it that many of the existing solutions won't solve the problems they were built to solve. It is because every solution I've found makes fundamentally the same set of assumptions about the nature of web programming, and at the heart of the matter something has gone horribly wrong.

The problem is that every web framework is focused on building web pages. More specifically, the creation and manipulation of DOM elements and the associated style information as defined in the CSS style sheet. Basically the entire purpose of most web frameworks is to generate a text file and send it over the wire to a web browser. Behavior, style, and most content are typically delivered from a collection of static assets, and are treated as auxiliary to the main asset production which is the text string that contains the html. And it is this focus on generating HTML that continues to hamper the development of a robust web platform. In my experience, this is the core of what is wrong with web frameworks.

Even those frameworks that work entirely on the client side, like jQuery, YUI, and ExtJS, all suffer from the same conceptual model. If you build something through DOM manipulation, you are still focused on generating a document in terms of the structured HTML representation. If you go as far as model driven templates on the client side with Mustache, you are doing nothing more than string generation browser side, producing the same mistake over again,just using more of the client's resources to do it. (which is not a bad idea, as it shifts the burden of work to the edge where there is far more computing power). Newer frameworks which provide seamless integration between Javascript and the server side code, like Seaside (Smalltalk) or Ocsigen (OCaml) are a step in the right direction, but still at their cores require an attention to modeling your data effectively with the DOM.

To see why this is a problem, let's look at this website. If you break it down, everything in this website is just a "smart box" containing one of a small set of resources: text, images, sounds, and videos. I could add 3d canvases to the list, but from a semantic standpoint they are indistinguishable from the user's perspective from images or videos. Layout is done through a composition of boxes (aka the web browser's box model), and all style is done as a manipulation of the properties of one of those 4 primary resource types. All behavior in the system is either intrinsic to the resource type (videos and sounds have a temporal aspect to them), or are interactions between the boxes and the mouse, touchpad, or keyboard. Time itself is one of the most overlooked aspects of any web content but is intrinsic to the viewing experience and the defining characteristic of our interaction with two of the most compelling media types, sound and video.

If you look at most web frameworks they utterly fail to capture any of these key concepts. Rather than designing flows, and user experiences as a fluid transition between states, we generate static pages and HTML. Rather than laying out text, images, and videos, we construct pseudo hierarchical data structures in a serialized textual representation. And rather than editing content and designing transitional effect that lead the user both visually and cognitively through a mental landscape, we attach behaviors to DOM element (or worse schedule event callbacks). Meanwhile we struggle to ensure that no changes to shared assets like style sheets, images, or video formats break the behavior of any of the other extant pieces of our web applications. In fact, our global style sheets are designed with a view towards violating any notion of encapsulation and componentization by recreating a fragile base class scenario in every web application.

I've tried to address these short comings in Phos. I've defined Resources (image, sound, video), Controllers (keyboard, touchpad,mouse), and Boxes, but even then I haven't taken it far enough to define the sorts of non-linear event flows necessary to build compelling interactions. Something I hope to fix the next time around.