DAVE'S LIFE ON HOLD

Some Thoughts on Unstructured Programming

Since the days of yore, when Edsgar Dijkstra rallied against the dragon of unstructured programming, it has become common place to assume structured programming is essential to best practice. Gone are the days when programmers manually computed offsets to jump within programs. Long dead are the games masters played with non-local returns.

Now we let the machines do it.

Wait what!?! We let the machines do it? What sort of insanity is this?

Yes, we let the machines do it. Modern day compilers pull stunts that would get your hands slapped if you ever tried it. They produce insanely complex chains of non-local exits with exception handling. They elide entire control structures, unrolling loops, optimizing away entire branches, and will gratuitously inline entire functions. Stack frames? Well don't count on them, you may never even touch the stack if enough registers can be preserved between calls. And JITs? Well those are just fancy words for self-modifying code. Seriously, you shouldn't write self modifying code because you can cause cache misses. Don't worry, your JIT uses cache coloring to work around those problems for you.

When you look at how the sausage is made, unstructured programming is alive and well in the assembler output of modern optimizing compilers.

People accept this because they never read the assembler output, never debug compiler bugs, and never question the wisdom of the compiler writers. After all it is taken as an article of faith that the machine will write better assembler than a human, why question it?

I think there is a weird sort of hubris surrounding our tools. We take too much pride in our clever tools that our think us. Millions of rules of optimizations applied against every line of code will automatically produce better results right? Take pride in our complex creations, for they will obsolete the artist. Revel in the ability of the compiler writer to compensate for the mediocre programmers in the cubes next to him. Marvel in how clever this code is! How meta-great we are to create such incredible tools!

That is until they fail. When I'm working with compiled code or even JIT code in a dynamic language, I invariably will run into a bug in one of these beasts. Often these bugs will only become self evident once you test the code on multiple versions of the compiler. It is only after seeing it work on multiple compilers that people will believe the believe the bug lay not in the code but the compiler. I have run into these bugs about once every 18 months for the past 14 years with compiled languages, and with more dynamic JITed languages about every 6 months.

I've encounter C++ compilers failing to compile nested templates, inline functions failing to inline serially, loops failing due to busted static analysis, and exceptions caught in the wrong scope. I've had JITs never produce a single collectable piece of garbage, miscalculate basic additions, generate invalid type coersions, and crash with empty function objects. All of this code compiled cleanly, ran on multiple language implementations, and conformed to the language spec, but the code didn't work.

Each time, I knew the code was right, but suspected the compiler to be wrong. Inspecting the memory dumps of the compiled code usually revealed the horrific quality of the machine code produced, and the failure of the compiler to recognize that some optimization or design shortcut was inappropriate. Often it was a result of multiple clever algorithms having unintended side effects when applied together. But mostly, it was a failure of the imagination of the compiler designers themselves. They never imagined a suitable test case, because they never imagined the code I wrote.

Early in my career, I always believed my code was in error. I trusted the compiler to be right, and modified my program until it worked. As I got older, I ran into more and more problems with compilers and interpreters, and got my hands dirty with their internals. I lost faith in them. Then I started consulting, and in a different company every 3 months, I saw plenty of reasons compilers fail. I saw old code built under a version of GCC which relied on the old string implementation preventing us from using fixes for compiler issues. I saw libraries which needed to link against binaries for which the source had long been lost. I saw ports of code from other architectures using memory overlays, which had to retain the data and alignment characteristics of the original COBOL code. I saw the horror and the reality of mission critical systems that you rely upon every day, and that the cost of fixing it often exceeded the profit margins of the companies maintaining these systems.

Recently Apple had a TLS bug which would have been caught via static analysis if and only if you added the -Weverything flag that means what people expect -Wall to mean. Sure dead code elimination will also optimize these bugs away from production binaries, so if you looked at the compiler output, the binary for the unreachable segment would not exist anywhere in the system. The question of whether you can run -Weverything and still get a functional build should pop into your head. Sometimes code will for historical purposes violate several options doing unsafe things but still work. If you are working on a critical system that generates a sea of warnings, but must ship every release, you will always choose to ship over cleaning up the code. If you work in a sufficiently complicated subsystem that is always undergoing threat assessment, you will also never have the time to fix all those warnings.

And it should be remembered that in any legacy system with over two decades of development history, dead code is everywhere. Often code is patched, and repatched over the years resulting in branches which will never be taken. In many code bases, Both published and unpublished control flags are used to enable and disable access to various paths. Sometimes, these pathways exist only for legacy purposes, and are only utilized by old code. Sometimes the dead code is a result of a elimination of a constant in a header file. Sometimes the code is undead, not possible to eliminate through static analysis, but still never taken in practice due to environmental factors. The larger and older your code base, the more difficulty you will invariably have bringing it up to the state of the art.

And the state of the art is still buggy. Just because static analysis by a fancy optimizing compiler complains about your code doesn't mean your code is wrong. In C, You can safely use negative indexes in arrays, index into static 0 length arrays, and play all sorts of pointer math games, if you are also managing memory yourself. If you are writing your own generational garbage collector in C, you are going to run afoul of most of these well intentioned warnings. Why? Because you are not writing general application code, but system level infrastructure. Most warnings are targeted towards "generic" code, t sort of code written by average programmers in daily work. You write C the C way and you'll know the sort of code this means. This code is structure heavy, pointer adverse, and filled with flow control structures. In general, 90% of he code out there is the sort that benefits from these warnings. But when you primarily write the other 10%, you become profoundly aware of the limitations of your tools.