The Golang The Bad and The Ugly

Over the past year, I've been reading a lot of Go code. Projects like Docker and etcd are written in it, and any attempt to use these in a pre-production system really requires paying attention to what the code actually does (as opposed to what was documented for some prior release). And while these might not be great examples of Go written in a fully
golangish style, they are substantial bodies of code that are readable. I don't limit the practice of reading the code to only "new" languages like Go, but I tend to do this with any project that is core to a product I am building. To truly appreciate what a body of code does for you it is best to give it a read.

This past week, I decided to start an experiment to see what actually programming in Go feels like. Over the past 15 years, I have written production systems in 19 different programming languages. Having worked on everything from embedded systems, to mobile, to globally distributed applications, to machine learning, I've developed a fairly nuanced approach to my choice languages: "what is the client wiling to pay?" Some languages are a joy to work with, flexible, dynamic, meta-friendly and I've discounted my rates as much as 50% simply because the project would be fun and over with quickly. Other languages are overly verbose, boilerplate heavy, fragile and fussy, and I've tripled my rates for those projects due to the inevitable pain and suffering involved. When I approach Go, I don't really have any interest in seeing it succeed or fail, and I don't really care if there's a great community or not, I'm only really concerned with what works in reality and what is simply frustrating. Working in reality is what matters from a production standpoint, and frustration matters from a quality of life standpoint.

The project I chose to implement in Go was ConnServer2, which is to be a port of one of my old game engine servers ConnServer that ran a number of games in production over the years. Ostensibly, Go was written to write distributed server software for the modern era, and should be a good fit for ConnServer2. The original codebase was started in 2003, written in C++, it provided a real time chat and object cache, with the ability to dynamically load new modules and features at runtime. Some versions of the production code base could pull the Erlang trick of upgrading existing modules at runtime, transitioning data between versions on the fly. ConnServer used PostgreSQL as a blob store for the C++ objects that represented the game state, and managed the communications between the game clients (usually written in Flash or J2ME) and the bots which managed the game world (written in Perl, Python, Lisp, or Ocaml). The purpose of the server was to provide connections between the separate partitioned object spaces and chat rooms which formed the basis of the game worlds. Each server was responsible for one of 64k id spaces, each with 48bit object identifiers. Each server would load / store it's object cache dynamically based on usage, and all manipulation of game state was done through sending messages via socket connections to the server.

Obviously, having been written over 10 years ago, a lot of the concepts implemented in terms of message structure, routing, and performance were geared towards the limitations of the day. Flash's XML sockets allowed for sending null terminated strings over the socket connection, so that is what ConnServer used. Messages were URL encoded, because it was the fastest safe object decoded available to Flash, and JSON didn't exist. Routing was "room" based or point to point, because the system overhead of broadcasting messages to more than 200 concurrent connections on a $500 server took up 70%+ of the CPU (this code predates the Amazon EC2 beta). Things were designed to live in RAM, and to fail over gracefully. The system also had to support runtime upgrades, as we couldn't afford to disconnect all of the game clients every time we wanted to patch a feature. While most of the game logic was written in bots that could be restarted and tooled without impacting the production services, it was often necessary to add new object types or message handlers on the fly which required loading DLLs into the runtime's plugin system. ConnServer and it's sister program Jawas (think Nodejs before node existed), were capable of servicing 10k concurrent users on about $5k capital cost in servers a decade ago.

Times Change

Now computers are immensely faster, cheaper, and new standards exist. WebSockets (RFC 6455) provide a nice bidirectional link similar to the old XMLSockets, but use a crazily complex framing system. and pseudo encryption, and a half dozen hoops to provide security against browser based attacks. As such, they provide an efficient way to waste cycles preventing programmers from abusing sockets, but they are supported by all of the major browsers (where as Flash less so), and as such can be used. JSON now exists, so it is easier to spend all of the static bits of a JavaScript object, without those pesky behaviors, between clients and servers. Since the JSON parsers are backed into all modern browsers, it is fairly fast, and somewhat safer than eval('(' + text + ')'), and occasionally faster. Finally, computers are now fast enough that network traffic handling typically doesn't swamp the CPU (because you don't have the memory bandwidth to do so), and so we have a bunch of spare cycles we can spend moving data around our vast quantities of RAM (which we don't technically have enough bandwidth to do either but it's a theory), so we can probably switch to a modern garbage collected language and avoid having to write all of our own memory management routines again. So those are the major modernization efforts for ConnServer2:

All three of these are doable in Go, as there exists a WebSocket package, a JSON package, and garbage collection is built into the language. The core of the rewrite largely focuses around using go routines and channels to better distribute the workload across multiple actors each in their own process space. The original ConnServer code base was event driven, with each client or bot sending messages via sockets triggering a cascade of message sends. The ConnServer2 model will be similar, except that each message sending entity will be modeled as an actor. Each actor will have an inbox and and outbox channel and will have their main receive loop inside of a dedicated go routine. A top level actor registry will provide the means for different first level actors to send messages to each other by name. Additionally, both topic routed (regex based) and fanout (room based) will be supported in addition to the registry (direct map based) routers, so that multiple tree topologies can be created. Finally, in order to support directed graphs, nodes may be pointed to by multiple nodes, but only the topic, room, and registry nodes may have multiple outputs. These changes will enable making substantially more complex messaging topologies happen, making it easier for bots to do more things.

The Road to v1.0

Starting out programming with Go wasn't difficult. The language is just Algol in yet another funny hat. Whitespace is significant, but only in a couple very special places. Luckily for me, my standard formatting practice matches Go's standard format, so it didn't hurt. It took a bit of effort to get the simple test framework working with my test code, but it was more due to Go's quixotic directory structure decisions. It took me a couple days to find the bit of help which explained how it actually works, as a couple of the tutorials on testing in Go that I read skipped over such "obvious" details. The lack of a REPL for testing out ideas is still very annoying. Go run doesn't make up for the inability to quickly test and examine packages interactively. In languages like LISP, Lua, JavaScript, CoffeeScript, Python, Jython, Perl, Ruby, Smalltalk, Self, Ocaml, Erlang, SML NJ, Haskell, Postscript, Forth, F-Script, and now even C and C++ there are nice interactive tools for playing with language constructs without having to erect the scaffolding of an entire program. For a modern language not to support interactive compilation and evaluation is just unacceptable, considering the technology is older than most of the languages I've listed above.

Once I managed to get my basic development environment setup, and was able to verify that I could import packages, run tests, and get code working, I started to make progress on implementing the core package: message.go. It should probably be a cardinal sin to implement a statically type language and categorically skip both unions and variant types. Writing unit tests for parsing arbitrary JSON via the JSON package resulted in two days of pure frustration. The crux of the issue is the use of the interface type as a void. Rather than having a tagged union or actual type algebra, Go has adopted a default type that binds to anything. The problem is however, reflection upon that anything can will and does return interface{} as the type, even if the actual type is []interface{}. As such, without a-priori knowledge of what was sent to you, the Unmarshalled data structure

Even though reflect is responding that the value in position 1 of the slice is in fact a slice, the interpreter throws an error if you attempt to dereference the element as if it were a slice as per line 23 above (commented out). This is because the compiler believes it to be of type interface{} and not []interface{}. Simply put, the compiler is incapable of compiling the code that would work because it isn't deferring the check to runtime, an instead strictly interprets each element MUST be of type interface{}. This means you MUST provide runtime overrides to force the type assertion. I have found that programming in Go is an exercise of asserting types until the compiler allows the code to compile, and then writing runtime type checks to catch all of the incorrect type assertions the compiler failed to catch. This is the worst of both worlds! Writing unit tests for this has also become a chore, as many of my test compile, but the type assertions fail at run time causing the unit tests to panic, not fail, but panic as in crash. Attempting to protect against clients passing arbitrary JSON and malformed messages is actually a nightmare as a result, as all input access must be moderated through reflection before basic access is attempted. That said, I had become very familiar with the reflection capabilities of Go before even completed my first file!

Before starting on writing the messaging code for the first simple router (topic.go), I had watched a couple videos of Rob Pike talking about go routines and channels. Having spent a lot of time programming in Erlang, but not nearly as much as Joe Armstrong, I have certain biases when it comes to distributed communication.

1.) it should be trivial to send a message to another entity via symbolic reference
2.) it should be trivial to spawn processes and have them live forever
3.) it should just work out of the box

Now consider the following go code:

If you run this a bunch of times, you’ll get:

As you can see the result is non-deterministic.

By contrast write some Erlang:

and run it via

erl -run test main

And you’ll always get the same response, an infinite stream of processes communicating:

And should you think, well I’ll just add an infinite loop at the end of the Go program to do the same:

You’ll end up with two lines of output and then it will hang in an infinite loop and neither go routine will ever iterate the next step of their for loop, or receive another message. What you need is the magic of select{}

Which is truly magic as the only place you’ll find this reference in the core documentation is in a code comment in an example in The Go Programming Language Specification. And the statement is described as:

A "select" statement chooses which of a set of possible send or receive operations will proceed. It looks similar to a "switch" statement but with the cases all referring to communication operations.

Which is about as far as you’re going to get out of the document, because the actual description of what select does in this specific case is stated as:

Since communication on nil channels can never proceed, a select with only nil channels and no default case blocks forever.

And that is all you will ever understand of this magical beast. So poorly documented is this behavior and so poorly explained, that it took me a week to puzzle out how the threading model actually works under the hood, and how the implicit joins occur. For most consumers of Go this is hidden under on of the default server’s select call deep within a package no one bothers to read. This select that Rob Pike comments in the video as being so critical to get right, is so wrong in so many ways.

From here

I’m almost done with v0.1, and will release it by the end of October 1st. I have a much deeper appreciation for what Go is and is not. I also no longer believe the hype. Go is yet another flawed language, just like all the rest. Panacea is not around the corner