Visualizing Squeak

So in addition to building a 64bit compiler for NewScript, I've been playing around with a set of tools to analyze the interconnectedness of various classes in Squeak Smalltalk. These tools aren't intended just for Smalltalk, and I will be porting them to Javascript, NewScript, and a few other languages too, but the basic idea is you can't properly prune a tree if you can't see the branches. Similarly, you can't refactor a class hierarchy or a mesh of objects, if you can't identify all of the explicit references.

KnownWorld: a Territorial Map of Squeak

What I've done is create a new Morph called Territory. To create a new Territory and inspect its neighbors you say something like:

Territory for: #Object

You then simply right click on it to see all of its neighbors. And by neighbors, I mean all of the classes that are explicitly referred to by its CompiledMethods. Essentially, you run through each CM associated with the class, and look at its literal fields. Then you test to see if a literal is an association that binds a symbol to a class, and then create a new Territory for that class. The radius of each circle is directly proportional to the number of methods in the class, and the color indicates whether it inherits from Behavior, Trait, or Morph. And what can quickly find out is how incredibly interconnected some things get:


catch(e) " href="http://4.bp.blogspot.com/_XCDTVvEbBMU/Sl-Xu1bjwaI/AAAAAAAAAE4/I07a6Y3lRuw/s1600-h/Array.png">

If you look at the Array class, it is actually nice and clean. DependentsArray is a bit of cruft which itself reference WriteStream, And DiskProxy is probably better refactored out, as it is a utility class for dealing with DataStream, Project, and ImageSegments. But overall it is quite reasonable.


catch(e) " href="http://1.bp.blogspot.com/_XCDTVvEbBMU/Sl-Xumb95FI/AAAAAAAAAEw/FLjdM1vQh2k/s1600-h/Compiler.png">

Compiler looks pretty clean as well. SyntaxErrorNotification is a bit crufty, and SyntaxError is both small and provides access to the Debugger and ClassOrganizer. SystemChangeNotifier is both small and well factored. The 80ton gorilla in the room is PositionableStream, which is both large and insanely complex for what its intended purpose is. That's probably the best target for refactoring or simply replacing.


catch(e) " href="http://2.bp.blogspot.com/_XCDTVvEbBMU/Sl-XuWqL7uI/AAAAAAAAAEo/cW6gZqo6p90/s1600-h/TPureBehavior.png">

TPureBehavior is a Godzilla sized monster weighing in at 126 methods, and provides the base functionality for a lot of the class and metaclass side of things. But even as large as it is, it is still dwarfed by String (240) and Behavior (256) proper. It is important to note that these numbers are not including methods inherited through inheritance. It isn't at first glance clear what is crufty and what isn't. But what is clear, is that this core cornerstone of the entire object system is in need of some serious refactoring and analysis.


catch(e) " href="http://4.bp.blogspot.com/_XCDTVvEbBMU/Sl-XuJtWiOI/AAAAAAAAAEg/5jqWPZQrivc/s1600-h/String.png">

If you think of String as some critical component in most applications you are probably correct. While ByteString is actually quite compact and well defined, the more general String class is a mythical Hydra. There's so much going on in this image, that the layout logic gets a little overwhelmed. While many objects refer to String to do all sorts of display and file related tasks, String itself explicitly refers to 50 classes! This is one of those areas where quite a sizeable number of those references are cruft, and could be fixed by reorganizing how files are processed through streams, converters, compressors, and the like. Many other bits of String detritus has to deal with the inappropriate pollution of the String class with display logic. Similarly, chronology seems to be muddled in with the String internals to an unhealthy extent, which could be solved with some refactoring.


catch(e) " href="http://2.bp.blogspot.com/_XCDTVvEbBMU/Sl-WLw0JZoI/AAAAAAAAAEY/1e7le45tfb0/s1600-h/Object.png">

At the heart of this mess is the much abused and polluted Object class. The Object class has 379 methods and refers to 54 classes in my image, and seems to accumulate 2 new ones for every 0 it sheds. Object is heavily polluted by Morphic, as well as, nearly every other addon in the image. The big blue circle in the upper left is the Class object, which is miniscule in comparison (but remember Class inherits from Behavior and consequently TPureBehavior so it can afford to be smaller). The big red PasteUpMorph in the lower left corner has 356 methods, and is almost as large as Object. Compare that to the Deprecation class which has 0 methods, yes 0 methods, which is pretty much as pointless a bit of cruft as there can be.

Future Direction

What I'm going to add to these pictures next is connectors which show the names of the methods that connect each class to its neighbors. I am then going to add the ability to identify CompiledMethods that refer to no other classes and are generally suspect for merely wrapping variables. I'd like to add some statistics to each bubble as well, showing how many class and instance variables each take up, and what percentage of the image at any point in time is composed of each. With a little work in the process code, I'd like to track how much time each object is spending processing its method sends. Finally, I'd like to have each object directly editable. Rather than running through the SystemBrowser, you should be able to just click on a method, refactor it, or prune it right there. This would make editing the class web a lot more reasonable.