Visualizing Software Systems in the Vivacity Virtual Environment

Clinton Jeffery
University of Idaho
jeffery@uidaho.edu

unfinished draft 0.2
May 2, 2018

Abstract

Software developers have been slow in adopting software visualization, whose promise has been so tantalizing and obvious since its inception in the 1980's. Although many researchers believed that visualization would someday revolutionize difficult tasks in debugging and software maintenance, many software visualizations have been so abstract as to be useless, or depicted information that was easy to obtain but not directly useful.

More recently, several research groups have visualized the evolution of large software systems using progressively more sophisticated "city" metaphors, mapping information about software components onto familiar architectural features such as buildings and roads. Up to now, this metaphor has been applied to static or slow-changing information, such as examining months or years of changes in a software repository.

This paper presents a survey of existing work on visualizing software as cities, and then introduces a "living city" metaphor, in which a set of programs written by a set of authors is visualized as a city populated by dynamic entities such as users, data structures, threads of execution, and bugs. An implementation of the "living city" is proposed. The paper includes a discussion of what will be needed, both in terms of open research problems and existing and needed software tools.

1.0 Introduction

Visualization, the graphical depiction of information, means many different things to people working in different areas. This paper is a study within the subfield of software visualization, the graphical depiction of information about computer programs. Programmers can learn much from static views of program structure such as UML Class Diagrams, but to really switch on the light bulb, one needs to animate the invisible, dynamic behavior of program executions.

In the 1980's, the promise of software visualization was made apparent by films such as Sorting out Sorting[1], and subsequent software systems such as Balsa [2]. It was clear then that appropriate graphical depiction of software would enhance understanding and be useful for tasks including construction and debugging. However, software visualization is still an exotic task. We still face programs primarily as a task of scrutinizing static source code, or wading through volumes of debug, profile, or log data. But intuitively our vocabulary when discussing code continuously refers to intuitive visual metaphors.

1.1 Visualizing Software As Cities

Wettel and Lanza [3] introduced a potent metaphor: a software system can be visualized as a city. They implemented a tool called CodeCity in SmallTalk that explores this metaphor. In CodeCity, classes and interfaces are depicted as buildings. A building's height indicates the number of methods in the class. Width and length are both proportional to the number of attributes; this implies that all buildings are square when viewed from the top. In addition to basic size, Wettel and Lanza consider position, hue, saturation, and transparency as useful for plotting additional information about the software.

With this simple metaphor, Wettel and Lanza depicted very large software systems, such as an 8,000 class SmallTalk program. The left image shows class dimensions mapped 1:1 onto building dimensions; the right image shows class dimensions scaled to reduce the effect of outliers. In CodeCity, the altitude or base land topography depicts package structure. Some software systems written in some languages such as Java emphasize the use of packages for code organization; in many languages that do not, directory organization might play a similar role in large programs.

Wettel and Lanza's overall city layout, other than to group classes in the same package together, is fairly arbitrary. A modified treemap algorithm [4] is used to place buildings largest-first, splitting rectangles into smaller pieces into which smaller rectangles can be placed. their modified treemap algorithm might be improved upon by a static layout that uses software coupling to position related classes near each other.

Wettel and Lanza used their software city metaphor to study how software systems evolves over weeks, months, or years in a software revision control repository. This use focuses on relatively static information, and any change would only be perceived by the equivalent of slow-motion photography. Their metaphor provides the static backdrop for the visualization of dynamic program behavior in this research.

1.2 Related Work

Wettel and Lanza weren't the first with the notion of using a city metaphor to describe software systems. Not counting small-scale related metaphors such as "software architecture", a city as a depiction of a software system was suggested in science fiction in the 1980's and possibly earlier. This section mentions a few of the more important relevant visualization projects.

Knight and Munro [8] developed a prototype called Software World for visualizing Java code in which a program is treated as a virtual world, a directory or package is a country, each single file is a city, each class is a district, and each method corresponds to a building. Like Wettel and Lanza, they do not go so far as to propose how to map individual statements and expressions in the code into an interior of their building, but they do assign building exterior characteristics based on size, number of parameters, and so on. In general, the finer granularity of this mapping of code onto virtual-world entities has its pros and cons. Having a 3D volume (building interior) might allow a better mapping for large methods, than a metaphor where an entire class is a building and each method occupies one storey, implying an essentially 2D layout for control structures.

In addition, Wettel and Lanza's work inspired several other researchers to go beyond the study of software repository evolution. Kuhn, Loretan, and Nierstrasz [5] developed a more topological form of software maps, with some interesting mathematics applied to develop what look like island chains depicting relationships and attributes of software components. Honestly, I think these maps are very pretty, and the math in the paper for constructing them from real relationships between components is interesting and deep. These maps do not inspire the scrutiny of multi-dimensional detail that CodeCity does. However, they and the mathematics underneath them might be useful in positioning related programs that are not explicitly connected.

Steinbruckner and Lewerentz [6] pursued the city metaphor further in a tool called CrocoCosmos in which they developed a much more sophisticated and information-based layout. They adopt practices from cartography in terms of establishing a primary model (a data structure abstracting the software system in its own terms) and then using secondary and tertiary models to graphically depict detailed information. Subsystems/packages are depicted as streets, with contained classes as buildings on those streets. Age is seemingly depicted by both centrality (older modules are near the center, as with real cities) and elevation.

Caserta, Zendra, and Bodenez [7] develop hierarchical attraction points in order to superimpose a depiction of relationships between classes as lines drawn above buildings in a city metaphor. While many or most (especially object-oriented) programs are understood largely through the static relationships between their classes and runtime relationships between instances, I don't think a big wad of red and green yarn is going to contribute greatly to clarity. It might, if it was transparent enough, provide subtle additional hints, or a Futurama-style teleportation tube system.

1.3 On the Shortcomings and Limitations of Metaphors

It is unrealistic to expect that a metaphor will convey meaning and understanding of large software systems. Cities inhabited by humans have structures designed to support mammalian bipeds. Computer programs have structures designed to support von Neumann computation -- control flow and data encapsulation. There is no reason for a city metaphor to "make sense", or provide intuitions about how software is (or should be) structured.

For example, the CodeCity authors noted that many classes were impossibly tall (over 1000 methods) relative to their length and width (only a few attributes). As a building, it looks inhuman. If a human is looking for problems in the code, they will be tempted to look for "wrong-looking" structures, but there is potentially nothing wrong with these classes just because they look impossibly tall and skinny. Worse, the CodeCity authors elected to adopt a non-linear scale of the buildings' height in order to make such classes look more like plausible buildings that might appear in a human city. Actually, visualization is full of adjustments to scaling. In an exponential computing world, a logarithmic scale is a visualization designer's friend. However, the reason to adopt a non-linear scale had better not be to make the data fit the metaphor.

2. The Living City Metaphor

Although the software city metaphor is interesting when used in a post-facto study of the evolution of large software repositories, this paper contends that the "city" metaphor is more useful when it becomes a means of sharing, interacting with, and debugging a group's collective software development efforts.

The vision advocated in this paper is to use the software city metaphor to implement a collaborative virtual environment for software development. In order to achieve this vision, many extensions to the relatively static software city metaphor are needed, but the primary extensions will be the introduction of dynamic entities with which to interact, and things to do in the city.

2.1 Visualizing a Software Ecosystem

The term ecosystem was used to suggest not just a single program, but a set of programs being worked on in full view of a set of developers. This could be a set of related open source projects, or an internally-open corporate development environment, or a group of hobbyists with common programming interests.

The main issues are: willingness to share, and offering something to gain by participation. The main technical gains from collaboration include peer assistance and review, but the main reasons to do all this in a collaborative software visualization environment is to inject interest and fun, and reduce the cost of collaboration.

Most prior work focuses on visualizing single (albeit large) systems written in a single programming language. In the general case, an ecosystem entails visualizing a rich heterogeneous set of entities written in multiple (programming) languages.

2.2 Static Extensions to the City Metaphor

Before moving on to the good stuff, it is worth considering a set of extensions to the static backdrop developed by Wettel and Lanza. Although the metaphor is useful as proposed, CodeCity doesn't automatically depict relationships between classes or semantic proximities at all. It goes as far as colocating classes in the same package.

directories and packages are roads, sure

And in fact we need several different sizes, at least equivalent to street, arterial, and freeway.

Classes' buildings' dimensions

Height is not just proportional to # of methods, each family of methods is a story of the building. Instead of width and length both being proportional to the number of class variables, the width can be proportional to the number of class variables while the length of the building is proportional to the (sqrt or log of) the length of the longest method in the class. Private methods will be depicted as subterranean/below ground levels.

Building exterior appearance

Previous research suggested the use of color to code aspects of the building such as author(s). I suggest a texture that encodes a class' origin age and a shading or blending color to suggest the last time a change to it was committed. By this standard, a lot of the code I work with is 20-30 years old, but some of it is maintained and modified and other programs are really ancient ruins.

Source: http://www.myspace.com/rockthe3d/blog/536537130

Class details/internals

The inside would ideally be an informative layout that you can walk around in. Starting from a ground floor directory, my thought would be that one might mainly access methods via elevator. Jafar suggested the ground floor consist of constructors (most classes have them, although there exist default constructors). It might be possible to classify other methods into functional categories that should be grouped together.

Floor/storey method/function body details

This part is really difficult, and important. It is a research topic. One could punt, and just revert to an IDE "file open" operation whenever one stepped off the elevator into a method. In general, though, one may want more than just source code, one wants that space to indicate the number of activation records live at the moment, processes/threads actually executing there, bugs, etc. Source code might be a giant floor, ceiling or wall texture, which might also include flow chart or flow graph, summary statistics etc. The 3D volume is for dynamic information.

What is the model of time?

Real-time is not satisfactory for program debugging purposes. Programmers frequently need to freeze time, or slow it dramatically. But, it will be difficult to coordinate timescales across multiple users' tasks on anything but real-time. Programmers already know there is a difference between "wall-clock time" and "CPU time". I guess it is CPU time that may need to freeze, or go backwards if we manage to acquire the Dagger of Time. If you don't know what the Dagger of Time is, you need to go level up and then come back (in time) to the talk.

What are executions (processes and threads)?

Programs and threads should appear to developers like "weeping angels". This monster is a frozen statue whenever anyone looks at it, but moves (often impossibly fast) when unseen. Of course, unlike in Dr. Who, our "weeping angels" are the good guys. They are frozen and stepped ("blink") so that one can view their state (mainly, values of local variables).

If everything that has an execution call stack is represented as a statue that moves as soon as you stop looking at it, this does not explain the connection of such entities to related heap structures or global variable regions. It does not explain how the stack should be depicted/accessible by inspecting the statue.

How to represent procedures

Procedures/functions can be treated as a singleton class containing one public method. A large library of procedures will look like a village; it will have many low-lying buildings. Ah for the good old days of small-scale software where OOP would be overkill.

How to represent (class) instances

More broadly, this is how to represent heap-allocated data, as distinct from the processes/threads depicted as weeping angels. As humanoids? Library instances as robots?

How to represent garbage? Garbage on the ground? As undead? There was an idea of a Garbage Collector going around blasting the undead while a viewer watches or helps...

How to represent atoms (numbers, strings, etc)

Not at all? As text? As virtual books (strings), hammers?? (ints) and saws?? (reals)? What about tables and lists? Records got special treatment as people; tables and lists as bookshelves, or buses, or?

How do represent external entities

network connections, I/O handles, files... Jafar says that network connections are airports, databases are sea ports, local files are mines. One reservation about this is operating system handles are associated with particular processes; they are runtime entities not static structure. Aircraft, ships and perhaps miners or mine carts are possible solutions

Why should one need associations in the metaphor?

Associations provide added connectivity, beyond the street system.

What associations are depicted, and how?

We need at least: inheritance, aggregation, and reference.

How to depict inheritance?

Inheritance implies physical resemblance. Copied buildings with extra floors.

How to depict aggregation?

aggregation is adjacency or containment. Since it is a runtime relationship

How to represent the stack

Gradually dimming lights in buildings?
Portals/teleporters/bridges/moving sidewalks?
Beam of light?

In discussion, there seemed to be support for the beam-of-light model, pointing backwards from callee to caller. The beam of light might be a good metaphor for an instant-teleportation feature...

How to represent bugs and warnings

As monsters. Actually, it is more complicated than that. Each bug report in each program is a spawn-point (whose location may move randomly until the location of the bug is known). That spawn-point will keep emitting monsters that kill or maim executions until fixed. Killing an instance of a bug might yield you a clue as to the bug's nature, but it won't be until you fix the bug that the spawn-point goes away.

How to layout buildings?

Around an older, urban core? Minimize distance of overall call graph? Besides placing the largest buildings first, and placing others around them, and the obvious thought of placing associated or coupled classes together, "Software Cartography", by Peter Loretan, M.S. Thesis, University of Bern, 2011 suggests associating classes by their lexical vocabulary. For example, classes with a great many of the same method names are similar.

What are ghosts?

Remembrances of fixed bugs and deleted code

How to present data details.

Well, instances are a lot of the data, and atoms are the rest. A prime issue here is one of aggregation. When is an object a citizen of the world, and when is it just somebody's foot? I guess the answer is: when referenced globally, or by two or more other instances.

2.3 Dynamic Extensions

A living city has inhabitants, who do things, and interact. The world itself is changed by actors and actions in the world. Second Life or Minecraft might be a good mental model for this, where software developers are construction and maintenance workers.

Users: What are users to software developers? Usually they are invisible. In a living city, they are citizens. They are NPC's unless they wish to use their software by means of the living city client itself. What happens to users? They perform use cases. They work more rapidly or more slowly during such use cases. Their client either completes a use case every so often, or aborts a use case to work on something else, or suspends, or logs out, or crashes.
Developers: Other developers are also usually invisible. In a living city, they are (possibly municipal) employees. They are NPC's unless they are writing their software using the living city client itself.
Processes and threads: The static software code forms the buildings of the city, but the executions of that software code are NPC's that move around through the building, at high speed. If you see a thread NPC standing still, it is because it is blocked or thrashing in some way and needs help.
Bugs

3. Case Study: Unicon

This section will present a design for a living city in which to develop code for the Unicon programming language. This city will be called Unicon City, in the hopes that the name harkens back to Raccoon City and the Umbrella Corporation, but if that means nothing to you, you can still follow the discussion.

Unicon City is a city built for the Unicon users. The case study considers only the source code that happens to be bundled with the main language distribution. Just for fun, the ecosystem of Unicon source code studied explicitly does consider the 195,000 lines of C code in the language implementation with which that code is distributed, although frankly, most users will not want to wander around that side of town at night. It doesn't consider the millions of lines of proprietary Icon and Unicon code written and used in various organizations that haven't gone to some effort to place their work in the public eye.

The .icn files in the Unicon distribution comprise some 350,000 lines of code written by 60+ authors. A first goal for a living city would be to select some space-filling algorithm for a primary 2D layout tool. To lay out Unicon city, a primary horizontal axis street named unicon/ might be sized reasonably such that a person could walk across it. Most MMORPGs', even if "World" is in their name, cover the entire world in a few kilometers at most. There are reasonable arguments as to what arbitrary scaling measures one should adopt for software cities.

For example, we have a grand total of 570K LOC to lay out. Should the length and width of the primary axis street be proportional to the number of lines of code or to some other measure? A street 1cm/LOC long would be 5.7km. Pretty small, but large enough that one might want to take public transport. As for the width of this primary arterial, ln(LOC) meters (13.2m) might be about right. Using ln(LOC) for all streets makes all streets fat and looking about the same.

Self-critique is in order here. This first attempt at a street layout for Unicon City has some pros and cons. A pro is that it was written with a Unicon program (128 lines) and is easily modified. But it is a tree because the Unicon directory hierarchy is a tree, but real cities are not trees and do not have a single central point that all cross-town traffic would have to go through. The "CodeCity" treemap algorithm implies more crossing roads due to its tendency to lay out disconnected parts of the tree adjacently. The "CrocoCosmos" street-based layout is closer to what you see here, albeit still quite tree-ish and underconnected.

A hand-drawn city layout would very probably be better than an algorithmically-generated one. One research goal would be to come up with a more natural, more city-like automatic layout algorithm.

4. Implementation

What tools have to exist in order to implement a living city?

collaborative virtual environment: We can adapt from CVE
initial "world" generation from software codebase: This is a procedural-generation job. Generating world models of basic street layouts and buildings from classes and procedures would not be hard.
convenient incremental algorithms for code updates: Code commits to a version control repository trickle in in bursts. Updating these slowly-changing (almost static) data models online would be challenging, and we have only dabbled with dynamic changes to world structure so far.
rich source of high performance dynamic data: we have exceptional monitoring facilities starting with a virtual machine instrumented to report ~120 types of events ranging from control structures and control flow to stack behavior and calls and returns to data structure allocation and access and garbage collection.
heaps of NPC AI: CVE has a rudimentary NPC interface but no real AI. We would benefit from a collaborator here.
heaps of gameplay mechanics: CVE has previously seen kickboxing and first-person-shooter ideas incorporated, but "gameplay" is not its original domain.
enthusiastic user base: Originally I planned to start with the Unicon language community. More recently, I came to think it would be best to start with CS1 programming in C++ and Java.

5. Conclusions and Future Work

Vivacity, the Living City, is an ambitious to gamify human understanding of computer program behavior by placing humans within the machine. It would not be possible without a high performance program execution monitoring engine, and an easy to use general purpose 3D programming facility. By happy coincidence, this summarizes my professional training and life's work.

Acknowledgement

Jafar Al-Gharaibeh provided thoughtful suggestions regarding an earlier draft of this work.

References

[1] Ronald Baecker, "Sorting out Sorting", 1981, video, 30 minutes.

[2] Marc Brown and Richard Sedgewick, "A System for Algorithm Animation", Computer Graphics 18(3), 177-186.

[3] Richard Wettel and Michele Lanza, "Visualizing Software Systems as Cities", In Proceedings of VISSOFT 2007 (4th IEEE International Workshop on Visualizing Software For Understanding and Analysis), pp. 92 - 99, IEEE Computer Society Press, 2007.

[4] Shneiderman, B. "Tree visualization with treemaps: a 2-d space-filling approach", ACM Transactions on Graphics, vol. 11, 1 (Jan. 1992) 92-99.

[5] Adrian Kuhn, Peter Loretan, Oscar Nierstrasz. "Consistent Layout for Thematic Software Maps", Proceedings of the 15th Working Conference on Reverse Engineering, WCRE '08. Oct. 2008. pp. 209 - 218.

[6] Frank Steinbruckner and Claus Lewerentz. "Representing Development History in Software Cities", in Proceedings of the 5th international symposium on Software visualization, SOFTVIZ 2010, ACM, New York, pp. 193-202. http://csbob.swan.ac.uk/visWeek10/softvis/docs/p193.pdf

[7] Pierre Caserta, Olivier Zendra and Damien Bodenes, "3D Hierarchical Edge Bundles to Visualize Relations in a Software City Metaphor" in 6th IEEE International Workshop on Visualizing Software for Understanding and Analysis (VISSOFT 2011).

[8] Claire Knight and Malcolm Munro, "Comprehension with[in] Virtual Environment Visualizations", in Proceedings of the Seventh International Workshop on Program Comprehension, IWPC '99, Pittsburgh, PA, 5-7 May 1999, pp. 4-11. [9] Craig Anslow, Stuart Marshall, and James Noble, "X3D-Earth in the Software Visualization Pipeline", X3D Earth Requirements '06 Workshop, November 14-15, 2006, Naval Postgraduate School,Monterey, California, USA. [10] Thomas Panas, R. Berrigan John Grundy, "A 3D metaphor for software production visualization", Proceedings. Seventh International Conference on Information Visualization, IV 2003.