Visualizing Software Systems in the Vivacity Virtual Environment
Clinton Jeffery
University of Idaho
jeffery@uidaho.edu
unfinished draft 0.2
May 2, 2018
Abstract
Software developers have been slow in adopting software visualization, whose
promise has been so tantalizing and obvious since its inception in the
1980's. Although many researchers believed that visualization would someday
revolutionize difficult tasks in debugging and software maintenance, many
software visualizations have been so abstract as to be useless, or depicted
information that was easy to obtain but not directly useful.
More recently, several research groups have visualized the evolution of
large software systems using progressively more sophisticated "city"
metaphors, mapping information about software components onto familiar
architectural features such as buildings and roads. Up to now, this
metaphor has been applied to static or slow-changing information, such
as examining months or years of changes in a software repository.
This paper presents a survey of existing work on visualizing software as
cities, and then introduces a "living city" metaphor, in which a set of
programs written by a set of authors is visualized as a city populated by
dynamic entities such as users, data structures, threads of execution,
and bugs. An implementation of the "living city" is proposed.
The paper includes a discussion of what will be needed, both in
terms of open research problems and existing and needed software tools.
1.0 Introduction
Visualization, the graphical depiction of information, means many
different things to people working in different areas. This paper is a
study within the subfield of software visualization, the graphical depiction
of information about computer programs. Programmers can learn much from
static views of program structure such as UML Class Diagrams, but to really
switch on the light bulb, one needs to animate the invisible, dynamic
behavior of program executions.
In the 1980's, the promise of software visualization was made apparent by
films such as Sorting out Sorting[1], and subsequent software systems
such as Balsa [2]. It was clear then that appropriate graphical depiction of
software would enhance understanding and be useful for tasks including
construction and debugging. However, software visualization is still an
exotic task. We still face programs primarily as a task of scrutinizing
static source code, or wading through volumes of debug, profile, or log
data. But intuitively our vocabulary when discussing code continuously
refers to intuitive visual metaphors.
1.1 Visualizing Software As Cities
Wettel and Lanza [3] introduced a potent metaphor: a software system can be
visualized as a city. They implemented a tool called CodeCity in SmallTalk
that explores this metaphor. In CodeCity, classes and interfaces are
depicted as buildings. A building's height indicates the number of methods
in the class. Width and length are both proportional to the number of
attributes; this implies that all buildings are square when viewed from the
top. In addition to basic size, Wettel and Lanza consider position, hue,
saturation, and transparency as useful for plotting additional information
about the software.
With this simple metaphor, Wettel and Lanza depicted very large software
systems, such as an 8,000 class SmallTalk program. The left
image shows class dimensions mapped 1:1 onto building dimensions; the
right image shows class dimensions scaled to reduce the effect of outliers.
In CodeCity, the altitude or base land topography depicts package structure.
Some software systems written in some languages such as Java emphasize the
use of packages for code organization; in many languages that do not,
directory organization might play a similar role in large programs.
Wettel and Lanza's overall city layout, other than to group classes in the
same package together, is fairly arbitrary. A modified treemap algorithm
[4]
is used to place buildings largest-first, splitting
rectangles into smaller pieces into which smaller rectangles can be placed.
their modified treemap algorithm might be improved
upon by a static layout that uses software coupling to position related
classes near each other.
Wettel and Lanza used their software city metaphor to study
how software systems evolves over weeks, months, or years in a software
revision control repository. This use focuses on relatively static
information, and any change would only be perceived by the equivalent
of slow-motion photography. Their metaphor provides the static backdrop
for the visualization of dynamic program behavior in this research.
1.2 Related Work
Wettel and Lanza weren't the first with the notion of using
a city metaphor to describe software systems. Not counting
small-scale related metaphors such as "software architecture",
a city as a depiction of a software system was suggested in
science fiction in the 1980's and possibly earlier. This section
mentions a few of the more important relevant visualization projects.
Knight and Munro [8]
developed a prototype called Software World for visualizing Java code in
which a program is treated as a virtual world, a directory or package
is a country, each single file is a city, each class is a district,
and each method corresponds to a building. Like Wettel and Lanza, they do
not go so far as to propose how to map individual statements and expressions
in the code into an interior of their building, but they do assign building
exterior characteristics based on size, number of parameters, and so on.
In general, the finer granularity of this mapping of code onto virtual-world
entities has its pros and cons. Having a 3D volume (building interior)
might allow a better mapping for large methods, than a metaphor where an
entire class is a building and each method occupies one storey, implying
an essentially 2D layout for control structures.
In addition, Wettel and Lanza's work inspired several other researchers to
go beyond the study of software repository evolution.
Kuhn, Loretan, and Nierstrasz [5] developed a more topological form
of software maps, with some interesting mathematics applied to develop
what look like island chains depicting relationships and attributes
of software components. Honestly, I think these maps are very pretty,
and the math in the paper for constructing them from real relationships
between components is interesting and deep. These maps do not
inspire the scrutiny of multi-dimensional detail that CodeCity does.
However, they and the mathematics underneath them might be useful in
positioning related programs that are not explicitly connected.
Steinbruckner and Lewerentz [6] pursued the
city metaphor further in a tool called CrocoCosmos in which they
developed a much more sophisticated and information-based layout.
They adopt practices from cartography in terms of establishing a
primary model (a data structure abstracting the software system
in its own terms) and then using secondary and tertiary models to
graphically depict detailed information. Subsystems/packages are
depicted as streets, with contained classes as buildings on those
streets. Age is seemingly depicted by both centrality (older modules
are near the center, as with real cities) and elevation.
Caserta, Zendra, and Bodenez [7] develop hierarchical attraction points
in order to superimpose a depiction of relationships between classes as
lines drawn above buildings in a city metaphor. While many or most
(especially object-oriented) programs are understood largely through
the static relationships between their classes and runtime relationships
between instances, I don't think a big wad of red and green yarn is going
to contribute greatly to clarity. It might, if it was transparent enough,
provide subtle additional hints, or a Futurama-style teleportation tube
system.
1.3 On the Shortcomings and Limitations of Metaphors
It is unrealistic to expect that a metaphor will convey meaning and
understanding of large software systems. Cities
inhabited by humans have structures designed to support mammalian bipeds.
Computer programs have structures designed to support von Neumann
computation -- control flow and data encapsulation. There is no reason
for a city metaphor to "make sense", or provide intuitions about how
software is (or should be) structured.
For example, the CodeCity authors
noted that many classes were impossibly tall (over 1000 methods)
relative to their length and width (only a few attributes). As a building,
it looks inhuman. If a human is looking for problems in the code, they
will be tempted to look for "wrong-looking" structures, but there is
potentially nothing wrong with these classes just because they look
impossibly tall and skinny. Worse, the CodeCity authors elected
to adopt a non-linear scale of the buildings' height in order to make
such classes look more like plausible buildings that might appear in a
human city. Actually,
visualization is full of adjustments to scaling. In an exponential
computing world, a logarithmic scale is a visualization designer's friend.
However, the reason to adopt a non-linear scale had better not be to make
the data fit the metaphor.
2. The Living City Metaphor
Although the software city metaphor is interesting when used in a post-facto
study of the evolution of large software repositories, this paper contends
that the "city" metaphor is more useful when it becomes a means of sharing,
interacting with, and debugging a group's collective software development
efforts.
The vision advocated in this paper is to use the software city metaphor to
implement a collaborative virtual environment for software development. In
order to achieve this vision, many extensions to the relatively static software
city metaphor are needed, but the primary extensions will be the introduction
of dynamic entities with which to interact, and things to do in the city.
2.1 Visualizing a Software Ecosystem
The term ecosystem was used to suggest not just a single program, but a set
of programs being worked on in full view of a set of developers. This could
be a set of related open source projects, or an internally-open corporate
development environment, or a group of hobbyists with common programming
interests.
The main issues are: willingness to share, and offering something to gain by
participation. The main technical gains from collaboration include peer
assistance and review, but the main reasons to do all this in a collaborative
software visualization environment is to inject interest and fun, and reduce
the cost of collaboration.
Most prior work focuses on visualizing single (albeit large) systems
written in a single programming language.
In the general case, an ecosystem entails visualizing a rich
heterogeneous set of entities written in multiple (programming)
languages.
2.2 Static Extensions to the City Metaphor
Before moving on to the good stuff, it is worth considering a set of
extensions to the static backdrop developed by Wettel and Lanza.
Although the metaphor is useful as proposed, CodeCity doesn't
automatically depict relationships between classes or semantic
proximities at all. It goes as far
as colocating classes in the same package.
- directories and packages are roads, sure
- And in fact we need several different sizes, at least equivalent to
street, arterial, and freeway.
- Classes' buildings' dimensions
- Height is not just proportional to # of methods, each family of
methods is a story of the building.
Instead of width and length both being
proportional to the number of class variables, the width can be
proportional to the number of class variables while the length of
the building is proportional to the (sqrt or log of) the length
of the longest method in the class.
Private methods will be depicted as subterranean/below ground levels.
- Building exterior appearance
-
Previous research suggested the use of color to code
aspects of the building such as author(s). I suggest a texture that
encodes a class' origin age and a shading or blending color to suggest
the last time a change to it was committed. By this standard, a lot of
the code I work with is 20-30 years old, but some of it is maintained
and modified and other programs are really ancient ruins.
- Class details/internals
-
The inside would ideally be an informative layout that
you can walk around in. Starting from a ground floor directory,
my thought would be that one might mainly access methods via elevator.
Jafar suggested the ground floor consist of constructors (most
classes have them, although there exist default constructors).
It might be possible to classify other methods into functional
categories that should be grouped together.
- Floor/storey method/function body details
-
This part is really difficult, and important. It is a research topic.
One could punt, and just revert to an IDE "file open" operation whenever
one stepped off the elevator into a method. In general, though, one may
want more than just source code, one wants that space to indicate the
number of activation records live at the moment, processes/threads actually
executing there, bugs, etc. Source code might be a giant floor, ceiling
or wall texture, which might also include flow chart or flow graph, summary
statistics etc. The 3D volume is for dynamic information.
- What is the model of time?
- Real-time is not satisfactory for program debugging purposes.
Programmers frequently need to freeze time, or slow it dramatically.
But, it will be difficult to coordinate timescales across multiple users'
tasks on anything but real-time. Programmers already know there is a
difference between "wall-clock time" and "CPU time". I guess it is
CPU time that may need to freeze, or go backwards if we manage to
acquire the Dagger of Time. If you don't know what the Dagger of Time
is, you need to go level up and then come back (in time) to the talk.
- What are executions (processes and threads)?
- Programs and threads should
appear to developers like "weeping angels".
This monster is a frozen statue whenever anyone
looks at it, but moves (often impossibly fast) when unseen.
Of course, unlike in Dr. Who, our "weeping angels" are the good guys.
They are frozen and stepped ("blink") so that one can view
their state (mainly, values of local variables).
If everything that has an execution call stack is represented as a
statue that moves as soon as you stop looking at it, this does not
explain the connection of such entities to related heap structures
or global variable regions. It does not explain how the stack
should be depicted/accessible by inspecting the statue.
- How to represent procedures
- Procedures/functions can be treated as a singleton class containing
one public method. A large library of procedures
will look like a village; it will have many low-lying buildings.
Ah for the good old days of small-scale software where OOP would
be overkill.
- How to represent (class) instances
- More broadly, this is how to represent heap-allocated data, as distinct
from the processes/threads depicted as weeping angels.
As humanoids? Library instances as robots?
- How to represent garbage?
Garbage on the ground? As undead?
There was an idea of a Garbage Collector going around blasting
the undead while a viewer watches or helps...
- How to represent atoms (numbers, strings, etc)
- Not at all? As text? As virtual books (strings), hammers?? (ints) and
saws?? (reals)? What about tables and lists? Records got special
treatment as people; tables and lists as bookshelves, or buses, or?
- How do represent external entities
- network connections, I/O handles, files... Jafar says that network
connections are airports, databases are sea ports, local files are mines.
One reservation about this is operating system handles are associated
with particular processes; they are runtime entities not static structure.
Aircraft, ships and perhaps miners or mine carts are possible solutions
- Why should one need associations in the metaphor?
- Associations provide added connectivity, beyond the street system.
- What associations are depicted, and how?
- We need at least: inheritance, aggregation, and reference.
- How to depict inheritance?
- Inheritance implies physical resemblance. Copied buildings with extra
floors.
- How to depict aggregation?
- aggregation is adjacency or containment. Since it is a runtime
relationship
- How to represent the stack
-
- Gradually dimming lights in buildings?
- Portals/teleporters/bridges/moving sidewalks?
- Beam of light?
In discussion, there seemed to be support for the beam-of-light
model, pointing backwards from callee to caller.
The beam of light might be a good metaphor for an instant-teleportation
feature...
- How to represent bugs and warnings
- As monsters. Actually, it is more complicated than that. Each bug
report in each program is a spawn-point (whose location may move randomly
until the location of the bug is known). That spawn-point will keep emitting
monsters that kill or maim executions until fixed.
Killing an instance of a bug
might yield you a clue as to the bug's nature, but it won't be until you
fix the bug that the spawn-point goes away.
- How to layout buildings?
- Around an older, urban core? Minimize distance of overall call graph?
Besides placing the largest buildings first, and placing others around them,
and the obvious thought of placing associated or coupled classes together,
"Software Cartography", by Peter Loretan, M.S. Thesis, University of Bern, 2011
suggests associating classes by their lexical vocabulary. For example, classes
with a great many of the same method names are similar.
- What are ghosts?
- Remembrances of fixed bugs and deleted code
- How to present data details.
- Well, instances are a lot of the data, and atoms are the rest.
A prime issue here is one of aggregation. When is an object
a citizen of the world, and when is it just somebody's foot?
I guess the answer is: when referenced globally, or by two or
more other instances.
2.3 Dynamic Extensions
A living city has inhabitants, who do things, and interact.
The world itself is changed by actors and actions in the world.
Second Life or Minecraft might be a good mental model for this,
where software developers are construction and maintenance workers.
- Users
- What are users to software developers? Usually they are invisible.
In a living city, they are citizens. They are NPC's unless they wish
to use their software by means of the living city client itself. What
happens to users? They perform use cases. They work more rapidly or
more slowly during such use cases. Their client either completes a
use case every so often, or aborts a use case to work on something else,
or suspends, or logs out, or crashes.
- Developers
- Other developers are also usually invisible. In a living city, they
are (possibly municipal) employees. They are NPC's unless they are
writing their software using the living city client itself.
- Processes and threads
- The static software code forms the buildings of the city, but the
executions of that software code are NPC's that move around through the
building, at high speed. If you see a thread NPC standing still, it is
because it is blocked or thrashing in some way and needs help.
- Bugs
-
3. Case Study: Unicon
This section will present a design for a living city in which to
develop code for the Unicon programming language. This city will be called
Unicon City, in the hopes that the name harkens back to Raccoon City and
the Umbrella Corporation, but if that means nothing to you, you can still
follow the discussion.
Unicon City is a city built for the Unicon users. The case study considers
only the source code that happens to be bundled
with the main language distribution. Just for fun, the ecosystem of
Unicon source code studied explicitly does consider the
195,000 lines of C code in the language implementation with which that code
is distributed, although frankly, most users will not want to wander around
that side of town at night. It doesn't consider the millions of lines
of proprietary
Icon and Unicon code written and used in various organizations that haven't
gone to some effort to place their work in the public eye.
The .icn files in the Unicon distribution comprise some 350,000 lines of code
written by 60+ authors. A first goal for a living city would be to select
some space-filling algorithm for a primary 2D layout tool. To lay out Unicon
city, a primary horizontal axis street named unicon/ might be sized reasonably
such that a person could walk across it. Most MMORPGs', even if "World" is in
their name, cover the entire world in a few kilometers at most. There are
reasonable arguments as to
what arbitrary scaling measures one should adopt for software cities.
For example, we have a grand total of 570K LOC to lay out. Should the
length and width of the primary axis street be proportional to the number
of lines of code or to some other measure? A street 1cm/LOC long would be
5.7km. Pretty small, but large enough that one might want to take public
transport. As for the width of this primary arterial, ln(LOC) meters (13.2m)
might be about right. Using ln(LOC) for all streets makes all streets
fat and looking about the same.
Self-critique is in order here. This first attempt at a street layout for
Unicon City has some pros and cons. A pro is that it was written with a
Unicon program (128 lines) and is easily modified. But it is a tree because
the Unicon
directory hierarchy is a tree, but real cities are not trees and do not
have a single central point that all cross-town traffic would have to go
through. The "CodeCity" treemap algorithm implies more crossing
roads due to its tendency to lay out disconnected parts of the tree adjacently.
The "CrocoCosmos" street-based layout is closer to what you see here,
albeit still quite tree-ish and underconnected.
A hand-drawn city layout would very probably be better than an
algorithmically-generated one. One research goal would be to
come up with a more natural, more city-like automatic layout
algorithm.
4. Implementation
What tools have to exist in order to implement a living city?
- collaborative virtual environment
- We can adapt from CVE
- initial "world" generation from software codebase
- This is a procedural-generation job.
Generating world models of basic street layouts and buildings
from classes and procedures would not be hard.
- convenient incremental algorithms for code updates
- Code commits to a version control repository trickle in in bursts.
Updating these slowly-changing (almost static) data models online
would be challenging, and we have only dabbled with dynamic changes
to world structure so far.
- rich source of high performance dynamic data
- we have exceptional monitoring facilities starting with a virtual
machine instrumented to report ~120 types of events ranging from
control structures and control flow to stack behavior and calls and
returns to data structure allocation and access and garbage collection.
- heaps of NPC AI
- CVE has a rudimentary NPC interface but no real AI.
We would benefit from a collaborator here.
- heaps of gameplay mechanics
- CVE has previously seen kickboxing and first-person-shooter
ideas incorporated, but "gameplay" is not its original domain.
- enthusiastic user base
- Originally I planned to start with the
Unicon language community.
More recently, I came to think it would be best
to start with CS1 programming in C++ and Java.
5. Conclusions and Future Work
Vivacity, the Living City, is an ambitious to gamify human understanding of
computer program behavior by placing humans within the machine. It would
not be possible without a high performance program execution monitoring
engine, and an easy to use general purpose 3D programming
facility. By happy coincidence, this summarizes my professional training
and life's work.
Acknowledgement
Jafar Al-Gharaibeh provided thoughtful suggestions regarding
an earlier draft of this work.
References
[1] Ronald Baecker, "Sorting out Sorting", 1981, video, 30 minutes.
[2] Marc Brown and Richard Sedgewick, "A System for Algorithm Animation",
Computer Graphics 18(3), 177-186.
[3] Richard Wettel and Michele Lanza, "Visualizing Software Systems as
Cities", In Proceedings of VISSOFT 2007 (4th IEEE International Workshop on
Visualizing Software For Understanding and Analysis), pp. 92 - 99, IEEE
Computer Society Press, 2007.
[4] Shneiderman, B. "Tree visualization with treemaps: a 2-d space-filling
approach", ACM Transactions on Graphics, vol. 11, 1 (Jan. 1992) 92-99.
[5] Adrian Kuhn, Peter Loretan, Oscar Nierstrasz.
"Consistent Layout for Thematic Software Maps",
Proceedings of the 15th Working Conference on
Reverse Engineering, WCRE '08. Oct. 2008. pp. 209 - 218.
[6] Frank Steinbruckner and Claus Lewerentz.
"Representing Development History in Software Cities",
in Proceedings of the 5th international symposium on Software visualization,
SOFTVIZ 2010, ACM, New York, pp. 193-202.
http://csbob.swan.ac.uk/visWeek10/softvis/docs/p193.pdf
[7] Pierre Caserta, Olivier Zendra and Damien Bodenes,
"3D Hierarchical Edge Bundles to Visualize Relations in a
Software City Metaphor" in 6th IEEE International Workshop on
Visualizing Software for Understanding and Analysis (VISSOFT 2011).
[8] Claire Knight and Malcolm Munro,
"Comprehension with[in] Virtual Environment Visualizations",
in Proceedings of the Seventh International Workshop on Program
Comprehension, IWPC '99, Pittsburgh, PA, 5-7 May 1999, pp. 4-11.
[9] Craig Anslow, Stuart Marshall, and James Noble,
"X3D-Earth in the Software Visualization Pipeline",
X3D Earth Requirements '06 Workshop,
November 14-15, 2006,
Naval Postgraduate School,Monterey, California, USA.
[10] Thomas Panas, R. Berrigan John Grundy,
"A 3D metaphor for software production visualization",
Proceedings. Seventh International Conference on
Information Visualization, IV 2003.