Lecture Notes for CS 404/504 Program Monitoring and Visualization
What this Course is About
This course is a blend of dynamic analysis ,
the study of program execution behavior, with visualization
the graphical depiction of large amounts of information.
We could spend half the semester on monitoring and then half on
visualization, but I am going to try and blend the topics into each
class period.
Early History of Monitoring and Visualization according to Jeffery
Others may have more and better information, but this is my version of
that subset of computing history relevant to this course.
In the beginning, there were programs. And programs begat bugs. In the
punchcard era, the highlight of one's afternoon often was getting back one's
output from one's daily program run, a short stack of punched cards to the
effect that the program was not executed at all, due to an error in the
source code. But eventually programs started to compile or assemble. When
a program ran and did not produce expected output, one was supposed to go
back to the source code and study it to find out why. This still works,
some of the time.
When the computing industry reached a stage of having interactive, text
screen terminals, all kinds of new bugs became common-place. Along with
mankinds increased ability to generate bugs, a whole slew of tools and
techniques were developed to understand program executions, including
tracing, and source level debuggers. These tools still work, they just
don't scale well. Sadly, if you look at a modern IDE its debugging and
tracing capabilities are not much improved from what was available 30
years ago. This is (I claim) because problems in monitoring and debugging
are hard, and the cost of building new tools which might advance the
state of the art is very high.
By the 1980's interactive 2D graphics was ubiquitous and improving rapidly
in performance. People started to use graphics to help understand program
execution behavior partly because text-only techniques did not scale well,
and partly juse because the graphics was available. A movie called
"Sorting
out Sorting"
(parts 1,2,3),
originally presented at SIGGRAPH, made a compelling argument that
graphical techniques could be valuable in teaching and understanding
algorithms.
Sorting Out Sorting was done one frame at a time on truly ancient facilities.
A group at Brown University (home of graphics guru Andy Van Dam, algorithms
guru Robert Sedgewick and a cast of thousands) set out to replicate on
interactive workstations what Ron Baecker had done a frame at a time. One
result of this effort was Marc Brown's Ph.D. and related software. We will
present more history in a later session.
What About Us?
The reason for this course has to do with my Ph.D. (insert story of Dr. J's
Ph.D. here). The central premise of my Ph.D. is that if we build the
infrastructure needed to reduce program monitoring and visualization to
"no harder than writing ordinary applications" and then use a rapid prototyping
language suitable for research experimentation, we should be able to propel
the state of the art forward. My Ph.D. produced an execution monitoring
framework and a 2D graphics API well-suited to these goals. Since then the
monitoring framework has been improved and 3D graphics has become ubiquitous.
This semester we will find out what we can do with this framework.
About Unicon
Unicon comes from unicon.org, programs are in .icn files and are compiled
into VM bytecode, unless you use the optimizing compiler, blah blah blah.
Monitoring Framework Intro
A monitor observes events in a target program, blah blah blah.
There are two-process, one-process (callback), and thread-models.
Graphic Design of the Day: a map
Napoleon's March into Russia: proof you can
legibly plot extra dimensions atop a map.
Maps have legends to explain what's on them,
along with two primary dimensions which are
intuitively based on actual geometry.
Reading Assignment
Yes, read the purple book, let's say chapters 1,3-6 (short chapters),
OK to skim chapter 2.
Buzzwords of the Day
Volume, dimensionality, intrusion, and access. Solve these four
unsolvable problems and you've got the makings of a decent
monitoring and visualization framework.
Unicon Feature of the Day: Co-expressions
Graphic Design Principles
We need Tufte's principles in preparation for visualization work.
- show the information
- show as much as you can with as little ink as possible
- remove ink that isn't showing useful information
- remove redundant information
- revise and edit
Graphic Design of the Day: a scatter plot
Let's see what the good doctor saw in old London...
lecture 3
Graphic Design of the Day: Line Plots
Multiple dimensions of weather along a primary time axis.
Look at HW#1 submissions
Writing your first Unicon monitor
Consider the beauty and virtue of m0.icn,
m1.icn and events.icn.
Now checkout sos.evt
lecture 4
Summary Notes from HW#1
- solutions ranged from 138-335 lines. longer is not better.
Writing good Unicon is like haiku or other short poetry.
Practice toward mastery of the art. Don't settle for odd-syntax-Java.
- avoid platform-dependent colors - stick to the portable color
names (see icon Graphics Book) or use RGB's.
- avoid platform-dependent fonts - stick to mono, sans, serif, typewriter
- do not assume that the display is larger than 1024x768
- put your name in a header comment at the top of your homework
- check user input for validity, avoid crashes
Unicon Language Topic: Surprised by Failure?
Don't be like the Novice Icon and Unicon programmers that are surprised
when fallible expressions fail. Failure in this language isn't some rare
event like an exception, failure is part of every program's life.
First, you should learn enough to know
how to identify fallible expressions. Then, you should expect failure.
When to check for failure: everywhere that failure can occur, and
everywhere that failure will matter. Examples:
- comparisons are designed to fail, most folks don't miss these
- type conversions like integer() are also designed to fail
- open() and similar system functions that ask for an operating
system resource that might not be available -- check them!
- find() and similar built-ins, UNLESS you can prove data is valid
- subscripts, unless you can prove valid index ranges
lecture 5
Graphic Design of the Day
William Playfair's chart depicting area, population, and tax revenues
of countries in europe is another excellent example of depicting multiple
dimensions of data. An excerpt is given here.
The slope between the population and tax revenues points down for most
countries and sharply up for England (and less so, for Spain).
This version is based on Mike Wilder's HW#1 solution, because it had
some interesting and valuable properties.
Visualization Principles (book section 3.1.2)
- animation
- incremental algorithms are a primary means of achieving efficient
animation. instead of minimizing ink, this is like minimizing the
motion of the plotter arm, or in our case, the # of memory writes.
- least astonishment
- use the golden rectangle, labels and legends
- metaphors
- use a familiar metaphor
- interconnection
- connecting different pieces of data is key, follow Playfair's example
- interaction
- the big difference between a visualization and a paper chart or graph
is that the user can interact with the data. exploit this.
- dynamic scale
- visualizations compete for screen space and hardware varies widely.
it is extra work, but if you write everything so that it scales, your
visualization will be useful on more machines and in more ways.
- static backdrop
- one of the best ways to make dynamic data understandable is to present
it in terms of static data. An execution is an instance of the underlying
universal abstract thing that is the program.
lecture 6
HW#2 Results
Notes on HW#2 Code
- overall, submissions were fantastic
- main(av)
- av is always a list of strings; if no arguments, *av = 0
- paramnames() is a generator
- use it with every, or ask questions like "if type(x:=paramnames(...))=="list" then..."
- consider the virtues of the apply operator p ! L
- consider the virtues of every maxval <:= !L
- isn't max() a built-in function at this point? maxval := (max ! L)
- failure and success
- say "if i := find() then ...", not: "i := find(); if \i then ..."
- check for open() failure
- I asked nicely before, now I am telling you
- sticking &fail at the end of a routine is a noop
- a routine fails if it falls off its end; &fail does not return a failure.
Unlike lisp, the return value of a function is not its final expression's
evaluation.
- some folks used the
structure() function
- please be nice and tell folks how it was useful here
Graphic Design of the Day
Fisheye Views. Read Furnas' Generalized Fisheye Views paper.
Monitoring Feature of the Day
Monitoring Location (pianoroll, tiny one-pixel-per-char views of source code).
lecture 7
Announcement
If any of you are proficient Flash programmers and would like
to make some quick cash, Scott Lynch at "The Beach" in Moscow has a small
job he wants done, his number is 208-794-2354.
As we proceed into the "meat" of the course, we have a need for lots
of subject programs to study, lots of example monitors, and bigger
programs that presumably will have more complex behavior.
- Suspects
- This directory was compiled by Ralph Griswold as a collection of
interesting or weird programs whose behavior could be understood
by program visualization. The good part of the Suspects directory
is that the programs all run non-interactively, in some cases they
were modified to do so, and those that require input have sample
.dat files on which they run nicely. This lets monitors do their
thing unimpeded. We should probably add some representative
object-oriented programs to this collection this semester. I probably
can dig out my "gui recorder" and create recordings of GUI programs
so that we can monitor them conveniently in this context.
- tools
- This directory was compiled by Clinton Jeffery as a collection of
simple program visualization programs and library procedures. Many
of these codes are featured in the book, Program Monitoring and
Visualization.
- Big Programs
- The largest programs in the suspects directory are typeinfer (2.6k lines),
and yhcheng (1.9k lines). These were considered large in the Icon
language, where source codes are typically 1/3 to 1/10 the size of
C programs that do the same thing. The other largest public domain
Icon programs are in the ipl/*packs directories. Among these,
ibpag2 is 3.7k lines, itweak is 3.5k lines, skeem is 3.1k lines,
ged is 3.6k lines, htetris is 4.3k lines, vib is 4.4k lines, and weaving
is 11.3k lines (?). Monitoring these might or might not be easy, since
they may be interactive, and you might or might not know what to click
at them in order to get them to behave. The largest known Icon programs
(source not available) was Bill Wulf's testcase generator (rumored to
be on the order of a half-million lines, perhaps machine-generated.
In the Unicon language, programs are far larger on average.
The unicon translator itself is 10k lines of Unicon.
The uni/lib class library is 20K lines, and the uni/gui
GUI class library is 14.5K lines; large subsets of these libraries
may be added onto whatever the tool size is. The Unicon IDE is 17K
lines, the IVIB user interface builder is 16K lines, and so on.
Some of these you can acually monitor.
The largest Icon/Unicon programs for which I have source code
include the SSEUS database review/update system (35K lines), and a
Knowledge Representation language and system (50K lines) done by an
AT&T scientist. It might be possible to find these and monitor
them, but it would take work to set them up for monitoring.
Unicon feature of the day: Packages
Packages were added to Unicon more or less against my will,
but they are obviously of growing importance in larger scale
development. Packages are about protecting a name space from
collisions. Without them, global variables in all modules
are shared, and accidentally, these variables may conflict
with globals (and undeclared, thought-to-be locals!) in other
modules. The more libraries you use, the more inevitable these
conflicts. Proof that packages are needed is evident in the
Icon Program Library, where, after fundamental built-in functions
like "type" were accidentally assigned one too many times by
client code, Ralph Griswold got in the habit of protecting "type"
or similar built-in functions the hard way, inside each library
procedure that uses them:
static type
initial type := proc("type", 0) # protect attractive name
This gets old in a hurry, and it actually bloats code a little bit.
So anyhow, Robert Parlett implemented packages, and I accepted them,
and now they are here to stay, and they aren't bad. You do have to
know the "package" and "import" keywords, and the ::foo syntax, and
that is about it.
MiniLoc
miniloc.icn is a "miniature location profiler" as discussed in the purple
book. What is mini about it is that each source code line and column is
one pixel row and column (this is a scaling problem for larger programs,
miniloc could be rewritten to scale its graphics). The frequency of
location events at various locations is recorded using a log scale
through a range of colors from boring to red-hot. Humans
don't really perceive red as a larger # than green, but the metaphor
of a temperature map is widely recognizable anyhow.
Go ye and write a source-location-oriented visualization of "something
interesting". If it is interesting enough, we may write it up and submit
it to a conference.
lecture 8
Hani's Clever Case Tag
Case expressions in Icon use === semantics, looking for an exact match with
no type conversions. Case branches are evaluated sequentially as if one
were writing
if x === firstbranchexpr then firstcodebody
else if x === secondbranchexpr then firstcodebody
else if x === thirdbranchexpr then firstcodebody
...
If all the branch labels are constants, this is colossally inefficient
compared with a C switch statement. But, it is fully general and you
can use arbitrary expressions, including generators, for which the entire
result sequence will be generated in trying to find a match.
You can add a predicate filter on the front, or have your values supplied from
subroutines, or whatever:
case x of {
p() & q() & foo: { ... }
a | b | 1 to 10 | f(): { ... }
}
This afternoon we are about to see examples that use this generator
capability with cset event masks, as in the following, but it would
work with sets, table keys, or any other generator you wanted to write.
case x of {
...
!ProcMask: {
}
...
}
This makes for short elegant code, but it is inefficient. Generating
the individual elements out of a cset costs a type conversion (cset to
string) which isn't cheap, and all generators pay for extra bookkeeping
on the stack, for that suspending resuming capability, which is slow at
times. You are paying for convenience and generality, and a good
optimizing compiler might make some of that go away, but the VM sure
does not. In a couple minutes we will see another measure of how much
you pay. But in the meantime...
Hani showed me some code this afternoon that looked like
case x of {
...
member(a_set, x): {
}
...
}
This has probably been done before, but it surprised me so we should
talk about it. member(a_set, x) tests whether x is a member and returns
x if it is, so it is just a filter, and by the way it avoids a linear
search via a generator so it is fast. Its got a seemingly redundant
comparison of x===x after the member() test succeeds, but that is C
code and probably not too bad.
Monitoring Procedure Activity
Procedure activity is a special case of the control flow behavior of
expression evaluation. In a normal language monitoring procedure activity
would mean monitoring the stack of procedure activation records, or in a
multithread context, monitoring a set of stacks of procedure activation
records. You would perhaps observe how deep the stack gets (usually not a
problem) and might look for patterns that suggest bugs (Q: Can anyone think
of a call-return sequence that suggests a bug?). Besides correctness, you
might imagine looking for performance problems or tuning opportunities.
Monitoring Icon and Unicon is a little more complicated because procedures
can suspend and be resumed. The events for this behavior are
E_Pcall, E_Psusp, E_Presum, E_Pret, E_Pfail, E_Prem.
The "call stack" becomes a "call tree", or as
section 8.1 in the text calls it, an activation tree (a better term since
procedures can be activated by more than just calls).
You can just ask for all the procedure activity events, but if your monitor
is doing more than just counting them then it potentially will need to do
more. One way to monitor the activation tree is to build a model of the
tree itself. You can do this by hand, or your monitor can use a library
procedure named evaltree() which does it for you. (Study in detail the
implementation of evaltree on p88-89). We will look at examples that use
evaltree, but first a word on timing.
The time cost of monitoring
Monitoring costs time. If it costs too much, folks won't want to do it even
if you do make pretty moving pictures (successful program visualizations). The
instrumentation of all those events costs time even if you don't ask for the
event reports, and the event reports (co-expression switches) cost time. It
is difficult to even measure the timings of different parts of the
monitoring process. You may be able to do a good job by going into the VM C
code and using your own expertise, or using specialty tools for doing
timing, such as gprof. This discussion is just based on what I can observe
casually.
Example. In the Suspects/ directory are many candidates (which one runs the
longest?). We will consider the poetry scrambler for this example.
time ./scramble <scramble.dat
uses the UNIX time(1) command to
measure the runtime externally. It reports something like:
1.0u 0.0s 0:03 32% 0+0k 0+0io 0pf+0w
That's 1.0 seconds of user time, 0.0 seconds of system time, 3 seconds
of wall-clock observed time. Out of curiosity, since it writes out a lot
to standard out, I retime it directing output to /dev/null, and it still
takes a second of user time, but the wall clock is down to 1 second.
Now I take an almost-empty monitor, timer.icn,
and time it using the UNIX utility.
time timer ./scramble <scramble.dat
and it writes out
tp time: 1830-0=1830
em time: 0-0=0
1.0u 0.0s 0:03 30% 0+0k 0+0io 0pf+0w
What is this telling me? The time function hasn't seen any extra time spent
by the monitor (that's odd; that's bad). The monitor thinks it has spent no
time, but that the program is spending 1.8 seconds. Which times are more
accurate?
One problem with measurement is that accuracy is limited by tools of
observation and hardware/OS limitations. Another problem with measurement
is that external evironmental considerations (load average, user activity)
change results to some extent. These measurements were done on
mars.cs.uidaho.edu, a sparc Solaris machine. The "who" command
reported 5 different
people logged in at the time, although the load average was apparently low
(inactive terminal sessions).
Now, suppose Ziad says monitoring the evaltree is slow. Why might that be?
- procedure activity events are frequent. Not as frequent as line
number changes... or are they more frequent?
- each procedure activity event costs two co-expression switches
- each procedure activity event costs however much execution time
evaltree itself requires...
It would be useful to know whether the co-expression switch totally
dominates the time spent in the monitor. Although our intuition says
it does, intuition is not always correct. Evaltree costs: a big case
statement (not very efficient in Icon/Unicon), whose labels are generators
(not very efficient), whose code bodies do allocations and list operations
(pretty darned fast), and call the monitor callback procedure. One way
to do our experiment is to measure &time before and after each EvGet(),
and instead of measuring time spent in the target program, measure the
the other time, time spent in the monitor. Another way to do the experiment
is to rewrite the evaltree() functionality for speed instead of clarity, and
see if it is measurably different or not.
Compare evaltime.icn,
evaltime2.icn,
evaltime3.icn, showing an
attempt to do this experiment.
time evaltime ./scramble <scramble.dat
shows
tp time: 2760--10=2770
em time: 6670-0=6670
10.0u 0.0s 0:18 55% 0+0k 0+0io 0pf+0w
Using evaltree, the monitor is accounting for more than 2/3rds of the
time, and the time reported for the target program is much slower than
for the unmonitored or empty monitored cases. evaltime2, which skips
the evaltree mechanism but uses a big case statement, gives:
tp time: 2490-0=2490
em time: 2660-0=2660
5.0u 0.0s 0:08 61% 0+0k 0+0io 0pf+0w
Cost of monitoring is substantially lower, although the particular
details may be affected by machine load fluctuation. One would have
to run several times and take averages for the numbers to be meaningful.
Using evaltime3, which avoids the large case statement, we get
tp time: 2580-0=2580
em time: 2050-0=2050
5.0u 0.0s 0:07 70% 0+0k 0+0io 0pf+0w
At this point, monitoring procedure activity is seen to impact
execution time substantially, but at least the monitor is taking
less time than the target program. Where is the co-expression
time being charged here?
Many Morals of the story:
- the UNIX time(1) command is not very fine-grained
or precise.
- The monitoring of &time gives times in milliseconds which
might or might not be reliable, they report what the C millisec()
function returns.
- The monitoring facilities attempt to explicitly separate the
&time reported by the TP from that of the EM.
- The coding of the EM has a (surprisingly?)
large impact on the practicality of the EM. Mastering
the language and coding elegantly actually matters for EM authors.
- Co-expression switch time may dominate but not totally dominate timings.
Griswold was fond of saying that on at least one old CPU where it was
measured, the co-expression switch cost less than a procedure call
in Icon. This is probably not true for us, but co-expression costs
are not the only factor in performance and not always the primary factor.
- The evaltree.icn module might be rewritable for
much better speed.
- Icon and Unicon VM compilers need a decent case expression optimization.
iconc might already do one, I am not sure.
scat
The scat program is the first application of evaltree in the purple book.
It links in a scatterplot library which
might or might not be useful to you. It implements the log scaling
that scat uses.
$include "evdefs.icn"
link evinit
link evaltree
link scatlib
Scat uses several global variables, three tables to remember what
has been plotted, and three clones set with different colors.
global at, # table: sets of procedures at various locations
call, # table: call counts
rslt, # table: result counts
red,
green,
black
Scat uses a generic evaltree-compatible record type for modeling;
no extra payload added.
record activation (node, parent, children)
The initialization is straightforward.
procedure main(av)
local mask, current_proc, L, max, i, k, child, e
EvInit(av) | stop("can't monitor")
scat_init()
red := Clone(&window, "fg=red")
green := Clone(&window, "fg=green")
black := Clone(&window, "fg=black")
current_proc := activation(,activation(,,,,[]),[])
Control is handed over to evaltree, which calls scat_callback
with events
evaltree(ProcMask ++ FncMask ++ E_MXevent,
scat_callback, activation)
WAttrib("label=scat (finished)")
EvTerm(&window)
end
scat_callback mostly calls scat_plot, which calls colorfor to decide
what color to plot with.
procedure scat_callback(new, old)
case &eventcode of {
E_Pcall:
scat_plot(new.node, 1, 0, , colorfor)
E_Psusp | E_Pret:
scat_plot(old.node, 0, 1, , colorfor)
E_Fcall:
scat_plot(new.node, 1, 0, , colorfor)
E_Fsusp | E_Fret:
scat_plot(old.node, 0, 1, , colorfor)
E_MXevent: {
case &eventvalue of {
"q" | "\033": stop("terminated")
&lpress : {
repeat {
scat_click(proced_name)
if Event() === &lrelease then
break
}
}
}
}
}
end
Procedure proced_name returns the name of a procedure, taken from its image.
procedure proced_name(p)
return image(p) ? {
[ =("procedure "|"function "), tab(0) ]
}
stop(image(p), " is not a procedure")
end
Procedure colorofone distinguishes procedures from functions.
procedure colorofone(p)
return if match("procedure ", image(p))
then red else green
end
Procedure colorfor uses a list (of procedures/functions) to select
what color to plot. If it is not the first color choice and the
subsequent value should be a different color, resort to black.
Return a red or green if all values say to be red or all say to be green.
procedure colorfor(L)
if *L = 0 then return &window
every x := !L do {
if not (/c := colorofone(x)) then
if colorofone(x) ~=== c then
return black
}
return c
end
What is scat good for?
scat is cooler than you think. It shows not just who the hot procedures
are, it also shows what procedures always fail, what procedures generate
lots of results per call, and what procedures (predicates) generate
between 0 and 1 result per call.
algae
The flagship demonstration of the evaltree framework is a literal
visualization of the activation tree.
EvInit(av) | stop("Can't EvInit ",av[1])
codes := algae_init(algaeoptions)
evaltree(codes, algae_callback, algae_activation)
WAttrib("windowlabel=Algae: finished")
EvTerm(&window)
Algae takes command line options to say how much to monitor, how to
graphically depict the tree, etc. It deliberately chooses a simple-minded
incremental graphic, coming from a time that graphic performance was deemed
to be a likely monitor bottleneck. By default it uses hexagons for
activation records (compare hexagons with a square grid). A real but still
INCREMENTAL tree layout algorithm would be better.
procedure algae_init(algaeoptions)
local t, position, geo, codes, i, cb, coord, e, s, x, y, m, row, column
t := options(algaeoptions,
winoptions() || "P:-S+-geo:-square!-func!-scan!-op!-noproc!-step!")
/t["L"] := "Algae"
/t["B"] := "cyan"
scale := \t["S"] | 12
delete(t, "S")
if \t["square"] then {
spot := square_spot
mouse := square_mouse
}
else {
scale /:= 4
spot := hex_spot
mouse := hex_mouse
}
codes := cset(E_MXevent)
if /t["noproc"] then codes ++:= ProcMask
if \t["scan"] then codes ++:= ScanMask
if \t["func"] then codes ++:= FncMask
if \t["op"] then codes ++:= OperMask
if \t["step"] then step := 1
hotspots := table()
&window := Visualization := optwindow(t) | stop("no window")
numrows := (WHeight() / (scale * 4))
numcols := (WWidth() / (scale * 4))
wHexOutline := Color("white") # used by the hexagon library
if /t["square"] then starthex(Color("black"))
return codes
end
The real work happens in algae_callback()
procedure algae_callback(new, old)
local coord, e
initial {
old.row := old.parent.row := 0; old.column := old.parent.column := 1
}
case &eventcode of {
!CallCodes: {
new.column := (old.children[-2].column + 1 | computeCol(old)) | stop("eh?")
new.row := old.row + 1
new.color := Color(&eventcode)
spot(\old.color, old.row, old.column)
}
!ReturnCodes |
!FailCodes: spot(Color("light blue"), old.row, old.column)
!SuspendCodes |
!ResumeCodes: spot(old.color, old.row, old.column)
!RemoveCodes: {
spot(Color("black"), old.row, old.column)
WFlush(Color("black"))
delay(100)
spot(Color("light blue"), old.row, old.column)
}
E_MXevent: do1event(&eventvalue, new)
}
spot(Color("yellow"), new.row, new.column)
coord := location(new.column, new.row)
if \step | (\breadthbound <= new.column) | (\depthbound <= new.row) |
\ hotspots[coord] then {
step := &null
WAttrib("windowlabel=Algae stopped: (s)tep (c)ont ( )clear ")
while e := Event() do
if do1event(e, new) then break
WAttrib("windowlabel=Algae")
if \ hotspots[coord] then spot(Color("light blue"), new.row, new.column)
}
end
Boring square graphics:
procedure square_spot(w, row, column)
FillRectangle(w, (column - 1) * scale, (row - 1) * scale, scale, scale)
end
# encode a location value (base 1) for a given x and y pixel
procedure square_mouse(y, x)
return location(x / scale + 1, y / scale + 1)
end
A whole new meaning for the term "graphical breakpoints":
#
# setspot() sets a breakpoint at (x,y) and marks it orange
#
procedure setspot(loc)
hotspots[loc] := loc
y := vertical(loc)
x := horizontal(loc)
spot(Color("orange"), y, x)
end
#
# clearspot() removes a "breakpoint" at (x,y)
#
procedure clearspot(spot)
local s2, x2, y2
hotspots[spot] := &null
y := vertical(spot)
x := horizontal(spot)
every s2 := \!hotspots do {
x2 := horizontal(s2)
y2 := vertical(s2)
}
spot(Visualization, y, x)
end
User input handling:
#
# do1event() processes a single user input event.
#
procedure do1event(e, new)
local m, xbound, ybound, row, column, x, y, s
case e of {
"q" |
"\e": stop("Program execution terminated by user request")
"s": { # execute a single step
step := 1
return
}
"C": { # clear a single break point
clearspot(location(new.column, new.row))
return
}
" ": { # space character: clear all break points
if \depthbound then {
every y := 1 to numcols do {
if not who_is_at(depthbound, y, new) then
spot(Visualization, depthbound, y)
}
}
if \breadthbound then {
every x := 1 to numrows do {
if not who_is_at(x, breadthbound, new) then
spot(Visualization, x, breadthbound)
}
}
every s := \!hotspots do {
x := horizontal(s)
y := vertical(s)
spot(Visualization, y, x)
}
hotspots := table()
depthbound := breadthbound := &null
return
}
&mpress | &mdrag: { # middle button: set bound box break lines
if m := mouse(&y, &x) then {
row := vertical(m)
column := horizontal(m)
if \depthbound then { # erase previous bounding box, if any
every spot(Visualization, depthbound, 1 to breadthbound)
every spot(Visualization, 1 to depthbound, breadthbound)
}
depthbound := row
breadthbound := column
#
# draw new bounding box
#
every x := 1 to breadthbound do {
if not who_is_at(depthbound, x, new) then
spot(Color("orange"), depthbound, x)
}
every y := 1 to depthbound - 1 do {
if not who_is_at(y, breadthbound, new) then
spot(Color("orange"), y, breadthbound)
}
}
}
&lpress | &ldrag: { # left button: toggle single cell breakpoint
if m := mouse(&y, &x) then {
xbound := horizontal(m)
ybound := vertical(m)
if hotspots[m] === m then
clearspot(m)
else
setspot(m)
}
}
&rpress | &rdrag: { # right button: report node at mouse loc.
if m := mouse(&y, &x) then {
column := horizontal(m)
row := vertical(m)
if p := who_is_at(row, column, new) then
WAttrib("windowlabel=Algae " || image(p.node))
}
}
}
end
Calculating which activation a given click refers to:
#
# who_is_at() - find the activation tree node at a given (row, column) location
#
procedure who_is_at(row, col, node)
while node.row > 1 & \node.parent do
node := node.parent
return sub_who(row, col, node) # search children
end
#
# sub_who() - recursive search for the tree node at (row, column)
#
procedure sub_who(row, column, p)
local k
if p.column === column & p.row === row then return p
else {
every k := !p.children do
if q := sub_who(row, column, k) then return q
}
end
A similar calculation for placing new nodes
#
# computeCol() - determine the correct column for a new child of a node.
#
procedure computeCol(parent)
local col, x, node
node := parent
while \node.row > 1 do # find root
node := \node.parent
if node === parent then return parent.column
if col := subcompute(node, parent.row + 1) then {
return max(col, parent.column)
}
else return parent.column
end
#
# subcompute() - recursive search for the leftmost tree node at depth row
#
procedure subcompute(node, row)
# check this level for correct depth
if \node.row = row then return node.column + 1
# search children from right to left
return subcompute(node.children[*node.children to 1 by -1], row)
end
How to use Clone()
#
# Color(s) - return a binding of &window with foreground color s;
# allocate at most one binding per color.
#
procedure Color(s)
static t, magenta
initial {
magenta := Clone(&window, "fg=magenta") | stop("no magenta")
t := table()
/t[E_Fcall] := Clone(&window, "fg=red") | stop("no red")
/t[E_Ocall] := Clone(&window, "fg=chocolate") | stop("no chocolate")
/t[E_Snew] := Clone(&window, "fg=purple") | stop("no purple")
}
if *s > 1 then
/ t[s] := Clone(&window, "fg=" || s) | stop("no ",image(s))
else
/ t[s] := magenta
return t[s]
end
lecture 9
3D Graphics Facilities
Known Additions to the 3D Facilities, at least semi-implemented:
- blending texture and foreground/material property
- "buffered 3D mode"
- WSection
- JPEG textures, preliminary PNG support (on Linux)
- dynamic textures
- preliminary transparency support
- meshmode attribute for FillPolygon
- slices and rings attributes for changing the cost and precision of spheres and cylinders
- subwindows
- freetype fonts (needs further test-and-port work)
- tr := Texture(); ...; Texture(tr) to re-use a texture
lecture 10
"Open Mike" Night: HW#3 Demos
You get to demo your stuff in front of a supportive audience.
Graphic Design(s) of the Day: Tukeys' Multiwindow- and Box-Plots
And Tufte's Data-ink maximization of box-plots.
GUI Monitors
Some of you have already written homeworks that involved GUI's,
but for most of you, some explanation and reinforcement are needed.
Unicon has a GUI class library, written by Robert Parlett, that
has extraordinary capabilities. Although I would like to say that
GUI's are amazingly simpler in Unicon than in other languages, it
is more honest to say that GUI programming in Unicon has a learning
curve comparable to GUI programming in other languages.
Step #1 in GUI exploration is usually to get familiar with the interface
builder program; in our case that is IVIB. (Demo of IVIB goes here).
IVIB generates code that looks like this.
Note that the 70-line application creates a dialog and calls show_modal(),
and for a normal VB-style app you then fill in the method bodies for
whatever events you've requested. For normal applications, it is not
necessary to understand much of the scaffolding in this file and the
large classes you inherit behavior from. Note that there is a Unicon
Technical Report, UTR#6, which tries to teach the IVIB basics.
IVIB let's you draw a GUI and generates the code for you. For a program
execution monitor the main question will be: how to merge the event streams,
or how to merge the event processing loops, from the GUI and from the
monitored program's events. To accomplish this, you need to know more about
the underlying GUI classes.
There are in fact a total of 3 classes that most Unicon GUI programmers
need to become semi-comfortable with: Component, Dialog, and Dispatcher.
Component is superclass of all basic visible GUI elements in an application:
buttons, sliders, lists, editable text boxes, and so on. Components are
generally organized hierarchically -- they form a tree in Venn diagram style,
with larger background components containing smaller more active components.
A Dialog is a component that constitutes the root of some window -- it owns
a window and therefor can receive input events, which it then needs to route
down the tree to the correct leaf. The Dispatcher class handles the actual
event-processing loop, allowing for multiple dialogs, and wall-clock time
events in addition to GUI events.
In order to merge the Monitor and GUI event streams, we might do one of
the following:
- keep the monitor event loop primary, and poll for GUI events (!)
- keep the GUI event loop primary, and peridically read monitor events (!)
Note that there is no way to select() from between GUI and monitor or
poll both, because to ask for an EvGet() is to transfer control to the
target program (freezing the GUI of the monitor until an event occurs).
However, you can call EvGet() with an E_Tick along with your other events
if you want to be sure to regain control periodically even if the other
monitored events do not occur for long periods... then your only danger
is: what if the target program that you are monitoring chooses to block
on some input it wants to read?
Additional notes on GUI-monitors:
- "piano.icn" had been doing its own input event processing, with
E_MXevent at the top level monitor loop and nested loops calling
Event() whenever a "breakpoint" was in place.
- can't call Event() cavalierly on your own in the middle of your app --
or GUI won't respond any more. GUI owns input processing, and calls
you when a component gets an event.
- how does one "pause" or "single step" in a GUI environment? The GUI
is not allowed to freeze. You cannot call EvGet(E_MXevent) to freeze
the program, you had better not call EvGet() at all.
lecture 11
Graphic Design of the Day
CASSE POSTALI DI RISPARMIO ITALIANE by
Antonio Gabaglio, via the revered Tufte, and cited in
a nice discussion of cyclic data, apparently by Benj Lipchak.
Unicon 3D: Unfinished Business
Mesh modes?
These values determine how lists of vertices are interpreted by OpenGL.
There is an attribute meshmode, set via WAttrib(w, "meshmode=value") where
the legal values are
- points
- lines
- linestrip
- lineloop
- triangles
- trianglefan
- trianglestrip
- quads
- quadstrip
- polygon
However, in a trivial test,
the mesh modes did not work!
They probably did for the grad student who implemented them...
but without a working test/demo they remain undocumented/unfinished business.
Minimally, you might expect that I'll have to put out some fixed Unicon
sources and/or binaries for you before these will work. You are welcome
to try them and find out of things are better than I report.
Transparency?
This feature of OpenGL determines to what extent light can go through
a substance, or to what extent objects behind it can be seen through it.
Color names, set via Fg(color) or WAttrib(w, "fg=value") can include a
diapheneity. The legal transparency adjectives are
- transparent
- subtransparent
- translucent
- subtranslucent
- opaque
This feature is implemented. In a trivial test
it appears to work. However, in testing it a seeming bug was identified
in the color attributes: when you set the fg= attribute with a simple
color it sets the diffuse value for that material property but apparently
does not reset or disable the other lighting colors (specular, ambient,
emission), which may give surprising results. Also:
it is not clear that transparency works correctly on all primitives yet; for
example, the last time I checked, either cubes or maybe filled polygons
looked not as transparent as they ought, because backfacing polygons weren't
transparent.
Monitoring Memory Allocation and Collection (book ch. 9)
(Heap)-based memory allocation is one of the simpler and yet very
interesting forms of behvior that we can monitor. Allocations in
Icon/Unicon are kept as cheap as possible, but it some programs they
still play a major role, especially when code does them by accident,
or does far more memory allocation than is needed for a problem.
Garbage collection is usually pretty fast -- we don't usually go for
coffee when the GC message hits the console, like old Lispers -- but
if a program is garbage collecting a lot (thrashing) it can significantly
impact performance. How can we measure whether allocation appears
excessive or garbage collection seems too frequent?
(Per the book, examine a series of memory allocation monitors.)
Mempie
lecture 12
Graphic Design of the Day
Procedure-grained flow graphs and the comet metaphor.
Kaestle, Fooscape, and Song Liang's Cata. A peek at
some old student projects and
Ralph Griswold's notes.
Mempie finds a bug
We noted last time that mempie and napoleon were drawing very different
pictures, and that one of them must be wrong. The bug was in the MS Windows
implementation of the FillArc function (our C code, not Win32) -- when the
"extent" (angle) of the arc approaches 0, and the calculated start and end
points become the same pixel, Win32 interprets that as a request for a complete
circle.
Griswold's claim examined
Ralph Griswold liked to claim that co-expression activations were about the
same speed as procedure calls in Icon... and this matters a lot for
execution monitors based on co-expressions, so I re-examined this claim with
the following program:
procedure main()
t1 := &time
every i := 1 to 10000000 do p()
write("10000000 calls: ", &time - t1)
ce := create |1
t2 := &time
every i := 1 to 10000000 do @ce
write("10000000 @: ", &time - t2)
end
procedure p()
return 1
end
The results (on Linux x86_64) seem to suggest that co-expression activations
are quite cheap, only 25% slower than procedure calls
10000000 calls: 6210
10000000 @: 7920
Synchronous threads are a lot cheaper than true concurrent threads!
Playing with a mac implementation earlier this semester, I plugged in
a pthreads-based co-expression switch available from the current Icon
language implementation, and it was an order of magnitude slower...
More memory monitors: mini-memmon and nova
Check out mmm, nova
and oldnova. You should look at them as
unfinished prototypes of the type of tool that your HW#4
should consist of.
lecture 13
Apologies
My apologies, but there will be no midterm exam for this course.
Instead, there is now posted a homework #5
and I want your work on this to be good.
A more honest mmm
In the process of giving mmm a fix, I wound up searching high and low...
to find my own bugs
Monitoring String Scanning (Ch 10)
Icon's string scanning control structure has a very natural depiction,
that of a progress bar or pointer working its way through a string.
Issues include: how to abstract/scale a very large number of operations,
how to depict backtracking, how to depict nested scanning environments
(which might or might not involve analysis of a substring of the enclosing
scanning environment).
Some programs use scanning a lot -- they are mostly string scanning -- and
others do not use it at all.
The ScanMask events include E_Snew, E_Sfail, E_Spos, E_Ssusp, E_Sresum, E_Srem.
E_Spos events are the most frequent. Compared with procedures, what is
missing?
For what its worth, evaltree() can model scanning environments just like it
does procedure call activity. It can also model built-in functions and
operators; all expressions can be modelled as call/ret/susp/resum/fail/rem
Now for a deep-thought question: what kinds of graphic depiction emphasizing
what kinds of behavior would make for a genuinely useful string scanning
visualization?
Monitoring Structures and Variable References (Ch 11)
The monitoring framework has fairly thorough instrumentation for
the built-in data structures of the language -- lists, tables,
records and sets. These one-level structures all support implicit
reference semantics, are routinely composed into big multi-level
structures such as trees and graphs.
What we learn from the simple list visualizer:
- There are basic events for list construction, shape changes, and
accesses.
- lists are highly variable in size, frequency of access, and frequency
of structural change
- many lists are complex structures almost entirely unnoticed by a tool
that visualizes all lists as arrays.
- many or most lists are really just internal glue (non-root)
- many lists are uninteresting, there should probably be a threshold
beneath which no screen space is allocated (what should an empty list
look like?)
What we learn from the structure spy
- It is quite possible to infer structures from provided events
- Many programs will have 1-2 huge structures and dozens or hundreds
of small ones.
lecture 14
mKE/mKR: the Largest Publically Available Unicon Program
It has its own website. It is a knowledge representation engine with its own
knowledge representation language built-in. It is developed by a (now retired)
AT&T scientist. It is something like 50K LOC. Let's study it.
Monitoring Variable References
Variable use is arguably one of the most important aspects of program
behavior, but it is easily overlooked. Some programs are primarily stack,
some primarily heap (especially, e.g. OOP programs), while some programs
use primarily static / global data layout.
What do we want to know about variables?
- What proportion of data is static/global, stack, or heap?
How can these be measured?
- What data type they hold; whether they ever change type
- Scope: From where-all are they read? From where-all are they assigned?
- Lifetime: are they short, medium, or long-lived?
- Frequency: are they heavily referenced?
- Dependence: are they aliases for data held under other, primary names?
Are they pointers into the middle of a larger structure, e.g. for
traversal?
Gnames shows you all your global data; variable names are written out,
color coded by their type. If you click on a variable name, up pops a
window showing that variable's details. Bugs and limitations:
- gnames should continue to support interaction after a program terminates,
so you can view variable state posthumously.
- gnames should (maybe) issue a breakpoint if a non-null variable
changes type.
- gnames should (maybe) highlight variable assignment and dereferencing,
for example flashing black (or white) for a brief time
lecture 15
vars is a local variable visualizer, it shows each activation record in a
manner similar to gnames. There is a strong scalability limit here which
vars does not solve; some programs it depicts well, others it does not.
It is more proof of concept/demonstration than finished and working tool.
Also, at present it has bad bitrot.
Under the Covers of the evinit library
EvInit(av) and EvGet(mask) are not always entirely what they seem.
They live in evinit.icn
and have some features tailored to allow multiple monitors to share
the observation of a program execution, which we will discuss in detail
in a couple more lectures. The main thing for you to know for today is:
EvInit() checks if the monitor's &eventsource is already initialized (by
a parent monitor who could pre-assign the value of &eventsource), and
if so, it does not load anything, it just requests events from its
&eventsource.
We might want to develop a similar architecture for windows!
Monitors that use 2D or 3D graphics might want to check and see
if their &window is already set, and if so, just draw to it
instead of opening a new window. This would allow a GUI for a
debugger or multi-visualization tool to allow independently-compiled
visualizations to "plug in". Of course, for it to work well, such
a model would need to cover how to handle window resizing, and how
to handle input by various tools. Subwindows, and subwindow resizing,
are more or less adequate to this task.
Monitor Coordinators (Chapter 12)
Basic premise: the Alamo architecture is intended to reduce the difficulty
of writing monitors. Monitors are easier to write if they are simpler and
smaller, and look for specific behaviors. But, we want to be able to
monitor several aspects of behavior for a given execution, and potentially
we want to look for interactions between behaviors. A monitor coordinator
is a monitor that hosts the execution of the target program under the
observation of multiple monitors.
Eve
The reference implementation monitor
coordinator is called Eve (eve.icn).
Eve is probably my last remaining "old Icon GUI" program, and needs
to be rewritten using the modern GUI class library.
It also looks like it has never been run on Windows. :-(
Eve configuration
Eve reads in a list of monitors from a ~/.eve file in the format:
"title" command line
For example:
"Line Number Monitor" /home/jeffery/tools/piano
"UFO" /home/jeffery/tools/ufo
"Algae" /home/jeffery/tools/algae
"Big Algae" /home/jeffery/tools/algae -func -op -step -S 48
"Memory bar chart" /home/jeffery/tools/barmem
"Global variables" /home/jeffery/tools/gnames
"Local Variables" /home/jeffery/tools/vars
"Lists" /home/jeffery/tools/tinylist
"Minimemmon" /home/jeffery/tools/mmm
"Miniloc" /home/jeffery/tools/miniloc
"Scat" /home/jeffery/tools/scat
"String scanner" /home/jeffery/tools/ss
From this datafile, eve draws an opening window that allows selection
of which monitors you want to run (selectEMs).
Eve's Global State
-
unioncset
- cset mask that is union of all monitor masks
-
EventCodeTable
- table of lists; keys are event codes, values are
"list of interested monitors"
Monitor State
This thinly-veiled "class" holds eve's knowledge about the monitors it loads.
"prog" is the actual loaded program (a co-expression value), while "mask" is
the program's event mask -- what it returned from its last EvGet().
record client_rec(name, args, eveRow, prog, state, mask, enabled)
#
# client() - create and initialize a client_rec.
#
procedure client(args[])
local self
self := client_rec ! args
if /self.name then stop("empty client?")
self.prog := load(self.name, self.args) | stop("can't load ", image(self.name))
variable("&eventsource", self.prog) := ¤t | stop("no EventSource?")
variable("Monitored", self.prog) := &eventsource | stop("no Monitored?")
/self.state := "Running"
/self.mask := ''
/self.enabled := E_Enable
return self
end
Initialization
After selecting monitors to run, eve has to load them all, and then
activate them all, running them up until their first EvGet() call.
Their EvInit's will be disabled by eve's having already set their
&eventsource. After their first EvGet() call, eve will register
them on the "list of interested monitors" for each of the event
codes in their mask.
every i := 1 to *clients do
clients[i].mask := @ clients[i].prog
Event Forwarding
event(code, value, recipient) - sends a (monitoring framework) event,
where code defaults to &eventcode and value defaults to &eventvalue.
In retrospect, this is a poor choice of function names. Note that
event() allows any value to be sent, not just what the EM requested
in its event mask, and not even limited to 1-letter string codes.
Eve's Main Loop
procedure mainLoop()
while EvGet(unioncset) do {
#
# Call Eve's own handler for this event, if there is one.
#
(\ EveHandlers[&eventcode]) ()
#
# Forward the event to those EM's that want it.
#
every monitor := !EventCodeTable[&eventcode] do
if C := event( , , monitor.prog) then {
if C ~=== monitor.mask then {
while type(C) ~== "cset" do {
if C === "abort" then fail
#
# The EM has raised a signal; pass it on, then
# return to the client to get his next event request.
#
broadcast(C, monitor)
if not (C := event( , , monitor.prog)) then {
unschedule(monitor)
break next
}
}
if monitor.mask ~===:= C then
computeUnionMask()
}
}
else {
unschedule(monitor)
}
delay(6 < delayval)
}
end
lecture 16
Papers for the Rest of the Semester
Timeslots:
- Apr 3 - Nathan, The Paradox of Software Visualization
- Apr 8 - J Al Gharaibeh, X3D for Software Visualization
- Apr 10 - Wilder, Visualizing Dynamic Memory Allocations
- Apr 15 - Wilder, Software Systems as Cities; Eklund, 3D Representations
- Apr 17 - Z Sharif, Omniscient Debugging, P Nathan, Alg. Anim. using Shape Analysis
- Apr 22 - H Bani Salameh, Memory Graphs; Eklund, Infoviz Using Game Engines
- Apr 24 - J Al Gharaibeh, Visualizing Live Software Systems in 3D, H Bani Salameh, KScope
- Apr 29 -
- May 1 - Z Sharif, Evolve
How many papers do we have time to discuss? Let's have
each person present 2. There are really Many sources for
software visualization research papers, but let's say that
the main ones are ACM SOFTVIZ and IEEE VISSOFT. These
every-other-year conferences were in lock-step for awhile,
but may have moved to the alternating year from each other
so that there is a software visualization conference each year
(how nice).
From OOPSLA 2007
From VISSOFT 2007
From the SOFTVIS 2006 conference
-
X3D Software Visualization, by Ainslow, Marshall, Noble, and Biddle (presented by Al Gharaibeh)
- Metaphor-Based Animation of OO Programs, by Sajaniemi, Byckling, and Gerdt (presented by Jeffery)
- Visualizing Live Software Systems in 3D, by Greevey, Lanza, Wysseier (presented by Al Gharaibeh)
- Mondrian: An Agile Information Visualization Framework, by Meyer, Girba, and Lungu
From the SOFTVIS 2005 Conference
From VISSOFT 2005
- The Paradox of Software Visualization, by Reiss (presented by Paul)
From the SOFTVIS 2003 Conference
- 3D Representations for Software Visualization, by Marcus, Feng, and Maletic (presented by Eklund)
- EVolve: An Open Extensible Software Visualization Framework, by Wang, Wang, Brown, Driesen, Dufour, Hendren, and Verbrugge (to be presented by Ziad)
- A System for Graph-Based Visualization of the Evolution of Software, by Collberg, Kobourov, Nagra, Pitts, and Wampler (presented by Jeffery)
From VISSOFT 2003
- Techniques for Reducing the Complexity of Object-Oriented Execution Traces, by Hamou-Lhadj and Lethbridge
- ADG: Annotated Dependency Graphs for Software Understanding
- Source Viewer 3D (sv3D): A System for Visualizing Multi Dimensionial Software Analysis Data
- MetaViz Issues in Software Visualizing Beyond 3D, by Rilling, Wang, and Mudur
-
KScope: A Modularized Tool for 3D Visualization of Object-Oriented Programs, by Davis, Pestka, and Kaplan (Bani Salameh)
From the Dagstuhl seminar, May 2001
- Visualizing the Execution of Java Programs, by Wim De Pauw, et al
-
Visualizing Memory Graphs, by Zimmermann and Zeller (Bani Salameh)
From Software Visualization: Programming as a Multimedia Experience
- A Menagerie of Program Visualization Techniques, by Jeffery
- Algorithm Animation Using Interactive 3D Graphics, by Brown and Najork
- ZStep95: A Reversible, Animated Source Code Stepper
- Visualization of Dynamics in Real World Software Systems, by Kimelman et al
From IEEE Visualization 94
- Strata-various: multi-layer visualization of dynamics in software system behavior, by Kimelman et al
From the 6th New Zealand CHI conference
Semester Project Topic Ideas
The perfect semester project would be a tool that...
- is actually useful, to someone
- is usable on any (Unicon) program; is useful on programs having some
common property X
- does some actual analysis of the events to extract higher level semantic
information
- is scalable; can be run on at least medium sized programs, and preferably
large ones
- depicts information in a way that is easily and rapidly interpreted
correctly by ordinary humans; contains legends or axes or metaphors
or a help system that enables users to understand what they are looking at
Where to get your ideas:
- Previous homeowork assignments suggested many possible projects
that looked interesting but were too hard to attempt as a HW
- Your own intuitions about what ought to be possible to visualize
- Your readings of the research papers
lecture 17
Final Project Demos
In class in the scheduled final examination period, Monday May 5, 3-5pm.
Each student will have around 20 minutes including setup and tear-down.
Graphic Design of the Day
A note on lying in charts and graphs;
Thoughts on visualizing large-ish trees in 2D and 3D.
Tool of the day: redconv
Redundant conversion catcher. Even if conversions are not redundant,
they may be an indicator of a bug or a performance problem. When is a
conversion "unhealthy"?
Reading assignment for today's lecture
Generally, after you pick your paper and dates, we need to pass out the
reading assignments ahead of time, with either hyperlinks or printed
copy of what is to be read. So for example, we have two papers so far
assigned that Ziad will be presenting. Also: for
each paper/presentation there are some specific questions I'd like you
to think about:
- What data domain(s) is the described system able to observe?
- What analysis does the described system perform?
- What visualization or novel data presentation techniques are employed, if any?
Rube
- idea: users should develop their own (visual) metaphors.
- 3d, web-based
- Separate geometry from inter-object semantic relations
- Model Fusion Engine merges object geometry and dynamic behavior models
into a 3D scene (VRML scene file).
- generates X3D
Rube methodology
- choose system to be modeled
- select structural and dynamic behavioral model types
- choose a metaphor
- define mappings/analogies
- create model
Example: a lightbulb is to be modeled. A finite state machine is chosen
to model the bulb. S1=disconnected, S2=off, S3=on.
For each different dynamic model type, there may be any number of defined
visual metaphors, or a programmer may wish to create a new one. A "water
tank" metaphor for a finite state machine would "fill the tank" of whichever
state the machine is in, and the water would be pumped over to a different
tank whenever a transition to a new state occurs.
In a gazebo metaphor, a person would indicate the state, and a transition
would be depicted by that person walking.
Rube Summary
-
There are benefits to a visualization system that supports 3D models and
external tools. The benefits include richer, reusable visual metaphors, and
better portability.
-
lecture 18
Quote of the Day
The fastest way to a million-line program is through the Clipboard.
Copied code is like cancer for software.
Graphic Design of the Day: Kiviat Diagrams
One way to represent many-dimensioned data is to lay out the dimensions
around a circle; the 2D shape (and its degree of circularity or lack
thereof) tell you something about which dimensions are interesting.

Kiviat diagram for software quality. Source: geeks with blogs, via google image
Kiviat diagrams are easy to criticize. There are problems with the relative
scales of dimension; do you reduce them all to 0.0-1.0 ranges, or not? There
are problems to identify normal or acceptable ranges of values. There are
problems that adjacent dimensions don't really have any more connection with
each other than remote dimensions, but the Kiviat makes them look like they
do. The area inside the Kiviat shape is really meaningless.
HW#5 statii
I have still not received some of your homework #5's.
What About the Dynamic Analysis?
You-all have been too polite, perhaps, to ask the question above.
There are no doubt different definitions, but here is a paper for you
to read on the subject:
According to Ball, dynamic analysis has the following properties compared
with static analysis:
- precision of information; derived from 1+ actual program run(s)
- input-centric mentality; shows dependence of internal behavior
on particular inputs of a given execution
Ball's paper mentions two particular types of dynamic analysis, out of myriads:
- frequency spectrum analysis
- analyze frequencies of different kinds of events, e.g. to identify related
computations
- coverage concept analysis
-
FSA
- low-frequency operations are generally at higher-levels of abstraction
- frequency clusters -- if foo and bar are both called 1033 times, there
is probably a connection
- frequencies that match a program's input or output domain may reveal
portions of the program related to input or output.
- frequencies can tip you off regarding the big-Oh complexity of an
algorithm
CCA
-
coverage profile
- profile of what was executed (no frequency info)
-
concept analysis
- (T, E), T a set of tests and E a set of program entities,
is a concept if every test in T covers all of E and no test not in T
covers all of E.
Given a (boolean) table showing all the tests and entities, Ball points out
that you can form a concept lattice, and that the concept lattice shows
control flow relationships within 1+ actual executions, analogous to the
kinds produced by control flow static analysis.
More Dynamic Analyses
OK, so where do we find more examples of dynamic analysis?
Here are some of Dr. J's notions of examples of interesting dynamic analyses.
- statistical
- Summarizing data by accumulation or averaging to give the big picture.
_ FSA seems to be an example of statistical analysis.
- pattern-of-interest
-
parsing event patterns to find bugs, or even just to find items of
interest. note that event pattern parsing must carefully define its domain,
skipping over events that don't effect the pattern match. note also that
event pattern parsing will usually be done non-deterministically and maybe
in a massively-parallel model
- higher-level-events
- one variant of the pattern-of-interest notion is to identify events at
a higher semantic level, such as aggregates of lower level events, or
application domain events
- categorization
- figuring out when a class implements a stack, or is using dynamic
programming, or whether it employs a feature for which a specialized
tool is available
- profiling; coverage
- treating hotspots and coldspots specially; for example the former deserve
extra performance tuning monitors, while the latter deserve extra
typographic paranoia monitors
lecture 19
Reading assignment
For Thursday, read Reiss' Paradox paper.
Hey, did you notice that there is an "information visualization wiki"?
Interesting...
This is a "short paper" pointing out an interesting tool with lots of
ideas to think about.
- millions of lines of unfamiliar code
- to add a feature, one must
- identify the relevant "entry points"
- read the source code
- current IDE's poorly suited to this task
- to follow the calls, one is switching constantly between files
- the source navigation tree does not show connections, does not
emphasize the files relevant to the feature under study, and
does not scale well to hundreds/thousands of files.
- no context for navigation, have to go-and-see, can't see-and-go
- idea: use dynamic call graph data to organize navigation activity
- similar a dynamic tracing facility...but the IDE uses the data to
emphasize or structure the navigation bar to the relevent code
automatically.
- superimpose the call graph structure on the source code views
- present a perspective-wall-like view of the call graph...
- apply level-of-detail techniques; present more information for the
nearer / focus nodes where there is space for it.
lecture 20
Nate's Structure Monitor
Simple graphics, reminiscent of Playfair's classic graphic design.
Ya, it is a cheap trick, but it works.
Paul Nathan presents comments on Steve Reiss' Paradox of SV paper
- SV conference pub is a poster abstract; Finnish author has
written some other related papers.
-
Context is novice programmer education, a perpetually popular
SV area.
- Project was done as Flash animations.
- watch panel metaphor for instances
- role metaphors for member variables
- blueprint for class, found in a blueprint book;
blueprint page will visually depict methods, which don't show
on the watch panel
- workshop for method invocation, workbench for its result (lame)
- method call is also visualized as a envelope ("message passing") that
delivers parameters to the watch panel
- object references use a "pennant" metaphor; color is used to match.
No pennants = garbage to be collected
- variable roles include: fixed value, organizer, stepper, most-recent holder, one-way flag, most-wanted holder, gatherer, container, walker, follower, temporary, other
lecture 21
X3D for Software Visualization
Mondrian
Viz tools conflict: gnuplot generality of reading file formats vs.
Alamo-style run-time access to original data. Mondrian sez: instead
of moving the data to the viz tool, move the visualization tool to the
data. Provide not a file format, but an interface and allow a declarative
script to specify the visualization. Work directly with the objects in the
data model. Let the programmer visualize what they are doing in their
environment/tools. SmallTalk-based tools trying to be relevant to a
non-SmallTalk world.
Challenges for InfoVis Engines
vis. engine should be domain independent
visualizations should be composed from simpler parts
visualization should be definable at a fine grained level
instance-based, not type-based; sometimes different instances
of the same type play different roles
minimize object-creation overhead
vis. works off a model of a running system, but instead of
duplicating objects in the system, how about using them directly?
visualization description should be declarative
compare w/ Tango, Dance, and UFO for that matter
Other Mondrian Highlights
- Declarative Syntax which look like...
-
view nodes: model classes using: Rectangle withBorder
forEach: [:eachClass | eachClass viewMethodsIn: view]
- Screen-Filling System
-
Mondrian has a lot of structures to visualize simultaneously...
And it has structures that are too wide to fit the window.
- Built on top of Moose
- You just know it has to be good.
- Interesting Mention of CodeCrawler
- "visualizations of combined metrics and structural information"
lecture 22
Visualizing Dynamic Memory Allocations
JIVE (Java Interactive Visualization Environment, Gestwicki et al)
- multiple concurrent representations
- reverse execution
- graphical queries
Major requirements:
- depict objects as environments. method calls happen inside one.
- multiple views. different granularities. detailed view and compact view.
- histories - of execution, of method interaction... show sequence or
collaboration diagrams (how do they address scalability? From Figure 1
the answer initially seems to be: they don't; from Figure 2 one answer
is, things shrink down to points). This is
not summary statistics, it is timelines and such
- forward and backward execution. state-saving model. big big logs.
- queries on the runtime state. when did a variable change; or when did
it achieve a certain value
- clear and legible
- use the stock JVM
- be able to visualize programs with GUI's!!
Graphic design: simple, relatively easy to understand, scales poorly
(minimal "visualization" involved, maximum IDE/debugger-like feel)
Analysis: hardwired, except that it supports a range of queries. What is
the query language?
Implementation: Two-process model, supports multiple threads so long as
only one runs at a time. Log file coupled with "in-memory"
execution history database. Events are able to commit and un-commit
themselves.
7 event types: static context creation, object creation, method call,
method return, exception thrown/caught, change in source line, and
change in variable value.
Stepping backward does not modify the client program, it is suspended
until you get back to the current state and move forward. (Means: you
can't modify the past, but maybe you can modify the present).
Queries: on program history; may return values, sets of states,
or portions of program history. Visual representation of program
states and program history means queries and results may be done
graphically. Queries vis-a-vis variables in single instances or classwide.
No evaluation of scalability or effectiveness of using UML-like depictions.
JPDA
Earlier there were the JVMDI and the JVMPI; now there is the JPDA.
JIVE lives with whatever the JVM dishes it. JPDA includes the JDI
(Debug Interface), JDWP (Wire Protocol), and JVM TI (Tools Interface)
which replaced JVMDI/JVMPI.
"remove view of a virtual machine in the debuggee process".
theStackFrame.getValue(theLocalVariable)
... transmitted via a socket / JDWP ...
jvmti->GetLocalInt(frame, slot, &intValue)
... result transmitted back...
lecture 23
Visualizing software as cities; 3D "visualizations" using barcharts...
lecture 24
Final Project Presentations
Next Monday 3-5pm, except Nate, who is going this Thursday.
5 students in 120 minutes, hmm, that's 24 minutes per student.
Figure you will be allowed at most 24 minutes. You can come
in under that.
Paul Nathan was kind enough to share this link.
For me, this paper is mainly eye-candy, but it is another representative of
the class of visualizations that are geared towards understanding the changes
in software over time, the same perspective the authors of the
visualizing-software-as-cities paper took. It is not the here and now of
a current execution, it is the view across the ages.
This paper says it is all about filtering techniques, which makes it
potentially important.
Execution traces are very large, and very redundant. The analysis used
in Visualization abstracts and filters before it starts drawing lines.
Figure 2 of this paper gives a nice toy example in which a tiny duplication
is removed; now scale it up many orders of magnitude.
Idea of multiplicity; how about regular expressions to describe multiplicity?
A->B*-*gt;C*D
Removing "utilities": constructors/destructors, accessor methods, utility
and library classes. Potentially many incoming edges, with few or no
outgoing dependencies.
Polymorphic methods: execution tree differences can be ignored when the
abstract function performed is understood.
Visualizing Software Executions as Populated, Dynamic Cities
- integrate CVS logs, bug tracker, static analysis, runtime data
- do this for Unicon, with mix of available and (new, needed) tools
- push "city" metaphor much farther than in previous papers
- overcome various fatal flaws with the whole city metaphor.
Dr. J's fatal-flaw view of visualizing software as cities: many or most
(especially OO) programs are understood largely through their relationships
between classes and between instances. Software as cities doesn't
automatically manage to depict such relationships at all. It got as far
as colocating classes in the same package.
- Classes are buildings, sure
- height=# methods, width=#variables, length=(log of) longest code.
Privates below ground.
- What is the model of time?
- Today = current execution run. CVS repositories and previous execution
logs make for remembrances of things past.
- Limited ("Prince of Persia") backwards-in-time capability?
- I think limited-reversible is better than no reversible, and is
more scalable than full-reversible. Limited reversible may mean,
if you go back past a certain point, you'll not be able to see as
many details, or change the execution from that point. Assuming
we are collecting fairly detailed traces, you can go backward
farther than that in a replay-only mode.
- How to represent procedures
- treat like a class w/ 1 method. Lotta procedures = village.
- How to represent instances
- As people? Library instances as robots? Garbage as undead?
There was an idea of a Garbage Collector going around blasting
the undead while a viewer watches or helps...
- How to represent atoms
- Not at all? As text? As virtual books (strings), hammers?? (ints) and
saws?? (reals)? What about tables and lists? Records got special
treatment as people; tables and lists as bookshelves, or buses, or?
- How do represent external entities
- network connections, I/O handles, files...
- Why should one need associations in the metaphor?
- Because we are in venice, or in hell, or in New York. Step off the
sidewalk and you are dead.
- What associations are depicted, and how?
- We need at least: inheritance, aggregation, and reference.
- How to depict inheritance and aggregation?
- aggregation = adjacency, bridges. inheritance = physical resemblance
- How to depict reference?
- boats
- What are the streets?
- In Venice, there are a few streets to handle high traffic.
- How to represent the stack
-
- Gradually dimming lights in buildings?
- Portals/teleporters/bridges/moving sidewalks?
- Beam of light?
In discussion, there seemed to be support for the beam-of-light
model, pointing backwards from callee to caller. Dr. J would add:
the beam of light might be a good metaphor for an instant-teleportation
feature...
- How to represent bugs and warnings
- As monsters
- How to layout buildings?
- Around an older, urban core? Minimize distance of overall call graph?
- What are ghosts?
- Remembrances of fixed bugs and deleted code
- How to present source code control structure details.
- There is the raw codesize, the extent of nesting
- How to present data details.
- Well, instances are a lot of the data, and atoms are the rest.
A prime issue here is one of aggregation. When is an object
a citizen of the world, and when is it just somebody's foot?
I guess the answer is: when referenced globally, or by two or
more other instances.
Question: How to Make Static Analysis in Unicon Much Easier?
Suppose I want tools like the software-as-cities, and its too much work.
Maybe the Alamo framework makes the dynamic events easy enough to grab,
but how do I make the static info easy enough to grab? The lexer and
parser for Unicon are widely available, what else do I need for this
type of project? What generic static analysis tool(s) should we invent?
What should be its model? Execution monitoring was modeled as a sequence
of events (while EvGet()). Is there a collection of
static analysis foundational data, and a set of generic operations,
that we should standardize? For example, for a hypothetical USA tool,
analysis produces a tuple (Σ, Π, Χ) where Σ is the
set of source files, Π is the Parse Tree forest, and Χ is the
control flow graph? Yeah, this is a lame start, but at least it will
allow you to tell me what should really be there.