Lecture Notes for CS 404/504 Program Monitoring and Visualization

Syllabus

What this Course is About

This course is a blend of It turns out that much of the key connecting glue between monitoring and visualization comes from static analysis, the study of program properties observable from the source code.

Each week, you can expect part of the lecture material to come from dynamic analysis and part from graphics/visualization. Similarly, part of the time each week will be studying interesting work done by others, and part of the time will be engaged playing with my research infrastructure, working on software tools that will (hopefully) advance the state of the art.

Reading Assignment #1

Early History of Monitoring and Visualization according to Jeffery

Others may have more and better information, but this is my version of that subset of computing history relevant to this course.

When the computing industry reached a stage of having interactive, text screen terminals, all kinds of new bugs became common-place. Along with mankind's increased ability to generate bugs, a whole slew of tools and techniques were developed to understand program executions, including tracing, and source level debuggers. These tools still work, they just don't scale well. Sadly, if you look at a modern IDE its debugging and tracing capabilities are not much improved from what was available 40 years ago. This is (I claim) because problems in monitoring and debugging are hard, and the cost of building new tools which might advance the state of the art is very high.

By the 1980's, interactive 2D graphics was ubiquitous and improving rapidly in performance. People started to use graphics to help understand program execution behavior partly because text-only techniques did not scale well, and partly juse because the graphics was available. A movie called "Sorting out Sorting" (parts 1,2,3), originally presented at SIGGRAPH, made a compelling argument that graphical techniques could be valuable in teaching and understanding algorithms.

Sorting Out Sorting was done one frame at a time on truly ancient facilities. A group at Brown University (home of graphics guru Andy Van Dam, algorithms guru Robert Sedgewick and a cast of thousands) set out to replicate on interactive workstations what Ron Baecker had done a frame at a time. One result of this effort was Marc Brown's Ph.D. and related software. We will present more history in a later session.

What About Us?

Announcements

There is a bblearn for this course now. It has a HW#1 posted, but I am not so sure I like it. I may think of a better HW#1 for you, by this weekend. Check for HW#1 on Monday. In the meantime, learn some Unicon.

Unicon 101

Unicon: the Easiest Parts

Let's ssh into a test machine to live-demo the following:
Types Control Flow
string success vs. failure
integer if-then-else
real while-do
cset calls, argument rules
list generators
table case-of
file every-do

Alternate Resources for Unicon Study

None of this is assigned reading. It is here for your convenience; you know, in case you just hate the Unicon book.

Monitoring Framework Intro

An execution monitor (EM) observes events in a target program (TP). There are two-process, one-process (callback), and thread-models.
two-process model
EM and TP communicate via network sockets, pipes, or files.
one-process/callback
The TP calls the EM when an event occurs. The EM is organized as a set of callbacks, i.e. it doesn't have its own main() or control flow, it just responds to things.
thread
EM and TP are threads in the same address space, making communication far easier.
Which model do most debuggers use? The two-process model. Which model should we use for visualization tools? What is different about their requirements?
two-process model
Pros: Cons:
one-process/callback
Pros: Cons:
thread
Pros: Cons:

Graphic Design of the Day: a map

Napoleon's March into Russia: proof you can legibly plot extra dimensions atop a map. Maps have legends to explain what's on them, along with two primary dimensions which are intuitively based on actual geometry.

lecture 3

Reading for this week

HW#1 revised

Compared with last time I taught this class, I want you to spend enough time to learn Unicon, or rather the 1/2 of it that will be useful for writing visualization tools.

Highlights from Hirose

[Hirose97] describes research from the University of Tokyo, presented at the annual conference of the World Society for Computer Graphics.

Cheesey Movie References

What movies present topics relevant to this class, i.e. program visualization, program behavior monitoring, or virtual environments where such activities occur?

Graphic Design Principles

We need graphic design principles in preparation for visualization work. The following can be attributed to Edward Tufte, a renowned ivy league graphic designer who has written some beautiful books.

Graphic Design of the Day: a scatter plot

A map of London by John Snow, 1854, cleaned up by John Mackenzie of the University of Deleware.

lecture 4

Mailbag

I am having trouble using the star operator on lists, *L
The size operator *L works only after L has been assigned a list value. L := []
How do I check if a string is not in my list of strings?
Well, first off, if one were doing this a lot maybe one should use a set instead of a list. Unicon has a set type. But for occasional use on lists of reasonable size, s==!L tells if s is in the list L. s ~== !L is not so good, it will almost surely succeed unless every value in L is s. Instead use not (s == !L)

Unicon: the next level

Let's peek at CS210 lecture notes on Unicon to see if I missed any highlights during the live demo.

Monitoring Buzzwords

Volume, dimensionality, intrusion, and access. Solve these four unsolvable problems and you've got the makings of a decent monitoring and visualization framework.
volume
if you think static analysis of source code has a lot of information the programmer may have to understand and/or deal with, wait until you see the amount of information dynamic analysis generates. Even small, short-running programs can generate millions and millions of events of interest. Monitoring and visualization tools have to filter/discard, condense/simplify, and analyze their input, turning low level data into higher-level information.
dimensionality
understanding program behavior involves many dimensions: control flow, data structures, algorithms, memory access patterns, input/output behavior... Visualizations can be selective, but often want to depict more than just 2 or 3 dimensions' worth of data even though they are using a 2D (or 3D) output device.
intrusion
The act of observing program execution behavior changes that behavior. Monitors have to minimize/mitigate this or they will be visualizing their own side effects more than the thing they purport to show. The first form of intrusion is to skew the timing of the observed behavior. Monitoring a program may also alter its memory layouts (e.g. on the stack), which might make bugs disappear (or merely exaggerate them).
access
Simple monitors might graphically depict exactly the information contained in the sequence of events that they deserve, but most monitors need to ask additional information, by accessing potentially the entire state of the program being executed.

Graphic Design of the Day: Line Plots

Multiple dimensions of weather along a primary time axis.
From the New York Times, popularized by Tufte.

lecture 5

Announcements

Unicon: Goal-Directed Evaluation

Surprised by Failure?

When to check for failure: everywhere that failure can occur, and everywhere that failure will matter. Examples:

Graphic Design of the Day

William Playfair's chart depicting area, population, and tax revenues of countries in europe is another excellent example of depicting multiple dimensions of data.

The slope between the population and tax revenues points down for most countries and sharply up for England (and less so, for Spain).

Introduction to Unicon Monitoring Facilities

events
billions and billions of tiny points in time, with a tiny data payload, and the ability to easily inspect the entire program state. Event names like E_Pcall or E_Lbang
event keywords
&eventcode and &eventvalue
built-atop co-expression data type
threads that take turns. AKA coroutine, goroutine, or co-operative or synchronous thread.
the VM is instrumented for you
asymmetric coroutines. VM C code sends events to monitors written in Unicon

lecture 6

Introduction to Unicon Monitoring Facilities, Part 2

built-in function EvGet(c)
c is an event mask. Activates &eventsource (Monitored) to get next event
link evinit
library function EvInit(argv) loads program

Writing your first Unicon monitor

Consider the beauty and virtue of m0.icn, m1.icn and events.icn. Now checkout sos.evt

Notes from Past Students' Unicon Program Visualizations

HW#1, clean co-expressions sample solution

This version is based on Mike Wilder's HW#1 solution, because it had some interesting and valuable properties.

Visualization Principles (book section 3.1.2)

animation
incremental algorithms are a primary means of achieving efficient animation. instead of minimizing ink, this is like minimizing the motion of the plotter arm, or in our case, the # of memory writes.
least astonishment
use the golden rectangle, labels and legends
metaphors
use a familiar metaphor
interconnection
connecting different pieces of data is key, follow Playfair's example
interaction
the big difference between a visualization and a paper chart or graph is that the user can interact with the data. exploit this.
dynamic scale
visualizations compete for screen space and hardware varies widely. it is extra work, but if you write everything so that it scales, your visualization will be useful on more machines and in more ways.
static backdrop
one of the best ways to make dynamic data understandable is to present it in terms of static data. An execution is an instance of the underlying universal abstract thing that is the program.

HW#2 Results

Notes on HW#2 Code

overall, submissions were fantastic
main(av)
av is always a list of strings; if no arguments, *av = 0
paramnames() is a generator
use it with every, or ask questions like "if type(x:=paramnames(...))=="list" then..."
consider the virtues of the apply operator p ! L
consider the virtues of every maxval <:= !L
isn't max() a built-in function at this point? maxval := (max ! L)
failure and success
say "if i := find() then ...", not: "i := find(); if \i then ..."
check for open() failure
I asked nicely before, now I am telling you
sticking &fail at the end of a routine is a noop
a routine fails if it falls off its end; &fail does not return a failure. Unlike lisp, the return value of a function is not its final expression's evaluation.
some folks used the structure() function
please be nice and tell folks how it was useful here

Graphic Design of the Day

Fisheye Views. Read Furnas' Generalized Fisheye Views paper.

Monitoring Feature of the Day

Monitoring Location (pianoroll, tiny one-pixel-per-char views of source code).

lecture 7

Announcement

If any of you are proficient Flash programmers and would like to make some quick cash, Scott Lynch at "The Beach" in Moscow has a small job he wants done, his number is 208-794-2354.

Suspects, Tools, and Big Programs

As we proceed into the "meat" of the course, we have a need for lots of subject programs to study, lots of example monitors, and bigger programs that presumably will have more complex behavior.
Suspects
This directory was compiled by Ralph Griswold as a collection of interesting or weird programs whose behavior could be understood by program visualization. The good part of the Suspects directory is that the programs all run non-interactively, in some cases they were modified to do so, and those that require input have sample .dat files on which they run nicely. This lets monitors do their thing unimpeded. We should probably add some representative object-oriented programs to this collection this semester. I probably can dig out my "gui recorder" and create recordings of GUI programs so that we can monitor them conveniently in this context.
tools
This directory was compiled by Clinton Jeffery as a collection of simple program visualization programs and library procedures. Many of these codes are featured in the book, Program Monitoring and Visualization.
Big Programs
The largest programs in the suspects directory are typeinfer (2.6k lines), and yhcheng (1.9k lines). These were considered large in the Icon language, where source codes are typically 1/3 to 1/10 the size of C programs that do the same thing. The other largest public domain Icon programs are in the ipl/*packs directories. Among these, ibpag2 is 3.7k lines, itweak is 3.5k lines, skeem is 3.1k lines, ged is 3.6k lines, htetris is 4.3k lines, vib is 4.4k lines, and weaving is 11.3k lines (?). Monitoring these might or might not be easy, since they may be interactive, and you might or might not know what to click at them in order to get them to behave. The largest known Icon programs (source not available) was Bill Wulf's testcase generator (rumored to be on the order of a half-million lines, perhaps machine-generated.

In the Unicon language, programs are far larger on average. The unicon translator itself is 10k lines of Unicon. The uni/lib class library is 20K lines, and the uni/gui GUI class library is 14.5K lines; large subsets of these libraries may be added onto whatever the tool size is. The Unicon IDE is 17K lines, the IVIB user interface builder is 16K lines, and so on. Some of these you can acually monitor.

The largest Icon/Unicon programs for which I have source code include the SSEUS database review/update system (35K lines), and a Knowledge Representation language and system (50K lines) done by an AT&T scientist. It might be possible to find these and monitor them, but it would take work to set them up for monitoring.

Unicon feature of the day: Packages

Packages were added to Unicon more or less against my will, but they are obviously of growing importance in larger scale development. Packages are about protecting a name space from collisions. Without them, global variables in all modules are shared, and accidentally, these variables may conflict with globals (and undeclared, thought-to-be locals!) in other modules. The more libraries you use, the more inevitable these conflicts. Proof that packages are needed is evident in the Icon Program Library, where, after fundamental built-in functions like "type" were accidentally assigned one too many times by client code, Ralph Griswold got in the habit of protecting "type" or similar built-in functions the hard way, inside each library procedure that uses them:
   static type
   initial type := proc("type", 0)	# protect attractive name
This gets old in a hurry, and it actually bloats code a little bit.

So anyhow, Robert Parlett implemented packages, and I accepted them, and now they are here to stay, and they aren't bad. You do have to know the "package" and "import" keywords, and the ::foo syntax, and that is about it.

MiniLoc

miniloc.icn is a "miniature location profiler" as discussed in the purple book. What is mini about it is that each source code line and column is one pixel row and column (this is a scaling problem for larger programs, miniloc could be rewritten to scale its graphics). The frequency of location events at various locations is recorded using a log scale through a range of colors from boring to red-hot. Humans don't really perceive red as a larger # than green, but the metaphor of a temperature map is widely recognizable anyhow.

HW#3

Go ye and write a source-location-oriented visualization of "something interesting". If it is interesting enough, we may write it up and submit it to a conference.

lecture 8

Hani's Clever Case Tag

Case expressions in Icon use === semantics, looking for an exact match with no type conversions. Case branches are evaluated sequentially as if one were writing
  if x === firstbranchexpr then firstcodebody
  else if x === secondbranchexpr then firstcodebody
  else if x === thirdbranchexpr then firstcodebody
  ...
If all the branch labels are constants, this is colossally inefficient compared with a C switch statement. But, it is fully general and you can use arbitrary expressions, including generators, for which the entire result sequence will be generated in trying to find a match.

You can add a predicate filter on the front, or have your values supplied from subroutines, or whatever:

   case x of {
   p() & q() & foo: { ... }
   a | b | 1 to 10 | f(): { ... }
   }

This afternoon we are about to see examples that use this generator capability with cset event masks, as in the following, but it would work with sets, table keys, or any other generator you wanted to write.

case x of {
   ...
   !ProcMask: {
      }
   ...
   }
This makes for short elegant code, but it is inefficient. Generating the individual elements out of a cset costs a type conversion (cset to string) which isn't cheap, and all generators pay for extra bookkeeping on the stack, for that suspending resuming capability, which is slow at times. You are paying for convenience and generality, and a good optimizing compiler might make some of that go away, but the VM sure does not. In a couple minutes we will see another measure of how much you pay. But in the meantime...

Hani showed me some code this afternoon that looked like

case x of {
   ...
   member(a_set, x): {
      }
   ...
   }
This has probably been done before, but it surprised me so we should talk about it. member(a_set, x) tests whether x is a member and returns x if it is, so it is just a filter, and by the way it avoids a linear search via a generator so it is fast. Its got a seemingly redundant comparison of x===x after the member() test succeeds, but that is C code and probably not too bad.

Monitoring Procedure Activity

Procedure activity is a special case of the control flow behavior of expression evaluation. In a normal language monitoring procedure activity would mean monitoring the stack of procedure activation records, or in a multithread context, monitoring a set of stacks of procedure activation records. You would perhaps observe how deep the stack gets (usually not a problem) and might look for patterns that suggest bugs (Q: Can anyone think of a call-return sequence that suggests a bug?). Besides correctness, you might imagine looking for performance problems or tuning opportunities.

Monitoring Icon and Unicon is a little more complicated because procedures can suspend and be resumed. The events for this behavior are E_Pcall, E_Psusp, E_Presum, E_Pret, E_Pfail, E_Prem. The "call stack" becomes a "call tree", or as section 8.1 in the text calls it, an activation tree (a better term since procedures can be activated by more than just calls).

You can just ask for all the procedure activity events, but if your monitor is doing more than just counting them then it potentially will need to do more. One way to monitor the activation tree is to build a model of the tree itself. You can do this by hand, or your monitor can use a library procedure named evaltree() which does it for you. (Study in detail the implementation of evaltree on p88-89). We will look at examples that use evaltree, but first a word on timing.

The time cost of monitoring

Monitoring costs time. If it costs too much, folks won't want to do it even if you do make pretty moving pictures (successful program visualizations). The instrumentation of all those events costs time even if you don't ask for the event reports, and the event reports (co-expression switches) cost time. It is difficult to even measure the timings of different parts of the monitoring process. You may be able to do a good job by going into the VM C code and using your own expertise, or using specialty tools for doing timing, such as gprof. This discussion is just based on what I can observe casually.

Example. In the Suspects/ directory are many candidates (which one runs the longest?). We will consider the poetry scrambler for this example.

time ./scramble <scramble.dat
uses the UNIX time(1) command to measure the runtime externally. It reports something like:
1.0u 0.0s 0:03 32% 0+0k 0+0io 0pf+0w
That's 1.0 seconds of user time, 0.0 seconds of system time, 3 seconds of wall-clock observed time. Out of curiosity, since it writes out a lot to standard out, I retime it directing output to /dev/null, and it still takes a second of user time, but the wall clock is down to 1 second.

Now I take an almost-empty monitor, timer.icn, and time it using the UNIX utility.

time timer ./scramble <scramble.dat
and it writes out
tp time: 1830-0=1830
em time: 0-0=0
1.0u 0.0s 0:03 30% 0+0k 0+0io 0pf+0w
What is this telling me? The time function hasn't seen any extra time spent by the monitor (that's odd; that's bad). The monitor thinks it has spent no time, but that the program is spending 1.8 seconds. Which times are more accurate?

One problem with measurement is that accuracy is limited by tools of observation and hardware/OS limitations. Another problem with measurement is that external evironmental considerations (load average, user activity) change results to some extent. These measurements were done on mars.cs.uidaho.edu, a sparc Solaris machine. The "who" command reported 5 different people logged in at the time, although the load average was apparently low (inactive terminal sessions).

Now, suppose Ziad says monitoring the evaltree is slow. Why might that be?

It would be useful to know whether the co-expression switch totally dominates the time spent in the monitor. Although our intuition says it does, intuition is not always correct. Evaltree costs: a big case statement (not very efficient in Icon/Unicon), whose labels are generators (not very efficient), whose code bodies do allocations and list operations (pretty darned fast), and call the monitor callback procedure. One way to do our experiment is to measure &time before and after each EvGet(), and instead of measuring time spent in the target program, measure the the other time, time spent in the monitor. Another way to do the experiment is to rewrite the evaltree() functionality for speed instead of clarity, and see if it is measurably different or not.

Compare evaltime.icn, evaltime2.icn, evaltime3.icn, showing an attempt to do this experiment.

time evaltime ./scramble <scramble.dat
shows
tp time: 2760--10=2770
em time: 6670-0=6670
10.0u 0.0s 0:18 55% 0+0k 0+0io 0pf+0w
Using evaltree, the monitor is accounting for more than 2/3rds of the time, and the time reported for the target program is much slower than for the unmonitored or empty monitored cases. evaltime2, which skips the evaltree mechanism but uses a big case statement, gives:
tp time: 2490-0=2490
em time: 2660-0=2660
5.0u 0.0s 0:08 61% 0+0k 0+0io 0pf+0w
Cost of monitoring is substantially lower, although the particular details may be affected by machine load fluctuation. One would have to run several times and take averages for the numbers to be meaningful. Using evaltime3, which avoids the large case statement, we get
tp time: 2580-0=2580
em time: 2050-0=2050
5.0u 0.0s 0:07 70% 0+0k 0+0io 0pf+0w
At this point, monitoring procedure activity is seen to impact execution time substantially, but at least the monitor is taking less time than the target program. Where is the co-expression time being charged here?

Many Morals of the story:

scat

The scat program is the first application of evaltree in the purple book. It links in a scatterplot library which might or might not be useful to you. It implements the log scaling that scat uses.
$include "evdefs.icn"
link evinit
link evaltree
link scatlib
Scat uses several global variables, three tables to remember what has been plotted, and three clones set with different colors.
global	at,   # table: sets of procedures at various locations
	call, # table: call counts
	rslt, # table: result counts
        red,
        green,
        black
Scat uses a generic evaltree-compatible record type for modeling; no extra payload added.
record activation (node, parent, children)
The initialization is straightforward.
procedure main(av)
   local mask, current_proc, L, max, i, k, child, e

   EvInit(av) | stop("can't monitor")

   scat_init()
   red := Clone(&window, "fg=red")
   green := Clone(&window, "fg=green")
   black := Clone(&window, "fg=black")

   current_proc := activation(,activation(,,,,[]),[])
Control is handed over to evaltree, which calls scat_callback with events
   evaltree(ProcMask ++ FncMask ++ E_MXevent,
	    scat_callback, activation)

   WAttrib("label=scat (finished)")
   EvTerm(&window)
end
scat_callback mostly calls scat_plot, which calls colorfor to decide what color to plot with.
procedure scat_callback(new, old)
   case &eventcode of {
      E_Pcall:
	 scat_plot(new.node, 1, 0, , colorfor)
      E_Psusp | E_Pret:
	 scat_plot(old.node, 0, 1, , colorfor)
      E_Fcall:
	 scat_plot(new.node, 1, 0, , colorfor)
      E_Fsusp | E_Fret:
	 scat_plot(old.node, 0, 1, , colorfor)
      E_MXevent: {
         case &eventvalue of {
	    "q" | "\033": stop("terminated")
	    &lpress : {
	       repeat {
	          scat_click(proced_name)
		  if Event() === &lrelease then
		     break
		  }
	       }
	    }
	 }
      }
end
Procedure proced_name returns the name of a procedure, taken from its image.
procedure proced_name(p)
   return image(p) ? {
      [ =("procedure "|"function "), tab(0) ]
      }
  stop(image(p), " is not a procedure")
end
Procedure colorofone distinguishes procedures from functions.
procedure colorofone(p)
  return if match("procedure ", image(p))
	 then red else green
end
Procedure colorfor uses a list (of procedures/functions) to select what color to plot. If it is not the first color choice and the subsequent value should be a different color, resort to black. Return a red or green if all values say to be red or all say to be green.
procedure colorfor(L)
   if *L = 0 then return &window
   every x := !L do {
      if not (/c := colorofone(x)) then
	 if colorofone(x) ~=== c then
	    return black
      }
   return c
end

What is scat good for?

scat is cooler than you think. It shows not just who the hot procedures are, it also shows what procedures always fail, what procedures generate lots of results per call, and what procedures (predicates) generate between 0 and 1 result per call.

algae

The flagship demonstration of the evaltree framework is a literal visualization of the activation tree.
   EvInit(av) | stop("Can't EvInit ",av[1])
   codes := algae_init(algaeoptions)
   evaltree(codes, algae_callback, algae_activation)
   WAttrib("windowlabel=Algae: finished")
   EvTerm(&window)
Algae takes command line options to say how much to monitor, how to graphically depict the tree, etc. It deliberately chooses a simple-minded incremental graphic, coming from a time that graphic performance was deemed to be a likely monitor bottleneck. By default it uses hexagons for activation records (compare hexagons with a square grid). A real but still INCREMENTAL tree layout algorithm would be better.
procedure algae_init(algaeoptions)
   local t, position, geo, codes, i, cb, coord, e, s, x, y, m, row, column
   t := options(algaeoptions,
	   winoptions() || "P:-S+-geo:-square!-func!-scan!-op!-noproc!-step!")
   /t["L"] := "Algae"
   /t["B"] := "cyan"
   scale := \t["S"] | 12
   delete(t, "S")
   if \t["square"] then {
      spot := square_spot
      mouse := square_mouse
      }
   else {
      scale /:= 4
      spot := hex_spot
      mouse := hex_mouse
      }
   codes := cset(E_MXevent)
   if /t["noproc"] then codes ++:= ProcMask
   if \t["scan"]   then codes ++:= ScanMask
   if \t["func"]   then codes ++:= FncMask
   if \t["op"]     then codes ++:= OperMask
   if \t["step"]   then step := 1
   hotspots := table()
   &window := Visualization := optwindow(t) | stop("no window")
   numrows := (WHeight() / (scale * 4))
   numcols := (WWidth() / (scale * 4))
   wHexOutline := Color("white") # used by the hexagon library
   if /t["square"] then starthex(Color("black"))
   return codes
end
The real work happens in algae_callback()
procedure algae_callback(new, old)
   local coord, e
   initial {
      old.row := old.parent.row := 0; old.column := old.parent.column := 1
      }
   case &eventcode of {
      !CallCodes: {
	 new.column := (old.children[-2].column + 1 | computeCol(old)) | stop("eh?")
	 new.row := old.row + 1
	 new.color := Color(&eventcode)
	 spot(\old.color, old.row, old.column)
	 }
      !ReturnCodes |
      !FailCodes: spot(Color("light blue"), old.row, old.column)
      !SuspendCodes |
      !ResumeCodes: spot(old.color, old.row, old.column)
      !RemoveCodes: {
	 spot(Color("black"), old.row, old.column)
	 WFlush(Color("black"))
	 delay(100)
	 spot(Color("light blue"), old.row, old.column)
	 }
      E_MXevent: do1event(&eventvalue, new)
      }
   spot(Color("yellow"), new.row, new.column)
   coord := location(new.column, new.row)
   if \step | (\breadthbound <= new.column) | (\depthbound <= new.row) |
      \ hotspots[coord] then {
      step := &null
      WAttrib("windowlabel=Algae stopped: (s)tep (c)ont ( )clear ")
      while e := Event() do
	 if do1event(e, new) then break
      WAttrib("windowlabel=Algae")
      if \ hotspots[coord] then spot(Color("light blue"), new.row, new.column)
      }
end
Boring square graphics:
procedure square_spot(w, row, column)
   FillRectangle(w, (column - 1) * scale, (row - 1) * scale, scale, scale)
end

# encode a location value (base 1) for a given x and y pixel
procedure square_mouse(y, x)
   return location(x / scale + 1, y / scale + 1)
end
A whole new meaning for the term "graphical breakpoints":
#
# setspot() sets a breakpoint at (x,y) and marks it orange
#
procedure setspot(loc)
   hotspots[loc] := loc
   y := vertical(loc)
   x := horizontal(loc)
   spot(Color("orange"), y, x)
end

#
# clearspot() removes a "breakpoint" at (x,y)
#
procedure clearspot(spot)
   local s2, x2, y2
   hotspots[spot] := &null
   y := vertical(spot)
   x := horizontal(spot)
   every s2 := \!hotspots do {
      x2 := horizontal(s2)
      y2 := vertical(s2)
   }
   spot(Visualization, y, x)
end
User input handling:
#
# do1event() processes a single user input event.
#
procedure do1event(e, new)
   local m, xbound, ybound, row, column, x, y, s
   case e of {
      "q" |
      "\e": stop("Program execution terminated by user request")
      "s": { # execute a single step
	 step := 1
	 return
	 }
      "C": { # clear a single break point
	 clearspot(location(new.column, new.row))
	 return
	 }
      " ": { # space character: clear all break points
	 if \depthbound then {
	    every y := 1 to numcols do {
	       if not who_is_at(depthbound, y, new) then
		  spot(Visualization, depthbound, y)
	       }
	    }
	 if \breadthbound then {
	    every x := 1 to numrows do {
	       if not who_is_at(x, breadthbound, new) then
		  spot(Visualization, x, breadthbound)
	       }
	    }
	 every s := \!hotspots do {
	    x := horizontal(s)
	    y := vertical(s)
	    spot(Visualization, y, x)
	    }
	 hotspots := table()
	 depthbound := breadthbound := &null
	 return
	 }
      &mpress | &mdrag: { # middle button: set bound box break lines
	 if m := mouse(&y, &x) then {
	    row := vertical(m)
	    column := horizontal(m)
	    if \depthbound then {       # erase previous bounding box, if any
	       every spot(Visualization, depthbound, 1 to breadthbound)
	       every spot(Visualization, 1 to depthbound, breadthbound)
	       }
	    depthbound := row
	    breadthbound := column
	    #
	    # draw new bounding box
	    #
	    every x := 1 to breadthbound do {
	       if not who_is_at(depthbound, x, new) then
		  spot(Color("orange"), depthbound, x)
	       }
	    every y := 1 to depthbound - 1 do {
	       if not who_is_at(y, breadthbound, new) then
		  spot(Color("orange"), y, breadthbound)
	       }
	    }
	 }
      &lpress | &ldrag: { # left button: toggle single cell breakpoint
	 if m := mouse(&y, &x) then {
	    xbound := horizontal(m)
	    ybound := vertical(m)
	    if hotspots[m] === m then
	       clearspot(m)
	    else
	       setspot(m)
	    }
	 }
      &rpress | &rdrag: { # right button: report node at mouse loc.
	 if m := mouse(&y, &x) then {
	    column := horizontal(m)
	    row := vertical(m)
	    if p := who_is_at(row, column, new) then
	       WAttrib("windowlabel=Algae " || image(p.node))
	    }
	 }
      }
end
Calculating which activation a given click refers to:
#
# who_is_at() - find the activation tree node at a given (row, column) location
#
procedure who_is_at(row, col, node)
   while node.row > 1 & \node.parent do
      node := node.parent
   return sub_who(row, col, node)		# search children
end

#
# sub_who() - recursive search for the tree node at (row, column)
#
procedure sub_who(row, column, p)
   local k
   if p.column === column & p.row === row then return p
   else {
      every k := !p.children do
	 if q := sub_who(row, column, k) then return q
      }
end
A similar calculation for placing new nodes
#
# computeCol() - determine the correct column for a new child of a node.
#
procedure computeCol(parent)
   local col, x, node
   node := parent
   while \node.row > 1 do	# find root
      node := \node.parent
   if node === parent then return parent.column
   if col := subcompute(node, parent.row + 1) then {
      return max(col, parent.column)
      }
   else return parent.column
end

#
# subcompute() - recursive search for the leftmost tree node at depth row
#
procedure subcompute(node, row)
   # check this level for correct depth
   if \node.row = row then return node.column + 1
   # search children from right to left
   return subcompute(node.children[*node.children to 1 by -1], row)
end
How to use Clone()
#
# Color(s) - return a binding of &window with foreground color s;
#  allocate at most one binding per color.
#
procedure Color(s)
  static t, magenta
  initial {
     magenta := Clone(&window, "fg=magenta") | stop("no magenta")
     t := table()
     /t[E_Fcall] := Clone(&window, "fg=red") | stop("no red")
     /t[E_Ocall] := Clone(&window, "fg=chocolate") | stop("no chocolate")
     /t[E_Snew] :=  Clone(&window, "fg=purple") | stop("no purple")
     }
  if *s > 1 then
     / t[s] := Clone(&window, "fg=" || s) | stop("no ",image(s))
  else
     / t[s] := magenta
  return t[s]
end

lecture 9

3D Graphics Facilities

Known Additions to the 3D Facilities, at least semi-implemented:

lecture 10

"Open Mike" Night: HW#3 Demos

You get to demo your stuff in front of a supportive audience.

Graphic Design(s) of the Day: Tukeys' Multiwindow- and Box-Plots

And Tufte's Data-ink maximization of box-plots.

GUI Monitors

Some of you have already written homeworks that involved GUI's, but for most of you, some explanation and reinforcement are needed.

Unicon has a GUI class library, written by Robert Parlett, that has extraordinary capabilities. Although I would like to say that GUI's are amazingly simpler in Unicon than in other languages, it is more honest to say that GUI programming in Unicon has a learning curve comparable to GUI programming in other languages.

Step #1 in GUI exploration is usually to get familiar with the interface builder program; in our case that is IVIB. (Demo of IVIB goes here). IVIB generates code that looks like this. Note that the 70-line application creates a dialog and calls show_modal(), and for a normal VB-style app you then fill in the method bodies for whatever events you've requested. For normal applications, it is not necessary to understand much of the scaffolding in this file and the large classes you inherit behavior from. Note that there is a Unicon Technical Report, UTR#6, which tries to teach the IVIB basics.

IVIB let's you draw a GUI and generates the code for you. For a program execution monitor the main question will be: how to merge the event streams, or how to merge the event processing loops, from the GUI and from the monitored program's events. To accomplish this, you need to know more about the underlying GUI classes.

There are in fact a total of 3 classes that most Unicon GUI programmers need to become semi-comfortable with: Component, Dialog, and Dispatcher. Component is superclass of all basic visible GUI elements in an application: buttons, sliders, lists, editable text boxes, and so on. Components are generally organized hierarchically -- they form a tree in Venn diagram style, with larger background components containing smaller more active components. A Dialog is a component that constitutes the root of some window -- it owns a window and therefor can receive input events, which it then needs to route down the tree to the correct leaf. The Dispatcher class handles the actual event-processing loop, allowing for multiple dialogs, and wall-clock time events in addition to GUI events.

In order to merge the Monitor and GUI event streams, we might do one of the following:

Note that there is no way to select() from between GUI and monitor or poll both, because to ask for an EvGet() is to transfer control to the target program (freezing the GUI of the monitor until an event occurs). However, you can call EvGet() with an E_Tick along with your other events if you want to be sure to regain control periodically even if the other monitored events do not occur for long periods... then your only danger is: what if the target program that you are monitoring chooses to block on some input it wants to read?

Additional notes on GUI-monitors:

lecture 11

Graphic Design of the Day

CASSE POSTALI DI RISPARMIO ITALIANE by Antonio Gabaglio, via the revered Tufte, and cited in a nice discussion of cyclic data, apparently by Benj Lipchak.

Unicon 3D: Unfinished Business

Mesh modes?

These values determine how lists of vertices are interpreted by OpenGL. There is an attribute meshmode, set via WAttrib(w, "meshmode=value") where the legal values are
points
lines
linestrip
lineloop
triangles
trianglefan
trianglestrip
quads
quadstrip
polygon
However, in a trivial test, the mesh modes did not work! They probably did for the grad student who implemented them... but without a working test/demo they remain undocumented/unfinished business. Minimally, you might expect that I'll have to put out some fixed Unicon sources and/or binaries for you before these will work. You are welcome to try them and find out of things are better than I report.

Transparency?

This feature of OpenGL determines to what extent light can go through a substance, or to what extent objects behind it can be seen through it. Color names, set via Fg(color) or WAttrib(w, "fg=value") can include a diapheneity. The legal transparency adjectives are
transparent
subtransparent
translucent
subtranslucent
opaque
This feature is implemented. In a trivial test it appears to work. However, in testing it a seeming bug was identified in the color attributes: when you set the fg= attribute with a simple color it sets the diffuse value for that material property but apparently does not reset or disable the other lighting colors (specular, ambient, emission), which may give surprising results. Also: it is not clear that transparency works correctly on all primitives yet; for example, the last time I checked, either cubes or maybe filled polygons looked not as transparent as they ought, because backfacing polygons weren't transparent.

Monitoring Memory Allocation and Collection (book ch. 9)

(Heap)-based memory allocation is one of the simpler and yet very interesting forms of behvior that we can monitor. Allocations in Icon/Unicon are kept as cheap as possible, but it some programs they still play a major role, especially when code does them by accident, or does far more memory allocation than is needed for a problem. Garbage collection is usually pretty fast -- we don't usually go for coffee when the GC message hits the console, like old Lispers -- but if a program is garbage collecting a lot (thrashing) it can significantly impact performance. How can we measure whether allocation appears excessive or garbage collection seems too frequent?

(Per the book, examine a series of memory allocation monitors.)

Mempie

lecture 12

Graphic Design of the Day

Procedure-grained flow graphs and the comet metaphor. Kaestle, Fooscape, and Song Liang's Cata. A peek at some old student projects and Ralph Griswold's notes.

Mempie finds a bug

We noted last time that mempie and napoleon were drawing very different pictures, and that one of them must be wrong. The bug was in the MS Windows implementation of the FillArc function (our C code, not Win32) -- when the "extent" (angle) of the arc approaches 0, and the calculated start and end points become the same pixel, Win32 interprets that as a request for a complete circle.

Griswold's claim examined

Ralph Griswold liked to claim that co-expression activations were about the same speed as procedure calls in Icon... and this matters a lot for execution monitors based on co-expressions, so I re-examined this claim with the following program:
procedure main()
   t1 := &time
   every i := 1 to 10000000 do p()
   write("10000000 calls: ", &time - t1)
   ce := create |1
   t2 := &time
   every i := 1 to 10000000 do @ce
   write("10000000 @: ", &time - t2)
end

procedure p()
   return 1
end
The results (on Linux x86_64) seem to suggest that co-expression activations are quite cheap, only 25% slower than procedure calls
10000000 calls: 6210
10000000 @: 7920
Synchronous threads are a lot cheaper than true concurrent threads! Playing with a mac implementation earlier this semester, I plugged in a pthreads-based co-expression switch available from the current Icon language implementation, and it was an order of magnitude slower...

More memory monitors: mini-memmon and nova

Check out mmm, nova and oldnova. You should look at them as unfinished prototypes of the type of tool that your HW#4 should consist of.

lecture 13

Apologies

My apologies, but there will be no midterm exam for this course. Instead, there is now posted a homework #5 and I want your work on this to be good.

A more honest mmm

In the process of giving mmm a fix, I wound up searching high and low... to find my own bugs

Monitoring String Scanning (Ch 10)

Icon's string scanning control structure has a very natural depiction, that of a progress bar or pointer working its way through a string. Issues include: how to abstract/scale a very large number of operations, how to depict backtracking, how to depict nested scanning environments (which might or might not involve analysis of a substring of the enclosing scanning environment).

Some programs use scanning a lot -- they are mostly string scanning -- and others do not use it at all.

The ScanMask events include E_Snew, E_Sfail, E_Spos, E_Ssusp, E_Sresum, E_Srem. E_Spos events are the most frequent. Compared with procedures, what is missing?

For what its worth, evaltree() can model scanning environments just like it does procedure call activity. It can also model built-in functions and operators; all expressions can be modelled as call/ret/susp/resum/fail/rem

Now for a deep-thought question: what kinds of graphic depiction emphasizing what kinds of behavior would make for a genuinely useful string scanning visualization?

Monitoring Structures and Variable References (Ch 11)

The monitoring framework has fairly thorough instrumentation for the built-in data structures of the language -- lists, tables, records and sets. These one-level structures all support implicit reference semantics, are routinely composed into big multi-level structures such as trees and graphs.

What we learn from the simple list visualizer:

What we learn from the structure spy

lecture 14

mKE/mKR: the Largest Publically Available Unicon Program

It has its own website. It is a knowledge representation engine with its own knowledge representation language built-in. It is developed by a (now retired) AT&T scientist. It is something like 50K LOC. Let's study it.

Monitoring Variable References

Variable use is arguably one of the most important aspects of program behavior, but it is easily overlooked. Some programs are primarily stack, some primarily heap (especially, e.g. OOP programs), while some programs use primarily static / global data layout.

What do we want to know about variables?

gnames

Gnames shows you all your global data; variable names are written out, color coded by their type. If you click on a variable name, up pops a window showing that variable's details. Bugs and limitations:

lecture 15

vars

vars is a local variable visualizer, it shows each activation record in a manner similar to gnames. There is a strong scalability limit here which vars does not solve; some programs it depicts well, others it does not. It is more proof of concept/demonstration than finished and working tool. Also, at present it has bad bitrot.

Under the Covers of the evinit library

EvInit(av) and EvGet(mask) are not always entirely what they seem. They live in evinit.icn and have some features tailored to allow multiple monitors to share the observation of a program execution, which we will discuss in detail in a couple more lectures. The main thing for you to know for today is: EvInit() checks if the monitor's &eventsource is already initialized (by a parent monitor who could pre-assign the value of &eventsource), and if so, it does not load anything, it just requests events from its &eventsource.

We might want to develop a similar architecture for windows! Monitors that use 2D or 3D graphics might want to check and see if their &window is already set, and if so, just draw to it instead of opening a new window. This would allow a GUI for a debugger or multi-visualization tool to allow independently-compiled visualizations to "plug in". Of course, for it to work well, such a model would need to cover how to handle window resizing, and how to handle input by various tools. Subwindows, and subwindow resizing, are more or less adequate to this task.

Monitor Coordinators (Chapter 12)

Basic premise: the Alamo architecture is intended to reduce the difficulty of writing monitors. Monitors are easier to write if they are simpler and smaller, and look for specific behaviors. But, we want to be able to monitor several aspects of behavior for a given execution, and potentially we want to look for interactions between behaviors. A monitor coordinator is a monitor that hosts the execution of the target program under the observation of multiple monitors.

Eve

The reference implementation monitor coordinator is called Eve (eve.icn). Eve is probably my last remaining "old Icon GUI" program, and needs to be rewritten using the modern GUI class library. It also looks like it has never been run on Windows. :-(

Eve configuration

Eve reads in a list of monitors from a ~/.eve file in the format:
"title" command line

For example:

"Line Number Monitor" /home/jeffery/tools/piano
"UFO" /home/jeffery/tools/ufo
"Algae" /home/jeffery/tools/algae
"Big Algae" /home/jeffery/tools/algae -func -op -step -S 48
"Memory bar chart" /home/jeffery/tools/barmem
"Global variables" /home/jeffery/tools/gnames
"Local Variables" /home/jeffery/tools/vars
"Lists" /home/jeffery/tools/tinylist
"Minimemmon" /home/jeffery/tools/mmm
"Miniloc" /home/jeffery/tools/miniloc
"Scat" /home/jeffery/tools/scat
"String scanner" /home/jeffery/tools/ss
From this datafile, eve draws an opening window that allows selection of which monitors you want to run (selectEMs).

Eve's Global State

unioncset
cset mask that is union of all monitor masks
EventCodeTable
table of lists; keys are event codes, values are "list of interested monitors"

Monitor State

This thinly-veiled "class" holds eve's knowledge about the monitors it loads. "prog" is the actual loaded program (a co-expression value), while "mask" is the program's event mask -- what it returned from its last EvGet().
record client_rec(name, args, eveRow, prog, state, mask, enabled)
#
# client() - create and initialize a client_rec.
#
procedure client(args[])
   local self
   self := client_rec ! args
   if /self.name then stop("empty client?")
   self.prog := load(self.name, self.args) | stop("can't load ", image(self.name))
   variable("&eventsource", self.prog) := ¤t | stop("no EventSource?")
   variable("Monitored", self.prog) := &eventsource | stop("no Monitored?")
   /self.state := "Running"
   /self.mask := ''
   /self.enabled := E_Enable
   return self
end

Initialization

After selecting monitors to run, eve has to load them all, and then activate them all, running them up until their first EvGet() call. Their EvInit's will be disabled by eve's having already set their &eventsource. After their first EvGet() call, eve will register them on the "list of interested monitors" for each of the event codes in their mask.
   every i := 1 to *clients do
      clients[i].mask := @ clients[i].prog

Event Forwarding

event(code, value, recipient) - sends a (monitoring framework) event, where code defaults to &eventcode and value defaults to &eventvalue. In retrospect, this is a poor choice of function names. Note that event() allows any value to be sent, not just what the EM requested in its event mask, and not even limited to 1-letter string codes.

Eve's Main Loop

procedure mainLoop()
   while EvGet(unioncset) do {
      #
      # Call Eve's own handler for this event, if there is one.
      #
      (\ EveHandlers[&eventcode]) ()
      #
      # Forward the event to those EM's that want it.
      #
      every monitor := !EventCodeTable[&eventcode] do
	 if C := event( , , monitor.prog) then {
	    if C ~=== monitor.mask then {
	       while type(C) ~== "cset" do {
		  if C === "abort" then fail
		  #
		  # The EM has raised a signal; pass it on, then
		  # return to the client to get his next event request.
		  #
		  broadcast(C, monitor)
		  if not (C := event( , , monitor.prog)) then {
		     unschedule(monitor)
		     break next
		     }
		  }
	       if monitor.mask ~===:= C then
		  computeUnionMask()
	       }
	    }
	 else {
	    unschedule(monitor)
	    }
      delay(6 < delayval)
      }
end

lecture 16

Papers for the Rest of the Semester

Timeslots:

How many papers do we have time to discuss? Let's have each person present 2. There are really Many sources for software visualization research papers, but let's say that the main ones are ACM SOFTVIZ and IEEE VISSOFT. These every-other-year conferences were in lock-step for awhile, but may have moved to the alternating year from each other so that there is a software visualization conference each year (how nice).

From OOPSLA 2007

From VISSOFT 2007 From the SOFTVIS 2006 conference From the SOFTVIS 2005 Conference From VISSOFT 2005 From the SOFTVIS 2003 Conference From VISSOFT 2003 From the Dagstuhl seminar, May 2001 From Software Visualization: Programming as a Multimedia Experience From IEEE Visualization 94 From the 6th New Zealand CHI conference

Semester Project Topic Ideas

The perfect semester project would be a tool that... Where to get your ideas:

lecture 17

Final Project Demos

In class in the scheduled final examination period, Monday May 5, 3-5pm. Each student will have around 20 minutes including setup and tear-down.

Graphic Design of the Day

A note on lying in charts and graphs;

Thoughts on visualizing large-ish trees in 2D and 3D.

Tool of the day: redconv

Redundant conversion catcher. Even if conversions are not redundant, they may be an indicator of a bug or a performance problem. When is a conversion "unhealthy"?

Reading assignment for today's lecture

Generally, after you pick your paper and dates, we need to pass out the reading assignments ahead of time, with either hyperlinks or printed copy of what is to be read. So for example, we have two papers so far assigned that Ziad will be presenting. Also: for each paper/presentation there are some specific questions I'd like you to think about:

Rube

Rube methodology

  1. choose system to be modeled
  2. select structural and dynamic behavioral model types
  3. choose a metaphor
  4. define mappings/analogies
  5. create model
Example: a lightbulb is to be modeled. A finite state machine is chosen to model the bulb. S1=disconnected, S2=off, S3=on.

For each different dynamic model type, there may be any number of defined visual metaphors, or a programmer may wish to create a new one. A "water tank" metaphor for a finite state machine would "fill the tank" of whichever state the machine is in, and the water would be pumped over to a different tank whenever a transition to a new state occurs.

In a gazebo metaphor, a person would indicate the state, and a transition would be depicted by that person walking.

Rube Summary

lecture 18

Quote of the Day

The fastest way to a million-line program is through the Clipboard. Copied code is like cancer for software.

Graphic Design of the Day: Kiviat Diagrams

One way to represent many-dimensioned data is to lay out the dimensions around a circle; the 2D shape (and its degree of circularity or lack thereof) tell you something about which dimensions are interesting.
Kiviat diagram for software quality. Source: geeks with blogs, via google image

Kiviat diagrams are easy to criticize. There are problems with the relative scales of dimension; do you reduce them all to 0.0-1.0 ranges, or not? There are problems to identify normal or acceptable ranges of values. There are problems that adjacent dimensions don't really have any more connection with each other than remote dimensions, but the Kiviat makes them look like they do. The area inside the Kiviat shape is really meaningless.

HW#5 statii

I have still not received some of your homework #5's.

What About the Dynamic Analysis?

You-all have been too polite, perhaps, to ask the question above. There are no doubt different definitions, but here is a paper for you to read on the subject: According to Ball, dynamic analysis has the following properties compared with static analysis:
  1. precision of information; derived from 1+ actual program run(s)
  2. input-centric mentality; shows dependence of internal behavior on particular inputs of a given execution
Ball's paper mentions two particular types of dynamic analysis, out of myriads:
frequency spectrum analysis
analyze frequencies of different kinds of events, e.g. to identify related computations
coverage concept analysis

FSA

CCA

coverage profile
profile of what was executed (no frequency info)
concept analysis
(T, E), T a set of tests and E a set of program entities, is a concept if every test in T covers all of E and no test not in T covers all of E.
Given a (boolean) table showing all the tests and entities, Ball points out that you can form a concept lattice, and that the concept lattice shows control flow relationships within 1+ actual executions, analogous to the kinds produced by control flow static analysis.

More Dynamic Analyses

OK, so where do we find more examples of dynamic analysis? Here are some of Dr. J's notions of examples of interesting dynamic analyses.
statistical
Summarizing data by accumulation or averaging to give the big picture. _ FSA seems to be an example of statistical analysis.
pattern-of-interest
parsing event patterns to find bugs, or even just to find items of interest. note that event pattern parsing must carefully define its domain, skipping over events that don't effect the pattern match. note also that event pattern parsing will usually be done non-deterministically and maybe in a massively-parallel model
higher-level-events
one variant of the pattern-of-interest notion is to identify events at a higher semantic level, such as aggregates of lower level events, or application domain events
categorization
figuring out when a class implements a stack, or is using dynamic programming, or whether it employs a feature for which a specialized tool is available
profiling; coverage
treating hotspots and coldspots specially; for example the former deserve extra performance tuning monitors, while the latter deserve extra typographic paranoia monitors

lecture 19

Reading assignment

For Thursday, read Reiss' Paradox paper.

Graphic Design of the Day: Perspective Wall

Hey, did you notice that there is an "information visualization wiki"? Interesting...

Research Paper of the Day: Bohnet and Dollner

This is a "short paper" pointing out an interesting tool with lots of ideas to think about.

lecture 20

Nate's Structure Monitor

Simple graphics, reminiscent of Playfair's classic graphic design. Ya, it is a cheap trick, but it works.

Paul Nathan presents comments on Steve Reiss' Paradox of SV paper

Metaphor-Based Animation of OO Programs

lecture 21

X3D for Software Visualization

Mondrian

Viz tools conflict: gnuplot generality of reading file formats vs. Alamo-style run-time access to original data. Mondrian sez: instead of moving the data to the viz tool, move the visualization tool to the data. Provide not a file format, but an interface and allow a declarative script to specify the visualization. Work directly with the objects in the data model. Let the programmer visualize what they are doing in their environment/tools. SmallTalk-based tools trying to be relevant to a non-SmallTalk world.

Challenges for InfoVis Engines

vis. engine should be domain independent
visualizations should be composed from simpler parts
visualization should be definable at a fine grained level
instance-based, not type-based; sometimes different instances of the same type play different roles
minimize object-creation overhead
vis. works off a model of a running system, but instead of duplicating objects in the system, how about using them directly?
visualization description should be declarative
compare w/ Tango, Dance, and UFO for that matter

Other Mondrian Highlights

Declarative Syntax which look like...
view nodes: model classes using: Rectangle withBorder
   forEach: [:eachClass | eachClass viewMethodsIn: view]
Screen-Filling System
Mondrian has a lot of structures to visualize simultaneously... And it has structures that are too wide to fit the window.
Built on top of Moose
You just know it has to be good.
Interesting Mention of CodeCrawler
"visualizations of combined metrics and structural information"

lecture 22

Visualizing Dynamic Memory Allocations

JIVE (Java Interactive Visualization Environment, Gestwicki et al)

Major requirements:
  1. depict objects as environments. method calls happen inside one.
  2. multiple views. different granularities. detailed view and compact view.
  3. histories - of execution, of method interaction... show sequence or collaboration diagrams (how do they address scalability? From Figure 1 the answer initially seems to be: they don't; from Figure 2 one answer is, things shrink down to points). This is not summary statistics, it is timelines and such
  4. forward and backward execution. state-saving model. big big logs.
  5. queries on the runtime state. when did a variable change; or when did it achieve a certain value
  6. clear and legible
  7. use the stock JVM
  8. be able to visualize programs with GUI's!!
Graphic design: simple, relatively easy to understand, scales poorly (minimal "visualization" involved, maximum IDE/debugger-like feel)

Analysis: hardwired, except that it supports a range of queries. What is the query language?

Implementation: Two-process model, supports multiple threads so long as only one runs at a time. Log file coupled with "in-memory" execution history database. Events are able to commit and un-commit themselves.

7 event types: static context creation, object creation, method call, method return, exception thrown/caught, change in source line, and change in variable value.

Stepping backward does not modify the client program, it is suspended until you get back to the current state and move forward. (Means: you can't modify the past, but maybe you can modify the present).

Queries: on program history; may return values, sets of states, or portions of program history. Visual representation of program states and program history means queries and results may be done graphically. Queries vis-a-vis variables in single instances or classwide.

No evaluation of scalability or effectiveness of using UML-like depictions.

JPDA

Earlier there were the JVMDI and the JVMPI; now there is the JPDA. JIVE lives with whatever the JVM dishes it. JPDA includes the JDI (Debug Interface), JDWP (Wire Protocol), and JVM TI (Tools Interface) which replaced JVMDI/JVMPI.

"remove view of a virtual machine in the debuggee process".

theStackFrame.getValue(theLocalVariable)
... transmitted via a socket / JDWP ...
jvmti->GetLocalInt(frame, slot, &intValue)
... result transmitted back...

lecture 23

Visualizing software as cities; 3D "visualizations" using barcharts...

lecture 24

Final Project Presentations

Next Monday 3-5pm, except Nate, who is going this Thursday. 5 students in 120 minutes, hmm, that's 24 minutes per student. Figure you will be allowed at most 24 minutes. You can come in under that.

Metaphors Hall of Shame

Paul Nathan was kind enough to share this link.

Graph-Based Visualization of Software Evolution

For me, this paper is mainly eye-candy, but it is another representative of the class of visualizations that are geared towards understanding the changes in software over time, the same perspective the authors of the visualizing-software-as-cities paper took. It is not the here and now of a current execution, it is the view across the ages.

Reduction Complexity of Object-Oriented Execution Traces

This paper says it is all about filtering techniques, which makes it potentially important.

Execution traces are very large, and very redundant. The analysis used in Visualization abstracts and filters before it starts drawing lines. Figure 2 of this paper gives a nice toy example in which a tiny duplication is removed; now scale it up many orders of magnitude.

Idea of multiplicity; how about regular expressions to describe multiplicity?
A->B*-*gt;C*D

Removing "utilities": constructors/destructors, accessor methods, utility and library classes. Potentially many incoming edges, with few or no outgoing dependencies.

Polymorphic methods: execution tree differences can be ignored when the abstract function performed is understood.

Visualizing Software Executions as Populated, Dynamic Cities

Dr. J's fatal-flaw view of visualizing software as cities: many or most (especially OO) programs are understood largely through their relationships between classes and between instances. Software as cities doesn't automatically manage to depict such relationships at all. It got as far as colocating classes in the same package.
Classes are buildings, sure
height=# methods, width=#variables, length=(log of) longest code. Privates below ground.
What is the model of time?
Today = current execution run. CVS repositories and previous execution logs make for remembrances of things past.
Limited ("Prince of Persia") backwards-in-time capability?
I think limited-reversible is better than no reversible, and is more scalable than full-reversible. Limited reversible may mean, if you go back past a certain point, you'll not be able to see as many details, or change the execution from that point. Assuming we are collecting fairly detailed traces, you can go backward farther than that in a replay-only mode.
How to represent procedures
treat like a class w/ 1 method. Lotta procedures = village.
How to represent instances
As people? Library instances as robots? Garbage as undead? There was an idea of a Garbage Collector going around blasting the undead while a viewer watches or helps...
How to represent atoms
Not at all? As text? As virtual books (strings), hammers?? (ints) and saws?? (reals)? What about tables and lists? Records got special treatment as people; tables and lists as bookshelves, or buses, or?
How do represent external entities
network connections, I/O handles, files...
Why should one need associations in the metaphor?
Because we are in venice, or in hell, or in New York. Step off the sidewalk and you are dead.
What associations are depicted, and how?
We need at least: inheritance, aggregation, and reference.
How to depict inheritance and aggregation?
aggregation = adjacency, bridges. inheritance = physical resemblance
How to depict reference?
boats
What are the streets?
In Venice, there are a few streets to handle high traffic.
How to represent the stack
In discussion, there seemed to be support for the beam-of-light model, pointing backwards from callee to caller. Dr. J would add: the beam of light might be a good metaphor for an instant-teleportation feature...
How to represent bugs and warnings
As monsters
How to layout buildings?
Around an older, urban core? Minimize distance of overall call graph?
What are ghosts?
Remembrances of fixed bugs and deleted code
How to present source code control structure details.
There is the raw codesize, the extent of nesting
How to present data details.
Well, instances are a lot of the data, and atoms are the rest. A prime issue here is one of aggregation. When is an object a citizen of the world, and when is it just somebody's foot? I guess the answer is: when referenced globally, or by two or more other instances.

Question: How to Make Static Analysis in Unicon Much Easier?

Suppose I want tools like the software-as-cities, and its too much work. Maybe the Alamo framework makes the dynamic events easy enough to grab, but how do I make the static info easy enough to grab? The lexer and parser for Unicon are widely available, what else do I need for this type of project? What generic static analysis tool(s) should we invent? What should be its model? Execution monitoring was modeled as a sequence of events (while EvGet()). Is there a collection of static analysis foundational data, and a set of generic operations, that we should standardize? For example, for a hypothetical USA tool, analysis produces a tuple (Σ, Π, Χ) where Σ is the set of source files, Π is the Parse Tree forest, and Χ is the control flow graph? Yeah, this is a lame start, but at least it will allow you to tell me what should really be there.