Lecture Notes for CS 404/504 Program Monitoring and Visualization

Note to Dr. J: next time you teach this course, review and re-order some papers and lecture material up to the front.

Syllabus

What this Course is About

This course is a blend of It turns out that much of the key connecting glue between monitoring and visualization comes from static analysis, the study of program properties observable from the source code.

Each week, you can expect part of the lecture material to come from dynamic analysis and part from graphics/visualization. Similarly, part of the time each week will be studying interesting work done by others, and part of the time will be engaged playing with my research infrastructure, working on software tools that will (hopefully) advance the state of the art.

Reading Assignment #1

Early History of Monitoring and Visualization according to Jeffery

Others may have more and better information, but this is my version of that subset of computing history relevant to this course.

When the computing industry reached a stage of having interactive, text screen terminals, all kinds of new bugs became common-place. Along with mankind's increased ability to generate bugs, a whole slew of tools and techniques were developed to understand program executions, including tracing, and source level debuggers. These tools still work, they just don't scale well. Sadly, if you look at a modern IDE its debugging and tracing capabilities are not much improved from what was available 40 years ago. This is (I claim) because problems in monitoring and debugging are hard, and the cost of building new tools which might advance the state of the art is very high.

By the 1980's, interactive 2D graphics was ubiquitous and improving rapidly in performance. People started to use graphics to help understand program execution behavior partly because text-only techniques did not scale well, and partly juse because the graphics was available. A movie called "Sorting out Sorting" (parts 1,2,3), originally presented at SIGGRAPH, made a compelling argument that graphical techniques could be valuable in teaching and understanding algorithms.

Sorting Out Sorting was done one frame at a time on truly ancient facilities. A group at Brown University (home of graphics guru Andy Van Dam, algorithms guru Robert Sedgewick and a cast of thousands) set out to replicate on interactive workstations what Ron Baecker had done a frame at a time. One result of this effort was Marc Brown's Ph.D. and related software. We will present more history in a later session.

What About Us?

Announcements

There is a bblearn for this course now. It has a HW#1 posted, but I am not so sure I like it. I may think of a better HW#1 for you, by this weekend. Check for HW#1 on Monday. In the meantime, learn some Unicon.

Unicon 101

Unicon: the Easiest Parts

Let's ssh into a test machine to live-demo the following:
Types Control Flow
string success vs. failure
integer if-then-else
real while-do
cset calls, argument rules
list generators
table case-of
file every-do

Alternate Resources for Unicon Study

None of this is assigned reading. It is here for your convenience; you know, in case you just hate the Unicon book.

Monitoring Framework Intro

An execution monitor (EM) observes events in a target program (TP). There are two-process, one-process (callback), and thread-models.
two-process model
EM and TP communicate via network sockets, pipes, or files.
one-process/callback
The TP calls the EM when an event occurs. The EM is organized as a set of callbacks, i.e. it doesn't have its own main() or control flow, it just responds to things.
thread
EM and TP are threads in the same address space, making communication far easier.
Which model do most debuggers use? The two-process model. Which model should we use for visualization tools? What is different about their requirements?
two-process model
Pros: Cons:
one-process/callback
Pros: Cons:
thread
Pros: Cons:

Graphic Design of the Day: a map

Napoleon's March into Russia: proof you can legibly plot extra dimensions atop a map. Maps have legends to explain what's on them, along with two primary dimensions which are intuitively based on actual geometry.

lecture 3

Reading for this week

HW#1 revised

Compared with last time I taught this class, I want you to spend enough time to learn Unicon, or rather the 1/2 of it that will be useful for writing visualization tools.

Highlights from Hirose

[Hirose97] describes research from the University of Tokyo, presented at the annual conference of the World Society for Computer Graphics.

Cheesey Movie References

What movies present topics relevant to this class, i.e. program visualization, program behavior monitoring, or virtual environments where such activities occur?

Graphic Design Principles

We need graphic design principles in preparation for visualization work. The following can be attributed to Edward Tufte, a renowned ivy league graphic designer who has written some beautiful books.

Graphic Design of the Day: a scatter plot

A map of London by John Snow, 1854, cleaned up by John Mackenzie of the University of Deleware.

lecture 4

Mailbag

I am having trouble using the star operator on lists, *L
The size operator *L works only after L has been assigned a list value. L := []
How do I check if a string is not in my list of strings?
Well, first off, if one were doing this a lot maybe one should use a set instead of a list. Unicon has a set type. But for occasional use on lists of reasonable size, s==!L tells if s is in the list L. s ~== !L is not so good, it will almost surely succeed unless every value in L is s. Instead use not (s == !L)

Unicon: the next level

Let's peek at CS210 lecture notes on Unicon to see if I missed any highlights during the live demo.

Monitoring Buzzwords

Volume, dimensionality, intrusion, and access. Solve these four unsolvable problems and you've got the makings of a decent monitoring and visualization framework.
volume
if you think static analysis of source code has a lot of information the programmer may have to understand and/or deal with, wait until you see the amount of information dynamic analysis generates. Even small, short-running programs can generate millions and millions of events of interest. Monitoring and visualization tools have to filter/discard, condense/simplify, and analyze their input, turning low level data into higher-level information.
dimensionality
understanding program behavior involves many dimensions: control flow, data structures, algorithms, memory access patterns, input/output behavior... Visualizations can be selective, but often want to depict more than just 2 or 3 dimensions' worth of data even though they are using a 2D (or 3D) output device.
intrusion
The act of observing program execution behavior changes that behavior. Monitors have to minimize/mitigate this or they will be visualizing their own side effects more than the thing they purport to show. The first form of intrusion is to skew the timing of the observed behavior. Monitoring a program may also alter its memory layouts (e.g. on the stack), which might make bugs disappear (or merely exaggerate them).
access
Simple monitors might graphically depict exactly the information contained in the sequence of events that they deserve, but most monitors need to ask additional information, by accessing potentially the entire state of the program being executed.

Graphic Design of the Day: Line Plots

Multiple dimensions of weather along a primary time axis.
From the New York Times, popularized by Tufte.

lecture 5

Announcements

Unicon: Goal-Directed Evaluation

Surprised by Failure?

When to check for failure: everywhere that failure can occur, and everywhere that failure will matter. Examples:

Graphic Design of the Day

William Playfair's chart depicting area, population, and tax revenues of countries in europe is another excellent example of depicting multiple dimensions of data.

The slope between the population and tax revenues points down for most countries and sharply up for England (and less so, for Spain).

Introduction to Unicon Monitoring Facilities

events
billions and billions of tiny points in time, with a tiny data payload, and the ability to easily inspect the entire program state. Event names like E_Pcall or E_Lbang
event keywords
&eventcode and &eventvalue
built-atop co-expression data type
threads that take turns. AKA coroutine, goroutine, or co-operative or synchronous thread.
the VM is instrumented for you
asymmetric coroutines. VM C code sends events to monitors written in Unicon

lecture 6

Reading for this week

Ideas from Visualizing Software in an Immersive Virtual Reality Environment

More Unicon

Notes from Past Students' Unicon Program Visualizations

Sorting Out Sorting, Unicon sample solution

This version is based on one by Mike Wilder.

lecture 7

How's the Homework Going? Any questions?

Things that might be useful:

Unicon: Threads and Co-expressions

Sorting Out Sorting, Unicon sample solution

Let's look at the code from this sorting visualization, based on one by Mike Wilder. Start at the bottom, with main().
You've seen !x before, but how about x ! y
x ! y is the apply operator. It calls function x with parameters given by the elements of list y.

lecture 8

Reading for this Week

Thoughts on 3D Visualization for Software Development

Bonyuet/Ma/Jaffrey, ICWS 2004
Basic PC GPUs existed by this time; World of Warcraft came out in 2004.
Key Criteria: usefulness, intuition, and scalability
What were their definitions for these?
Schneiderman's 7 tasks
overview, zoom, filter, detail-on-demand, relate, history, extract
CodeMapping achieves: labeled "atomic metaphor" 3D graphs.
did they achieve their key criteria?

Introduction to Unicon Monitoring Facilities, Part 2

built-in function EvGet(c)
Activates &eventsource (Monitored) to get next event whose code is of type c
event codes and masks
an event code is a one-letter string. an event mask is a cset. This is, literally, just grad-student-drj exploiting the handy bit vector implementation that was in Icon.
link evinit
library function EvInit(argv) loads program
$include "evdefs.icn"
include file evdefs contains definitions of event codes

Writing your first Unicon monitor

Consider the beauty and virtue of m0.icn, m1.icn and events.icn. Now checkout sos.evt

lecture 9

Windows Unicon Trouble?

Summary of Event Monitoring Libraries

From unicon/ipl/mprocs
evinit
EvInit(args) loads another Unicon program that is to be monitored
evnames
evnames(e) maps event codes to English, e.g. E_Pcall -> "procedure call"
evsyms
returns a table that maps codes to symbols t[E_Pcall] -> "E_Pcall"
...
there are several more that we will introduce as needed
From unicon/ipl/mincl
evdefs.icn
$defines for all 100+ event codes. We should probably tour this.
patdefs.icn
$defines for the ~100 integer &eventvalue's of the E_PatMatch event

HW#2 status

Subject to some tweakage, here it is.

Unicon 2D Graphics Functions

We briefly discussed the built-in 2D graphics function set.

Functions you might have a use for in this class:

3D Functions We Will Worry about Later

Functions you probably don't need in this class:

lecture 10

Unicon Mailbag Questions

How does open mode "p" work?
You don't have to use it you can do anything that you find works for you. But open(cmdline, "p") runs cmdline in a shell and opens a file that reads its standard output into your program.
Linux Example Windows Example
f := open("ls -l | grep icn", "p")
while filename := read(f) do stuff(filename)
close(f)
f := open("cmd /C dir", "p")
while line := read(f) do
   if find(".icn", line) then stuff(line)
close(f)
how would I make global lists or tables that I can access in other procedures?
  1. Declare global variables.
  2. Assign them list or table values (maybe in main())
  3. They will then be visible everywhere.

Partial Highlights from HW#1 Solutions

    outfile := open("output.json","w")
	# OK, but check whether open() fails or not

    s := f(s, "morestuff")
    	# functional style is fine and appropriate. no reference parameters.

    L := []
    every put(L, !fileIO)
	# OK, but consider L := [: !fileIO :]

    truth := 1
       ...
	    if truth = 1 then {
	# fine, use boolean flags if you must. no boolean data type.
	# More common to use &null as false and non-null as true.

     every x := find("(", line) do {  #finds every instance o
     	# outstanding; uses find() to iterate through line

     if not member(&letters ++ &digits, line[x-1]) then {
     	# fine, but if you do this a lot of times, pull ++ out of the loop

     hashIndex := &null
     hashIndex := find("#", line)
     if hashIndex ~=== &null then{
     	# fine, but consider
	#   if hashIndex := find("#", line) then { ...


     system (["cflow", "--omit-arguments", name], f, f, f3)
	# wow, kudos for using the full power of system()! Is this better than
	#   system("cflow --omit-arguments " || name, f, f, f3)

      word := tab(upto("("))
	# kudos for using string scanning!
	# consider using tab(find("(")) or change to tab(upto('(')))

      every i := 1 to *args do tableofprogs[args[i]] := preprocess(args[i])
	#  every arg := !args do tableofprogs[arg] := preprocess(arg)

      p := <[_a-zA-Z][_a-zA-Z0-9]*[ \n\t]*"(">
      p2 := p || .>y
      s ?? p2 -> s2
	# wow, regexps and patterns!

      if s2[j] == (" "|"\n"|"\t"|"(") then {
        # if any(' \n\t(', s2[j]) then { ...

      system("cflow  cflow.c > info.txt", "p")
	# hmm, possible mixed metaphor

      if(pos ~== 0 ) then
	# if pos ~= 0 then

      if(lenghtOfString(L[i]) = 1 )then {
	# not just misspelled, also misleading

      wchar := &letters ++ &digits ++'\'_'
      lista ? while tab(upto(wchar)) do {
	# this is good practice

      n_pos :=  find("()", p_name)
      f_pos := find(")", p_name)
      if  p_name[n_pos] == "(" then {write("nice")}
	# better know for sure that these can't fail, or check

      procedure getSpaceNumber(line)
      local pos:=0
        space := line[1]
        while (space == ' ') do {
          pos := pos + 1
          space := line[pos]
        }
        return pos
      end
	# many(' ', line)

      if not (tab(find("class"|"procedure"))) then {
	# cool

      &pos := &pos + 6
      	# move(6)

      lineno=0
	# lineno := 0

Visualization Principles (according to Dr. J)

animation
incremental algorithms are a primary means of achieving efficient animation. complementary to the principle of minimizing ink (or # pixels) used to convey a given set of information, this is like minimizing the motion of the plotter arm, or in our case, the # of memory writes.
least astonishment
use the golden rectangle, labels and legends
metaphors
a familiar metaphor saves the user a lot of time and improves understanding. Metaphors can be taught, and become familiar over time, but that is often laborious.
interconnection
connecting different pieces of data is key, follow Playfair's example
interaction
the big difference between a visualization and a paper chart or graph is that the user can interact with the data. exploit this.
dynamic scale
visualizations compete for screen space and hardware varies widely. it is extra work, but if you write everything so that it scales, your visualization will be useful on more machines and in more ways.
static backdrop
one of the best ways to make dynamic data understandable is to present it in terms of static data. An execution is an instance of the underlying universal abstract thing that is the program.

Notes from Past Students' Unicon Code

main(av)
av is always a list of strings; if no arguments, *av = 0
paramnames() is a generator
use it with every, or ask questions like "if type(x:=paramnames(...))=="list" then..."
the apply operator p ! L is pretty awesome
what does every maxval <:= !L do?
max() is a built-in function, so maxval := (max ! L)
failure and success
if i := find() then ... is cooler than i := find(); if \i then ...
check for open() failure
I asked nicely before, now I am telling you
sticking &fail at the end of a routine is a noop
a routine fails for free if it falls off its end; &fail does not return a failure and is in fact seldom used. Unlike lisp, the return value of a function is not its final expression's evaluation.

Graphic Design of the Day

Fisheye Views.

If you want, you can read Furnas' paper on Generalized Fisheye Views.

Suspects, Tools, and Big Programs

As we proceed into the "meat" of the course, we have a need for lots of subject programs to study, lots of example monitors, and bigger programs that presumably will have more complex behavior.
Suspects
This directory was compiled by Ralph Griswold as a collection of interesting or weird programs whose behavior could be understood by program visualization. The good part of the Suspects directory is that the programs all run non-interactively, in some cases they were modified to do so, and those that require input have sample .dat files on which they run nicely. This lets monitors do their thing unimpeded. We should probably add some representative object-oriented programs to this collection this semester. I probably can dig out my "gui recorder" and create recordings of GUI programs so that we can monitor them conveniently in this context.
tools
This directory was compiled by Clinton Jeffery as a collection of simple program visualization programs and library procedures. Many of these codes are featured in the book, Program Monitoring and Visualization.
Big Programs
The largest programs in the suspects directory are typeinfer (2.6k lines), and yhcheng (1.9k lines). These were considered large in the Icon language, where source codes are typically 1/3 to 1/10 the size of C programs that do the same thing. The other largest public domain Icon programs are in the ipl/*packs directories. Among these, ibpag2 is 3.7k lines, itweak is 3.5k lines, skeem is 3.1k lines, ged is 3.6k lines, htetris is 4.3k lines, vib is 4.4k lines, and weaving is 11.3k lines (?). Monitoring these might or might not be easy, since they may be interactive, and you might or might not know what to click at them in order to get them to behave. The largest known Icon programs (source not available) was Bill Wulf's testcase generator (rumored to be on the order of a half-million lines, perhaps machine-generated.

The Unicon language supports larger programs than Icon was intended for. The unicon translator itself is 10k lines of Unicon. The uni/lib class library is 20K lines, and the uni/gui GUI class library is 14.5K lines; large subsets of these libraries may be added onto whatever the tool size is. The Unicon IDE is 17K lines, the IVIB user interface builder is 16K lines, and so on. Some of these you can acually monitor.

The largest Icon/Unicon programs for which I have source code include the SSEUS database review/update system (35K lines), and a Knowledge Representation language and system (50K lines) done by an AT&T scientist. It might be possible to find these and monitor them, but it would take work to set them up for monitoring.

lecture 11

Mailbag

How can I set the width and height of the string that I print with DrawString() using the values height and width from the dot output?
Great question. Text labels are going to be important all through this course. Visualizations often botch them: either not enough, or too many to the point they are unreadable.

Reading

Highlights from OGRE [Milne/Rowe 04]

topics related to memory are the most difficult
pointers, dynamic memory allocation, copying, polymorphism... (9/10 of the most difficult topics for novices identified in a previous paper)
[Knight and Munro 2000] "Software World" sounds interesting.
Not assigned as homework/reading. It proposes a city metaphor in which:
Object-oriented systems can be harder to understand than traditional imperative code.
So maybe it would be more important for us to figure out how to visualize them.
A conceptual view is needed more than a literal view of memory
At least for novices, sizing each object to its # of bytes is not the main point.
Understanding scopes is important. Each one gets a plane.
Local scopes are mostly extremely numerous and short-lived. We need a metaphor in which these "planes" or sets of variables/objects come in together in a rush, and leave together with a wimper. A lot. We are looking for a metaphor for the stack. Of course, we could depict them as a stack. Pancakes? Waffles?

Note OGRE's target: novice C++ programmers who need to develop a very concrete mental model of how pointers work.

More Unicon highlights from HW#1 code?

      if first:=find("at ", line) & lineNumb==1 then{
         move(first+2)
		# what's the difference between tab() and move() again?
	# extremely common: tab(find(...)), tab(upto(...))

      while move(1) == " " do {
         count:=count+1
      }
	# count +:= 1 ok, but how about count +:= *tab(many(' '))

Graphic Design of the Day

CASSE POSTALI DI RISPARMIO ITALIANE by Antonio Gabaglio, via the revered Tufte, and cited in a nice discussion of cyclic data, apparently by Benj Lipchak.

Unicon feature of the day: Packages

Packages were added to Unicon more or less against my will, but they are obviously of growing importance in larger scale development. Packages are about protecting a name space from collisions. Without them, global variables in all modules are shared, and accidentally, these variables may conflict with globals (and undeclared, thought-to-be locals!) in other modules. The more libraries you use, the more inevitable these conflicts. Proof that packages are needed is evident in the Icon Program Library, where, after fundamental built-in functions like "type" were accidentally assigned one too many times by client code, Ralph Griswold got in the habit of protecting "type" or similar built-in functions the hard way, inside each library procedure that uses them:
   static type
   initial type := proc("type", 0)	# protect attractive name
This gets old in a hurry, and it actually bloats code a little bit.

So anyhow, Robert Parlett implemented packages, and I accepted them, and now they are here to stay, and they aren't bad. You do have to know the "package" and "import" keywords, and the ::foo syntax, and that is about it.

lecture 12

Mailbag

I am stuck trying to parse dot output. For example, if I have a string
    s == "     a -> b  [pos=\"e,63,108.41 63,143.83 63,136.13 63,126.97 63,118.42\"];"
how do I parse it?
Well, obviously we are still learning Unicon and I will take whatever bloody harvest of bytes you manage to deliver me. But if I had to do this homework, I might start with something like:
   s ? {
   tab(upto(&letters))       # discard up to node name
   srcnodename := tab(many(&letters))
   tab(many(' \t'))          # discard whitespace
   if ="->" then {           # we have an edge
      tab(many(' \t'))       # discard whitespace
      dstnodename := tab(many(&letters))
      tab(many(' \t'))       # discard whitespace
      if ="[pos=\"e," then {
         L := []
	 while num := tab(upto(', \"')) do {
	    put(L, numeric(num))
	    if ="\"" then break
	    else tab(many(', '))
	    }
         }
      else write(&errout, "malformed edge")
      }
   }
I am stuck trying to use DrawCurve(). From reading the book, I understand that the arguments need to be x,y pairs. My issue is when I try to pass DrawCurve() a string or a list as an argument containing all of the x,y pairs. For example a string or list containing [127.7,180.41, 127.7,215.83, 127.7,208.13, 127.7,198.97, 127.7,190.42]. I am assuming the string or list gets treated as just one argument when I do this, is there something else I can do?
DrawCurve() and the other Draw*() functions do not take their parameters in a list or a string. If you have all your arguments in a list L, you can turn them into parameters using the apply operator, as in DrawCurve ! L

cflow on Windows

If you dare, check out https://github.com/noahp/cflow-mingw. It is either a nice guy who built cflow on Windows and shared it with the world, or a nefarious ransomware hacker luring victims with offers of cflow binaries. If you don't like trusting his .exe's you can certainly examine the source code and try to follow this github project's build instructions. How I found it: googled cflow.exe. Random github is not a highly reputed official distribution, but at least with source code it is not obviously one of those codehosts of ill repute malware sites, like a fake device driver repository.

dot on Windows

There have been reports of problems running dot on Windows. graphviz.org provides windows executables and dot.exe seems to work OK. Maybe it conspicuously chooses not to add itself to your PATH; adding the directory where it was installed to the Path got it working for one student. In another student's case, instead of open("dot ...", "p") we ended up using open("cmd /C dot ...", "p"). That smells also like a Path issue, but I am not sure.

Monitoring Location Events

MiniLoc

Vizualization Idea: The program miniloc.icn is a "miniature location profiler". It is our first example from the tools/ directory mentioned in an earlier lecture. It is 66 lines of code. What is "mini" about miniloc is that each source code line and column is one pixel row and column. This is a scaling problem for large programs or small monitors. Miniloc could be rewritten to scale its graphics. The frequency of location events at various locations is recorded using a log scale through a range of colors from boring to red-hot. Humans don't really perceive red as a larger # than green, but the metaphor of a temperature map is widely recognizable anyhow.

lecture 13

Reflections on Miniloc

My first thought after briefly running miniloc last time was: After sleeping on it, additional ideas came calling: For the small-font legibility question, we might take a look at this font demo.

Bigger Questions

Piano

Vizualization Idea:

Hani's Clever Case Tag

Case expressions in Icon use === semantics, looking for an exact match with no type conversions. Case branches are evaluated sequentially as if one were writing
  if x === firstbranchexpr then firstcodebody
  else if x === secondbranchexpr then firstcodebody
  else if x === thirdbranchexpr then firstcodebody
  ...
If all the branch labels are constants, this is colossally inefficient compared with a C switch statement. But, it is fully general and you can use arbitrary expressions, including generators, for which the entire result sequence will be generated in trying to find a match.

You can add a predicate filter on the front, or have your values supplied from subroutines, or whatever:

   case x of {
   p() & q() & foo: { ... }
   a | b | 1 to 10 | f(): { ... }
   }

This generator capability can be used with cset event masks, as in the following; it would also work with sets, table keys, or any other generator you wanted to write.

case x of {
   ...
   !ProcMask: {
      }
   ...
   }
This makes for short elegant code, but it is inefficient. Generating the individual elements out of a cset costs a type conversion (cset to string) which isn't cheap, and all generators pay for extra bookkeeping on the stack, for that suspending resuming capability, which is slow at times. You are paying for convenience and generality, and a good optimizing compiler might make some of that go away, but the VM sure does not. In a couple minutes we will see another measure of how much you pay. But in the meantime...

Hani Bani Salameh showed me some code once that looked like:

case x of {
   ...
   member(a_set, x): {
      }
   ...
   }
member(a_set, x) tests whether x is a member and returns x if it is, so it is just a filter, and by the way it avoids a linear search via a generator so it is fast. Its got a seemingly redundant comparison of x===x after the member() test succeeds, but that is C code and probably very fast compared with a case with a lot of alternation | or generate ! operators in it.

lecture 14

Reading

Highlights from [Wettel and Lanza]


Monitoring Procedure Activity

Monitoring Icon and Unicon is a little more complicated because procedures can suspend and be resumed. The events for this behavior are given below. The include file evdefs.icn defines an event mask named ProcMask that will select all six of these events.

Event Value Description
E_Pcall procedure called Procedure call
E_Psusp value produced Procedure suspended to caller
E_Presum procedure resumed Resume a previous suspension
E_Pret value produced Procedure returned to caller
E_Pfail failing procedure Procedure failed
E_Prem removed procedure Procedure removed

In the presence of suspend/resume, the "call stack" becomes a "call tree", a.k.a. an activation tree (a better term since procedures can be activated by more than just calls).

You can just ask for all the procedure activity events, but if your monitor is doing more than just counting them then it potentially will need to do more. One way to monitor the activation tree is to build a model of the tree itself.

We will look at examples that use evaltree, but first a word on timing.

The time cost of monitoring

Example. In the suspects/ directory are many candidates (which one runs the longest?). We will consider the poetry scrambler for this example.

time ./scramble <scramble.dat
uses the UNIX time(1) command to measure the runtime externally. It reports something like:

Sun Sparc, ~9/2007 Threadripper, 2/2019
1.0u 0.0s 0:03 32% 0+0k 0+0io 0pf+0w
0.019u 0.025s 0:00.15 20.0%	0+0k 0+0io 2pf+0w

Over a decade ago, that program took 1.0 seconds of user time, 0.0 seconds of system time, 3 seconds of wall-clock observed time. Out of curiosity, since it writes out a lot to standard out, I re-timed it directing output to /dev/null, and it still took a second of user time, but the wall clock is down to 1 second.

Now I take an almost-empty monitor, timer.icn, and time it using the UNIX utility.

time timer ./scramble <scramble.dat
and it writes out

Sun Sparc, ~9/2007 Threadripper, 2/2019
tp time: 1830 - 0 = 1830
em time: 0 - 0 = 0
1.0u 0.0s 0:03 30% 0+0k 0+0io 0pf+0w
tp time: 35 - -5 = 40
em time: 5 - 5 = 0
0.025u 0.018s 0:00.15 20.0%	0+0k 0+0io 2pf+0w

Given that timer.icn is the "empty monitor", what do these numbers tell us?

Time measurement accuracy is limited by tools of observation and hardware/OS limitations. Another problem with measurement is that external evironmental considerations (load average, user activity) change results to some extent. The 2007 measurements were done long ago on mars.cs.uidaho.edu, a sparc Solaris machine. The "who" command reported 5 different people logged in at the time, although the load average was apparently low (inactive terminal sessions). The 2019 Threadripper numbers were for the machine in my office running Fedora. Lots of processes, only 1 user.

lecture 15

No Class on Monday

Monday is President's Day.

Mailbag

How do I draw arrowheads?
The arrow is to be drawn from the last point to the point given with the "e,x,y" at the beginning of the pos attribute. Possible implementations:

Upcoming Conferences

Some of you should consider doing a semester project worthy of a research paper. Some of you might even want to target one of these venues.

A Brief on Windows Unicon

HW#3

Timing, Part 2

time ../tools/timerloc ./scramble < scramble.dat > /dev/null
tp time: 366 - -6 = 372ms
em time: 394 - 6 = 388ms
0.490u 0.881s 0:01.46 93.8%	0+0k 0+0io 2pf+0w
Wow! Is that a factor of 100x? BTW, a pthreads context switch, where the OS gets involved because you want to support true multicore or whatever, costs maybe easily another 100x. Now, I wonder how much evaltree costs? A past student once claimed it was "slow". I wonder why that would be... It would be useful to know whether the co-expression switch totally dominates the time spent in the monitor. Although our intuition says it does, intuition is not always correct. Evaltree costs: a big case statement (not very efficient in Icon/Unicon), whose labels are generators (not very efficient), whose code bodies do allocations and list operations (pretty darned fast), and call the monitor callback procedure. One way to do our experiment is to measure &time before and after each EvGet(), and instead of measuring time spent in the target program, measure the the other time, time spent in the monitor. Another way to do the experiment is to rewrite the evaltree() functionality for speed instead of clarity, and see if it is measurably different or not.

Compare evaltime.icn, evaltime2.icn, evaltime3.icn, showing an attempt to do this experiment.

time evaltime ./scramble <scramble.dat
shows

Sun Sparc, ~9/2007 Threadripper, 2/2019
tp time: 2760--10=2770
em time: 6670-0=6670
10.0u 0.0s 0:18 55% 0+0k 0+0io 0pf+0w
tp time: 56--7=63
em time: 207-7=200
0.212u 0.094s 0:00.30 100.0%    0+0k 0+0io 0pf+0w

Using evaltree, the monitor is accounting for the vast majority of the time, and the time reported for the target program is much slower than for the unmonitored or empty monitored cases. evaltime2, which skips the evaltree mechanism but uses a big case statement, gives:

Sun Sparc, ~9/2007 Threadripper, 2/2019
tp time: 2490-0=2490
em time: 2660-0=2660
5.0u 0.0s 0:08 61% 0+0k 0+0io 0pf+0w
tp time: 55--7=62
em time: 90-7=83
0.113u 0.085s 0:00.19 100.0%    0+0k 0+0io 0pf+0w

Cost of monitoring is substantially lower, although the particular details may be affected by machine load fluctuation. One would have to run several times and take averages for the numbers to be meaningful. Using evaltime3, which avoids the large case statement, we get

Sun Sparc, ~9/2007 Threadripper, 2/2019
tp time: 2580-0=2580
em time: 2050-0=2050
5.0u 0.0s 0:07 70% 0+0k 0+0io 0pf+0w
tp time: 60--8=68
em time: 76-8=68
0.088u 0.103s 0:00.19 94.7%     0+0k 0+0io 0pf+0w
At this point, monitoring procedure activity is seen to impact execution time substantially, but at least the monitor is taking no more time than the target program.

Many Morals of the story:

scat

The scat program is a simple application of evaltree. You kind of have to see this one running to appreciate it, so let's try and demo it. It links in a scatterplot library which might or might not be useful to you; scatlib implements the log scaling that scat uses.

$include "evdefs.icn"
link evinit
link evaltree
link scatlib
Scat uses several global variables, three tables to remember what has been plotted, and three clones set with different colors.
global	at,   # table: sets of procedures at various locations
	call, # table: call counts
	rslt, # table: result counts
        red,
        green,
        black
Scat uses a generic evaltree-compatible record type for modeling; no extra payload added.
record activation (node, parent, children)
The initialization is straightforward.
procedure main(av)
   local mask, current_proc, L, max, i, k, child, e

   EvInit(av) | stop("can't monitor")

   scat_init()
   red := Clone(&window, "fg=red")
   green := Clone(&window, "fg=green")
   black := Clone(&window, "fg=black")

   current_proc := activation(,activation(,,,,[]),[])
Control is handed over to evaltree, which calls scat_callback with events
   evaltree(ProcMask ++ FncMask ++ E_MXevent,
	    scat_callback, activation)

   WAttrib("label=scat (finished)")
   EvTerm(&window)
end
scat_callback mostly calls scat_plot, which calls colorfor to decide what color to plot with.
procedure scat_callback(new, old)
   case &eventcode of {
      E_Pcall:
	 scat_plot(new.node, 1, 0, , colorfor)
      E_Psusp | E_Pret:
	 scat_plot(old.node, 0, 1, , colorfor)
      E_Fcall:
	 scat_plot(new.node, 1, 0, , colorfor)
      E_Fsusp | E_Fret:
	 scat_plot(old.node, 0, 1, , colorfor)
      E_MXevent: {
         case &eventvalue of {
	    "q" | "\033": stop("terminated")
	    &lpress : {
	       repeat {
	          scat_click(proced_name)
		  if Event() === &lrelease then
		     break
		  }
	       }
	    }
	 }
      }
end
Procedure proced_name returns the name of a procedure, taken from its image.
procedure proced_name(p)
   return image(p) ? {
      [ =("procedure "|"function "), tab(0) ]
      }
  stop(image(p), " is not a procedure")
end
Procedure colorofone distinguishes procedures from functions.
procedure colorofone(p)
  return if match("procedure ", image(p))
	 then red else green
end
Procedure colorfor uses a list (of procedures/functions) to select what color to plot. If it is not the first color choice and the subsequent value should be a different color, resort to black. Return a red or green if all values say to be red or all say to be green.
procedure colorfor(L)
   if *L = 0 then return &window
   every x := !L do {
      if not (/c := colorofone(x)) then
	 if colorofone(x) ~=== c then
	    return black
      }
   return c
end

What is scat good for?

scat is cooler than you think. It shows not just who the hot procedures are, it also shows what procedures always fail, what procedures generate lots of results per call, and what procedures (predicates) generate between 0 and 1 result per call.

lecture 16

Office Hours Pushback

My office hours today will start at 3pm due to my boss requesting the half hour from 2:30-3.

More Class Cancellations

I am going to ACM SIGCSE in Minneapolis February 26-March 3. We will miss a Wednesday and a Friday class that week, sorry! I will be reachable by e-mail and will try to accommodate office appointment requests via Zoom.

HW#3 Discussion

algae

The flagship demonstration of the evaltree framework is a fairly literal visualization of the activation tree.

   EvInit(av) | stop("Can't EvInit ",av[1])
   codes := algae_init(algaeoptions)
   evaltree(codes, algae_callback, algae_activation)
   WAttrib("windowlabel=Algae: finished")
   EvTerm(&window)
Algae takes command line options to say how much to monitor, how to graphically depict the tree, etc. It deliberately chooses a simple-minded incremental graphic, coming from a time that graphic performance was deemed to be a likely monitor bottleneck. By default it uses hexagons for activation records (compare hexagons with a square grid). A real but still INCREMENTAL tree layout algorithm would be better.
procedure algae_init(algaeoptions)
   local t, position, geo, codes, i, cb, coord, e, s, x, y, m, row, column
   t := options(algaeoptions,
	   winoptions() || "P:-S+-geo:-square!-func!-scan!-op!-noproc!-step!")
   /t["L"] := "Algae"
   /t["B"] := "cyan"
   scale := \t["S"] | 12
   delete(t, "S")
   if \t["square"] then {
      spot := square_spot
      mouse := square_mouse
      }
   else {
      scale /:= 4
      spot := hex_spot
      mouse := hex_mouse
      }
   codes := cset(E_MXevent)
   if /t["noproc"] then codes ++:= ProcMask
   if \t["scan"]   then codes ++:= ScanMask
   if \t["func"]   then codes ++:= FncMask
   if \t["op"]     then codes ++:= OperMask
   if \t["step"]   then step := 1
   hotspots := table()
   &window := Visualization := optwindow(t) | stop("no window")
   numrows := (WHeight() / (scale * 4))
   numcols := (WWidth() / (scale * 4))
   wHexOutline := Color("white") # used by the hexagon library
   if /t["square"] then starthex(Color("black"))
   return codes
end
The real work happens in algae_callback()
procedure algae_callback(new, old)
   local coord, e
   initial {
      old.row := old.parent.row := 0; old.column := old.parent.column := 1
      }
   case &eventcode of {
      !CallCodes: {
	 new.column := (old.children[-2].column + 1 | computeCol(old)) | stop("eh?")
	 new.row := old.row + 1
	 new.color := Color(&eventcode)
	 spot(\old.color, old.row, old.column)
	 }
      !ReturnCodes |
      !FailCodes: spot(Color("light blue"), old.row, old.column)
      !SuspendCodes |
      !ResumeCodes: spot(old.color, old.row, old.column)
      !RemoveCodes: {
	 spot(Color("black"), old.row, old.column)
	 WFlush(Color("black"))
	 delay(100)
	 spot(Color("light blue"), old.row, old.column)
	 }
      E_MXevent: do1event(&eventvalue, new)
      }
   spot(Color("yellow"), new.row, new.column)
   coord := location(new.column, new.row)
   if \step | (\breadthbound <= new.column) | (\depthbound <= new.row) |
      \ hotspots[coord] then {
      step := &null
      WAttrib("windowlabel=Algae stopped: (s)tep (c)ont ( )clear ")
      while e := Event() do
	 if do1event(e, new) then break
      WAttrib("windowlabel=Algae")
      if \ hotspots[coord] then spot(Color("light blue"), new.row, new.column)
      }
end
Boring square graphics:
procedure square_spot(w, row, column)
   FillRectangle(w, (column - 1) * scale, (row - 1) * scale, scale, scale)
end

# encode a location value (base 1) for a given x and y pixel
procedure square_mouse(y, x)
   return location(x / scale + 1, y / scale + 1)
end
A whole new meaning for the term "graphical breakpoints":
#
# setspot() sets a breakpoint at (x,y) and marks it orange
#
procedure setspot(loc)
   hotspots[loc] := loc
   y := vertical(loc)
   x := horizontal(loc)
   spot(Color("orange"), y, x)
end

#
# clearspot() removes a "breakpoint" at (x,y)
#
procedure clearspot(spot)
   local s2, x2, y2
   hotspots[spot] := &null
   y := vertical(spot)
   x := horizontal(spot)
   every s2 := \!hotspots do {
      x2 := horizontal(s2)
      y2 := vertical(s2)
   }
   spot(Visualization, y, x)
end
User input handling:
#
# do1event() processes a single user input event.
#
procedure do1event(e, new)
   local m, xbound, ybound, row, column, x, y, s
   case e of {
      "q" |
      "\e": stop("Program execution terminated by user request")
      "s": { # execute a single step
	 step := 1
	 return
	 }
      "C": { # clear a single break point
	 clearspot(location(new.column, new.row))
	 return
	 }
      " ": { # space character: clear all break points
	 if \depthbound then {
	    every y := 1 to numcols do {
	       if not who_is_at(depthbound, y, new) then
		  spot(Visualization, depthbound, y)
	       }
	    }
	 if \breadthbound then {
	    every x := 1 to numrows do {
	       if not who_is_at(x, breadthbound, new) then
		  spot(Visualization, x, breadthbound)
	       }
	    }
	 every s := \!hotspots do {
	    x := horizontal(s)
	    y := vertical(s)
	    spot(Visualization, y, x)
	    }
	 hotspots := table()
	 depthbound := breadthbound := &null
	 return
	 }
      &mpress | &mdrag: { # middle button: set bound box break lines
	 if m := mouse(&y, &x) then {
	    row := vertical(m)
	    column := horizontal(m)
	    if \depthbound then {       # erase previous bounding box, if any
	       every spot(Visualization, depthbound, 1 to breadthbound)
	       every spot(Visualization, 1 to depthbound, breadthbound)
	       }
	    depthbound := row
	    breadthbound := column
	    #
	    # draw new bounding box
	    #
	    every x := 1 to breadthbound do {
	       if not who_is_at(depthbound, x, new) then
		  spot(Color("orange"), depthbound, x)
	       }
	    every y := 1 to depthbound - 1 do {
	       if not who_is_at(y, breadthbound, new) then
		  spot(Color("orange"), y, breadthbound)
	       }
	    }
	 }
      &lpress | &ldrag: { # left button: toggle single cell breakpoint
	 if m := mouse(&y, &x) then {
	    xbound := horizontal(m)
	    ybound := vertical(m)
	    if hotspots[m] === m then
	       clearspot(m)
	    else
	       setspot(m)
	    }
	 }
      &rpress | &rdrag: { # right button: report node at mouse loc.
	 if m := mouse(&y, &x) then {
	    column := horizontal(m)
	    row := vertical(m)
	    if p := who_is_at(row, column, new) then
	       WAttrib("windowlabel=Algae " || image(p.node))
	    }
	 }
      }
end
Calculating which activation a given click refers to:
#
# who_is_at() - find the activation tree node at a given (row, column) location
#
procedure who_is_at(row, col, node)
   while node.row > 1 & \node.parent do
      node := node.parent
   return sub_who(row, col, node)		# search children
end

#
# sub_who() - recursive search for the tree node at (row, column)
#
procedure sub_who(row, column, p)
   local k
   if p.column === column & p.row === row then return p
   else {
      every k := !p.children do
	 if q := sub_who(row, column, k) then return q
      }
end
A similar calculation for placing new nodes
#
# computeCol() - determine the correct column for a new child of a node.
#
procedure computeCol(parent)
   local col, x, node
   node := parent
   while \node.row > 1 do	# find root
      node := \node.parent
   if node === parent then return parent.column
   if col := subcompute(node, parent.row + 1) then {
      return max(col, parent.column)
      }
   else return parent.column
end

#
# subcompute() - recursive search for the leftmost tree node at depth row
#
procedure subcompute(node, row)
   # check this level for correct depth
   if \node.row = row then return node.column + 1
   # search children from right to left
   return subcompute(node.children[*node.children to 1 by -1], row)
end
How to use Clone()
#
# Color(s) - return a binding of &window with foreground color s;
#  allocate at most one binding per color.
#
procedure Color(s)
  static t, magenta
  initial {
     magenta := Clone(&window, "fg=magenta") | stop("no magenta")
     t := table()
     /t[E_Fcall] := Clone(&window, "fg=red") | stop("no red")
     /t[E_Ocall] := Clone(&window, "fg=chocolate") | stop("no chocolate")
     /t[E_Snew] :=  Clone(&window, "fg=purple") | stop("no purple")
     }
  if *s > 1 then
     / t[s] := Clone(&window, "fg=" || s) | stop("no ",image(s))
  else
     / t[s] := magenta
  return t[s]
end

Graphic Design(s) of the Day

Consider the Tukeys' Multiwindow- and Box-Plots on the left, and Tufte's Data-ink maximization on the right.

lecture 17

HW#2 Feedback

Windows Users' Notes

Reading

Discussion of "Overview of 3D Software Visualization"

GUI Monitors

Step #1 in GUI exploration is usually to get familiar with the interface builder program; in our case that is IVIB. (Demo of IVIB goes here). IVIB generates code that looks like this.

IVIB let's you draw a GUI and generates the code for you. For a program execution monitor the main question will be: how to merge the event streams, or how to merge the event processing loops, from the GUI and from the monitored program's events. To accomplish this, you need to know more about the underlying GUI classes.

There are 3 classes that most Unicon GUI programmers need to become semi-comfortable with:

Component
Component is superclass of all basic visible GUI elements in an application: buttons, sliders, lists, editable text boxes, and so on. Components are generally organized hierarchically -- they form a tree in Venn diagram style, with larger background components containing smaller more active components.
Dialog
A Dialog is a component that constitutes the root of some window -- it owns a window and therefor can receive input events, which it then needs to route down the tree to the correct leaf.
Dispatcher
The Dispatcher class handles the actual event-processing loop, allowing for multiple dialogs, and wall-clock time events in addition to GUI events.

In order to merge the Monitor and GUI event streams, we might do one of the following:

There is no way to select() from between GUI and monitor or poll both, because to ask for an EvGet() is to transfer control to the target program (freezing the GUI of the monitor until an event occurs). However, you can call EvGet() with an E_Tick along with your other events if you want to be sure to regain control periodically even if the other monitored events do not occur for long periods... then your only danger is: what if the target program that you are monitoring chooses to block on some input it wants to read?

Additional notes on GUI-monitors:

lecture 18

Monitoring Memory Allocation and Garbage Collection

Allocation and Collection Events

Mempie

See mempie.icn

More memory monitors: mini-memmon and nova

Check out mmm, nova and oldnova. You should look at them as unfinished prototypes.

Griswold's claim examined

Ralph Griswold liked to claim that co-expression activations were about the same speed as procedure calls in Icon... and this matters a lot for execution monitors based on co-expressions, so I re-examined this claim with the following program:
procedure main()
   t1 := &time
   every i := 1 to 10000000 do p()
   write("10000000 calls: ", &time - t1)
   ce := create |1
   t2 := &time
   every i := 1 to 10000000 do @ce
   write("10000000 @: ", &time - t2)
end

procedure p()
   return 1
end
The results (on Linux x86_64) seem to suggest that co-expression activations are quite cheap, only 25% slower than procedure calls
10000000 calls: 6210
10000000 @: 7920
Synchronous threads are a lot cheaper than true concurrent threads! Playing with a mac implementation earlier this semester, I plugged in a pthreads-based co-expression switch available from the current Icon language implementation, and it was an order of magnitude slower...

lecture 19

Discussion of Last Week's Reading

Communicating Software Architecture using a Unified Single-View Visualization

Just as a reminder for this metaphor:
big shots
Tell me what you know about LLNL. They might need their visualizations to work on the hardest real-world (very large, complex, C/C++) programs
"single view"
their argument for the city metaphor is to visualize multiple aspects about a program, for multiple stakeholders with differing roles and concerns, so that they will all be able to see the same thing and communicate effectively with each other over the shared artifact.
"static and dynamic"
they recognize the need for information based on program runs, not just code. Dynamic info consists of whatever gprof will tell them. Static info includes standard software engineering metrics: lines of code, cyclomatic complexity, and various safety static analysis checker outputs. They do not do, but anticipate the value of, incorporating repository log information used in others' city metaphor visualization research.
"source level" vs. "middle level" vs. "architectural level"
multi-graph mindset
function call graph sure, but instead of visualizing one big multi-purpose graph they see it as a "union of graphs": class call graph, class contents graph, class inherits graph, file call graph, file contents graph, directory contents graph...

"Representing Development History in Software Cities"

requirements
Evo-Streets
If you are going to do cities and maps, adopt techniques of cartographers
Primary, secondary, and tertiary models
Primary == original collected data. Secondary == all aspects of primary that might ever be drawn together on a map. Tertiary == specific aspects (selections, projections, coloring, symbols, legends...) for a single view
Layout based on four things
code hierarchy, elements' types and sizes, (multiple types of) dependencies, and development time(stamps)

Reading Assignment

Monitoring String Scanning

Icon's string scanning control structure has a very natural depiction, that of a progress bar or pointer working its way through a string. Issues include: how to abstract/scale a very large number of operations, how to depict backtracking, how to depict nested scanning environments (which might or might not involve analysis of a substring of the enclosing scanning environment).

Some programs use scanning a lot -- they are mostly string scanning -- and others do not use it at all.

The ScanMask events are shown in the table below. E_Spos events are the most frequent. Compared with procedure activity events, there appears to maybe be one missing. Which one is it? Is it a problem?
codevaluedescription
E_Snew create/enter a new scanning environment
E_Sfailfail/exit a scanning environment
E_Sposmove the string scanning position
E_Ssuspsuspend a result from a scanning environment
E_Sresumresume a suspended scanning environment
E_Sremremove a never-to-be-resumed scanning environment
May God bless richly the team that goes 

For what its worth, evaltree() can model scanning environments just like it does procedure call activity. It can also model built-in functions and operators; all expressions can be modelled as call/ret/susp/resum/fail/rem

Now for a deep-thought question: what kinds of graphic depiction emphasizing what kinds of behavior would make for a genuinely useful string scanning visualization?

Monitoring Structures and Variable References

The monitoring framework has fairly thorough instrumentation for the built-in data structures of the language -- lists, tables, records and sets. These one-level structures all support implicit reference semantics, are routinely composed into big multi-level structures such as trees and graphs.

lecture 20

A Simple List Visualizer

What we learn from the simple list visualizer, lst.icn:

The Structure Spy

What we learn from the structure spy

Unicon 3D Graphics Facilities

Design goals: 3D Windows:
  W := open("win","gl")

3D Coordinate System

Camera and viewing Frustum

The scene is viewed from a particular (x,y,z) that is looking at a particular (x2,y2,z2). There is also a question of what direction is "up" from the point of view of the camera, given as a vector but equivalent to specifying what angle the camera is at on the vector between the position and direction.

Drawing Primitives

Originally I thought these would be defining things about the 3D facilities, they are mostly built-in to opengl, although some are in the opengl utilities (glu) library. Most 3D applications, once they acquire a certain level of sophistication, probably don't need all these primitives, they just use FillPolygon with lots of little triangles specified via large data structures called 3D models.

Transformations

Lighting, Materials

This is an example of an area where things are far more complicated than a non-specialist programmer would want to deal with. Unicon tries to have sensible default behaviors.

Textures

Important, especially in more serious 3D such as games.
texture
2D image whose contents are used to paint the pixels of a 3D primitive
texture coordinates
(u,v) in the texture image normalized to Cartesian 0.0-1.0. Actually, they wrap around so a texture coordinate of 2.5 says to repeat a texture two and a half times in that direction.
Unicon turns on texture mapping via WAttrib("texmode=on"). Texture coordinates are supplied via Texcoord(u1,v1,...) which must correspond in 1:1 relationship to vertices in a subsequent primitive, e.g. FillPolygon(x1,y1,z1, ...). There is also a WAttrib("texcoord=auto") which might be needed in order e.g. to map textures onto spheres, tori, etc.

lecture 21

Try Again with Lst and Nova Demos

3D Examples

Miscellaneous Other 3D Facilities:

We might need to talk about various extra features in future lectures. They are listed here so we can know to bring them up.

Mesh modes

These values determine how lists of vertices are interpreted by OpenGL. There is an attribute meshmode, set via WAttrib(w, "meshmode=value") where the legal values are
points
lines
linestrip
lineloop
triangles
trianglefan
trianglestrip
quads
quadstrip
polygon
However, in a trivial test, the mesh modes did not work! They probably did for the grad student who implemented them... but without a working test/demo they remain undocumented/unfinished business. Minimally, you might expect that I'll have to put out some fixed Unicon sources and/or binaries for you before these will work. You are welcome to try them and find out of things are better than I report.

Transparency

This feature of OpenGL determines to what extent light can go through a substance, or to what extent objects behind it can be seen through it. Color names, set via Fg(color) or WAttrib(w, "fg=value") can include a diapheneity. The legal transparency adjectives are
transparent
subtransparent
translucent
subtranslucent
opaque
This feature is implemented. In a trivial test it appears to work. However, in testing it a seeming bug was identified in the color attributes: when you set the fg= attribute with a simple color it sets the diffuse value for that material property but apparently does not reset or disable the other lighting colors (specular, ambient, emission), which may give surprising results. Also: it is not clear that transparency works correctly on all primitives yet; for example, the last time I checked, either cubes or maybe filled polygons looked not as transparent as they ought, because backfacing polygons weren't transparent.

mKE/mKR: the Largest Publically Available Unicon Program

It has its own website. It is a knowledge representation engine with its own knowledge representation language built-in. It is something like 50K LOC. Let's study it.

lecture 22

Reading Assignment

Discussion of Visualization of the static aspects of software: a survey

lecture 23

Announcements

Semester Project Topic Ideas

The perfect semester project would be a tool that... Where to get your ideas:

Monitoring Variable References

Variable use is arguably one of the most important aspects of program behavior, but it is easily overlooked.

What do we want to know about variables?

Unicon Variable Events

We can start with E_Assign and E_Value, the two events associated with assignment operators such as :=
E_Assign
This event's &eventvalue gives the variable name, plus a one-letter suffix indicating scope:
CodeScope
+ global
: static
- local
^ parameter
E_Value
This event, after the assignment, tells you the value that was assigned.

gnames

Gnames shows you all your global data; variable names are written out, color coded by their type. If you click on a variable name, up pops a window showing that variable's details. Bugs and limitations:

vars

vars is a local variable visualizer, it shows each activation record in a manner similar to gnames. There is a strong scalability limit here which vars does not solve; some programs it depicts well, others it does not. It is more proof of concept/demonstration than finished and working tool. Also, at present it has bad bitrot.

assignments to structure types

Consider the following program
procedure main()
  L := list(3)
  L[2] := "hello"
end
What does assigning to L[2] look like? The events program shows the E_Assign for a structure reference does not look the same as an assignment to the variable itself:
E_Ocall       operator call                      function []
E_Deref       dereference                        L-main
E_Lref        list reference                     list_1(3)
E_Lsub        list subscript                     2
E_Oret        operator return                    &null
E_Opcode      virtual-machine instruction        Str
E_Literal     literal reference                  hello
E_Loc         location change                    3:8
E_Opcode      virtual-machine instruction        Asgn
E_Ocall       operator call                      function :=
E_Assign      assignment                         list_1[2]
E_Value       value assigned                     hello
E_Oret        operator return                    hello

Under the Covers of the evinit library

EvInit(av) and EvGet(mask) are not always entirely what they seem.

We might want to develop a similar architecture for windows. Monitors that use 2D or 3D graphics might want to check and see if their &window is already set. If so, just draw to it instead of opening a new window. This would allow a GUI for a debugger or multi-visualization tool to allow independently-compiled visualizations to "plug in". Of course, for it to work well, such a model would need to cover how to handle window resizing, and how to handle input by various tools. Subwindows, and subwindow resizing, are more or less adequate to this task.

lecture 24

NKN Data Science Competition

On Improving the performance of Unicon 3D

Your program could be CPU bound. Or it could be GPU bound. Or it could be I/O bound e.g. on network traffic. Or in our case, it could be "TP bound", i.e. spending most of its time in the target program and/or monitoring context switch costs. Optimizing the wrong thing might not help much.

Unicon 3D Display List Management

Cheesey (incomplete and buggy) UTR9 example:
sphere := DrawSphere(w, x, y, z, r)
increment := 0.2
every i := 1 to 100 do {
   every j := 1 to 100 do {
      sphere.y +:= increment
      Refresh(w)
      }
   }
What would this look like if it were changing the color of a sphere, instead of changing its y coordinate? Setting the foreground color generates a display list entry that is itself a list. For a simple foreground color setting (one that only sets the diffuse property) it is a list of 7 elements*: the string "Fg", the integer code 160 that correponds to a fgcolor setting, the string "diffuse" that indicates what color property is being set, and then four 16-bit unsigned values that give the RGBA color setting.

*The current color-setting display list entry format might get turned into a record type so we can use field names instead of L[4] etc. but for now it is a list.

The following example gives a sphere that bounces and changes its colors randomly between red, white, and blue each frame:

procedure main()
   &window := open("win","gl","size=800,800","bg=black")
   colors := [[65535,0,0],[65535,65535,65535],[0,0,65535]]
   Fg("blue")
   spherecolor := WindowContents()[-1] # fg=most recent display list entry

   sphere := DrawSphere(0, 0, -50, 2)
   increment := 0.2
   every i := 1 to 100 do {
      every j := 1 to 100 do {
         sphere.y +:= increment
         c := ?colors
         spherecolor[4] := c[1]
         spherecolor[5] := c[2]
         spherecolor[6] := c[3]
         Refresh()
         }
      increment *:= -1
      }
   Event()
end

On Drawing Text on 3D Windows

Arbitrary DrawStrings from a Single Texture Load (duh)

text.icn

lecture 25

Homework #4 Due Date Change

Reading Assignment

One of these two is very short, while one is a regular full conference paper.

Discussion of SynchroVis: 3D Visualization of Monitoring Traces in the City Metaphor

This was an extremely short paper you were assigned to read this past week.

Monitor Coordinators

Basic premise: A monitor coordinator is a monitor that hosts the execution of the target program under the observation of multiple monitors.

Eve

The reference implementation monitor coordinator is called Eve (eve.icn). Eve is one of the last remaining "old Icon GUI" programs, and needs to be rewritten using the modern GUI class library.

Eve configuration

Eve reads in a list of monitors from a ~/.eve file in the format:
"title" command line

For example:

"Line Number Monitor" /home/jeffery/tools/piano
"UFO" /home/jeffery/tools/ufo
"Algae" /home/jeffery/tools/algae
"Big Algae" /home/jeffery/tools/algae -func -op -step -S 48
"Memory bar chart" /home/jeffery/tools/barmem
"Global variables" /home/jeffery/tools/gnames
"Local Variables" /home/jeffery/tools/vars
"Lists" /home/jeffery/tools/tinylist
"Minimemmon" /home/jeffery/tools/mmm
"Miniloc" /home/jeffery/tools/miniloc
"Scat" /home/jeffery/tools/scat
"String scanner" /home/jeffery/tools/ss
From this datafile, eve draws an opening window that allows selection of which monitors you want to run (selectEMs).

Eve's Global State

unioncset
cset mask that is union of all monitor masks
EventCodeTable
table of lists; keys are event codes, values are "list of interested monitors"

Monitor State

This "class" holds eve's knowledge about the monitors it loads. "prog" is the actual loaded program (a co-expression value), while "mask" is the program's event mask -- what it returned from its last EvGet().
record client_rec(name, args, eveRow, prog, state, mask, enabled)
#
# client() - create and initialize a client_rec.
#
procedure client(args[])
   local self
   self := client_rec ! args
   if /self.name then stop("empty client?")
   self.prog := load(self.name, self.args) | stop("can't load ", image(self.name))
   variable("&eventsource", self.prog) := &current | stop("no EventSource?")
   variable("Monitored", self.prog) := &eventsource | stop("no Monitored?")
   /self.state := "Running"
   /self.mask := ''
   /self.enabled := E_Enable
   return self
end

Initialization

After selecting monitors to run, eve has to load them all, and then activate them all, running them up until their first EvGet() call. Their EvInit's will be disabled by eve's having already set their &eventsource. After their first EvGet() call, eve registers them on the "list of interested monitors" for each of the event codes in their mask.
   every i := 1 to *clients do
      clients[i].mask := @ clients[i].prog

Event Forwarding

EvSend(code, value, recipient) - sends a monitoring framework event, where code defaults to &eventcode and value defaults to &eventvalue. Note that EvSend() allows any value to be sent, not just what the EM requested in its event mask, and not even limited to 1-letter string codes.

Eve's Main Loop

procedure mainLoop()
   while EvGet(unioncset) do {
      #
      # Call Eve's own handler for this event, if there is one.
      #
      (\ EveHandlers[&eventcode]) ()
      #
      # Forward the event to those EM's that want it.
      #
      every monitor := !EventCodeTable[&eventcode] do
	 if C := EvSend( , , monitor.prog) then {
	    if C ~=== monitor.mask then {
	       while type(C) ~== "cset" do {
		  if C === "abort" then fail
		  #
		  # The EM has raised a signal; pass it on, then
		  # return to the client to get his next event request.
		  #
		  broadcast(C, monitor)
		  if not (C := EvSend( , , monitor.prog)) then {
		     unschedule(monitor)
		     break next
		     }
		  }
	       if monitor.mask ~===:= C then
		  computeUnionMask()
	       }
	    }
	 else {
	    unschedule(monitor)
	    }
      delay(6 < delayval)
      }
end

lecture 26

Brainstorm with me on "3d Monitor Coordinators"

What would it take for us to see/share all your visualizations in the same 3D window, from separate monitors? What would a 3D monitor coordinator need to do?

Unicon City: a Brief Discussion

Want: Some Code Prototypes:

Layout in 3D

I haven't converted to 3D yet, so the following are open to your suggestions and/or better ideas.
# Unicon City Template Model

default {
  name Unicon City
  home [5.0, 0.0, 5.0]
  angle 4.6
  origin_node toplevel directory
}


Room {
name toplevel directory
x 0
y 0
z 0
w 10
h 10
l 10
texture wall.gif
}
Within the CVE format, there are a couple possible ways to introduce the buildings

Graphic Design of the Day: Kiviat Diagrams

One way to represent many-dimensioned data is to lay out the dimensions around a circle; the 2D shape (and its degree of circularity or lack thereof) tell you something about which dimensions are interesting.
Kiviat diagram for software quality. Source: geeks with blogs, via google image

Kiviat diagrams are easy to criticize. There are problems with the relative scales of dimension; do you reduce them all to 0.0-1.0 ranges, or not? There are problems to identify normal or acceptable ranges of values. There are problems that adjacent dimensions don't really have any more connection with each other than remote dimensions, but the Kiviat makes them look like they do. The area inside the Kiviat shape is really meaningless.

lecture 27

Reading Assignment

Discussion of VR City Papers

Search and Exporing Software Repositories in VR

VR City

So, we have reached current state-of-the-art getting-published software city research! How does it compare?

Look at HW#5

Update on Dr. J's Code Analyzer Tool

Type Conversion Events

Unicon does more automatic type conversion than C/C++. At almost every operator, and every built-in function, the types of arguments are checked, and if necessary, converted.
EventValue Description
E_Aconvinput value attempt to convert
E_Tconvexample target conversion target
E_Nconvinput value no conversion was needed
E_Sconvoutput value conversion was successful
E_Fconvinput value conversion failed

Tool of the day: redconv

Redundant conversion catcher. This is not a visualization tool, but it is an execution monitor. Even if conversions are not redundant, they may be an indicator of a bug or a performance problem. When is a conversion "unhealthy"?

lecture 28

WSection, 3D Object Selection, and Level of Detail

3D graphics is computationally intense. Unicon's 3D Facilities are a compromise between the dynamic language and the requirements of the underlying 3D API's in C/C++.

History:

Options for better performance include: We settled on a Uniconish way to implement the concept level of detail without rebuilding the display list each frame.

Level of Detail

Level of Detail in typical games:

WSection(): Basic Idea

WSection() Example #1

WSection("redrect") # beginning of a new object named redrect
Fg("red")
FillPolygon(0,0,0, 0,1,0, 1,1,0, 1,0,0)
WSection()          # end of the object redrect

WSection() in 3D Object Selection

Visualization Evaluation Questions

Specific questions to think about as you consider other folks' visualizations, or design your semester project

X3D for Software Visualization

You should at least hear of X3D in this class. Let's discuss it.

Rube

This work is described in "The rube Framework for Personalized 3D Software Visualization", by Hopkins and Fishwick, Dagstuhl software visualization seminar, 2001.

Rube methodology

  1. choose system to be modeled
  2. select structural and dynamic behavioral model types
  3. choose a metaphor
  4. define mappings/analogies
  5. create model
Example: a lightbulb is to be modeled. A finite state machine is chosen to model the bulb. S1=disconnected, S2=off, S3=on.

For each different dynamic model type, there may be any number of defined visual metaphors, or a programmer may wish to create a new one. A "water tank" metaphor for a finite state machine would "fill the tank" of whichever state the machine is in, and the water would be pumped over to a different tank whenever a transition to a new state occurs.

In a gazebo metaphor, a person would indicate the state, and a transition would be depicted by that person walking.

Rube Summary

lecture 29

HW4 Report/Show/Tell






Comments on your HW4 Code

On Dynamic Analysis

Here is a classic paper on the subject. Grad students, go ahead and read this. We will skim it today to try and pick out the highlights.

According to Ball, dynamic analysis has the following properties compared with static analysis:
  1. greater precision of information, derived from 1+ actual program run(s)
  2. input-centric mentality; shows dependence of internal behavior on particular inputs of a given execution
  3. ability to reveal semantic dependencies that are far apart in scope
Ball's paper mentions (claims to introduce) two particular types of dynamic analysis, out of myriads:
frequency spectrum analysis
analyze frequencies of different kinds of events, e.g. to identify related computations
coverage concept analysis
comparing actual control flow from a set of executions against a static control flow graph can show what's missing from a set of tests

FSA

CCA

coverage profile
profile of what was executed (no frequency info)
concept analysis
(T, E), T a set of tests and E a set of program entities, is a concept if every test in T covers all of E and no test not in T covers all of E.
Given a (boolean) table showing all the tests and entities, Ball points out that you can form a concept lattice, and that the concept lattice shows control flow relationships within 1+ actual executions, analogous to the kinds produced by control flow static analysis.

More Dynamic Analyses

OK, so where do we find more examples of dynamic analysis? Here are some more examples of interesting dynamic analyses.
statistical
Summarizing data by accumulation or averaging to give the big picture. _ FSA seems to be an example of statistical analysis.
pattern-of-interest
Parsing event sequences using patterns to find bugs, or even just to find items of interest. Event pattern parsing must carefully define its domain, skipping over events that don't effect the pattern match. Event pattern parsing will usually be done non-deterministically and maybe in a ``massively parallel'' model. Tools like flex take a massively parallel set of patterns and merge them into a single DFA, but not all pattern matching can be so reduced.
higher-level-events
one variant of the pattern-of-interest notion is to identify events at a higher semantic level, such as aggregates of lower level events, or application domain events
categorization
figuring out when a class implements a stack, or is using dynamic programming, or whether it employs a feature for which a specialized tool is available
profiling; coverage
treating hotspots and coldspots specially; for example the former deserve extra performance tuning monitors, while the latter deserve extra typographic paranoia monitors

lecture 30

Reading Assignment

This week you get a very cool paper that is one of the best at integrating visualization with the views of the code inside a code city.

Discussion of Using High Rising Cities to Visualize Performance in Real-Time

Graphic Design of the Day: Perspective Wall

Hey, did you notice that there is an "information visualization wiki"? Interesting...

Update on Dr. J's Software City Effort

2 1/2 D Visualizations of Call Graphs

(From Facilitating Exploration of Unfamiliar Source Code by Providing 2.5D Visualizations of Dynamic Call Graphs, by Bohnet and Dollner, 2007, 4th IEEE Workshop on Visualizing Software for Understanding and Analysis)

A "short paper" in 2007 gives lots of ideas to think about.

Nate's Structure Monitor

Simple graphics, reminiscent of Playfair's classic graphic design. Ya, it is a cheap trick, but it works.

Metaphor-Based Animation of OO Programs

lecture 31

Reminder HW#5 Due Tomorrow Night

Write me your design document. I will endeavor to give you timely feedback on this one.

Status Update and Demo on Dr. J's Software City

Demo, if the Laptop Cooperates

Jeffery's Current Todo List

I have ~2.5 weeks before my next conference paper deadline.

Brainstorming: Visualizing Software Executions as Populated, Dynamic Cities

Help me improve my metaphors. Dr. J's fatal-flaw view of visualizing software as cities: many or most (especially OO) programs are understood largely through their relationships between classes and between instances. Software as cities doesn't automatically manage to depict such relationships at all. It got as far as colocating classes in the same package.
Classes are buildings, sure
height=# methods, width=#public variables, length=(log of) longest code. (Private variables not included)
What is the model of time in-game?
Today = a current execution run. CVS repositories and previous execution logs make for remembrances of things past.
Limited ("Prince of Persia") backwards-in-time capability?
limited-reversible is better than no reversible, and is more scalable than full-reversible. Limited reversible may mean, if you go back past a certain point, you'll not be able to see as many details, or change the execution from that point. Assuming we are collecting fairly detailed traces, you can go backward farther than that in a replay-only mode.
How to represent procedures
treat like a class w/ 1 method. Lotta procedures = village.
How to represent instances
As robots? Garbage would be broken-down robots...lots and lots of broken-down robots! (thanks, A.P.)
How to represent "atoms" (scalar values)
Not at all? As text? As virtual books (strings), hammers?? (ints) and saws?? (reals)? What about tables and lists? Records got special treatment as people; tables and lists as bookshelves, or buses, or?
How do represent external entities
In software engineering/software design, an "association" refers to a relationship between classes or instances. Why does the metaphor need associations?
Because making correct code is difficult and perilous.
What associations are depicted, and how?
We need at least: inheritance, aggregation, and "other"
How to depict inheritance and aggregation?
How have other researchers depicted these? Tubes running into a roof?
How to depict reference?
boats?
What are the streets?
In Venice, there are streets. And canals.
How to represent the stack
In past discussion, there has been support for the beam-of-light model, pointing backwards from callee to caller. Dr. J would add: the beam of light might be a good metaphor for an instant-teleportation feature...
How to represent bugs and warnings
As monsters
How to layout buildings?
Around an older, urban core? Minimize distance of overall call graph?
What are ghosts?
Remembrances of fixed bugs and deleted code
How to present source code control structure details.
There is the raw codesize, the extent of nesting
How to present data details.
Well, instances are a lot of the data, and atoms are the rest. A prime issue here is one of aggregation. When is an object a citizen of the world, and when is it just somebody's foot? I guess the answer is: when referenced globally, or by two or more other instances.

lecture 32

Status of HW#4 Grading

Status of HW#5 Grading

Remainder of the Course

Question: How to Get More Static Analysis for Unicon if You Need it

What Static Analysis Information Might We Want?

What are Options for Getting It?

# you would have to adjust these paths to refer you your uni/udb directory
link "/home/jeffery/unicon/uni/udb/icode"
link "/home/jeffery/unicon/uni/udb/srcfile"
link "/home/jeffery/unicon/uni/udb/symtab"
link "/home/jeffery/unicon/uni/udb/system"
procedure main(argv)
   icode := Icode()
   write("Icode file: ", argv[1] | "not supplied")
   src := icode.getSrcFileNames(argv[1])
   write("source files: ", image(src))
      every write("\t", !\src)
   srcFile := SourceFile()
   srcFile.loadSourceFiles(argv[1], src)
   write("srcFile: ", image(srcFile))
   every k := key(srcFile) do {
      write("\t", k, " ", image(srcFile[k]))
      if type(srcFile[k]) == ("set"|"list") then
		every write("\t\t", image(!srcFile[k]))
      else if type(srcFile[k]) == ("table") then
         	every kk := key(srcFile[k]) do {
		   if type(srcFile[k][kk]) == "list" then {
                      write("\t\t", image(kk), ":")
                      every write("\t\t\t", image(!(srcFile[k][kk]))) \ 10
                      write("\t\t\t...")
                      }
                   else
                      write("\t\t", image(kk), ": ", image(srcFile[k][kk]))
                   }
      }
end
Live Demo this one.

More thoughts on How to Make Static Analysis in Unicon Much Easier

Suppose I want tools like the software-as-cities, and its too much work. Yeah, this is a lame start, but at least it will allow us to consider what should really be there.

Mondrian

Viz tools conflict
gnuplot generality of reading file formats vs. Alamo-style run-time access to original data.
Mondrian sez:
instead of moving the data to the viz tool, move the visualization tool to the data.
Provide not a file format
but instead, an interface. Allow a declarative script to specify the visualization.
Work directly with the objects in the data model.
Let the programmer visualize what they are doing in their environment/tools.
at one time this felt to me like:
SmallTalk-based tools trying to be relevant to a non-SmallTalk world.

lecture 33

Mailbag

I am currently trying to get all of the procedures from a Unicon program that is being passed to my hw6. I was thinking of scanning the file and looking for them, but I don't think this is the best option. Is there a different direction you can point me to look at and do some sort of static analysis before i begin to monitor or should i stick to scanning the file?
Great question. You could use the udb modules I demo'ed last class to find all your source files, and then run HW#1 style code. But, instead of looking for the procedures in the source code, if procedures is what you want, I think you could scan all the global variables using globalnames() and if the value is of type procedure, it is a procedure. Maybe something like
every g := globalnames(Monitored) do
   if type(variable(g, Monitored)) == "procedure" then ...
BTW, beware of "procedure" versus "function". A procedure is Unicon code, a function is generally built-in, i.e. C code.
Here are...what I'd be interested to see for static information provided
  1. memory requirements for global data
  2. minimum memory on stack required for each procedure
  3. minimum heap memory required for program's run time execution, and
  4. the amounts of minimum heap allocation requested by each procedure. Included in this could be amount of memory allocated each time procedure is called, and minimum number of times that procedure is called.
  5. indicator for procedures that have the potential to allocate more than the minimum denoted above (procedures called in a loop, memory being allocated in a loop, etc.)
Great list. Let's work on these a bit. Interestingly, some of them may be statically calculable, but some of them sound more like dynamics to me.

Memory requirements and Sizes in Unicon

  1. Memory requirements for global data: 16 bytes per global PLUS heap memory pointed-at.
  2. Minimum memory on stack required for each procedure:
  3. Minimum heap memory required for program's runtime execution: would require hard analysis to statically guesstimate, but maybe pretty easy to derive empirically. Q: how to keep around static or dynamic analysis results across time and/or multiple runs?
  4. Minimum heap allocation required by each procedure: hard to be accurate in all cases, but maybe not too hard to do a crude power bound
  5. indicator for procedure that have the potential to allocate more than the minimums: semi-difficult to do statically, maybe easy to do dynamically.

Reading Assignment

Discussion of Code Park: A New 3D Code Visualization Tool



Challenges for InfoVis Engines

vis. engine should be domain independent
visualizations should be composed from simpler parts
visualization should be definable at a fine grained level
instance-based, not type-based; sometimes different instances of the same type play different roles
minimize object-creation overhead
vis. works off a model of a running system, but instead of duplicating objects in the system, how about using them directly?
visualization description should be declarative
compare w/ Tango, Dance, and UFO for that matter

Other Mondrian Highlights

Declarative Syntax which look like...
view nodes: model classes using: Rectangle withBorder
   forEach: [:eachClass | eachClass viewMethodsIn: view]
Screen-Filling System
Mondrian has a lot of structures to visualize simultaneously... And it has structures that are too wide to fit the window.
Built on top of Moose
You just know it has to be good.
Interesting Mention of CodeCrawler
"visualizations of combined metrics and structural information"

lecture 34

Static vs. Dynamic: Memory Size Requirements, Take Two

Static Dynamic
enumerate
globals
   parse all source code
      including includes and linked library modules
or
   "parse" the binary.
      It has a header, might be compressed
      Header includes "pointer" to array of globals
      udb has of some of this; see icode.icn
      # of bytes of globals is Gnames-Globals
      note...global names are also part of their memory cost
   G := [: globalnames(Monitored) :]
   write("there are ", *G, " globals, including procedures")
size
globals
  • Unlike traditional compiled mainstream languages Unicon does not have pre-initialized variables, other than procedures.
  • The icode does have a constant region of known size (Filenms-Strcons)
  • Although it is called Strcons and holds a lot of strings, it also holds cset blocks, and previously held real #'s as well.
  • From parsed code or binary, static analysis starting from main() could identify some variables that are always initialized
16 bytes per global, 16 bytes for the slot to hold its name Sizes of pointed-at values are mostly calculable, on 64-bit machines they are 16 bytes per slot, plus some overhead for headers, pointers, etc. It is difficult to find out from a structure value, how many list element blocks or hash table buckets it is using.
enumerate
locals
   parse source code, build symbol table
or
   "parse" the binary
      the icode for each procedure has a "procedure block"
      that contains relevant information (see struct b_proc from rstructs.h)
   P := [: paramnames(Monitored) :]
   write("there are ", *P, " params")
   L := [: localnames(Monitored) :]
   write("there are ", *L, " locals")
size
locals
On the stack: 16-bytes per local and parameter. In static memory: 16 bytes per name. Not counting any heap memory they point at. Regarding measuring stack depth before/after a call, earlier I mentioned an E_Stack event that reports changes in stack depth. This is for the VM interpreter stack. There is also an E_Cstack event, but it looks to me like the grad student tasked with it did not implement it correctly.
size
heap
entities
No heaps at compile time. Static analysis could determine for some parts of the program that are guaranteed to work, some amount of the heap allocation that would occur.
  • String: 1 byte per character.
  • Cset: block of X bytes of overhead plus a 32-byte bit vector
  • List: 16 bytes per slot, plus any data pointed at, plus list header block (96 bytes) and one or more list element blocks (56 bytes). Element blocks grown via put/push hold a lot more slots than are actually used
  • Table: 64 byte header + var. size hash table starting ~288 bytes? + 56 bytes/element
  • Set: 64 byte header + var. size hash table starting ~288 bytes? + 40 bytes/element
  • Record: 48 bytes of overhead plus 16 bytes per field
  • Object: 80 bytes of overhead plus 16 bytes per field

On the monitoring of OOP Behavior

Consider the program
class C(x,y)
   method m(a)
      write(a, ": x,y: ", image(x), ",", image(y))
   end
initially
   x := 1
   y := 3.14
end

procedure main()
   o := C()
   o.m("hey")
end

Thes questions boil down to: what dynamic analysis of the event stream do we have to do in order to turn it into useful higher level information?

construction
  • E_Fcall for a function whose image says "class constructor C__state" instead of "function whatever"
  • E_Fret from that function call returns the created instance itself
  • an instance's image is "object C_serial#(numfields)"
method call A call to o.m() is an E_Pcall to a procedure whose name is C_m, with an extra parameter for o on the front.
field access A field access is an E_Opcode to the Field VM instruction, resulting in a E_Rref on the object, and an Rsub identifying the field.
A serious side consideration: if the monitor holds direct references to object instances, those instances will never become garbage. Need to think about this one some more.

Play around with this interactively in moncls.icn

lecture 35

Updated moncls.icn

Mapping Code to World Coordinates (and maybe vice-versa)

More on Visualizing Dynamic Memory Allocations

Making Unicon Garbage Collect, for Science

"Turning CVE Into a Visualization Environment" Update

Start with: how to wire together CVE Architecture with Alamo Architecture?

lecture 36

Mailbag

I have been trying to get the time spend on each function. What I have tried so far is recording the &time during an E_Pcall() then recording the &time during an E_Psusp or E_Pret and subtracting that from the E_Pcall time to get the time spent. The trouble I'm having right now is that those times are coming back with the same value giving me 0 when I subtract them. Do you have any suggestions on how I can approach timing functions.
Great question. Let's talk some more about timing.

More on Execution Timing

Earlier when we talked about timing, I gave examples that use the Unicon &time keyword, but a student has clearly found and reported that it is not always sufficient.

Reading Assignment

Discussion of A Controlled Experiment on Spatial Orientation in VR-Based Software Cities










Hypothesis #1: Users navigate more effectively and efficiently in EvoStreets when they use a 3DHMD instead of a pseudo 3D desktop system as a displaydevice.
Was this confirmed? Is it generalizable?
Hypothesis #2: Users who are familiar with navigating using a keyboard in computer games achieve higher task completion efficiency.
Was this confirmed? Is it generalizable?
Hypothesis #3: Users who are already familiar with the EvoStreet of a software for one particular metric mapping can navigate equally well if only the metric mapping changes (same structure, same starting point).
Was this confirmed?

Brief Update on Dr. J's City Efforts

lecture 37

Grading Update

HW#4 grades varied widely. Feel free to improve and resubmit.

Timing Update

What Dr. J is Thinking About

lecture 38

Yeah, so, how was EXPO?

JIVE (Java Interactive Visualization Environment, Gestwicki et al)

This paper is too old for me to assign as a required reading, but it has some nice properties: it is about a mainstream language (Java), and it lays out an ambitious set of goals for us to compare, and see if we should be aspiring to also do them. Major requirements:
  1. depict objects as environments. method calls happen inside one. This immediately challenges the objects-as-robots metaphor.
  2. multiple views. Different Granularities. detailed view and compact view.
  3. histories - of execution, of method interaction... show sequence or collaboration diagrams (how do they address scalability? From Figure 1 the answer initially seems to be: they don't; from Figure 2 one answer is, things shrink down to points). This is not summary statistics, it is timelines and such
  4. forward and backward execution. state-saving model. big Big logs.
  5. queries on the runtime state. when did a variable change; or when did it achieve a certain value
  6. clear and legible
  7. use the stock JVM
  8. be able to visualize programs that themselves have GUI's!!
Graphic design: simple, relatively easy to understand, scales poorly (minimal "visualization" involved, maximum IDE/debugger-like feel)

Analysis: hardwired, except that it supports a range of queries. What is the query language?

Implementation: Two-process model, supports multiple threads so long as only one runs at a time. Log file coupled with "in-memory" execution history database. Events are able to commit and un-commit themselves.

7 event types: static context creation, object creation, method call, method return, exception thrown/caught, change in source line, and change in variable value.

Stepping backward does not modify the client program, it is suspended until you get back to the current state and move forward. (Means: you can't modify the past, but maybe you can modify the present).

Queries: on program history; may return values, sets of states, or portions of program history. Visual representation of program states and program history means queries and results may be done graphically. Queries vis-a-vis variables in single instances or classwide.

No evaluation of scalability or effectiveness of using UML-like depictions.

JPDA: Java Platform Debugger Architecture

Originally there were the JVMDI and the JVMPI; now there is the JPDA. JIVE has to live on whatever the JVM feeds it. JPDA includes the JDI (Debug Interface), JDWP (Wire Protocol), and JVM TI (Tools Interface) which replaced JVMDI/JVMPI.

Compare this access to the value of a variable in Java, with the Unicon/Alamo access to a variable via variable(s, Monitored):

theStackFrame.getValue(theLocalVariable)
... transmitted via a socket / JDWP ...
jvmti->GetLocalInt(frame, slot, &intValue)
... result transmitted back...

Graph-Based Visualization of Software Evolution

This paper is ancient eye-candy I am including for sentimental reasons, but it is another representative of the class of visualizations that are geared towards understanding the changes in software over time, the same perspective the authors of the visualizing-software-as-cities paper took. It is not the here-and-now of a current execution, it is the view of code across the ages.

Given a software repository (they talk about CVS, a fine predecessor to Subversion; you might do the same for Git), how do we visualize a program's change over time? For each revision, they collect/measure/compute:

  1. The author of each change of each file.
  2. The control flow graphs of each method in the program.
  3. The change in each basic block in the control-flow graphs.
  4. The inheritance graph of the program.
  5. The call-graphs of the methods of the program.
  6. The time of each change to each file.

lecture 39

No Office Hours today

Sorry, search committee meeting, if you need office consultation please e-mail me and suggest your available time(s).

Mailbag

I was wondering if you have any test suspects or programs I could use to monitor for class and methods?
Great question. unicon (~7K LOC), ui (~9K LOC), and ivib (~16K LOC) are three example OO programs that one might try to monitor, but maybe we need something smaller. Within unicon/uni/progs a couple programs are possible: deen.icn (200 LOC) is a toy German-to-English dictionary, while umake.icn (300 LOC) is a simplified variant of the "make" program.

Deen takes German words on its command line and writes out English. A sample run might look like:

$ ./deen Ich bin ein Berliner
Opened file(de-en.txt).
Reading.....................................................................................................................................................................................................
done.  Read 197771 lines
Ich: self
bin is not in the dictionary.
ein: a
Berliner: doughnut
Deen is a toy program and is a far from ideal representative of object-orientation, but it is small enough that it would be easy to use as a suspect. At least it is OO enough to have some inheritance and some aggregation going on. Monitoring the unicon compiler compiling itself, or a ui session, or an ivib session, would be a far more impressive and challenging OO demonstration.

Discussion of Overcoming Issues of 3D Software Visualization through Immersive Augmented Reality

What were the Issues of 3D Software Visualization that they wanted to overcome?







what's difficult about navigation
what's difficult about occlusion
what's difficult about selection
what's difficult about text readability
Is the hypothesis ("displaying 3d software visualizations in immersive augmented reality can help to overcome usability issues of 3D visualizations and increase their effectiveness to support software concerns") almost the same as that posed by Rudel?
What was their test of this hypothesis, and what was the outcome?
In their conclusions they assert that augmented reality provided the "highest performance to find outliers", but in the results section they state that a standard computer screen required the least time and gave the highest correctness for this task. What gives?

Techniques for Reducing the Complexity of Object-Oriented Execution Traces, by Hamou-Lhadj and Lethbridge

Execution traces are very large, and very redundant.
The ubiquity and reliance of most algorithms on loops guarantees this will be true for most programs.
The analysis used in a software visualization should generally abstract and filter the data before it starts drawing graphics.
Figure 2 of this paper gives a toy example in which a tiny duplication is removed; in practice, scale it up many orders of magnitude.
multiplicity
In software engineering design diagrams, multiplicity is commonly used to indicate the number of instances involved in a given association relationship. Might we use regular expressions to describe multiplicity in execution traces?
A->B*-*>C*D
Removing "utilities"
constructors/destructors, accessor methods, utility and library classes. Potentially many incoming edges, with few or no outgoing dependencies.
Polymorphic methods
execution tree differences can be ignored when the abstract function performed is understood.

lecture 40

Mailbag

When I tried monitoring OO examples, on Windows I was unable to get them to run. On Linux they work fine...but my Linux Unicon does not do 3D.
Thank you for the screen shot. I recommend an office consultation to look at your 3D issues. I may be able to get things to run on Windows. Zoom is a good way to do an appointment, if you can't bring the hardware to my office.
In the fifth example of the HW4's you showed in class one of the students drew text in a 3d environment quite well, That didn't appear to be a texture, but instead a sole graphic. If you have time could you tell me how this student went about displaying this text? I would like to use it for my final project.
Sure, let's go look at those.

Dr. J Status Report

Questions Regarding Final Exam Project Demos

Simultaneous Visual Analysis of Multiple Software Hierarchies

This paper appeared in the 2018 Working Conference on Software Visualization.



More Research Papers?

Some papers that I didn't have on our reading list. Discovered while preparing final copy of a literature survey on software cities.

lecture 41

Mailbag

I was the author of the code that had the string implementation that was requested. You have my permission to share the code. How I implemented that was from one of your examples shared in class where you started by opening a 2D window.
Thanks for your permission. Code presented below is from your HW#4; I have not checked if you changed anything from what I gave earlier.
I am currently done making my buildings in my semester project but I wanted to add some detail to the city I'm trying to build. How would you recommend me to approach making a road or a ground surface so my building do not look like they are floating?
For my city, I took a single big 2D image and used it as a texture for a single rectangular ground surface. Since my area was large, this stretches out the pixels enormously. It would be possible to either (a) use an image that repeats many times in both the x and z dimensions so that it doesn't look pixelated, by using texture coordinates > 1.0, or (b) plot a non-flat ground surface, if you preferred, perhaps using a 2D matrix whose values are the "y" values at the various x,z locations around your ground surface.

Fonts from the Fifth HW#4 Example

Well, there is this bit. It depends on a textures already set, and a twidths table already initialized.
# Code from Dr. Jeffery's text.icn example #
procedure myDrawString(x,y,z,s)
    WAttrib("texmode=on")
    every c := !s do {
        i := ord(c)
        row := i/16
        col := i%16
        ht := 20.5
        wd := 20.5 * real(twidths[c]) / 32
        u1 := col*32.0/512
        v1 := 1.0-(row+1)*32.0/512
        u2 := col*32.0/512
        v2 := 1.0-row*32.0/512
        u3 := (col+(wd/ht))*32.0/512
        v3 := 1.0-row*32.0/512
        u4 := (col+wd/ht)*32.0/512
        v4 := 1.0-(row+1)*32.0/512

        Texcoord(u1,v1, u2,v2, u3,v3, u4,v4)
        DrawPolygon(x-wd/2,y-ht/2,z, x-wd/2,y+ht/2,z,
		    x+wd/2,y+ht/2,z, x+wd/2,y-ht/2,z)
        x +:= wd + 0.1
    }
end
The initialization code was found in main()
   &window := open("win","g","size=512,512",
		    "font=sans,32,bold", "canvas=hidden") # 2D window is hidden
   #### Code from Dr. Jeffery's text.icn example to draw strings #####
   asc := WAttrib("ascent") 
   every i := 1 to 16 do
      every j := 1 to 16 do {
         DrawString((j-1)*32, (i-1)*32+asc, char((i-1)*16+(j-1)))
         }
   twidths := table()
   every i := 0 to 255 do twidths[char(i)] := TextWidth(char(i))
   wfont := &window
   &window:= open("HW4", "gl", "size="||size)
   WAttrib("texmode=on")
   Texture(&window, wfont)

Brief Discussion of Texture Tiling

Mostly review, I would guess Mini-example.
  • In CVE, we have carpeting and flooring and walls.
  • If we tried to use textures that cover the entire area, we would either be far too low-resolution, or use far too much texture memory.
  • We need high resolution textures that can repeat
  • For an arbitrary space to be textured, how many times should I repeat the texture?
  • Measure/estimate/record real-world size of NxM pixel image.
  • In CVE, in the textures directory we placed a mini-database of the textures' real-world sizes. I suppose I should convert to JSON:
    floor_1.jpg
    {
       name floor1
       real_world_x .4
       real_world_y .4
    }
    
  • Divide real-world size of space to be textured (i.e. x,y,z world coordinates of vertices) by real world size of image.
  • Result is (u,v) texture coordinates saying how many times to tile
  • For the JEB tile, we estimated it as 0.4x0.4 (a little less than half a meter). You would tile it 2.5 times in each dimension to fill 1 square meter.
  • For the JEB 2nd floor corridor outside my office, we measured 21.1x3.4 meters. The (u,v) is (52.75,8.5). The four texture coordinates might be (0.0, 0.0), (0.0, 8.5), (52.75,8.5), (52.75,0.0).
  • Vertex order matters. It will look crazy if (x,y,z) vertices are not given in same order as (u,v) texture coordinates. Easy to get things flipped, skewed, etc.
  • In my city, I tossed in some building textures real fast, but didn't supply texture coordinates? So my buildings did not know how to tile last time I showed them to you. Maybe by next Friday, they will. :-)

    Visualizing Live Software Systems in 3D

    by Greevy, Lanza, Wysseier (SOFTVIS 2006)

    From the same group that gave us CodeCity (and preceding that paper!), this paper gives me great hope of addressing some of the issues that I am passionate about, regarding the visualization of static+dynamic information.

    "feature-centric reverse engineering"
    you know, captured traces of selected runtime behavior. Like as if you used an Event Mask to only ask for features of interest.
    how static source artifacts contribute to runtime behavior
    the connection of statics to dynamics is a central task
    "feature trace"
    a record of the steps a program takes during execution of a feature
    "feature"
    user-triggerable functionality of a software system
    which parts of the code are active during the execution of a feature?
    what's instantiated and how objects collaborate on a feature
    what patterns of activity are common across features?
    alleged to give insights into the architectural structure of the system
    what activities are specific to one feature?
    The Greevy approach:
    1. apply static analysis, extract a static model
    2. instrument the code
    3. execute code to obtain traces ("trees of method calls") of feature executions.
    4. resolve/bind/connect trace events back to static model

    Trace summarization may eliminate details that provide valuable insights!

    Visualization is Static class hierarchy + "towers of communicating instances". (Sounds Very similar to SynchroVis, which came after).

    5 Dimensions of Interest of Software Visualization (Maletic):

    1. Task. Why is the visualization needed?
    2. Audience. Who will use the visualization?
    3. Target: What low level aspects are visualized?
    4. Representation: What best conveys the target information to users?
    5. Medium: where are the visualizations rendered?

    KScope: A Modularized Tool for 3D Visualization of Object-Oriented Programs

    by Davis, Pestka, and Kaplan (VISSOFT 2003)

    KScope

    • compare "reverse engineering" of standard UML (left) with Kscope visualization (right)
    • there is a class under study (multicolored cube)
    • cube vs. pyramid for class vs. interface
    • dark blue == "terminator class" (library class)
    • line color (red=association, blue=dependency, magenta=composition, black=implementation, green=inheritance, yellow=interface inheritance)
    • click things for info detail
    • BCEL: Byte Code Engineering Library, a Java thing from Apache. Perhaps subsumed by ASM

    Visualizing Memory Graphs

    by Zimmermann and Zeller (Dagstuhl seminar, 2001)

    Who needs visualization? Programmers debugging bugs need visualization!

    (gdb) print *tree
    *tree = {value = 7,name = 0x8049e88 "Ada", _left = 0x804d7d8,
      _right = 0x0, left_thread = false, right_thread = false,
      date ={day_of_week = Thu, day = 1, month = 1, year = 1970,
      _vptr. = 0x8049f78}, static shared = 4711}
    
    Modern GUI debuggers still mostly show these values as text. If you use a good one, you might get some depiction of pointers:

    DDD (pictured above) makes you expand/follow each pointer manually.

    • Pro: program is in control, sets focus of what is to be displayed.
    • Con: wow, to display a linked list of length 100, click 100 next pointers.

    A memory graph (pictured above) might in fact be a graphic depiction of an entire program state. Consider it to be a (relatively) brute force or literal depiction of memory, with pointers as arrowed edges. Given this depiction, how easy is it to answer questions like these:

    • are there any pointers pointing to this address?
    • how many elements does this data structure have?
    • is this allocated block reachable from within my module?
    • did this tree change during the last function call?
    Now: what downsides or challenges can you suggest might occur with memory graphs?

    How do they get these memory graphs? I think it is fair to say: painfully.