CS 404/504 Program Monitoring and Visualization

Lecture Notes for CS 404/504 Program Monitoring and Visualization

Note to Dr. J: next time you teach this course, review and re-order some papers and lecture material up to the front.

Lecture 1
Lecture 2
Lecture 3
Lecture 4
Lecture 5
Lecture 6
Lecture 7
Lecture 8
Lecture 9
Lecture 10
Lecture 11
Lecture 12
Lecture 13

Lecture 14
Lecture 15
Lecture 16
Lecture 17
Lecture 18
Lecture 19
Lecture 20
Lecture 21
Lecture 22
Lecture 23
Lecture 24
Lecture 25
Lecture 26
Lecture 27

Lecture 28
Lecture 29
Lecture 30
Lecture 31
Lecture 32
Lecture 33
Lecture 34
Lecture 35
Lecture 36
Lecture 37
Lecture 38
Lecture 39
Lecture 40
Lecture 41

Syllabus

What this Course is About

This course is a blend of

dynamic analysis -- the study of program execution behavior, with
visualization -- the graphical depiction of large amounts of information.

It turns out that much of the key connecting glue between monitoring and visualization comes from static analysis, the study of program properties observable from the source code.

Each week, you can expect part of the lecture material to come from dynamic analysis and part from graphics/visualization. Similarly, part of the time each week will be studying interesting work done by others, and part of the time will be engaged playing with my research infrastructure, working on software tools that will (hopefully) advance the state of the art.

Reading Assignment #1

Go read unicon.org. Download the main Unicon book. Read chapter 1. By Friday, e-mail the instructor with questions you have about Unicon based on the website and chapter 1.
Read Program Visualization in a Virtual Environment, by Hirose/Ogi/Riesterer. By Monday, be prepared to discuss this paper.

Early History of Monitoring and Visualization according to Jeffery

Others may have more and better information, but this is my version of that subset of computing history relevant to this course.

In the beginning, there were programs.
And programs begat bugs. In the punchcard era, the highlight of one's afternoon often was getting back one's output from one's daily program run, a short stack of punched cards to the effect that the program was not executed at all, due to an error in the source code.
Eventually programs started to compile or assemble.
When a program ran and did not produce expected output, one was supposed to go back to the source code and read/study/stare at it to find out why. This still works, some of the time.

When the computing industry reached a stage of having interactive, text screen terminals, all kinds of new bugs became common-place. Along with mankind's increased ability to generate bugs, a whole slew of tools and techniques were developed to understand program executions, including tracing, and source level debuggers. These tools still work, they just don't scale well. Sadly, if you look at a modern IDE its debugging and tracing capabilities are not much improved from what was available 40 years ago. This is (I claim) because problems in monitoring and debugging are hard, and the cost of building new tools which might advance the state of the art is very high.

By the 1980's, interactive 2D graphics was ubiquitous and improving rapidly in performance. People started to use graphics to help understand program execution behavior partly because text-only techniques did not scale well, and partly juse because the graphics was available. A movie called "Sorting out Sorting" (parts 1,2,3), originally presented at SIGGRAPH, made a compelling argument that graphical techniques could be valuable in teaching and understanding algorithms.

Sorting Out Sorting was done one frame at a time on truly ancient facilities. A group at Brown University (home of graphics guru Andy Van Dam, algorithms guru Robert Sedgewick and a cast of thousands) set out to replicate on interactive workstations what Ron Baecker had done a frame at a time. One result of this effort was Marc Brown's Ph.D. and related software. We will present more history in a later session.

What About Us?

This course is a follow-on to the work presented in my Ph.D. (insert story of Dr. J's Ph.D. here).
The central premise of my Ph.D. is that if we build the infrastructure needed to reduce program monitoring and visualization to "no harder than writing ordinary applications" and then use a rapid prototyping language suitable for research experimentation, we should be able to propel the state of the art forward.
My Ph.D. produced an execution monitoring framework and a 2D graphics API well-suited to these goals. Since then the monitoring framework has been improved and 3D graphics has become ubiquitous.
This semester we will find out what we can do with this framework. Each time I have taught this class, we have propelled the research forward a little bit.

Announcements

There is a bblearn for this course now. It has a HW#1 posted, but I am not so sure I like it. I may think of a better HW#1 for you, by this weekend. Check for HW#1 on Monday. In the meantime, learn some Unicon.

Unicon 101

Unicon is found at unicon.org
We will teach Unicon from scratch in this class
Unicon programs' source code are in .icn files and are compiled into VM bytecode by default (execution model vaguely resembles Java)
Unicon programs are organized similar to C++, as a set of functions and optionally, classes, starting from main().
For this class, we will have to make sure that you have access to a machine with unicon, including its (2D and 3D) graphics facilities.
Unicon comes with a simple IDE; feel free use it or skip it. I use Emacs but the IDE has a nice help menu that will quicklaunch most reference documentation. IDE has a known bug on MacOS.
Unicon itself has one pending 3D bug I intend to fix for this class, and we may encounter other bugs. In response to those, I will almost surely (re)build Unicon from sources one or more times during the semester. You can plan to re-install one or more times. It is research software. If you were to build Unicon yourself after I announced a bugfix, this generally requires a complete C/C++ compiler, with supporting tools such as "make" and "subversion". On Windows we use MSYS and Mingw64.

Unicon: the Easiest Parts

Let's ssh into a test machine to live-demo the following:

Types	Control Flow
string	success vs. failure
integer	if-then-else
real	while-do
cset	calls, argument rules
list	generators
table	case-of
file	every-do

Alternate Resources for Unicon Study

None of this is assigned reading. It is here for your convenience; you know, in case you just hate the Unicon book.

my CS210 lecture notes on Unicon
Chapter 2 of Graphics Programming in Icon is probably the clearest available description of Icon and Unicon's goal-directed expression evaluation. Did you know that I give away hard copies of this book?

Monitoring Framework Intro

An execution monitor (EM) observes events in a target program (TP). There are two-process, one-process (callback), and thread-models.

two-process model: EM and TP communicate via network sockets, pipes, or files.
one-process/callback: The TP calls the EM when an event occurs. The EM is organized as a set of callbacks, i.e. it doesn't have its own main() or control flow, it just responds to things.
thread: EM and TP are threads in the same address space, making communication far easier.

Which model do most debuggers use? The two-process model. Which model should we use for visualization tools? What is different about their requirements?

two-process model

Pros:

easiest to do language-neutral and cross-language.
Least intrusive

Cons:

slowest option; fast enough for debuggers but not for visualization
low-level and/or difficult access to TP state

one-process/callback

Pros:

fastest

Cons:

intrusive (monitor shares target program stack and globals)
most difficult for monitor programming (no control flow!)

thread

Pros:

best of both worlds

Cons:

Graphic Design of the Day: a map

Napoleon's March into Russia: proof you can legibly plot extra dimensions atop a map. Maps have legends to explain what's on them, along with two primary dimensions which are intuitively based on actual geometry.

lecture 3

Reading for this week

Unicon book, chapters 2, 3, and 5.1-5.2.
Visualizing Software in an Immersive Virtual Reality Environment, by Maletic, Leigh, and Marcus. ICSE'01 Workshop on Software Visualization, 2001.

HW#1 revised

Compared with last time I taught this class, I want you to spend enough time to learn Unicon, or rather the 1/2 of it that will be useful for writing visualization tools.

Highlights from Hirose

[Hirose97] describes research from the University of Tokyo, presented at the annual conference of the World Society for Computer Graphics.

What do Hirose et al mean by "virtual environment"? What does a virtual environment have to do with program visualization?
Goal of visualization: help human build a cognitive model of a large, complex software system.
when are graphic representations more useful than purely textual representations?
What is the paper's proposed solution to the "bird's nest graph" problem with program visualizations, which is that the many crossing lines make the depicted graph difficult to understand
programs have neither intuitive appearance nor physical form. what info to show? How to depict it graphically? How to position it within 2D or 3D space?
Proposed static analysis: from source code, generate an "intermediate file" containing:
- list of source files
- list of functions in each file
- for each function, a list of calls to other functions
- "data structures"
(Hirose note: "intermediate file" format might make sense for multiple/many languages, allowing subsequent visualization on all of them).
(Jeffery note: many tools have been written to extract this before, might not have to do it from scratch)
don't ask the user to place all the objects -- too much work
force-directed placement algorithm:
- each object exerts a repulsive force on every other object
- relationships between objects exert attractive forces
- model it as a physical system, start with random positions within a small space and turn forces loose. layout is finished when it reaches equilibrium
Fundamental in visualization: too much information. Too complex. Need to select/simplify/abstract. For example: merging similar entities, omitting peripheral ones.
Levels of abstraction, levels of detail. At a distance: only show external coupling. From close up: object becomes transparent, source code is visible.
Color-coding : a limited, extra dimension. Example: color GUI components red, database components green, computations blue. Example: distance from main() (or current point of execution) indicated by brightness/saturation/grayscale. Hirose does not note the significant percent of users with some colorblindness.
Geometric appearance. E.g. size and shape. Hirose uses cubes sized to the log of the #lines of code. Shape is another limited, extra dimension that can be used.
Does VR matter? What impact do VR goggles or a CAVE have on users ability to understand the information we want to depict in this class?

Cheesey Movie References

What movies present topics relevant to this class, i.e. program visualization, program behavior monitoring, or virtual environments where such activities occur?

TRON
Disclosure
Ready Player One

Graphic Design Principles

We need graphic design principles in preparation for visualization work. The following can be attributed to Edward Tufte, a renowned ivy league graphic designer who has written some beautiful books.

show the information
show as much as you can with as little ink as possible
remove ink that isn't showing useful information
remove redundant information
revise and edit

Graphic Design of the Day: a scatter plot

A map of London by John Snow, 1854, cleaned up by John Mackenzie of the University of Deleware.

lecture 4

Mailbag

I am having trouble using the star operator on lists, *L: The size operator *L works only after L has been assigned a list value. L := []
How do I check if a string is not in my list of strings?: Well, first off, if one were doing this a lot maybe one should use a set instead of a list. Unicon has a set type. But for occasional use on lists of reasonable size, s==!L tells if s is in the list L. s ~== !L is not so good, it will almost surely succeed unless every value in L is s. Instead use not (s == !L)

Unicon: the next level

Let's peek at CS210 lecture notes on Unicon to see if I missed any highlights during the live demo.

Monitoring Buzzwords

Volume, dimensionality, intrusion, and access. Solve these four unsolvable problems and you've got the makings of a decent monitoring and visualization framework.

volume: if you think static analysis of source code has a lot of information the programmer may have to understand and/or deal with, wait until you see the amount of information dynamic analysis generates. Even small, short-running programs can generate millions and millions of events of interest. Monitoring and visualization tools have to filter/discard, condense/simplify, and analyze their input, turning low level data into higher-level information.
dimensionality: understanding program behavior involves many dimensions: control flow, data structures, algorithms, memory access patterns, input/output behavior... Visualizations can be selective, but often want to depict more than just 2 or 3 dimensions' worth of data even though they are using a 2D (or 3D) output device.
intrusion: The act of observing program execution behavior changes that behavior. Monitors have to minimize/mitigate this or they will be visualizing their own side effects more than the thing they purport to show. The first form of intrusion is to skew the timing of the observed behavior. Monitoring a program may also alter its memory layouts (e.g. on the stack), which might make bugs disappear (or merely exaggerate them).
access: Simple monitors might graphically depict exactly the information contained in the sequence of events that they deserve, but most monitors need to ask additional information, by accessing potentially the entire state of the program being executed.

Graphic Design of the Day: Line Plots

Multiple dimensions of weather along a primary time axis.

From the New York Times, popularized by Tufte.

lecture 5

Announcements

Regular office hours: after class MW 2:30-3:20. Also by appointment.
UI closed - no class on Monday

Unicon: Goal-Directed Evaluation

Surprised by Failure?

Don't be surprised when fallible expressions fail.
Failure in Unicon isn't a rare event like an exception,
Failure is part of every program's life.
You should learn to know how to identify fallible expressions.
Expect failure. Write code to accommodate it.

When to check for failure: everywhere that failure can occur, and everywhere that failure will matter. Examples:

comparisons are designed to fail, most folks don't miss these
type conversions like integer() are also designed to fail
open() and similar system functions that ask for an operating system resource that might not be available -- check them!
find() and similar built-ins, UNLESS you can prove data is valid
subscripts, unless you can prove valid index ranges

Graphic Design of the Day

William Playfair's chart depicting area, population, and tax revenues of countries in europe is another excellent example of depicting multiple dimensions of data.

The slope between the population and tax revenues points down for most countries and sharply up for England (and less so, for Spain).

Introduction to Unicon Monitoring Facilities

events: billions and billions of tiny points in time, with a tiny data payload, and the ability to easily inspect the entire program state. Event names like E_Pcall or E_Lbang
event keywords: &eventcode and &eventvalue
built-atop co-expression data type: threads that take turns. AKA coroutine, goroutine, or co-operative or synchronous thread.
the VM is instrumented for you: asymmetric coroutines. VM C code sends events to monitors written in Unicon

lecture 6

Reading for this week

Unicon book, chapters 4, 5.6 (skim rest of 5), 7
3D Visualization for Software Development by Bonyuet, Ma, Jaffrey. In Proceedings of the IEEE International Conference on Web Services, 2004.

Ideas from Visualizing Software in an Immersive Virtual Reality Environment

What does the paper claim is the distinction between 3D and VR?
Research issues:
- What are the most efficient methods to visually represent software systems?
- What are the perceptual limits of these visualization with respect to large software systems?
- How do we represent static and dynamic information about the software?
- What are the user's most important information needs with respect to particular software engineering tasks?
- What features are most suited to supporting collaborative problem solving in this domain?
- What software engineering tasks benefit (most) from this collaborative environment?
VRML - late 90's, expected to be a big hit follow-on to HTML, did the static part OK but didn't solve well the need for dynamics.
"Imsovision" - class as a rectangular platform, methods as columns (laid out how?)

More Unicon

Did you notice system(s) and open(s, "p") ?
Built-in structure types as the associations between objects
Dr. J's intro to OO design was a user ...
Classes
Windows

Notes from Past Students' Unicon Program Visualizations

"Longer" is not better. Writing good Unicon is like haiku or other short poetry. Practice toward mastery of the art.
Avoid platform-dependent colors - stick to the portable color names (see Unicon Graphics Book) or use RGB values.
avoid platform-dependent fonts - stick to mono, sans, serif, typewriter
do not assume that the display is larger than 1024x768*
put your name in a header comment at the top of your homework
check user input for validity, avoid crashes

Sorting Out Sorting, Unicon sample solution

This version is based on one by Mike Wilder.

lecture 7

How's the Homework Going? Any questions?

Things that might be useful:

open(commandstring, "p")
UTR#20 (json.icn)

Unicon: Threads and Co-expressions

Sorting Out Sorting, Unicon sample solution

Let's look at the code from this sorting visualization, based on one by Mike Wilder. Start at the bottom, with main().

You've seen !x before, but how about x ! y: x ! y is the apply operator. It calls function x with parameters given by the elements of list y.

lecture 8

Reading for this Week

OGRE: Three-Dimensional Program Visualization for Novice Programmers, by Milne and Rowe. A journal article published by Springer. As a reminder, you may have to view/download from on-campus. Check your access now and inform me of issues.
Unicon book chapter 9

Thoughts on 3D Visualization for Software Development

Bonyuet/Ma/Jaffrey, ICWS 2004: Basic PC GPUs existed by this time; World of Warcraft came out in 2004.
Key Criteria: usefulness, intuition, and scalability: What were their definitions for these?
Schneiderman's 7 tasks: overview, zoom, filter, detail-on-demand, relate, history, extract
CodeMapping achieves: labeled "atomic metaphor" 3D graphs.: did they achieve their key criteria?

Introduction to Unicon Monitoring Facilities, Part 2

built-in function EvGet(c): Activates &eventsource (Monitored) to get next event whose code is of type c
event codes and masks: an event code is a one-letter string. an event mask is a cset. This is, literally, just grad-student-drj exploiting the handy bit vector implementation that was in Icon.
link evinit: library function EvInit(argv) loads program
$include "evdefs.icn": include file evdefs contains definitions of event codes

Writing your first Unicon monitor

Consider the beauty and virtue of m0.icn, m1.icn and events.icn. Now checkout sos.evt

lecture 9

Windows Unicon Trouble?

Windows Unicon wasn't running for one student due to a long PATH environment variable. On current Windows you can pretty much set a path arbitrarily long, but for a new process to see it there is a length limit (2047 characters, possibly less).
Windows Unicon bug reports are extremely welcome; please send me cut/paste console messages or screenshots...

Summary of Event Monitoring Libraries

From unicon/ipl/mprocs

evinit: EvInit(args) loads another Unicon program that is to be monitored
evnames: evnames(e) maps event codes to English, e.g. E_Pcall -> "procedure call"
evsyms: returns a table that maps codes to symbols t[E_Pcall] -> "E_Pcall"
...: there are several more that we will introduce as needed

From unicon/ipl/mincl

evdefs.icn: $defines for all 100+ event codes. We should probably tour this.
patdefs.icn: $defines for the ~100 integer &eventvalue's of the E_PatMatch event

HW#2 status

Subject to some tweakage, here it is.

Unicon 2D Graphics Functions

We briefly discussed the built-in 2D graphics function set.

Functions you might have a use for in this class:

Bg - get/set the background color
CopyArea - copies a rectangular area
DrawArc - draw a partial or complete oval
DrawCircle - draw a partial or complete circle
DrawCurve - draw a smooth curve through points
DrawImage - draw an image from a ASCII-art string representation
DrawLine - draw one or more lines
DrawSegment - draw connected lines
DrawPoint - draw points
DrawPolygon - draw a polygon
DrawString - draw text
DrawRectangle - draw a rectangle
EraseArea - fill a rectangle with the background color
Event - read the next keyboard or mouse event
Fg - get/set the foreground color
FillArc - draw an arc filled with the foreground color
FillCircle - draw a circle filled with the foreground color
FillRectangle - draw a rectangle filled with the foreground color
FillPolygon - draw a polygon filled with the foreground color
Font - set the font
GotoXY - set the text cursor to pixel (x,y) based on current font
Pattern
Pending - return the list of pending user input on this window
QueryPointer - obtain the current mouse location
ReadImage - read an image from a file into a window
TextWidth - return the # of pixels wide a string is in the current font
Texture
Texcoord
WAttrib
WDefault
WFlush
WSync
WriteImage

3D Functions We Will Worry about Later

DrawTorus
DrawCylinder
DrawDisk
DrawCube
DrawSphere
PopMatrix
PushMatrix
Rotate
Scale
Translate

Functions you probably don't need in this class:

Active - returns a window with pending input
Color - set a mutable color (color-index frame buffers only)
Couple - bind a canvas from w1 to a context from w2
FreeColor - free a mutable color (color-index frame buffers only)
GotoRC - set the text cursor to (row,column) based on current font
NewColor - allocate a mutable color (color-index frame buffers only)
PaletteChars
PaletteColor
PaletteKey
Uncouple

lecture 10

Unicon Mailbag Questions

How does open mode "p" work?

You don't have to use it you can do anything that you find works for you. But open(cmdline, "p") runs cmdline in a shell and opens a file that reads its standard output into your program.

Linux Example	Windows Example
f := open("ls -l \| grep icn", "p") while filename := read(f) do stuff(filename) close(f)	f := open("cmd /C dir", "p") while line := read(f) do if find(".icn", line) then stuff(line) close(f)

how would I make global lists or tables that I can access in other procedures?

Declare global variables.
Assign them list or table values (maybe in main())
They will then be visible everywhere.

Partial Highlights from HW#1 Solutions

    outfile := open("output.json","w")
	# OK, but check whether open() fails or not

    s := f(s, "morestuff")
    	# functional style is fine and appropriate. no reference parameters.

    L := []
    every put(L, !fileIO)
	# OK, but consider L := [: !fileIO :]

    truth := 1
       ...
	    if truth = 1 then {
	# fine, use boolean flags if you must. no boolean data type.
	# More common to use &null as false and non-null as true.

     every x := find("(", line) do {  #finds every instance o
     	# outstanding; uses find() to iterate through line

     if not member(&letters ++ &digits, line[x-1]) then {
     	# fine, but if you do this a lot of times, pull ++ out of the loop

     hashIndex := &null
     hashIndex := find("#", line)
     if hashIndex ~=== &null then{
     	# fine, but consider
	#   if hashIndex := find("#", line) then { ...


     system (["cflow", "--omit-arguments", name], f, f, f3)
	# wow, kudos for using the full power of system()! Is this better than
	#   system("cflow --omit-arguments " || name, f, f, f3)

      word := tab(upto("("))
	# kudos for using string scanning!
	# consider using tab(find("(")) or change to tab(upto('(')))

      every i := 1 to *args do tableofprogs[args[i]] := preprocess(args[i])
	#  every arg := !args do tableofprogs[arg] := preprocess(arg)

      p := <[_a-zA-Z][_a-zA-Z0-9]*[ \n\t]*"(">
      p2 := p || .>y
      s ?? p2 -> s2
	# wow, regexps and patterns!

      if s2[j] == (" "|"\n"|"\t"|"(") then {
        # if any(' \n\t(', s2[j]) then { ...

      system("cflow  cflow.c > info.txt", "p")
	# hmm, possible mixed metaphor

      if(pos ~== 0 ) then
	# if pos ~= 0 then

      if(lenghtOfString(L[i]) = 1 )then {
	# not just misspelled, also misleading

      wchar := &letters ++ &digits ++'\'_'
      lista ? while tab(upto(wchar)) do {
	# this is good practice

      n_pos :=  find("()", p_name)
      f_pos := find(")", p_name)
      if  p_name[n_pos] == "(" then {write("nice")}
	# better know for sure that these can't fail, or check

      procedure getSpaceNumber(line)
      local pos:=0
        space := line[1]
        while (space == ' ') do {
          pos := pos + 1
          space := line[pos]
        }
        return pos
      end
	# many(' ', line)

      if not (tab(find("class"|"procedure"))) then {
	# cool

      &pos := &pos + 6
      	# move(6)

      lineno=0
	# lineno := 0

Visualization Principles (according to Dr. J)

animation: incremental algorithms are a primary means of achieving efficient animation. complementary to the principle of minimizing ink (or # pixels) used to convey a given set of information, this is like minimizing the motion of the plotter arm, or in our case, the # of memory writes.
least astonishment: use the golden rectangle, labels and legends
metaphors: a familiar metaphor saves the user a lot of time and improves understanding. Metaphors can be taught, and become familiar over time, but that is often laborious.
interconnection: connecting different pieces of data is key, follow Playfair's example
interaction: the big difference between a visualization and a paper chart or graph is that the user can interact with the data. exploit this.
dynamic scale: visualizations compete for screen space and hardware varies widely. it is extra work, but if you write everything so that it scales, your visualization will be useful on more machines and in more ways.
static backdrop: one of the best ways to make dynamic data understandable is to present it in terms of static data. An execution is an instance of the underlying universal abstract thing that is the program.

Notes from Past Students' Unicon Code

main(av): av is always a list of strings; if no arguments, *av = 0
paramnames() is a generator: use it with every, or ask questions like "if type(x:=paramnames(...))=="list" then..."
the apply operator p ! L is pretty awesome
what does every maxval <:= !L do?: max() is a built-in function, so maxval := (max ! L)
failure and success: if i := find() then ... is cooler than i := find(); if \i then ...
check for open() failure: I asked nicely before, now I am telling you
sticking &fail at the end of a routine is a noop: a routine fails for free if it falls off its end; &fail does not return a failure and is in fact seldom used. Unlike lisp, the return value of a function is not its final expression's evaluation.

Graphic Design of the Day

Fisheye Views.

If you want, you can read Furnas' paper on Generalized Fisheye Views.

Suspects, Tools, and Big Programs

As we proceed into the "meat" of the course, we have a need for lots of subject programs to study, lots of example monitors, and bigger programs that presumably will have more complex behavior.

Suspects

This directory was compiled by Ralph Griswold as a collection of interesting or weird programs whose behavior could be understood by program visualization. The good part of the Suspects directory is that the programs all run non-interactively, in some cases they were modified to do so, and those that require input have sample .dat files on which they run nicely. This lets monitors do their thing unimpeded. We should probably add some representative object-oriented programs to this collection this semester. I probably can dig out my "gui recorder" and create recordings of GUI programs so that we can monitor them conveniently in this context.

tools

This directory was compiled by Clinton Jeffery as a collection of simple program visualization programs and library procedures. Many of these codes are featured in the book, Program Monitoring and Visualization.

Big Programs

The largest programs in the suspects directory are typeinfer (2.6k lines), and yhcheng (1.9k lines). These were considered large in the Icon language, where source codes are typically 1/3 to 1/10 the size of C programs that do the same thing. The other largest public domain Icon programs are in the ipl/*packs directories. Among these, ibpag2 is 3.7k lines, itweak is 3.5k lines, skeem is 3.1k lines, ged is 3.6k lines, htetris is 4.3k lines, vib is 4.4k lines, and weaving is 11.3k lines (?). Monitoring these might or might not be easy, since they may be interactive, and you might or might not know what to click at them in order to get them to behave. The largest known Icon programs (source not available) was Bill Wulf's testcase generator (rumored to be on the order of a half-million lines, perhaps machine-generated.

The Unicon language supports larger programs than Icon was intended for. The unicon translator itself is 10k lines of Unicon. The uni/lib class library is 20K lines, and the uni/gui GUI class library is 14.5K lines; large subsets of these libraries may be added onto whatever the tool size is. The Unicon IDE is 17K lines, the IVIB user interface builder is 16K lines, and so on. Some of these you can acually monitor.

The largest Icon/Unicon programs for which I have source code include the SSEUS database review/update system (35K lines), and a Knowledge Representation language and system (50K lines) done by an AT&T scientist. It might be possible to find these and monitor them, but it would take work to set them up for monitoring.

lecture 11

Mailbag

How can I set the width and height of the string that I print with DrawString() using the values height and width from the dot output?

Great question. Text labels are going to be important all through this course. Visualizations often botch them: either not enough, or too many to the point they are unreadable.

Unicon fonts have height, ascent, and descent attributes that are independent of what string you are trying to output. Units are pixels.
Some fonts are fixed width, and some are spaced such that wider characters use more pixels.
The four portable font names are "sans", "serif", "mono", and "courier"
You set the font with the Font(s) function, or assign it with WAttrib("font="||...).
Typical font strings look like Font("serif,14") or WAttrib("font=sans,11,italics").
DrawString(x,y,s) draws s starting from (x,y).
If you link gpxop you get a procedure CenterString(x,y,s) where s is centered at (x,y).
There are several calculations that might be relevant: determine whether a given font will fit a given (width,height) size, calculate what is the largest font that will fit, or print only as many characters of a string that will fit, if the font is not to get smaller.

Reading

"Visualizing Software Systems as Cities", by Wettel and Lanza.
Unicon Book Chapters 10-11

Highlights from OGRE [Milne/Rowe 04]

topics related to memory are the most difficult

pointers, dynamic memory allocation, copying, polymorphism... (9/10 of the most difficult topics for novices identified in a previous paper)

[Knight and Munro 2000] "Software World" sounds interesting.

Not assigned as homework/reading. It proposes a city metaphor in which:

each class is a "district" and each function is a building.
The height of the building gives the source code size in lines.
Building exterior color shows visibility (light=public, dark=private).
Outside doors indicate parameters number and type.

Object-oriented systems can be harder to understand than traditional imperative code.

So maybe it would be more important for us to figure out how to visualize them.

A conceptual view is needed more than a literal view of memory

At least for novices, sizing each object to its # of bytes is not the main point.

Understanding scopes is important. Each one gets a plane.

Local scopes are mostly extremely numerous and short-lived. We need a metaphor in which these "planes" or sets of variables/objects come in together in a rush, and leave together with a wimper. A lot. We are looking for a metaphor for the stack. Of course, we could depict them as a stack. Pancakes? Waffles?

Note OGRE's target: novice C++ programmers who need to develop a very concrete mental model of how pointers work.

More Unicon highlights from HW#1 code?

      if first:=find("at ", line) & lineNumb==1 then{
         move(first+2)
		# what's the difference between tab() and move() again?
	# extremely common: tab(find(...)), tab(upto(...))

      while move(1) == " " do {
         count:=count+1
      }
	# count +:= 1 ok, but how about count +:= *tab(many(' '))

Graphic Design of the Day

CASSE POSTALI DI RISPARMIO ITALIANE by Antonio Gabaglio, via the revered Tufte, and cited in a nice discussion of cyclic data, apparently by Benj Lipchak.

Unicon feature of the day: Packages

Packages were added to Unicon more or less against my will, but they are obviously of growing importance in larger scale development. Packages are about protecting a name space from collisions. Without them, global variables in all modules are shared, and accidentally, these variables may conflict with globals (and undeclared, thought-to-be locals!) in other modules. The more libraries you use, the more inevitable these conflicts. Proof that packages are needed is evident in the Icon Program Library, where, after fundamental built-in functions like "type" were accidentally assigned one too many times by client code, Ralph Griswold got in the habit of protecting "type" or similar built-in functions the hard way, inside each library procedure that uses them:

   static type
   initial type := proc("type", 0)	# protect attractive name

This gets old in a hurry, and it actually bloats code a little bit.

So anyhow, Robert Parlett implemented packages, and I accepted them, and now they are here to stay, and they aren't bad. You do have to know the "package" and "import" keywords, and the ::foo syntax, and that is about it.

lecture 12

Mailbag

I am stuck trying to parse dot output. For example, if I have a string s == " a -> b [pos=\"e,63,108.41 63,143.83 63,136.13 63,126.97 63,118.42\"];" how do I parse it?

Well, obviously we are still learning Unicon and I will take whatever bloody harvest of bytes you manage to deliver me. But if I had to do this homework, I might start with something like:

   s ? {
   tab(upto(&letters))       # discard up to node name
   srcnodename := tab(many(&letters))
   tab(many(' \t'))          # discard whitespace
   if ="->" then {           # we have an edge
      tab(many(' \t'))       # discard whitespace
      dstnodename := tab(many(&letters))
      tab(many(' \t'))       # discard whitespace
      if ="[pos=\"e," then {
         L := []
	 while num := tab(upto(', \"')) do {
	    put(L, numeric(num))
	    if ="\"" then break
	    else tab(many(', '))
	    }
         }
      else write(&errout, "malformed edge")
      }
   }

I am stuck trying to use DrawCurve(). From reading the book, I understand that the arguments need to be x,y pairs. My issue is when I try to pass DrawCurve() a string or a list as an argument containing all of the x,y pairs. For example a string or list containing [127.7,180.41, 127.7,215.83, 127.7,208.13, 127.7,198.97, 127.7,190.42]. I am assuming the string or list gets treated as just one argument when I do this, is there something else I can do?

DrawCurve() and the other Draw*() functions do not take their parameters in a list or a string. If you have all your arguments in a list L, you can turn them into parameters using the apply operator, as in DrawCurve ! L

cflow on Windows

If you dare, check out https://github.com/noahp/cflow-mingw. It is either a nice guy who built cflow on Windows and shared it with the world, or a nefarious ransomware hacker luring victims with offers of cflow binaries. If you don't like trusting his .exe's you can certainly examine the source code and try to follow this github project's build instructions. How I found it: googled cflow.exe. Random github is not a highly reputed official distribution, but at least with source code it is not obviously one of those codehosts of ill repute malware sites, like a fake device driver repository.

dot on Windows

There have been reports of problems running dot on Windows. graphviz.org provides windows executables and dot.exe seems to work OK. Maybe it conspicuously chooses not to add itself to your PATH; adding the directory where it was installed to the Path got it working for one student. In another student's case, instead of open("dot ...", "p") we ended up using open("cmd /C dot ...", "p"). That smells also like a Path issue, but I am not sure.

Monitoring Location Events

E_Line events show line # changes
E_Loc events contain line, column, and a syntax code. Somewhat more frequent than E_Line.
to get line # out of an E_Loc: iand(&eventvalue, 65535)
to get column # out of an E_Loc: ishift(&eventvalue, -16) (actually, maybe a bit trickier)

MiniLoc

Vizualization Idea:

map line #'s to y coordinate, one pixel per text row.
map column #'s to x coordinate, one pixel per text column
aggregate ALL events at a coordinate to a color. that way, nothing has to move, it just gets "hotter" over time.

The program miniloc.icn is a "miniature location profiler". It is our first example from the tools/ directory mentioned in an earlier lecture. It is 66 lines of code. What is "mini" about miniloc is that each source code line and column is one pixel row and column. This is a scaling problem for large programs or small monitors. Miniloc could be rewritten to scale its graphics. The frequency of location events at various locations is recorded using a log scale through a range of colors from boring to red-hot. Humans don't really perceive red as a larger # than green, but the metaphor of a temperature map is widely recognizable anyhow.

lecture 13

Reflections on Miniloc

My first thought after briefly running miniloc last time was:

I need bigger dots (for legibility on TV's).

After sleeping on it, additional ideas came calling:

How hard is it to adjust the scale interactively?
How hard is it to tell how many lines/columns this viz. needs to be able to scale to show the whole file?
I wonder how big the dots have to be before I could just be printing text?
I wonder how to scale miniloc to programs with (many) files?

For the small-font legibility question, we might take a look at this font demo.

What is the smallest row you can read?
Below that point, illegible font shapes drawn might still be useful for miniloc's purpose, which is to be a "mini-map" of the code.

my problem last time wasn't just small dots on a TV screen
the copy of miniloc.icn I showed you hadn't been updated when the E_Loc event was enhanced for Ziad Al-Sharif's Ph.D.
I have too many copies of the execution monitoring tools, on too many machines
This is a great argument for placing them under revision control

Bigger Questions

How much more static information will we need, e.g. beyond the function call graph? What types?
Does that "static" information include source code?
How much could we get if wanted to not depend on source code?

Piano

Vizualization Idea:

map line #'s to y coordinate, one pixel per text row.
map column #'s to historical sequence of line numbers, show last (K) events, where K=width of window in pixels
wrap around when you get to K+1'th column (why?)

Hani's Clever Case Tag

Case expressions in Icon use === semantics, looking for an exact match with no type conversions. Case branches are evaluated sequentially as if one were writing

  if x === firstbranchexpr then firstcodebody
  else if x === secondbranchexpr then firstcodebody
  else if x === thirdbranchexpr then firstcodebody
  ...

If all the branch labels are constants, this is colossally inefficient compared with a C switch statement. But, it is fully general and you can use arbitrary expressions, including generators, for which the entire result sequence will be generated in trying to find a match.

You can add a predicate filter on the front, or have your values supplied from subroutines, or whatever:

   case x of {
   p() & q() & foo: { ... }
   a | b | 1 to 10 | f(): { ... }
   }

This generator capability can be used with cset event masks, as in the following; it would also work with sets, table keys, or any other generator you wanted to write.

case x of {
   ...
   !ProcMask: {
      }
   ...
   }

This makes for short elegant code, but it is inefficient. Generating the individual elements out of a cset costs a type conversion (cset to string) which isn't cheap, and all generators pay for extra bookkeeping on the stack, for that suspending resuming capability, which is slow at times. You are paying for convenience and generality, and a good optimizing compiler might make some of that go away, but the VM sure does not. In a couple minutes we will see another measure of how much you pay. But in the meantime...

Hani Bani Salameh showed me some code once that looked like:

case x of {
   ...
   member(a_set, x): {
      }
   ...
   }

member(a_set, x) tests whether x is a member and returns x if it is, so it is just a filter, and by the way it avoids a linear search via a generator so it is fast. Its got a seemingly redundant comparison of x===x after the member() test succeeds, but that is C code and probably very fast compared with a case with a lot of alternation | or generate ! operators in it.

lecture 14

Reading

An Overview of 3D Software Visualization, by AR Teyseyre and MR Campo, IEEE Trans. Vis Comput. Graph 15(1):87-105, 2009
Unicon Book chapter 18

Highlights from [Wettel and Lanza]

CodeCity
classes and interfaces are buildings
building height == # of methods
width and length proportional to # of attributes (all square)
position hue, saturation, and transparency are all available to depict more information
land topography/elevation depicts package structure
Visualized on real, large systems, e.g. 8,000 classes
Layout: largest first, splitting rectangles into pieces, treemap algorithm
Studied software evolution in a repository

Monitoring Procedure Activity

Procedure activity is a subset of the control flow behavior of expression evaluation.
It is related to but far coarser-grained than monitoring line#/col#
In a normal language monitoring procedure activity would mean monitoring the stack of procedure activation records
In a multi-threaded context, there would be a set of stacks of procedure activation records.
Just looking at the stack, one might note how deep the stack gets (not a problem unless it overflows) and look for patterns that suggest bugs
Can anyone think of a call-return sequence that suggests a bug?
Besides correctness, you might imagine looking for performance problems or tuning opportunities.

Monitoring Icon and Unicon is a little more complicated because procedures can suspend and be resumed. The events for this behavior are given below. The include file evdefs.icn defines an event mask named ProcMask that will select all six of these events.

Event Value Description
E_Pcall procedure called Procedure call
E_Psusp value produced Procedure suspended to caller
E_Presum procedure resumed Resume a previous suspension
E_Pret value produced Procedure returned to caller
E_Pfail failing procedure Procedure failed
E_Prem removed procedure Procedure removed

Event	Value	Description
`E_Pcall`	procedure called	Procedure call
`E_Psusp`	value produced	Procedure suspended to caller
`E_Presum`	procedure resumed	Resume a previous suspension
`E_Pret`	value produced	Procedure returned to caller
`E_Pfail`	failing procedure	Procedure failed
`E_Prem`	removed procedure	Procedure removed

In the presence of suspend/resume, the "call stack" becomes a "call tree", a.k.a. an activation tree (a better term since procedures can be activated by more than just calls).

You can just ask for all the procedure activity events, but if your monitor is doing more than just counting them then it potentially will need to do more. One way to monitor the activation tree is to build a model of the tree itself.

a library procedure named evaltree(), does this for you.
you can link evaltree to link this module into your program.
study in detail the implementation of evaltree.icn.

We will look at examples that use evaltree, but first a word on timing.

The time cost of monitoring

Monitoring costs time.
If it costs too much, folks won't want to do it even if your successful program visualizations do make pretty moving pictures.
The instrumentation of all events costs time even if you don't ask for the event reports
when you do ask for an event, the event report (co-expression switch) costs a lot more time.
It is difficult to even measure the timings of different parts of the monitoring process.
You may be able to do a good job by going into the VM C code and using your own expertise, or using specialty tools for doing timing, such as gprof.
This discussion is just based on casual observation.

Example. In the suspects/ directory are many candidates (which one runs the longest?). We will consider the poetry scrambler for this example.

time ./scramble <scramble.dat

uses the UNIX time(1) command to measure the runtime externally. It reports something like:

Sun Sparc, ~9/2007 Threadripper, 2/2019

1.0u 0.0s 0:03 32% 0+0k 0+0io 0pf+0w

0.019u 0.025s 0:00.15 20.0% 0+0k 0+0io 2pf+0w

Sun Sparc, ~9/2007	Threadripper, 2/2019
1.0u 0.0s 0:03 32% 0+0k 0+0io 0pf+0w	0.019u 0.025s 0:00.15 20.0% 0+0k 0+0io 2pf+0w

Over a decade ago, that program took 1.0 seconds of user time, 0.0 seconds of system time, 3 seconds of wall-clock observed time. Out of curiosity, since it writes out a lot to standard out, I re-timed it directing output to /dev/null, and it still took a second of user time, but the wall clock is down to 1 second.

Now I take an almost-empty monitor, timer.icn, and time it using the UNIX utility.

time timer ./scramble <scramble.dat

and it writes out

Sun Sparc, ~9/2007 Threadripper, 2/2019

tp time: 1830 - 0 = 1830 em time: 0 - 0 = 0 1.0u 0.0s 0:03 30% 0+0k 0+0io 0pf+0w

tp time: 35 - -5 = 40 em time: 5 - 5 = 0 0.025u 0.018s 0:00.15 20.0% 0+0k 0+0io 2pf+0w

Sun Sparc, ~9/2007	Threadripper, 2/2019
tp time: 1830 - 0 = 1830 em time: 0 - 0 = 0 1.0u 0.0s 0:03 30% 0+0k 0+0io 0pf+0w	tp time: 35 - -5 = 40 em time: 5 - 5 = 0 0.025u 0.018s 0:00.15 20.0% 0+0k 0+0io 2pf+0w

Given that timer.icn is the "empty monitor", what do these numbers tell us?

The time command doesn't see any appreciable extra time spent due to the act of monitoring (in 1993 this would have reported ~50% slowdown for the privilege of having the instrumentation present)
The "monitor" thread thinks it has spent no time at all (almost true)
2007 Sparc monitoring facilities reported the program is spending 1.8 seconds, versus system time's 1.0u CPU time, or 3sec wall clock.
System time program in 2019 reports same overall time (0.15sec) but fluctuates as to whether user or system time is more to blame... [clock tick fluctuation?]
Are threadrippers more like 6x faster than Sparcs, or is it more like 40x?
CPU (user+system) is pretty routinely a small fraction of wall clock time
Which timers are more accurate/relevant?
Is there time spent that is unaccounted for?
Timing facilities on PCs have resolution/accuracy limits and you will have to run something for a long time in order to get any useful accuracy.

Time measurement accuracy is limited by tools of observation and hardware/OS limitations. Another problem with measurement is that external evironmental considerations (load average, user activity) change results to some extent. The 2007 measurements were done long ago on mars.cs.uidaho.edu, a sparc Solaris machine. The "who" command reported 5 different people logged in at the time, although the load average was apparently low (inactive terminal sessions). The 2019 Threadripper numbers were for the machine in my office running Fedora. Lots of processes, only 1 user.

lecture 15

No Class on Monday

Monday is President's Day.

Mailbag

How do I draw arrowheads?

The arrow is to be drawn from the last point to the point given with the "e,x,y" at the beginning of the pos attribute. Possible implementations:

no arrow head, just a line from one to the other
compute midpoint, draw a circle there
calculate points on each side of the last point on the line segment orthogonal to the last line segment, form triangle between them and the point listed in the "e,x,y"
Draw using decreasing line widths along the final line segment
... what did you-all end up doing?

Upcoming Conferences

Some of you should consider doing a semester project worthy of a research paper. Some of you might even want to target one of these venues.

WSCG 2019, papers due March 10, conference in Pilsen Cz 5/27-31
CGI 2019, papers due March 25, conference in Calgary 6/17-20, (notification Apr 21)
VISSOFT 2019, papers due Apr 26, conference in Cleveland 9/30-10/1. (notification May 26)

A Brief on Windows Unicon

I did some debugging recently, related to spaces in pathnames, and unbreaking an event monitoring bug that was Windows-specific.
I have put up a .zip with my current .exe's and libraries at http://www2.cs.uidaho.edu/~jeffery/unicon.zip. It might or might not be usable, or less broken, than the public Windows Unicon at the moment.
I will work testing it, and on a proper Windows installer at my earliest convenience
I expect to put out some more improvements, related to drawing text in the 3D facilities, in time for you to use them in homeworks

HW#3

Timing, Part 2

Last time we saw that the timing under a monitor that isn't asking for any events costs very little.
I think the times I reported were on cs-445 instead of the threadripper in my office. Mea culpa. Threadripper is about 4x faster.
Now let's look at how much it costs to monitor every single location change, a very frequent event. Check out timerloc.icn
Remember: every event report, you are doing a context switch to a different program, and back! This timer is mostly measuring that context switch time.

time ../tools/timerloc ./scramble < scramble.dat > /dev/null
tp time: 366 - -6 = 372ms
em time: 394 - 6 = 388ms
0.490u 0.881s 0:01.46 93.8%	0+0k 0+0io 2pf+0w

Wow! Is that a factor of 100x? BTW, a pthreads context switch, where the OS gets involved because you want to support true multicore or whatever, costs maybe easily another 100x.

You don't want E_Loc events unless you really want them.
We will see for other types of events, different costs proportional to event frequency
The most frequent event of all is probably E_Opcode, E_Loc pales in comparison.
The high cost of event reports is an open area of research. To work on it we'd go inside the VM runtime system (C code; outside the scope of this class).
Suppose you only want E_Loc for certain locations (e.g. breakpoints), or E_Opcode only for certain instructions?
Besides event masks, there are also value masks.
- installable, one per event code, thusfar rarely used

Now, I wonder how much evaltree costs? A past student once claimed it was "slow". I wonder why that would be...

procedure activity events are frequent.
- Not as frequent as line number changes...but then again there can be several/many calls on a line of code...
- far more so if you ask for built-ins as well as user-level
each procedure activity event report costs two co-expression switches
evaltree itself uses time building and maintaining the tree
the evaltree callback procedures (monitor application code) uses time

It would be useful to know whether the co-expression switch totally dominates the time spent in the monitor. Although our intuition says it does, intuition is not always correct. Evaltree costs: a big case statement (not very efficient in Icon/Unicon), whose labels are generators (not very efficient), whose code bodies do allocations and list operations (pretty darned fast), and call the monitor callback procedure. One way to do our experiment is to measure &time before and after each EvGet(), and instead of measuring time spent in the target program, measure the the other time, time spent in the monitor. Another way to do the experiment is to rewrite the evaltree() functionality for speed instead of clarity, and see if it is measurably different or not.

Compare evaltime.icn, evaltime2.icn, evaltime3.icn, showing an attempt to do this experiment.

time evaltime ./scramble <scramble.dat

shows

Sun Sparc, ~9/2007 Threadripper, 2/2019

tp time: 2760--10=2770 em time: 6670-0=6670 10.0u 0.0s 0:18 55% 0+0k 0+0io 0pf+0w

tp time: 56--7=63 em time: 207-7=200 0.212u 0.094s 0:00.30 100.0% 0+0k 0+0io 0pf+0w

Sun Sparc, ~9/2007	Threadripper, 2/2019
tp time: 2760--10=2770 em time: 6670-0=6670 10.0u 0.0s 0:18 55% 0+0k 0+0io 0pf+0w	tp time: 56--7=63 em time: 207-7=200 0.212u 0.094s 0:00.30 100.0% 0+0k 0+0io 0pf+0w

Using evaltree, the monitor is accounting for the vast majority of the time, and the time reported for the target program is much slower than for the unmonitored or empty monitored cases. evaltime2, which skips the evaltree mechanism but uses a big case statement, gives:

Sun Sparc, ~9/2007 Threadripper, 2/2019

tp time: 2490-0=2490 em time: 2660-0=2660 5.0u 0.0s 0:08 61% 0+0k 0+0io 0pf+0w

tp time: 55--7=62 em time: 90-7=83 0.113u 0.085s 0:00.19 100.0% 0+0k 0+0io 0pf+0w

Sun Sparc, ~9/2007	Threadripper, 2/2019
tp time: 2490-0=2490 em time: 2660-0=2660 5.0u 0.0s 0:08 61% 0+0k 0+0io 0pf+0w	tp time: 55--7=62 em time: 90-7=83 0.113u 0.085s 0:00.19 100.0% 0+0k 0+0io 0pf+0w

Cost of monitoring is substantially lower, although the particular details may be affected by machine load fluctuation. One would have to run several times and take averages for the numbers to be meaningful. Using evaltime3, which avoids the large case statement, we get

Sun Sparc, ~9/2007 Threadripper, 2/2019

tp time: 2580-0=2580 em time: 2050-0=2050 5.0u 0.0s 0:07 70% 0+0k 0+0io 0pf+0w

tp time: 60--8=68 em time: 76-8=68 0.088u 0.103s 0:00.19 94.7% 0+0k 0+0io 0pf+0w

At this point, monitoring procedure activity is seen to impact execution time substantially, but at least the monitor is taking no more time than the target program.

Sun Sparc, ~9/2007	Threadripper, 2/2019
tp time: 2580-0=2580 em time: 2050-0=2050 5.0u 0.0s 0:07 70% 0+0k 0+0io 0pf+0w	tp time: 60--8=68 em time: 76-8=68 0.088u 0.103s 0:00.19 94.7% 0+0k 0+0io 0pf+0w

Many Morals of the story:

the UNIX time(1) command is not very fine-grained or precise.
The monitoring of &time gives times in milliseconds which might or might not be reliable, they report what the C millisec() function returns.
The monitoring facilities attempt to explicitly separate the &time reported by the TP from that of the EM. My best guess is that this is imperfect, and TP is being charged for part of the co-expression time.
The coding of the EM has a (surprisingly?) large impact on the practicality of the EM. Mastering the language and coding elegantly actually matters for EM authors.
Co-expression switch time may dominate but not totally dominate timings. Griswold was fond of saying that on at least one old CPU where it was measured, the co-expression switch cost less than a procedure call in Icon. This is probably not true for us, but co-expression costs are not the only factor in performance and not always the primary factor.
The evaltree.icn module might be rewritable for much better speed. It begs to have Hani's clever case tag applied to it, and timings measured.
Icon and Unicon VM compilers need a decent case expression optimization. iconc might already do one, I am not sure.

scat

The scat program is a simple application of evaltree. You kind of have to see this one running to appreciate it, so let's try and demo it. It links in a scatterplot library which might or might not be useful to you; scatlib implements the log scaling that scat uses.

$include "evdefs.icn"
link evinit
link evaltree
link scatlib

Scat uses several global variables, three tables to remember what has been plotted, and three clones set with different colors.

global	at,   # table: sets of procedures at various locations
	call, # table: call counts
	rslt, # table: result counts
        red,
        green,
        black

Scat uses a generic evaltree-compatible record type for modeling; no extra payload added.

record activation (node, parent, children)

The initialization is straightforward.

procedure main(av)
   local mask, current_proc, L, max, i, k, child, e

   EvInit(av) | stop("can't monitor")

   scat_init()
   red := Clone(&window, "fg=red")
   green := Clone(&window, "fg=green")
   black := Clone(&window, "fg=black")

   current_proc := activation(,activation(,,,,[]),[])

Control is handed over to evaltree, which calls scat_callback with events

   evaltree(ProcMask ++ FncMask ++ E_MXevent,
	    scat_callback, activation)

   WAttrib("label=scat (finished)")
   EvTerm(&window)
end

scat_callback mostly calls scat_plot, which calls colorfor to decide what color to plot with.

procedure scat_callback(new, old)
   case &eventcode of {
      E_Pcall:
	 scat_plot(new.node, 1, 0, , colorfor)
      E_Psusp | E_Pret:
	 scat_plot(old.node, 0, 1, , colorfor)
      E_Fcall:
	 scat_plot(new.node, 1, 0, , colorfor)
      E_Fsusp | E_Fret:
	 scat_plot(old.node, 0, 1, , colorfor)
      E_MXevent: {
         case &eventvalue of {
	    "q" | "\033": stop("terminated")
	    &lpress : {
	       repeat {
	          scat_click(proced_name)
		  if Event() === &lrelease then
		     break
		  }
	       }
	    }
	 }
      }
end

Procedure proced_name returns the name of a procedure, taken from its image.

procedure proced_name(p)
   return image(p) ? {
      [ =("procedure "|"function "), tab(0) ]
      }
  stop(image(p), " is not a procedure")
end

Procedure colorofone distinguishes procedures from functions.

procedure colorofone(p)
  return if match("procedure ", image(p))
	 then red else green
end

Procedure colorfor uses a list (of procedures/functions) to select what color to plot. If it is not the first color choice and the subsequent value should be a different color, resort to black. Return a red or green if all values say to be red or all say to be green.

procedure colorfor(L)
   if *L = 0 then return &window
   every x := !L do {
      if not (/c := colorofone(x)) then
	 if colorofone(x) ~=== c then
	    return black
      }
   return c
end

What is scat good for?

scat is cooler than you think. It shows not just who the hot procedures are, it also shows what procedures always fail, what procedures generate lots of results per call, and what procedures (predicates) generate between 0 and 1 result per call.

lecture 16

Office Hours Pushback

My office hours today will start at 3pm due to my boss requesting the half hour from 2:30-3.

More Class Cancellations

I am going to ACM SIGCSE in Minneapolis February 26-March 3. We will miss a Wednesday and a Friday class that week, sorry! I will be reachable by e-mail and will try to accommodate office appointment requests via Zoom.

HW#3 Discussion

As a reminder, if you did HW#1 for C, you get to re-tool a bit for HW#3 and beyond, as we will be monitoring Unicon programs. If any of you are interested in working on building/updating my monitoring framework for C/C++ or Java after the semester, please come visit.
I tweaked one of your HW#1 solutions a bit and ran it on all the suspects, so you should have no shortage of json files to play with. Try out your HW#3 on all of them, and look for interesting images/animations.
I recommend you do as much HW#3 between now and Monday as possible, so that if you need to consult me in person, you can do it before Minneapolis.

algae

The flagship demonstration of the evaltree framework is a fairly literal visualization of the activation tree.

   EvInit(av) | stop("Can't EvInit ",av[1])
   codes := algae_init(algaeoptions)
   evaltree(codes, algae_callback, algae_activation)
   WAttrib("windowlabel=Algae: finished")
   EvTerm(&window)

Algae takes command line options to say how much to monitor, how to graphically depict the tree, etc. It deliberately chooses a simple-minded incremental graphic, coming from a time that graphic performance was deemed to be a likely monitor bottleneck. By default it uses hexagons for activation records (compare hexagons with a square grid). A real but still INCREMENTAL tree layout algorithm would be better.

procedure algae_init(algaeoptions)
   local t, position, geo, codes, i, cb, coord, e, s, x, y, m, row, column
   t := options(algaeoptions,
	   winoptions() || "P:-S+-geo:-square!-func!-scan!-op!-noproc!-step!")
   /t["L"] := "Algae"
   /t["B"] := "cyan"
   scale := \t["S"] | 12
   delete(t, "S")
   if \t["square"] then {
      spot := square_spot
      mouse := square_mouse
      }
   else {
      scale /:= 4
      spot := hex_spot
      mouse := hex_mouse
      }
   codes := cset(E_MXevent)
   if /t["noproc"] then codes ++:= ProcMask
   if \t["scan"]   then codes ++:= ScanMask
   if \t["func"]   then codes ++:= FncMask
   if \t["op"]     then codes ++:= OperMask
   if \t["step"]   then step := 1
   hotspots := table()
   &window := Visualization := optwindow(t) | stop("no window")
   numrows := (WHeight() / (scale * 4))
   numcols := (WWidth() / (scale * 4))
   wHexOutline := Color("white") # used by the hexagon library
   if /t["square"] then starthex(Color("black"))
   return codes
end

The real work happens in algae_callback()

procedure algae_callback(new, old)
   local coord, e
   initial {
      old.row := old.parent.row := 0; old.column := old.parent.column := 1
      }
   case &eventcode of {
      !CallCodes: {
	 new.column := (old.children[-2].column + 1 | computeCol(old)) | stop("eh?")
	 new.row := old.row + 1
	 new.color := Color(&eventcode)
	 spot(\old.color, old.row, old.column)
	 }
      !ReturnCodes |
      !FailCodes: spot(Color("light blue"), old.row, old.column)
      !SuspendCodes |
      !ResumeCodes: spot(old.color, old.row, old.column)
      !RemoveCodes: {
	 spot(Color("black"), old.row, old.column)
	 WFlush(Color("black"))
	 delay(100)
	 spot(Color("light blue"), old.row, old.column)
	 }
      E_MXevent: do1event(&eventvalue, new)
      }
   spot(Color("yellow"), new.row, new.column)
   coord := location(new.column, new.row)
   if \step | (\breadthbound <= new.column) | (\depthbound <= new.row) |
      \ hotspots[coord] then {
      step := &null
      WAttrib("windowlabel=Algae stopped: (s)tep (c)ont ( )clear ")
      while e := Event() do
	 if do1event(e, new) then break
      WAttrib("windowlabel=Algae")
      if \ hotspots[coord] then spot(Color("light blue"), new.row, new.column)
      }
end

Boring square graphics:

procedure square_spot(w, row, column)
   FillRectangle(w, (column - 1) * scale, (row - 1) * scale, scale, scale)
end

# encode a location value (base 1) for a given x and y pixel
procedure square_mouse(y, x)
   return location(x / scale + 1, y / scale + 1)
end

A whole new meaning for the term "graphical breakpoints":

#
# setspot() sets a breakpoint at (x,y) and marks it orange
#
procedure setspot(loc)
   hotspots[loc] := loc
   y := vertical(loc)
   x := horizontal(loc)
   spot(Color("orange"), y, x)
end

#
# clearspot() removes a "breakpoint" at (x,y)
#
procedure clearspot(spot)
   local s2, x2, y2
   hotspots[spot] := &null
   y := vertical(spot)
   x := horizontal(spot)
   every s2 := \!hotspots do {
      x2 := horizontal(s2)
      y2 := vertical(s2)
   }
   spot(Visualization, y, x)
end

User input handling:

#
# do1event() processes a single user input event.
#
procedure do1event(e, new)
   local m, xbound, ybound, row, column, x, y, s
   case e of {
      "q" |
      "\e": stop("Program execution terminated by user request")
      "s": { # execute a single step
	 step := 1
	 return
	 }
      "C": { # clear a single break point
	 clearspot(location(new.column, new.row))
	 return
	 }
      " ": { # space character: clear all break points
	 if \depthbound then {
	    every y := 1 to numcols do {
	       if not who_is_at(depthbound, y, new) then
		  spot(Visualization, depthbound, y)
	       }
	    }
	 if \breadthbound then {
	    every x := 1 to numrows do {
	       if not who_is_at(x, breadthbound, new) then
		  spot(Visualization, x, breadthbound)
	       }
	    }
	 every s := \!hotspots do {
	    x := horizontal(s)
	    y := vertical(s)
	    spot(Visualization, y, x)
	    }
	 hotspots := table()
	 depthbound := breadthbound := &null
	 return
	 }
      &mpress | &mdrag: { # middle button: set bound box break lines
	 if m := mouse(&y, &x) then {
	    row := vertical(m)
	    column := horizontal(m)
	    if \depthbound then {       # erase previous bounding box, if any
	       every spot(Visualization, depthbound, 1 to breadthbound)
	       every spot(Visualization, 1 to depthbound, breadthbound)
	       }
	    depthbound := row
	    breadthbound := column
	    #
	    # draw new bounding box
	    #
	    every x := 1 to breadthbound do {
	       if not who_is_at(depthbound, x, new) then
		  spot(Color("orange"), depthbound, x)
	       }
	    every y := 1 to depthbound - 1 do {
	       if not who_is_at(y, breadthbound, new) then
		  spot(Color("orange"), y, breadthbound)
	       }
	    }
	 }
      &lpress | &ldrag: { # left button: toggle single cell breakpoint
	 if m := mouse(&y, &x) then {
	    xbound := horizontal(m)
	    ybound := vertical(m)
	    if hotspots[m] === m then
	       clearspot(m)
	    else
	       setspot(m)
	    }
	 }
      &rpress | &rdrag: { # right button: report node at mouse loc.
	 if m := mouse(&y, &x) then {
	    column := horizontal(m)
	    row := vertical(m)
	    if p := who_is_at(row, column, new) then
	       WAttrib("windowlabel=Algae " || image(p.node))
	    }
	 }
      }
end

Calculating which activation a given click refers to:

#
# who_is_at() - find the activation tree node at a given (row, column) location
#
procedure who_is_at(row, col, node)
   while node.row > 1 & \node.parent do
      node := node.parent
   return sub_who(row, col, node)		# search children
end

#
# sub_who() - recursive search for the tree node at (row, column)
#
procedure sub_who(row, column, p)
   local k
   if p.column === column & p.row === row then return p
   else {
      every k := !p.children do
	 if q := sub_who(row, column, k) then return q
      }
end

A similar calculation for placing new nodes

#
# computeCol() - determine the correct column for a new child of a node.
#
procedure computeCol(parent)
   local col, x, node
   node := parent
   while \node.row > 1 do	# find root
      node := \node.parent
   if node === parent then return parent.column
   if col := subcompute(node, parent.row + 1) then {
      return max(col, parent.column)
      }
   else return parent.column
end

#
# subcompute() - recursive search for the leftmost tree node at depth row
#
procedure subcompute(node, row)
   # check this level for correct depth
   if \node.row = row then return node.column + 1
   # search children from right to left
   return subcompute(node.children[*node.children to 1 by -1], row)
end

How to use Clone()

#
# Color(s) - return a binding of &window with foreground color s;
#  allocate at most one binding per color.
#
procedure Color(s)
  static t, magenta
  initial {
     magenta := Clone(&window, "fg=magenta") | stop("no magenta")
     t := table()
     /t[E_Fcall] := Clone(&window, "fg=red") | stop("no red")
     /t[E_Ocall] := Clone(&window, "fg=chocolate") | stop("no chocolate")
     /t[E_Snew] :=  Clone(&window, "fg=purple") | stop("no purple")
     }
  if *s > 1 then
     / t[s] := Clone(&window, "fg=" || s) | stop("no ",image(s))
  else
     / t[s] := magenta
  return t[s]
end

Graphic Design(s) of the Day

Consider the Tukeys' Multiwindow- and Box-Plots on the left, and Tufte's Data-ink maximization on the right.

lecture 17

HW#2 Feedback

vague HW spec? That is on Dr. J
bugs in Unicon? That is on Dr. J
Still, to give your program proper credit I want to see it work on a variety of inputs.
I tested with: hirose.json, poem.json, and typeinfer.json. More broadly I might want to run it on any suspects/*.json
Found a bug thanks to typeinfer.json: .dot format has reserved words that we have to avoid! Like "node" for example. What all should we mangle in translating .json to .dot format?
Found a bug with open(cmd, "p"), when "dot.exe" was on the path in a directory with spaces in its names, things did not go well. Fixed.
I will accept one fixed/improved resubmit of HW#2 for regrading purposes.

Windows Users' Notes

Windows binary distribution for this class will live at http://www2.cs.uidaho.edu/~jeffery/setup-unicon-13.1.exe
I built a new one with the above fixes, and posted it at 1:20pm today before class.
If you try it, please report successes/failures/quirks.

Reading

Discussion of "Overview of 3D Software Visualization"

What was your favorite visualization idea from this paper?
Define the three types of cognitive processes that visualization supports:
- Exploratory/Discovery
- Analytical/Decision-making
- Descriptive/Explanation
What is the difference between scientific visualization and information visualization?

What are the main arguments as to whether to use 2D vs. 3D?

	2D	3D
Pro	fast low CPU cheap ubiquitous trained from infancy	density composition perception familiarity
Con	scalability challenges ??	compute intensive hard to program hard to do 3DUI occlusion

I dismiss "Augmented 2D" views out of hand. What about Adapted 2D views, are they useful? What about combined 2D+3D presentations?
Space metaphor: HUGE potential. See Figure 4.
Which was your favorite 3D layout? Which ones pose questions?
Which was your favorite 3D tree layout? Which ones pose questions?
What did you think of the circular/spiral depictions of call stacks?

GUI Monitors

Unicon has a GUI class library, written by Robert Parlett
big, feature rich, inspired-by-Java
GUI programming in Unicon has a learning curve comparable to GUI programming in other languages.
Homework #4 probably will include adding a GUI to your work

Step #1 in GUI exploration is usually to get familiar with the interface builder program; in our case that is IVIB. (Demo of IVIB goes here). IVIB generates code that looks like this.

The 70-line application creates a dialog and calls show_modal()
For a normal GUI app you then fill in the method bodies for whatever events you've requested.
For most applications, it is not necessary to understand much of the scaffolding and large classes you inherit behavior from.
Unicon Technical Report UTR#6 teaches the IVIB basics.

IVIB let's you draw a GUI and generates the code for you. For a program execution monitor the main question will be: how to merge the event streams, or how to merge the event processing loops, from the GUI and from the monitored program's events. To accomplish this, you need to know more about the underlying GUI classes.

There are 3 classes that most Unicon GUI programmers need to become semi-comfortable with:

Component: Component is superclass of all basic visible GUI elements in an application: buttons, sliders, lists, editable text boxes, and so on. Components are generally organized hierarchically -- they form a tree in Venn diagram style, with larger background components containing smaller more active components.
Dialog: A Dialog is a component that constitutes the root of some window -- it owns a window and therefor can receive input events, which it then needs to route down the tree to the correct leaf.
Dispatcher: The Dispatcher class handles the actual event-processing loop, allowing for multiple dialogs, and wall-clock time events in addition to GUI events.

In order to merge the Monitor and GUI event streams, we might do one of the following:

keep the monitor event loop primary, and poll for GUI events (!)
keep the GUI event loop primary, and peridically read monitor events (!)

There is no way to select() from between GUI and monitor or poll both, because to ask for an EvGet() is to transfer control to the target program (freezing the GUI of the monitor until an event occurs). However, you can call EvGet() with an E_Tick along with your other events if you want to be sure to regain control periodically even if the other monitored events do not occur for long periods... then your only danger is: what if the target program that you are monitoring chooses to block on some input it wants to read?

Additional notes on GUI-monitors:

"piano.icn" had been doing its own input event processing, with E_MXevent at the top level monitor loop and nested loops calling Event() whenever a "breakpoint" was in place.
can't call Event() cavalierly on your own in the middle of your app -- or GUI won't respond any more. GUI owns input processing, and calls you when a component gets an event.
how does one "pause" or "single step" in a GUI environment? GUIs are not allowed to freeze. You cannot call EvGet(E_MXevent) to freeze the program; while paused, do not call EvGet() at all.

lecture 18

Monitoring Memory Allocation and Garbage Collection

Heap memory allocation is an important form of behavior that we can monitor.
Allocations in Icon/Unicon are "as cheap as possible", but in many programs they play a major role in performance
Sometimes code does lots of allocation by accident, or does far more memory allocation than is needed for a problem.
Garbage collection is usually pretty fast -- we don't usually go for coffee when the GC message hits the console, like old Lispers -- but...
if a program is garbage collecting continually (thrashing) it can significantly impact performance. This would generally be because a program is allocating excessively.
How can we tell whether allocation is excessive, or garbage collection seems too frequent?

Allocation and Collection Events

one memory allocation event for each built-in type.
&eventcode gives the type of memory allocated
&eventvalue gives the number of bytes.
Garbage collection is an E_Collect. IF in response to an E_Collect, the next EvGet() includes a request for an E_EndCollect, it can also request "reallocation" events: allocation events that list the types and sizes of memory that was found to be live after the collection.
It is fair to say that the heap situation is a little more complex now than when these events were engineered: these events don't account for multiple heaps, a.k.a. regions. Region events are a missing piece.

Mempie

See mempie.icn

More memory monitors: mini-memmon and nova

Check out mmm, nova and oldnova. You should look at them as unfinished prototypes.

Griswold's claim examined

Ralph Griswold liked to claim that co-expression activations were about the same speed as procedure calls in Icon... and this matters a lot for execution monitors based on co-expressions, so I re-examined this claim with the following program:

procedure main()
   t1 := &time
   every i := 1 to 10000000 do p()
   write("10000000 calls: ", &time - t1)
   ce := create |1
   t2 := &time
   every i := 1 to 10000000 do @ce
   write("10000000 @: ", &time - t2)
end

procedure p()
   return 1
end

The results (on Linux x86_64) seem to suggest that co-expression activations are quite cheap, only 25% slower than procedure calls

10000000 calls: 6210
10000000 @: 7920

Synchronous threads are a lot cheaper than true concurrent threads! Playing with a mac implementation earlier this semester, I plugged in a pthreads-based co-expression switch available from the current Icon language implementation, and it was an order of magnitude slower...

lecture 19

Discussion of Last Week's Reading

Communicating Software Architecture using a Unified Single-View Visualization

Just as a reminder for this metaphor:

building==function, texture/color=> LOC
"city"==blue plate
"pillar"==class definition
"water tower" sphere==header file
green "landscape" == directory

big shots

Tell me what you know about LLNL. They might need their visualizations to work on the hardest real-world (very large, complex, C/C++) programs

"single view"

their argument for the city metaphor is to visualize multiple aspects about a program, for multiple stakeholders with differing roles and concerns, so that they will all be able to see the same thing and communicate effectively with each other over the shared artifact.

"static and dynamic"

they recognize the need for information based on program runs, not just code. Dynamic info consists of whatever gprof will tell them. Static info includes standard software engineering metrics: lines of code, cyclomatic complexity, and various safety static analysis checker outputs. They do not do, but anticipate the value of, incorporating repository log information used in others' city metaphor visualization research.

"source level" vs. "middle level" vs. "architectural level"

"source level" == fine grained; debuggers, profilers etc.
"middle level" == problem-specific, tailored visualizations.
"architectural level" == summarized, abstracted views of whole system

multi-graph mindset

function call graph sure, but instead of visualizing one big multi-purpose graph they see it as a "union of graphs": class call graph, class contents graph, class inherits graph, file call graph, file contents graph, directory contents graph...

"Representing Development History in Software Cities"

requirements

support orientation and spatial memory
multiple levels of detail
different development stages; uniform, consistent representation
robus against small changes
integrate many different data
scale to millions of lines of code

Evo-Streets

If you are going to do cities and maps, adopt techniques of cartographers

Primary, secondary, and tertiary models

Primary == original collected data. Secondary == all aspects of primary that might ever be drawn together on a map. Tertiary == specific aspects (selections, projections, coloring, symbols, legends...) for a single view

Layout based on four things

code hierarchy, elements' types and sizes, (multiple types of) dependencies, and development time(stamps)

Reading Assignment

Visualization of the static aspects of software: a survey IEEE Trans. Vis. Comput. Graph. 17(7):913-933 (2011).

Monitoring String Scanning

Icon's string scanning control structure has a very natural depiction, that of a progress bar or pointer working its way through a string. Issues include: how to abstract/scale a very large number of operations, how to depict backtracking, how to depict nested scanning environments (which might or might not involve analysis of a substring of the enclosing scanning environment).

Some programs use scanning a lot -- they are mostly string scanning -- and others do not use it at all.

The ScanMask events are shown in the table below. E_Spos events are the most frequent. Compared with procedure activity events, there appears to maybe be one missing. Which one is it? Is it a problem?

code value description
E_Snew create/enter a new scanning environment
E_Sfail fail/exit a scanning environment
E_Spos move the string scanning position
E_Ssusp suspend a result from a scanning environment
E_Sresum resume a suspended scanning environment
E_Srem remove a never-to-be-resumed scanning environment
May God bless richly the team that goes

code	value	description
`E_Snew`		create/enter a new scanning environment
`E_Sfail`		fail/exit a scanning environment
`E_Spos`		move the string scanning position
`E_Ssusp`		suspend a result from a scanning environment
`E_Sresum`		resume a suspended scanning environment
`E_Srem`		remove a never-to-be-resumed scanning environment

For what its worth, evaltree() can model scanning environments just like it does procedure call activity. It can also model built-in functions and operators; all expressions can be modelled as call/ret/susp/resum/fail/rem

Now for a deep-thought question: what kinds of graphic depiction emphasizing what kinds of behavior would make for a genuinely useful string scanning visualization?

Monitoring Structures and Variable References

The monitoring framework has fairly thorough instrumentation for the built-in data structures of the language -- lists, tables, records and sets. These one-level structures all support implicit reference semantics, are routinely composed into big multi-level structures such as trees and graphs.

lecture 20

A Simple List Visualizer

What we learn from the simple list visualizer, lst.icn:

basic events for list construction, shape changes, and accesses.
lists are highly variable in size, frequency of access, and frequency of structural change
many lists are complex structures almost entirely unnoticed by a tool that visualizes all lists as arrays.
many or most lists are really just internal glue (non-root)
many lists are uninteresting, there should probably be a threshold beneath which no screen space is allocated (what should an empty list look like?)

The Structure Spy

What we learn from the structure spy

It is quite possible to infer structures from provided events
Many programs will have 1-2 huge structures and dozens or hundreds of small ones.
Within a software city, where should structures live?

Unicon 3D Graphics Facilities

Design goals:

enough 3D functionality to write games and virtual environments
simplicity, ease of use -- reduce OpenGL 250+ functions down to ~30, don't require the programmer to learn lots of new types
consistency with Unicon and with Unicon's 2D facilities

3D Windows:

  W := open("win","gl")

3D Coordinate System

Not pixels
real numbers using any units you choose
x,y are traditional cartesian
z is coming out of the screen at you

Camera and viewing Frustum

The scene is viewed from a particular (x,y,z) that is looking at a particular (x₂,y₂,z₂). There is also a question of what direction is "up" from the point of view of the camera, given as a vector but equivalent to specifying what angle the camera is at on the vector between the position and direction.

Drawing Primitives

Originally I thought these would be defining things about the 3D facilities, they are mostly built-in to opengl, although some are in the opengl utilities (glu) library.

cube
point, line, line segment
sphere, torus, cylinder, disk, partial disk
polygon

Most 3D applications, once they acquire a certain level of sophistication, probably don't need all these primitives, they just use FillPolygon with lots of little triangles specified via large data structures called 3D models.

Transformations

scaling, rotation, and translation are applied to all coordinates on all drawing primitives via transformation matrices
Unicon has Translate(), Scale(), etc.
historically opengl maintains matrix stacks to do these hierarchically.
Actually two of them, one for the objects/data and one for the camera, but just barely.
Unicon has PushMatrix(), PopMatrix(), etc.
If used, these operations are so common that, e.g. PushTranslate() is a combined PushMatrix() and Translate()
once one goes to 3D models...the need for matrix stacks goes away

Lighting, Materials

ambient, diffuse, specular
Unicon has light0 ... light7 as graphics context attributes, they are on or off, and if on, have each light type, plus a position
objects can either be drawn with a material (which defines how it responds to the different lighting types, plus have emissive light) OR a texture (which by default won't respond to lighting at all) OR a blend of both.

This is an example of an area where things are far more complicated than a non-specialist programmer would want to deal with. Unicon tries to have sensible default behaviors.

Textures

Important, especially in more serious 3D such as games.

texture: 2D image whose contents are used to paint the pixels of a 3D primitive
texture coordinates: (u,v) in the texture image normalized to Cartesian 0.0-1.0. Actually, they wrap around so a texture coordinate of 2.5 says to repeat a texture two and a half times in that direction.

Unicon turns on texture mapping via WAttrib("texmode=on"). Texture coordinates are supplied via Texcoord(u1,v1,...) which must correspond in 1:1 relationship to vertices in a subsequent primitive, e.g. FillPolygon(x1,y1,z1, ...). There is also a WAttrib("texcoord=auto") which might be needed in order e.g. to map textures onto spheres, tori, etc.

lecture 21

Try Again with Lst and Nova Demos

3D Examples

Miscellaneous Other 3D Facilities:

We might need to talk about various extra features in future lectures. They are listed here so we can know to bring them up.

blending texture and foreground/material property
"buffered 3D mode"
WSection
JPEG textures, preliminary PNG support (on Linux)
dynamic textures
preliminary transparency support
meshmode attribute for FillPolygon
slices and rings attributes for changing the cost and precision of spheres and cylinders
subwindows
freetype fonts (needs further test-and-port work)
tr := Texture(); ...; Texture(tr) to re-use a texture

Mesh modes

These values determine how lists of vertices are interpreted by OpenGL. There is an attribute meshmode, set via WAttrib(w, "meshmode=value") where the legal values are

points
lines
linestrip
lineloop
triangles
trianglefan
trianglestrip
quads
quadstrip
polygon

However, in a trivial test, the mesh modes did not work! They probably did for the grad student who implemented them... but without a working test/demo they remain undocumented/unfinished business. Minimally, you might expect that I'll have to put out some fixed Unicon sources and/or binaries for you before these will work. You are welcome to try them and find out of things are better than I report.

Transparency

This feature of OpenGL determines to what extent light can go through a substance, or to what extent objects behind it can be seen through it. Color names, set via Fg(color) or WAttrib(w, "fg=value") can include a diapheneity. The legal transparency adjectives are

transparent
subtransparent
translucent
subtranslucent
opaque

This feature is implemented. In a trivial test it appears to work. However, in testing it a seeming bug was identified in the color attributes: when you set the fg= attribute with a simple color it sets the diffuse value for that material property but apparently does not reset or disable the other lighting colors (specular, ambient, emission), which may give surprising results. Also: it is not clear that transparency works correctly on all primitives yet; for example, the last time I checked, either cubes or maybe filled polygons looked not as transparent as they ought, because backfacing polygons weren't transparent.

mKE/mKR: the Largest Publically Available Unicon Program

It has its own website. It is a knowledge representation engine with its own knowledge representation language built-in. It is something like 50K LOC. Let's study it.

lecture 22

Reading Assignment

SynchroVis: 3D Visualization of Monitoring Traces in the City Metaphor for Analyzing Concurrency, by Waller, et al. In 2013 First IEEE Working Conference on Software Visualization.

Discussion of Visualization of the static aspects of software: a survey

static aspects == properties of program code itself, rather than a program's run-time execution behavior
some static aspects can be extracted from binaries, but many are about source code
crudest source code visualizations might map each text char to one pixel; color code with control structure or whatever. Although source code is perhaps the finest level of granularity, these tools have been used to help study and navigate within very large programs. Source code visualizations have been integrated into IDE's, yes? Which IDE's?
class internals visualizations seem to be an important medium-granularity visualization whose goal is to convey the structure of a whole class. Which methods call which other methods, which methods access which class variables, and how much of this class is dependent on which other (super)classes might all be vital information that can be depicted. It is interesting to compare these visualizations with the types of diagrams that software engineers develop when designing the software in the first place: to what extent are these just reverse engineering of software design information? My gut tells me that this area is under-researched.
architecture visualization describes coarse-grained depictions of entire software systems. A lot of interesting visualizations have been developed for this category.
Solar System Metaphor: star == package, planet==class, etc.
Do you visualize the classes, or the relationships between the classes?

lecture 23

Announcements

No class next Wednesday, I am going to PNNL in Richland Washington for a discussion of education/research
The HW#4 due date is pushed back a couple days. It is posted on bblearn.
Graphics facilities trouble? I am in-process on looking at some misbehaving 3D programs, using valgrind on Linux and DrMemory on Windows. I welcome additional example programs where the graphics crashes or surprises you.

Semester Project Topic Ideas

The perfect semester project would be a tool that...

is actually potentially useful to someone
is usable on any (Unicon) program; is useful on programs having some common property X
does some actual analysis of the events to extract higher level semantic information
is scalable; can be run on at least medium sized programs, and preferably large ones
depicts information in a way that is easily and rapidly interpreted correctly by ordinary humans; contains legends or axes or metaphors or a help system that enables users to understand what they are looking at
Team projects are welcome. They should be more ambitious.

Where to get your ideas:

I am still charging towards the theme of: using dynamic information to populate/animate a 3D city (or other 3D space) that reflects the static structure of a program. Projects that relate to, or contribute a small step towards this goal, are especially welcome. However, you can do whatever you think will be most interesting/useful.
Previous homeowork assignments suggested many possible projects that looked interesting but were too hard to attempt as a HW
Your own intuitions about what ought to be possible to visualize
Your readings of the research papers

Monitoring Variable References

Variable use is arguably one of the most important aspects of program behavior, but it is easily overlooked.

Some programs primarily use the stack (i.e. local variables)
Others primarily use the heap (especially, e.g. OOP programs)
Historically many programs use primarily static / global data, particularly real-time, embedded and other performance-focused systems.

What do we want to know about variables?

What proportion of data is static/global, stack, or heap? How can these be measured?
What data type variables hold; whether they ever change type
Actual Scope: From where-all are variables read? From where-all are they assigned?
Lifetime: for any variable, is it short-lived, medium, or long-lived? (Can you define what is a short- or long-lived variable?)
Frequency: for any variable, is it heavily referenced? Are its references in bunches, or relatively distributed across execution?
Dependence: are some variables aliases for data held under other, primary names? Are they pointers into the middle of a larger structure, e.g. for traversal?

Unicon Variable Events

We can start with E_Assign and E_Value, the two events associated with assignment operators such as :=

E_Assign

This event's &eventvalue gives the variable name, plus a one-letter suffix indicating scope:

Code	Scope
+	global
:	static
-	local
^	parameter

E_Value

This event, after the assignment, tells you the value that was assigned.

gnames

Gnames shows you all your global data; variable names are written out, color coded by their type. If you click on a variable name, up pops a window showing that variable's details. Bugs and limitations:

gnames should continue to support interaction after a program terminates, so you can view variable state posthumously.
gnames should (maybe) issue a breakpoint if a non-null variable changes type.
gnames should (maybe) highlight variable assignment and dereferencing, for example flashing black (or white) for a brief time

vars

vars is a local variable visualizer, it shows each activation record in a manner similar to gnames. There is a strong scalability limit here which vars does not solve; some programs it depicts well, others it does not. It is more proof of concept/demonstration than finished and working tool. Also, at present it has bad bitrot.

assignments to structure types

Consider the following program

procedure main()
  L := list(3)
  L[2] := "hello"
end

What does assigning to L[2] look like? The events program shows the E_Assign for a structure reference does not look the same as an assignment to the variable itself:

E_Ocall       operator call                      function []
E_Deref       dereference                        L-main
E_Lref        list reference                     list_1(3)
E_Lsub        list subscript                     2
E_Oret        operator return                    &null
E_Opcode      virtual-machine instruction        Str
E_Literal     literal reference                  hello
E_Loc         location change                    3:8
E_Opcode      virtual-machine instruction        Asgn
E_Ocall       operator call                      function :=
E_Assign      assignment                         list_1[2]
E_Value       value assigned                     hello
E_Oret        operator return                    hello

Under the Covers of the evinit library

EvInit(av) and EvGet(mask) are not always entirely what they seem.

They live in evinit.icn
They can allow multiple monitors to share the observation of a program execution, which we will discuss in detail.
EvInit() checks if the monitor's &eventsource is already initialized (by a parent monitor who could pre-assign the value of &eventsource)
if so, it does not load anything, it just requests events from its &eventsource

We might want to develop a similar architecture for windows. Monitors that use 2D or 3D graphics might want to check and see if their &window is already set. If so, just draw to it instead of opening a new window. This would allow a GUI for a debugger or multi-visualization tool to allow independently-compiled visualizations to "plug in". Of course, for it to work well, such a model would need to cover how to handle window resizing, and how to handle input by various tools. Subwindows, and subwindow resizing, are more or less adequate to this task.

lecture 24

NKN Data Science Competition

has a $1k prize
has a data visualization category
has a due date of April 16th
"effectiveness, creativity, relevance to the state of Idaho"

On Improving the performance of Unicon 3D

opengl performance depends violently on (a)hardware and (b)drivers
if your Unicon 3d program is too slow, what do you do? Options include
draw fewer triangles
use fewer or smaller textures
on integrated graphics the above constraints are pretty serious
don't do lots of bytecode interpretation at runtime, setup the scene and then let the C code just Refresh() or Eye() each frame
it is sometimes possible to just use better hardware, or get better drivers and achieve goals without making your software so clever

Your program could be CPU bound. Or it could be GPU bound. Or it could be I/O bound e.g. on network traffic. Or in our case, it could be "TP bound", i.e. spending most of its time in the target program and/or monitoring context switch costs. Optimizing the wrong thing might not help much.

Unicon 3D Display List Management

OpenGL has an internal concept of a display list that is more or less a way to record a sequence of OpenGL calls in a data structure and repeat them with high performance.
Unicon's display list is not an OpenGL display list, it is a Unicon list
Unicon's display list is discussed in UTR9 section 3.7 and the corresponding place in the Unicon book.
Unicon's display list records an entry for each state-changing operation on a 3d canvas.
As a Unicon list, it is fully manipulable by the application program.
The display list contains a mixture of (sub)list and record entries, mostly lists for variable-length things and records for things with a fixed set of fields.
For drawing primitives, the function generally returns the display list entry. For other functions (e.g. that return a string result) you can get the display list entry by asking for WindowContents()[-1]

Cheesey (incomplete and buggy) UTR9 example:

sphere := DrawSphere(w, x, y, z, r)
increment := 0.2
every i := 1 to 100 do {
   every j := 1 to 100 do {
      sphere.y +:= increment
      Refresh(w)
      }
   }

What would this look like if it were changing the color of a sphere, instead of changing its y coordinate? Setting the foreground color generates a display list entry that is itself a list. For a simple foreground color setting (one that only sets the diffuse property) it is a list of 7 elements*: the string "Fg", the integer code 160 that correponds to a fgcolor setting, the string "diffuse" that indicates what color property is being set, and then four 16-bit unsigned values that give the RGBA color setting.

*The current color-setting display list entry format might get turned into a record type so we can use field names instead of L[4] etc. but for now it is a list.

The following example gives a sphere that bounces and changes its colors randomly between red, white, and blue each frame:

procedure main()
   &window := open("win","gl","size=800,800","bg=black")
   colors := [[65535,0,0],[65535,65535,65535],[0,0,65535]]
   Fg("blue")
   spherecolor := WindowContents()[-1] # fg=most recent display list entry

   sphere := DrawSphere(0, 0, -50, 2)
   increment := 0.2
   every i := 1 to 100 do {
      every j := 1 to 100 do {
         sphere.y +:= increment
         c := ?colors
         spherecolor[4] := c[1]
         spherecolor[5] := c[2]
         spherecolor[6] := c[3]
         Refresh()
         }
      increment *:= -1
      }
   Event()
end

On Drawing Text on 3D Windows

It is bluntly embarrassing to me that text in 3D is even an issue
OpenGL doesn't provide a built-in text rendering capability
Brute Force Version 1 (e-mail last weekend) used a cube instead of just a rectangle. Pro: potentially readable from all sides. Con: doesn't scale super-well. 6x slowdown? How much trigonometry would it take to rotate the text so it always faced the camera?
HUD vs. in-world text labels: text rendered as a texture is generally going to be fuzzy (big problem on e.g. Oculus Rift) and/or pixelated. Might want to calculate size/position and then render in 2D, if your API supports that.
Lots of games exhibit good solutions, but fewer open source libraries that run on all major platforms than you might expect.
Unicon's 3D DrawString() depends on Freetype (font rendering engine) and (on XWindows: Xft) and FTGL (obscure C++ freetype-for-OpenGL library).
Windows has been an ongoing portability problem for FTGL. Libraries have been built for specific windows compiler versions but are never just "there" for current Mingw gcc. FTGL and similar are kind of hard to build, and semi-abandonware. FTGL was abandoned by original developer on sourceforge, but picked up by someone else now on github (last commit in 2018).
Alternative libraries that I could try: GLText (last updated 2003), QuesoGLC (last updated 2011). I don't know of one that would prove easier to support consistently on Windows. Feel free to find me one.
Or we could roll our own

Arbitrary DrawStrings from a Single Texture Load (duh)

Instead of allocating a separate texture for each text label (bad)...
Allocate one texture for an entire font
Render each letter in a single "quad" (hardware will use 2 triangles)
Figure out how to render only the fg pixels, bg as transparent
Today we will get started

text.icn

lecture 25

Homework #4 Due Date Change

The HW#4 Due Date is further extended to 4/1/11:59pm.
I would like you to make your HW#4 a bit more polished than previous homeworks. Make it as flexible/general/good as you can manage by then. Try it on a lot of suspects. Report bugs.
To maximize your points, you can demo it for minutes or so and receive feedback, by volunteering, on either Friday or Monday in class. Demoing is optional. Slots are first-come first-serve; you may request a spot by e-mail in advance.
If you show your tool, plan to ask and answer basic questions like: how well does your tool scale to larger programs? How do you manage to make labels legible? How do you insert delays or otherwise scale time so that the animation/activity is legible?

Reading Assignment

One of these two is very short, while one is a regular full conference paper.

Searching and Exploring Software Repositories in Virtual Reality, by Vincur, Polasek, and Navrat. In Proceedings of the 23rd ACM Symposium on Virtual Reality Software and Technology, VRST 2017.
VR City: Software Analysis in a Virtual Reality Environment, by Vincur, Navrat, and Polasek. In IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C), 2017.

Discussion of SynchroVis: 3D Visualization of Monitoring Traces in the City Metaphor

This was an extremely short paper you were assigned to read this past week.

objective: visualize concurrent behavior (dynamic, from traces)
they talk about "simultaneously visualizing" static and dynamic properties
different city metaphor: each instance gets its own "storey" -- not scalable.
"streets" are lines drawn whenever instance A calls instance B. -- utter annihilation of the city metaphor
color coded threads, created as instances from a special "thread" building in a separate neighbhorhood. Similar separate building for semaphores, each semaphor is a storey.
roof-top arrows depict static relations: black==inheritance, gray==implements, white=="other"

Monitor Coordinators

Basic premise:

Unicon's monitoring facilities are intended to reduce the difficulty of writing monitors.
Monitors are easier to write if they are simpler and smaller, and look for specific behaviors.
But, we want to be able to monitor several aspects of behavior for a given execution, and
potentially we want to look for interactions between behaviors.

A monitor coordinator is a monitor that hosts the execution of the target program under the observation of multiple monitors.

Eve

The reference implementation monitor coordinator is called Eve (eve.icn). Eve is one of the last remaining "old Icon GUI" programs, and needs to be rewritten using the modern GUI class library.

Eve configuration

Eve reads in a list of monitors from a ~/.eve file in the format:
"title" command line

For example:

"Line Number Monitor" /home/jeffery/tools/piano
"UFO" /home/jeffery/tools/ufo
"Algae" /home/jeffery/tools/algae
"Big Algae" /home/jeffery/tools/algae -func -op -step -S 48
"Memory bar chart" /home/jeffery/tools/barmem
"Global variables" /home/jeffery/tools/gnames
"Local Variables" /home/jeffery/tools/vars
"Lists" /home/jeffery/tools/tinylist
"Minimemmon" /home/jeffery/tools/mmm
"Miniloc" /home/jeffery/tools/miniloc
"Scat" /home/jeffery/tools/scat
"String scanner" /home/jeffery/tools/ss

From this datafile, eve draws an opening window that allows selection of which monitors you want to run (selectEMs).

Eve's Global State

unioncset: cset mask that is union of all monitor masks
EventCodeTable: table of lists; keys are event codes, values are "list of interested monitors"

Monitor State

This "class" holds eve's knowledge about the monitors it loads. "prog" is the actual loaded program (a co-expression value), while "mask" is the program's event mask -- what it returned from its last EvGet().

record client_rec(name, args, eveRow, prog, state, mask, enabled)
#
# client() - create and initialize a client_rec.
#
procedure client(args[])
   local self
   self := client_rec ! args
   if /self.name then stop("empty client?")
   self.prog := load(self.name, self.args) | stop("can't load ", image(self.name))
   variable("&eventsource", self.prog) := &current | stop("no EventSource?")
   variable("Monitored", self.prog) := &eventsource | stop("no Monitored?")
   /self.state := "Running"
   /self.mask := ''
   /self.enabled := E_Enable
   return self
end

Initialization

After selecting monitors to run, eve has to load them all, and then activate them all, running them up until their first EvGet() call. Their EvInit's will be disabled by eve's having already set their &eventsource. After their first EvGet() call, eve registers them on the "list of interested monitors" for each of the event codes in their mask.

   every i := 1 to *clients do
      clients[i].mask := @ clients[i].prog

Event Forwarding

EvSend(code, value, recipient) - sends a monitoring framework event, where code defaults to &eventcode and value defaults to &eventvalue. Note that EvSend() allows any value to be sent, not just what the EM requested in its event mask, and not even limited to 1-letter string codes.

Eve's Main Loop

procedure mainLoop()
   while EvGet(unioncset) do {
      #
      # Call Eve's own handler for this event, if there is one.
      #
      (\ EveHandlers[&eventcode]) ()
      #
      # Forward the event to those EM's that want it.
      #
      every monitor := !EventCodeTable[&eventcode] do
	 if C := EvSend( , , monitor.prog) then {
	    if C ~=== monitor.mask then {
	       while type(C) ~== "cset" do {
		  if C === "abort" then fail
		  #
		  # The EM has raised a signal; pass it on, then
		  # return to the client to get his next event request.
		  #
		  broadcast(C, monitor)
		  if not (C := EvSend( , , monitor.prog)) then {
		     unschedule(monitor)
		     break next
		     }
		  }
	       if monitor.mask ~===:= C then
		  computeUnionMask()
	       }
	    }
	 else {
	    unschedule(monitor)
	    }
      delay(6 < delayval)
      }
end

lecture 26

Brainstorm with me on "3d Monitor Coordinators"

What would it take for us to see/share all your visualizations in the same 3D window, from separate monitors? What would a 3D monitor coordinator need to do?

Unicon City: a Brief Discussion

Want:

ability to generate a 3D city from arbitrary software repository, such as project(s) on Github or sourceforge
static information via directory traversal of many source files, scaled up a bit from your earlier homework
highly dynamic information via monitoring facilities
potential for semi-dynamic information of at least two types:
- revisions to code repository, slow code change over time
- multiple program executions, such as post-mortem traces/logfiles
ideally, would work on many popular languages
- initially, Unicon (duh), plus:
- option #1: replicate monitoring facilities, etc.
  Labor intensive, becomes "development" instead of research
- option #2: implement (subsets of) other languages in Unicon.
  Very viable for education context, e.g. CS 120-121 subsets; maybe not in other contexts

Some Code Prototypes:

Layout in 3D

I haven't converted to 3D yet, so the following are open to your suggestions and/or better ideas.

create 1+ prototype "box" building textures
layout 1+ "levels" for different languages used. In Unicon, this is 2, or 3 (Unicon, C, and in the middle RTL).
distribute files along streets
distribute classes/functions within files.
output in a 3D model file format. Initially, for example, CVE .dat file format. This uses units of 1.0==1 meter. Sample file

# Unicon City Template Model

default {
  name Unicon City
  home [5.0, 0.0, 5.0]
  angle 4.6
  origin_node toplevel directory
}


Room {
name toplevel directory
x 0
y 0
z 0
w 10
h 10
l 10
texture wall.gif
}

Within the CVE format, there are a couple possible ways to introduce the buildings

in version 0, just place them as boxes within one gigantic "room"==world (as long as users cannot walk into buildings)
in version 1, place them as separately "rooms" (allow users to enter buildings)
in version 2, generate many internal "rooms" per building

Graphic Design of the Day: Kiviat Diagrams

One way to represent many-dimensioned data is to lay out the dimensions around a circle; the 2D shape (and its degree of circularity or lack thereof) tell you something about which dimensions are interesting.

Kiviat diagram for software quality. Source: geeks with blogs, via google image

Kiviat diagrams are easy to criticize. There are problems with the relative scales of dimension; do you reduce them all to 0.0-1.0 ranges, or not? There are problems to identify normal or acceptable ranges of values. There are problems that adjacent dimensions don't really have any more connection with each other than remote dimensions, but the Kiviat makes them look like they do. The area inside the Kiviat shape is really meaningless.

lecture 27

Reading Assignment

Using High Rising Cities to Visualize Performance in Real-Time, by Ogami, et al.

Discussion of VR City Papers

Search and Exporing Software Repositories in VR

Set of (open source) code repositories == planetary system
"world clouds", a play on "word clouds" try to connect semantics search terms to class names
Code structure details via a city view.
Using HTC Vive despite the fuzziness of text in current-gen VR

VR City

So, we have reached current state-of-the-art getting-published software city research! How does it compare?

slightly refined (how?) city metaphor as in Lanza et al
Class == building, method==floor, assembled Minecraft-style from little cubes corresponding to some changeable metric.
Some fancy math (Hilbert curve! LOL), bringing a grenade to a knife fight.
Ability to color code buildings, connect an author's commits or replay "trace" files containing method calls/returns
HTC vive; all the fuzzy source code you can eat
massively stuck in birds-eye view when city begs for First Person
city is still uninhabited
but at least it is prettier! and somewhat more detailed

Look at HW#5

Update on Dr. J's Code Analyzer Tool

Over the weekend, I combined the two previous prototypes
cod3d.icn reads the entire directory hierarchy and generates the 2D layout from live data
"streets" now have a collection of "file" objects which contain the source code
next steps:
- layout file objects in 2D
- populate "file" objects with building objects corresponding to classes and functions
- generate a CVE .dat file of the buildings
  (former CS 428 students, if your old Java program handles .dat file format, you could potentially render the results)
- inject various computer controlled characters (dinosaurs, anyone?) corresponding to different types of events
- map memory references into micro-forces directing NPC's to locations in the code where they are referenced

Type Conversion Events

Unicon does more automatic type conversion than C/C++. At almost every operator, and every built-in function, the types of arguments are checked, and if necessary, converted.

Event	Value	Description
`E_Aconv`	input value	attempt to convert
`E_Tconv`	example target	conversion target
`E_Nconv`	input value	no conversion was needed
`E_Sconv`	output value	conversion was successful
`E_Fconv`	input value	conversion failed

Tool of the day: redconv

Redundant conversion catcher. This is not a visualization tool, but it is an execution monitor. Even if conversions are not redundant, they may be an indicator of a bug or a performance problem. When is a conversion "unhealthy"?

lecture 28

WSection, 3D Object Selection, and Level of Detail

3D graphics is computationally intense. Unicon's 3D Facilities are a compromise between the dynamic language and the requirements of the underlying 3D API's in C/C++.

History:

Classic OpenGL C mindset: render entire scene in code each frame. On a VM interpreted language that was a non-starter: too slow.
Unicon mindset is: build scene on display list, let C code render display list each frame as fast as it can.
Worked well on single-room FPS-style tests
On integrated graphics, we brought the machines to their knees very quickly as we extended from a "single room" to a "building floor" scale.
On NVIDIA with decent drivers, we got a LOT farther, but the code still dropped to unacceptable frame rates when we tried to do a whole floor
In large environments: a TINY percent of your scene will be visible in each frame. Start with the 50% that's behind you. But wait, how many degrees of field of vision does your display do? And how about the 50% that is behind larger objects, like walls for example? If you OpenGL-render it all, you pay for all the invisible stuff.
For Unicon, a crude metric turned out to be the display list size. Each CPU/GPU is different, but somewhere between 10,000 - 50,000, the runtime system C code just can't walk through it 60x/sec. For a larger display list, we have to have a way to skip invisible stuff.

Options for better performance include:

Buy faster hardware ($ works to some extent)
Make the compiler/VM faster (if app is CPU bound)
Tune the language runtime system OpenGL C code.
Hardwire more in C. Use much fancier data structures/algorithms in C. This is what a "game engine" would do.
Write smarter Unicon code that asks the GPU to do less work per frame. Except Unicon is slow, so CPU/frame is also very limited budget.

We settled on a Uniconish way to implement the concept level of detail without rebuilding the display list each frame.

Level of Detail

Level of Detail in typical games:

objects farther away are smaller, fewer pixels
the full up-close OpenGL render would be a waste
related concept: mipmapping. You can use smaller textures for farther away objects.

WSection(): Basic Idea

Add the ability to group, and turn on/off, sections of the display list each frame without having to rebuild it
Boolean flags inserted into the display list, include an extent which says how far you can skip over when the section is disabled.
Sections are named for human's sakes
Start a section with WSection(name), end it with WSection()
Sections may be nested hierarchically
Sections are also used to group 3D primitives together for 3D object selection

WSection() Example #1

WSection("redrect") # beginning of a new object named redrect
Fg("red")
FillPolygon(0,0,0, 0,1,0, 1,1,0, 1,0,0)
WSection()          # end of the object redrect

WSection() in 3D Object Selection

WAttrib("pick=on") turns on object selection. If it is on:
After each event, &pick generates strings that identify each named (by WSection()) 3D object that the mouse is on.
There is a library, selection3D, that encapsulates that in a more GUI-class-library callback-method style. See UTR

Visualization Evaluation Questions

Specific questions to think about as you consider other folks' visualizations, or design your semester project

What data domain(s) is the described system able to observe?
What analysis does the described system perform?
What visualization or novel data presentation techniques are employed, if any?

X3D for Software Visualization

You should at least hear of X3D in this class. Let's discuss it.

First there was VRML, and it was a parody of HTML with a 1:1 correspondence to a popular SGI C++ 3D library used for early VR.
Detailed graphics scenes depiction in a browser that had to have a special plugin, hard to support all browsers, updates etc.
Browser plugin has hardwired basic 3D UI (camera control etc) but not a full featured general purpose applications programming interface.
Then came X3D, it is an XML-based successor for VRML.
WebGL or Web3D or HTML5 or whatever standards have managed to mostly solve the 3D web portability problem
Remaining issues: data formats don't solve the code/API generality problem. Most web applications are interpreted and/or run real slow. Site load times threaten to be "bad" on legacy broadband. Imagine needing to download WoW on the fly every time you play.

Rube

This work is described in "The rube Framework for Personalized 3D Software Visualization", by Hopkins and Fishwick, Dagstuhl software visualization seminar, 2001.

idea: users should develop their own (visual) metaphors.
3d, web-based
Separate geometry from inter-object semantic relations
Model Fusion Engine merges object geometry and dynamic behavior models into a 3D scene (VRML scene file).
generates X3D

Rube methodology

choose system to be modeled
select structural and dynamic behavioral model types
choose a metaphor
define mappings/analogies
create model

Example: a lightbulb is to be modeled. A finite state machine is chosen to model the bulb. S1=disconnected, S2=off, S3=on.

For each different dynamic model type, there may be any number of defined visual metaphors, or a programmer may wish to create a new one. A "water tank" metaphor for a finite state machine would "fill the tank" of whichever state the machine is in, and the water would be pumped over to a different tank whenever a transition to a new state occurs.

In a gazebo metaphor, a person would indicate the state, and a transition would be depicted by that person walking.

Rube Summary

There are benefits to a visualization system that supports 3D models and external tools. The benefits include richer, reusable visual metaphors, and better portability.

lecture 29

HW4 Report/Show/Tell

Comments on your HW4 Code

submitted solutions ranged from 400-700 LOC
solutions ranged widely both in looks and performance
did you figure out how to pull out most of the work out of DrawString, so that it only executes the first time it is called? For example, re-using the font texture using its display list record instead of re-extracting it from the wfont window over and over again?
what fraction of your monitor code is on the critical path executed repeatedly every event?
kind of impressed that of you used a co-expression in your own code, but...in a big loop? probably should re-think that
put(numbers, pop(numbers), pop(numbers), pop(numbers))
while not member(&digits ++ '.', line[i]) do ...
while ... do ... tab(many(&digits || "."))
if = "=" || "." || "," || "\"" then ...
i +:= 1
Which is better: x *:= -1 or x := -x
if (3 < i < *TempL) then ...
keyList := [: key(jObject[keysOfjObject[i]]) :] ...
tab(upto(',')) ... find('.', filename)
using evaltree() but also your own EvGet() loop??
evaltree() callback? Probably should consider using a table of procedures instead of giant case expression. Or maybe Hani's trick.
if color == ("red"|"green") then ...
if (postpone | cont) == "T" then ...

On Dynamic Analysis

Dynamic analysis refers to computing higher level information from program execution behavior such as an event stream.
In pragmatic terms, dynamic analysis consists additional code and data structures, besides just a switch/case, applied to the processing of an event stream, before deciding what gets depicted graphically. Do you summarize? Compute statistics on it? Place it into categories?

Here is a classic paper on the subject. Grad students, go ahead and read this. We will skim it today to try and pick out the highlights.

The Concept of Dynamic Analysis by Tom Ball.

According to Ball, dynamic analysis has the following properties compared with static analysis:

greater precision of information, derived from 1+ actual program run(s)
input-centric mentality; shows dependence of internal behavior on particular inputs of a given execution
ability to reveal semantic dependencies that are far apart in scope

Ball's paper mentions (claims to introduce) two particular types of dynamic analysis, out of myriads:

frequency spectrum analysis: analyze frequencies of different kinds of events, e.g. to identify related computations
coverage concept analysis: comparing actual control flow from a set of executions against a static control flow graph can show what's missing from a set of tests

FSA

low-frequency operations are generally at higher-levels of abstraction
frequency clusters -- if foo and bar are both called 1033 times, there is probably a connection
frequencies that match a program's input or output domain may reveal portions of the program related to input or output.
frequencies can tip you off regarding the big-Oh complexity of an algorithm

CCA

coverage profile: profile of what was executed (no frequency info)
concept analysis: (T, E), T a set of tests and E a set of program entities, is a concept if every test in T covers all of E and no test not in T covers all of E.

Given a (boolean) table showing all the tests and entities, Ball points out that you can form a concept lattice, and that the concept lattice shows control flow relationships within 1+ actual executions, analogous to the kinds produced by control flow static analysis.

More Dynamic Analyses

OK, so where do we find more examples of dynamic analysis? Here are some more examples of interesting dynamic analyses.

statistical: Summarizing data by accumulation or averaging to give the big picture. _ FSA seems to be an example of statistical analysis.
pattern-of-interest: Parsing event sequences using patterns to find bugs, or even just to find items of interest. Event pattern parsing must carefully define its domain, skipping over events that don't effect the pattern match. Event pattern parsing will usually be done non-deterministically and maybe in a ``massively parallel'' model. Tools like flex take a massively parallel set of patterns and merge them into a single DFA, but not all pattern matching can be so reduced.
higher-level-events: one variant of the pattern-of-interest notion is to identify events at a higher semantic level, such as aggregates of lower level events, or application domain events
categorization: figuring out when a class implements a stack, or is using dynamic programming, or whether it employs a feature for which a specialized tool is available
profiling; coverage: treating hotspots and coldspots specially; for example the former deserve extra performance tuning monitors, while the latter deserve extra typographic paranoia monitors

lecture 30

Reading Assignment

This week you get a very cool paper that is one of the best at integrating visualization with the views of the code inside a code city.

Code Park: A New 3D Code Visualization Tool, by Khaloo et al.

Discussion of Using High Rising Cities to Visualize Performance in Real-Time

Profiling is an important use of dynamic runtime execution behavior
Classic profilers are text-heavy. Readers can extract useful information but it is usually just: who are the heaviest overall consumers of time
Typical GUI profilers will maybe show this as a bar chart.
This paper: draws a software city, where the height of the building corresponds to the % of time spent in a method in the last time interval (k seconds, or milliseconds or whatever)

Graphic Design of the Day: Perspective Wall

Hey, did you notice that there is an "information visualization wiki"? Interesting...

Update on Dr. J's Software City Effort

Windows Unicon 3D is not as stable as Linux Unicon 3D.
Windows doesn't have as good tools for finding memory issues, i.e. nothing as good as valgrind.

2 1/2 D Visualizations of Call Graphs

(From Facilitating Exploration of Unfamiliar Source Code by Providing 2.5D Visualizations of Dynamic Call Graphs, by Bohnet and Dollner, 2007, 4th IEEE Workshop on Visualizing Software for Understanding and Analysis)

A "short paper" in 2007 gives lots of ideas to think about.

millions of lines of unfamiliar code
to add a feature, one must
- identify the relevant "entry points"
- read the source code
current IDE's poorly suited to this task
- to follow the calls, one is switching constantly between files
- the source navigation tree does not show connections, does not emphasize the files relevant to the feature under study, and does not scale well to hundreds/thousands of files.
- no context for navigation, have to go-and-see, can't see-and-go
idea: use dynamic call graph data to organize navigation activity
- similar a dynamic tracing facility...but the IDE uses the data to emphasize or structure the navigation bar to the relevent code automatically.
superimpose the call graph structure on the source code views
present a perspective-wall-like view of the call graph...
apply level-of-detail techniques; present more information for the nearer / focus nodes where there is space for it.

Nate's Structure Monitor

Simple graphics, reminiscent of Playfair's classic graphic design. Ya, it is a cheap trick, but it works.

Metaphor-Based Animation of OO Programs

SV conference pub is a poster abstract; Finnish author has written some other related papers.
Context is novice programmer education, a perpetually popular SV area.
Project was done as Flash animations.
watch panel metaphor for instances
role metaphors for member variables
blueprint for class, found in a blueprint book; blueprint page will visually depict methods, which don't show on the watch panel
workshop for method invocation, workbench for its result (lame)
method call is also visualized as a envelope ("message passing") that delivers parameters to the watch panel
object references use a "pennant" metaphor; color is used to match. No pennants = garbage to be collected
variable roles include: fixed value, organizer, stepper, most-recent holder, one-way flag, most-wanted holder, gatherer, container, walker, follower, temporary, other

lecture 31

Reminder HW#5 Due Tomorrow Night

Write me your design document. I will endeavor to give you timely feedback on this one.

Status Update and Demo on Dr. J's Software City

As reported last class, ~8900 buildings render pretty OK in CVE despite how slow Unicon is and how primitive its 3D facilities are.
Essentially: I am procedurally generating a "game level" from a source code repository. I plan to support several mainstream languages.
One could do a whole M.S. thesis improving the procedural generation based on static program information, and making it more human-city-like.
Monday night was spent changing building heights to correspond to # of methods in a class, and littler things like improving the sky to not tile badly.
2D city layout size modified to 1024x1024 because texture power of 2; (should I go with 2048x1024 to closer approximate HD?)
Current 3D generation is set to "10 meters per pixel". That means Unicon City is 100 square KM of land. Actually a lot to walk around; had to modify CVE client to allow for faster movement, need to do more with that.
Some surprising happy serendipities: buildings' heights and clusters of classes looks more organic than expected.

Demo, if the Laptop Cooperates

Jeffery's Current Todo List

I have ~2.5 weeks before my next conference paper deadline.

come up with a great name
add to procedural generation a mechanism for mapping source code locations (file:line:col) to city locations (x,y,z). What should that mapping look like?
Design one or more architectures for how the execution monitoring connects to the videogame. "Easiest" is to just embed it directly in the CVE client, but that has downsides compared with putting it in a server, in the long run. "Easiest" just monitors one program at a time, but the long run should support multiple programs being monitored.
think about how to incorporate new programs into an existing city
add a menu item for launching an execution
add calls to EvGet() into the null handler in cve
write a standalone monitor that monitors class instance construction and member variable references.
how to talk about, produce analysis of, and visualize the fact that some directories contain unrelated/unordered collections of code, while others contain members of a single system. Compare the IPL directory categories: procs (library modules), progs (separate programs in each file), packs (multi-file programs).

Brainstorming: Visualizing Software Executions as Populated, Dynamic Cities

Help me improve my metaphors.

integrate CVS logs, bug tracker, static analysis, runtime data
do this for Unicon, with mix of available and (new, needed) tools
push "city" metaphor much farther than in previous papers
overcome various fatal flaws with the whole city metaphor.

Dr. J's fatal-flaw view of visualizing software as cities: many or most (especially OO) programs are understood largely through their relationships between classes and between instances. Software as cities doesn't automatically manage to depict such relationships at all. It got as far as colocating classes in the same package.

Classes are buildings, sure

height=# methods, width=#public variables, length=(log of) longest code. (Private variables not included)

What is the model of time in-game?

Today = a current execution run. CVS repositories and previous execution logs make for remembrances of things past.

Limited ("Prince of Persia") backwards-in-time capability?

limited-reversible is better than no reversible, and is more scalable than full-reversible. Limited reversible may mean, if you go back past a certain point, you'll not be able to see as many details, or change the execution from that point. Assuming we are collecting fairly detailed traces, you can go backward farther than that in a replay-only mode.

How to represent procedures

treat like a class w/ 1 method. Lotta procedures = village.

How to represent instances

As robots? Garbage would be broken-down robots...lots and lots of broken-down robots! (thanks, A.P.)

How to represent "atoms" (scalar values)

Not at all? As text? As virtual books (strings), hammers?? (ints) and saws?? (reals)? What about tables and lists? Records got special treatment as people; tables and lists as bookshelves, or buses, or?

How do represent external entities

network connections, I/O handles, files... need representation
try and give me some better ideas here.
what have we seen other folks try? Letters/envelopes...

In software engineering/software design, an "association" refers to a relationship between classes or instances. Why does the metaphor need associations?

Because making correct code is difficult and perilous.

Our city is someplace like venice, or hell, or New York.
Step off the sidewalk on the wrong street and you are dead.
Associations provide your security.

What associations are depicted, and how?

We need at least: inheritance, aggregation, and "other"

How to depict inheritance and aggregation?

How have other researchers depicted these? Tubes running into a roof?

aggregation = adjacency, or containment, or bridges.
inheritance = physical resemblance

How to depict reference?

boats?

What are the streets?

In Venice, there are streets. And canals.

How to represent the stack

Gradually dimming lights in buildings?
Portals/teleporters/bridges/moving sidewalks?
Beam of light?

In past discussion, there has been support for the beam-of-light model, pointing backwards from callee to caller. Dr. J would add: the beam of light might be a good metaphor for an instant-teleportation feature...

How to represent bugs and warnings

As monsters

How to layout buildings?

Around an older, urban core? Minimize distance of overall call graph?

What are ghosts?

Remembrances of fixed bugs and deleted code

How to present source code control structure details.

There is the raw codesize, the extent of nesting

How to present data details.

Well, instances are a lot of the data, and atoms are the rest. A prime issue here is one of aggregation. When is an object a citizen of the world, and when is it just somebody's foot? I guess the answer is: when referenced globally, or by two or more other instances.

lecture 32

Status of HW#4 Grading

I have received 7 HW#4 submissions so far.
I will grade HW#4 as soon as I can, but it will take a bit longer.
I will print and read your code and make suggestions there
I will test on several suspects, and maybe on both Windows and Linux.
Your grade will be based on whichever platform produces the best results on the tests.

Status of HW#5 Grading

I have received 7 HW#5 submissions so far.
I will attempt to grad them this weekend.

Remainder of the Course

more: research papers
more: DrJ-tries-to-build-Unicon-City
but: what do you need to know in order to coolify your semester project?

Question: How to Get More Static Analysis for Unicon if You Need it

Some of you did a Unicon-based HW#1, some didn't
We kind of punted some, by putting up .json files for all suspects, fine.
All that has is table-of-lists representation of a call graph
What other static information might we want?
What are our options for getting it?

What Static Analysis Information Might We Want?

What are Options for Getting It?

Some of what we might want, has already been done by udb.
How hard would it be to use parts of udb as a library?
So far, this is my test of this question:

# you would have to adjust these paths to refer you your uni/udb directory
link "/home/jeffery/unicon/uni/udb/icode"
link "/home/jeffery/unicon/uni/udb/srcfile"
link "/home/jeffery/unicon/uni/udb/symtab"
link "/home/jeffery/unicon/uni/udb/system"
procedure main(argv)
   icode := Icode()
   write("Icode file: ", argv[1] | "not supplied")
   src := icode.getSrcFileNames(argv[1])
   write("source files: ", image(src))
      every write("\t", !\src)
   srcFile := SourceFile()
   srcFile.loadSourceFiles(argv[1], src)
   write("srcFile: ", image(srcFile))
   every k := key(srcFile) do {
      write("\t", k, " ", image(srcFile[k]))
      if type(srcFile[k]) == ("set"|"list") then
		every write("\t\t", image(!srcFile[k]))
      else if type(srcFile[k]) == ("table") then
         	every kk := key(srcFile[k]) do {
		   if type(srcFile[k][kk]) == "list" then {
                      write("\t\t", image(kk), ":")
                      every write("\t\t\t", image(!(srcFile[k][kk]))) \ 10
                      write("\t\t\t...")
                      }
                   else
                      write("\t\t", image(kk), ": ", image(srcFile[k][kk]))
                   }
      }
end

Live Demo this one.

More thoughts on How to Make Static Analysis in Unicon Much Easier

Suppose I want tools like the software-as-cities, and its too much work.

Unicon's monitoring framework makes the dynamic events easy to get, but how do I make the static info easy enough to grab? (Today's answer was: make a library from udb + ???)
The lexer and parser for Unicon are widely available, do monitors need the type of static analysis they could get from parse trees? (Discuss static info obtainable from icode VM binary vs. from list of strings of source code vs. from parse trees and symbol tables)
What generic static analysis tool(s) should we invent?
Execution monitoring was modeled as a sequence of events (while EvGet()); what should be the conceptual model for a general purpose static analysis tool?
Is there a collection of static analysis foundational data, and a set of generic operations, that is standard, or that we should standardize? For example, for a hypothetical Unicon Static Analysis (USA) tool, analysis produces a tuple (Σ, Π, Χ) where Σ is the set of source files, Π is the Parse Tree forest, and Χ is the control flow graph?

Yeah, this is a lame start, but at least it will allow us to consider what should really be there.

Mondrian

Mondrian: An Agile Information Visualization Framework, by Meyer, Girba, and Lungu (SOFTVIS 2006)
scripting visualizations with mondrian
video

Viz tools conflict: gnuplot generality of reading file formats vs. Alamo-style run-time access to original data.
Mondrian sez:: instead of moving the data to the viz tool, move the visualization tool to the data.
Provide not a file format: but instead, an interface. Allow a declarative script to specify the visualization.
Work directly with the objects in the data model.: Let the programmer visualize what they are doing in their environment/tools.
at one time this felt to me like:: SmallTalk-based tools trying to be relevant to a non-SmallTalk world.

lecture 33

Mailbag

I am currently trying to get all of the procedures from a Unicon program that is being passed to my hw6. I was thinking of scanning the file and looking for them, but I don't think this is the best option. Is there a different direction you can point me to look at and do some sort of static analysis before i begin to monitor or should i stick to scanning the file?

Great question. You could use the udb modules I demo'ed last class to find all your source files, and then run HW#1 style code. But, instead of looking for the procedures in the source code, if procedures is what you want, I think you could scan all the global variables using globalnames() and if the value is of type procedure, it is a procedure. Maybe something like

every g := globalnames(Monitored) do
   if type(variable(g, Monitored)) == "procedure" then ...

BTW, beware of "procedure" versus "function". A procedure is Unicon code, a function is generally built-in, i.e. C code.

Here are...what I'd be interested to see for static information provided memory requirements for global data minimum memory on stack required for each procedure minimum heap memory required for program's run time execution, and the amounts of minimum heap allocation requested by each procedure. Included in this could be amount of memory allocated each time procedure is called, and minimum number of times that procedure is called. indicator for procedures that have the potential to allocate more than the minimum denoted above (procedures called in a loop, memory being allocated in a loop, etc.)

Great list. Let's work on these a bit. Interestingly, some of them may be statically calculable, but some of them sound more like dynamics to me.

Memory requirements and Sizes in Unicon

Memory requirements for global data: 16 bytes per global PLUS heap memory pointed-at.
Minimum memory on stack required for each procedure:
- Use E_Stack events?
- Are there access functions or keywords that report stack depth
- Oh by the way, Unicon uses two stacks, the C stack and a VM interpreter stack.
Minimum heap memory required for program's runtime execution: would require hard analysis to statically guesstimate, but maybe pretty easy to derive empirically. Q: how to keep around static or dynamic analysis results across time and/or multiple runs?
Minimum heap allocation required by each procedure: hard to be accurate in all cases, but maybe not too hard to do a crude power bound
indicator for procedure that have the potential to allocate more than the minimums: semi-difficult to do statically, maybe easy to do dynamically.

Reading Assignment

A Controlled Experiment on Spatial Orientation in VR-Based Software Cities, by Rudel, Ganser, and Koschke. In 2018 IEEE Working Conference on Software Visualization, pp. 21-31.

Discussion of Code Park: A New 3D Code Visualization Tool

Challenges for InfoVis Engines

vis. engine should be domain independent
visualizations should be composed from simpler parts
visualization should be definable at a fine grained level: instance-based, not type-based; sometimes different instances of the same type play different roles
minimize object-creation overhead: vis. works off a model of a running system, but instead of duplicating objects in the system, how about using them directly?
visualization description should be declarative: compare w/ Tango, Dance, and UFO for that matter

Other Mondrian Highlights

Declarative Syntax which look like...

view nodes: model classes using: Rectangle withBorder
   forEach: [:eachClass | eachClass viewMethodsIn: view]

Screen-Filling System

Mondrian has a lot of structures to visualize simultaneously... And it has structures that are too wide to fit the window.

Built on top of Moose

You just know it has to be good.

Interesting Mention of CodeCrawler

"visualizations of combined metrics and structural information"

lecture 34

Static vs. Dynamic: Memory Size Requirements, Take Two

	Static	Dynamic
enumerate globals	parse all source code including includes and linked library modules or "parse" the binary. It has a header, might be compressed Header includes "pointer" to array of globals udb has of some of this; see icode.icn # of bytes of globals is Gnames-Globals note...global names are also part of their memory cost	G := [: globalnames(Monitored) :] write("there are ", *G, " globals, including procedures")
size globals	Unlike traditional compiled mainstream languages Unicon does not have pre-initialized variables, other than procedures. The icode does have a constant region of known size (Filenms-Strcons) Although it is called Strcons and holds a lot of strings, it also holds cset blocks, and previously held real #'s as well. From parsed code or binary, static analysis starting from main() could identify some variables that are always initialized	16 bytes per global, 16 bytes for the slot to hold its name Sizes of pointed-at values are mostly calculable, on 64-bit machines they are 16 bytes per slot, plus some overhead for headers, pointers, etc. It is difficult to find out from a structure value, how many list element blocks or hash table buckets it is using.
enumerate locals	parse source code, build symbol table or "parse" the binary the icode for each procedure has a "procedure block" that contains relevant information (see struct b_proc from rstructs.h)	P := [: paramnames(Monitored) :] write("there are ", P, " params") L := [: localnames(Monitored) :] write("there are ", L, " locals")
size locals	On the stack: 16-bytes per local and parameter. In static memory: 16 bytes per name. Not counting any heap memory they point at.	Regarding measuring stack depth before/after a call, earlier I mentioned an E_Stack event that reports changes in stack depth. This is for the VM interpreter stack. There is also an E_Cstack event, but it looks to me like the grad student tasked with it did not implement it correctly.
size heap entities	No heaps at compile time. Static analysis could determine for some parts of the program that are guaranteed to work, some amount of the heap allocation that would occur.	String: 1 byte per character. Cset: block of X bytes of overhead plus a 32-byte bit vector List: 16 bytes per slot, plus any data pointed at, plus list header block (96 bytes) and one or more list element blocks (56 bytes). Element blocks grown via put/push hold a lot more slots than are actually used Table: 64 byte header + var. size hash table starting ~288 bytes? + 56 bytes/element Set: 64 byte header + var. size hash table starting ~288 bytes? + 40 bytes/element Record: 48 bytes of overhead plus 16 bytes per field Object: 80 bytes of overhead plus 16 bytes per field

On the monitoring of OOP Behavior

Consider the program

class C(x,y)
   method m(a)
      write(a, ": x,y: ", image(x), ",", image(y))
   end
initially
   x := 1
   y := 3.14
end

procedure main()
   o := C()
   o.m("hey")
end

Running it results in around 400 events.
Maybe we just want to know:
- whenever an object instance is created,
- when a method is called, and
- whenever a field is read/written.
How many of those 400 events do we have to look at?
How might an execution monitor build a model of the objects in the target program?

Thes questions boil down to: what dynamic analysis of the event stream do we have to do in order to turn it into useful higher level information?

construction

E_Fcall for a function whose image says "class constructor C__state" instead of "function whatever"
E_Fret from that function call returns the created instance itself
an instance's image is "object C_serial#(numfields)"

method call A call to o.m() is an E_Pcall to a procedure whose name is C_m, with an extra parameter for o on the front.
field access A field access is an E_Opcode to the Field VM instruction, resulting in a E_Rref on the object, and an Rsub identifying the field.
A serious side consideration: if the monitor holds direct references to object instances, those instances will never become garbage. Need to think about this one some more.

construction	E_Fcall for a function whose image says "class constructor C__state" instead of "function whatever" E_Fret from that function call returns the created instance itself an instance's image is `"object C_serial#(numfields)"`
method call	A call to o.m() is an E_Pcall to a procedure whose name is C_m, with an extra parameter for o on the front.
field access	A field access is an E_Opcode to the Field VM instruction, resulting in a E_Rref on the object, and an Rsub identifying the field.

Play around with this interactively in moncls.icn

lecture 35

Updated moncls.icn

in order to map member variables, some class "modeling" in the monitor
visualization will want to spawn robots at point of object creation, so where is that, exactly? -- could ask keyword("line", Monitored) but E_Loc gives better answer !? So use/remember last E_Loc before E_Pcall.

Mapping Code to World Coordinates (and maybe vice-versa)

need to map code locations to (x,y,z) world locations. This probably needs to be output by the program that generates the world, which knows the (x,y,z) coordinates. The prototype is cod3d.icn
What are "code locations"?
- In file system terms this might come down to sourcepath/file:line:column.
- In language terms a location might be more about scopes: package/class/procedure
- The buildings correspond to named procedures and classes.

So what should the API for that mapping even look like, anyhow?

# given a procedure (and scope) what [x,y,z] location is it at?
#   note: names get reused in different scopes
procedure symbol2world(procname, classname, packagename) : list
    # determine/compute filename, lookup in a table of [filename||":"||symbol]

Other possibilities:

# given a filename, line number, and column, return a [x,y,z] location.
# this will be needed when we have buildings' internals mapped.
# for now, it might calculate from a source location, what building is it in?
procedure source2world(filen,lineno,colno:1):list


# given a procedure (and scope) what filename, line number is it
# this has surely been done in udb. Look for it.
procedure source2world(procname, classname, packagename):list

What data structure does this need?
```
table[filename]  --> ???
```
So far, cod3d.icn generates something like this model.json. Almost-legal json, raises whole new cans of worms.
Aside from syntax, how would you improve the JSON to make it more usable/useful?
What do our static analysis tools (and for that matter, our visualizations) do about multiple versions of the same function, under ifdefs for example?

Making Unicon Garbage Collect, for Science

Garbage collection used to operate out of two 64KB regions: string and block
Modern Unicon operates out of two regions sized to 1% of available memory.
Three ways to make Unicon monitor a program with small regions:
1. Reduce available memory before launching program (bad idea)
2. Set environment variables STRSIZE/BLKSIZE
3. Pass strsize, blksize as arguments to load()
  - In the bowels of evinit.icn are lines that look like like:
```
&eventsource := EventSource := Monitored := load(f[1],f[2:0],input,output,error) | fail
```
  - But it turns out load()'s full signature is:
```
load(s,arglist,infile,outfile,errfile, blocksize, stringsize, stacksize)
```
So, you/we could easily hack EvInit to take region size arguments

"Turning CVE Into a Visualization Environment" Update

How to get live/dynamic data into my CVE
turning it from a collaborative virtual environment, into a collaborative visualization environment

Start with: how to wire together CVE Architecture with Alamo Architecture?

CVE runs standalone, but normally uses a traditional client/server two-process TCP socket-based architecture.
Unicon uses Alamo, a single-process synchronous thread-based architecture
Want: multiple users, running multiple programs, to (be able to) see each other's program behavior
Want: lots of little computer controlled NPC's running around
Possible solution 1: CVE client is the EM
- Pros: can develop standalone mode initially
- Cons: need to develop way for others to see what you are seeing
Possible solution 2: CVE server is the EM
- Pros: server is the logical space for shared environment entities
- Cons: server can be a bottleneck; server must remain robust
Possible solution 3: another client is the EM
- Pros: we already have some infrastructure for computer-controlled characters (NPCs) written as clients
- Cons: increases latency; reaction to program event is now two (buffered) network packets away from client
- May need to extend CVE network protocol to handle multiple avatars managed from a single client ("Legion")

lecture 36

Mailbag

I have been trying to get the time spend on each function. What I have tried so far is recording the &time during an E_Pcall() then recording the &time during an E_Psusp or E_Pret and subtracting that from the E_Pcall time to get the time spent. The trouble I'm having right now is that those times are coming back with the same value giving me 0 when I subtract them. Do you have any suggestions on how I can approach timing functions.: Great question. Let's talk some more about timing.

More on Execution Timing

Earlier when we talked about timing, I gave examples that use the Unicon &time keyword, but a student has clearly found and reported that it is not always sufficient.

Machine "cycles" are billions per second, but how precisely can we measure time in a pre-emptive multi-tasking operating system?
The underlying hardware has fundamental limits, as does the operating system. Traditional "clock ticks" are 10ms on Linux, 55 ms on classic Windows. On a modern system a program (or function) has to run a long time in order for timing to be remotely accurate.
When you evaluate &time, the Unicon runtime makes a C library call to ask what time it is (or how much CPU has elapsed) and which library function is used determines what you get.
on UNIX/Linux, &time uses getrusage(), or clock_gettime(), or times(), or sysconf(_SC_CLK_TCK). We have gone to a fair bit of trouble to use the best one available.
On Windows it seems to be using an ANSI C function called clock(). There is some experimental Windows code for a better clock precision that is #ifdef'ed out. I am inquiring with our main Windows Unicon developer, Jafar, to see if we can improve the Windows implementation to address this.

In the meantime, some experiments:

procedure main()
    t1 := &time
    f(10000)
    t2 := &time
    write("f() took ", t2-t1, "ms")
end

procedure f(n)
    every i := 1 to n
end

Somewhere between ~7500 and 50000 the #ms will report in as nonzero. If you were an aggressive student of Unicon you might bump into the function gettimeofday() which reports wall clock time not CPU time, but might seem to be hopeful since it reports in microseconds, not milliseconds. However, on Windows it might not actually be any more precise:

procedure main()
    t1 := &time
    g := gettimeofday()
    f(100000)
    t2 := &time
    g2 := gettimeofday()
    write("f() took ", t2-t1, "ms")
    write(image(g), " vs. ", image(g2))
    every k := key(g) do write("\t",k,": ", g[k])
    write("vs.")
    every k := key(g2) do write("\t",k,": ", g2[k])
    write("gettimeofday reports ", g2.sec-g.sec,"s ", g2.usec-g.usec,"usec")
end

procedure f(n)
    every i := 1 to n
end

Monitoring and counting VM instructions elapsed, or E_Tick events, might give you alternative ways to measure computation expended, but monitoring VM instructions is expensive, and E_Tick events should be charged evenly against all the code since the last E_Tick event, not just where the PC is at the time of the E_Tick, or you are just doing random sampling (like gprof). By the way, if you run for a long enough number of E_Tick events, random sampling is not a bad statistical approximation of overall time spent.

Reading Assignment

Overcoming Issues of 3D Software Visualization through Immersive Augmented Reality

Discussion of A Controlled Experiment on Spatial Orientation in VR-Based Software Cities

What was the controlled experiment?

Hypothesis #1: Users navigate more effectively and efficiently in EvoStreets when they use a 3DHMD instead of a pseudo 3D desktop system as a displaydevice.: Was this confirmed? Is it generalizable?
Hypothesis #2: Users who are familiar with navigating using a keyboard in computer games achieve higher task completion efficiency.: Was this confirmed? Is it generalizable?
Hypothesis #3: Users who are already familiar with the EvoStreet of a software for one particular metric mapping can navigate equally well if only the metric mapping changes (same structure, same starting point).: Was this confirmed?

Brief Update on Dr. J's City Efforts

So far this semester I have submitted two conference papers, one a literature survey, one on the basic city layout.
Third paper, on the architecture, is under construction.
If you were to go to grad school (and say, do an M.S. thesis), you may well follow a similar arc: get a vision, do a literature survey, design and implement something new, and evaluate evaluate evaluate
Some of you have demonstrated that you'd be able to contribute to this research, even if we haven't managed to make it magically happen.
Typical CS undergrad doesn't realize whether they are grad school capable, let alone whether or not it would serve their interests.
For the grad students in this class: thank you for your participation. Whatever else you learned here, I hope you picked up on the notion that communication and relationships are important in research, even if you find that out via my flaws and limitations.

lecture 37

Grading Update

HW#4 grades varied widely. Feel free to improve and resubmit.

Timing Update

current versions of Mingw64 GCC now have clock_gettime() in their C libraries.
You can read more about clock_gettime() at Paul Krzyzanowski's page. Basically, it provides CPU timers at nanosecond resolution.
Unicon has a tendency towards a lowest-common denominator and/or "best Windows approximation of UNIX behavior" when it comes to non-portable C API's.
That might mean, for example, that we live with millisecond timing in &time even though some OS'es can do microseconds or nano-seconds.
In the case of clock_gettime(), the Lowest Common Denominator is that it gives us user CPU time, but not system CPU time. On UNIX we add in system CPU time via a call to (lower resolution) times() function, but that is not present on Windows.
Summary: Dr. J has built a Windows Unicon that uses clock_gettime() with its finer resolution; not tested yet; will make binaries available.

What Dr. J is Thinking About

I am thinking about how to evaluate monitoring and visualization framework scalability.
Toy suspects are great, but there are no object-oriented programs in the mix.
The Unicon benchmark quite is also basically not object-oriented (a couple classes are used to implement a Thread Pool for the concurrent tests).
How about a validation test for OO behavior consisting of the three biggest OO Unicon programs in the distribution? unicon itself, the ui IDE, and the ivib visual interface builder.
They need to do some defined operations on some defined test input. Fine, how about if we add a button to a dialog, and define its code body to exit when clicked.
Two of them are user-interactive GUI's, so that will take some engineering. Like, turn them into non-interactive by inserting a mechanism for recording and replaying their logged GUI events.
Then, run them on recorded GUI logs, and observe for our software city virtual environment visualization/monitor:
- how many objects were created
- how many method calls/returns can the tool process per second
- how many frames per second does the visualization maintain, with this number of objects spawned and moving around
I fully expect to bring my visualization system to its knees, and spend some quality time studying and improving its performance
Are you expecting to bring your semester project to its kneeds, and spend some quality time studying and improving its performance?

lecture 38

Yeah, so, how was EXPO?

JIVE (Java Interactive Visualization Environment, Gestwicki et al)

This paper is too old for me to assign as a required reading, but it has some nice properties: it is about a mainstream language (Java), and it lays out an ambitious set of goals for us to compare, and see if we should be aspiring to also do them.

multiple concurrent (visual) representations
reverse execution
graphical queries

Major requirements:

depict objects as environments. method calls happen inside one. This immediately challenges the objects-as-robots metaphor.
multiple views. Different Granularities. detailed view and compact view.
histories - of execution, of method interaction... show sequence or collaboration diagrams (how do they address scalability? From Figure 1 the answer initially seems to be: they don't; from Figure 2 one answer is, things shrink down to points). This is not summary statistics, it is timelines and such
forward and backward execution. state-saving model. big Big logs.
queries on the runtime state. when did a variable change; or when did it achieve a certain value
clear and legible
use the stock JVM
be able to visualize programs that themselves have GUI's!!

Graphic design: simple, relatively easy to understand, scales poorly (minimal "visualization" involved, maximum IDE/debugger-like feel)

Analysis: hardwired, except that it supports a range of queries. What is the query language?

Implementation: Two-process model, supports multiple threads so long as only one runs at a time. Log file coupled with "in-memory" execution history database. Events are able to commit and un-commit themselves.

7 event types: static context creation, object creation, method call, method return, exception thrown/caught, change in source line, and change in variable value.

Stepping backward does not modify the client program, it is suspended until you get back to the current state and move forward. (Means: you can't modify the past, but maybe you can modify the present).

Queries: on program history; may return values, sets of states, or portions of program history. Visual representation of program states and program history means queries and results may be done graphically. Queries vis-a-vis variables in single instances or classwide.

No evaluation of scalability or effectiveness of using UML-like depictions.

JPDA: Java Platform Debugger Architecture

Originally there were the JVMDI and the JVMPI; now there is the JPDA. JIVE has to live on whatever the JVM feeds it. JPDA includes the JDI (Debug Interface), JDWP (Wire Protocol), and JVM TI (Tools Interface) which replaced JVMDI/JVMPI.

Compare this access to the value of a variable in Java, with the Unicon/Alamo access to a variable via variable(s, Monitored):

theStackFrame.getValue(theLocalVariable)
... transmitted via a socket / JDWP ...
jvmti->GetLocalInt(frame, slot, &intValue)
... result transmitted back...

This is a two-process, separate address space model, limited to the speed of whatever OS/network connection the EM has to the TP.
If you only have to execute as fast as a human user types "print" commands, this is plenty fast enough.
If you want to run it far faster than human, in order to drive a big animated visualization with lots of moving parts, this might not be fast enough.
If I am doing a software city visualization that runs the target program on a server, I might get the same network performance bottleneck.
From experience, modern networks have the bandwidth we need, but clients and servers can only process so many packets per second.
If you use JDWP, or if you do any other network-based two process communication, you are likely to need to bundle many messages per packet and send only a few packets per second.

Graph-Based Visualization of Software Evolution

This paper is ancient eye-candy I am including for sentimental reasons, but it is another representative of the class of visualizations that are geared towards understanding the changes in software over time, the same perspective the authors of the visualizing-software-as-cities paper took. It is not the here-and-now of a current execution, it is the view of code across the ages.

Given a software repository (they talk about CVS, a fine predecessor to Subversion; you might do the same for Git), how do we visualize a program's change over time? For each revision, they collect/measure/compute:

The author of each change of each file.
The control flow graphs of each method in the program.
The change in each basic block in the control-flow graphs.
The inheritance graph of the program.
The call-graphs of the methods of the program.
The time of each change to each file.

lecture 39

No Office Hours today

Sorry, search committee meeting, if you need office consultation please e-mail me and suggest your available time(s).

Mailbag

I was wondering if you have any test suspects or programs I could use to monitor for class and methods?

Great question. unicon (~7K LOC), ui (~9K LOC), and ivib (~16K LOC) are three example OO programs that one might try to monitor, but maybe we need something smaller. Within unicon/uni/progs a couple programs are possible: deen.icn (200 LOC) is a toy German-to-English dictionary, while umake.icn (300 LOC) is a simplified variant of the "make" program.

Deen takes German words on its command line and writes out English. A sample run might look like:

$ ./deen Ich bin ein Berliner
Opened file(de-en.txt).
Reading.....................................................................................................................................................................................................
done.  Read 197771 lines
Ich: self
bin is not in the dictionary.
ein: a
Berliner: doughnut

Deen is a toy program and is a far from ideal representative of object-orientation, but it is small enough that it would be easy to use as a suspect. At least it is OO enough to have some inheritance and some aggregation going on. Monitoring the unicon compiler compiling itself, or a ui session, or an ivib session, would be a far more impressive and challenging OO demonstration.

Discussion of Overcoming Issues of 3D Software Visualization through Immersive Augmented Reality

What were the Issues of 3D Software Visualization that they wanted to overcome?
what's difficult about navigation
what's difficult about occlusion
what's difficult about selection
what's difficult about text readability
Is the hypothesis ("displaying 3d software visualizations in immersive augmented reality can help to overcome usability issues of 3D visualizations and increase their effectiveness to support software concerns") almost the same as that posed by Rudel?
What was their test of this hypothesis, and what was the outcome?
In their conclusions they assert that augmented reality provided the "highest performance to find outliers", but in the results section they state that a standard computer screen required the least time and gave the highest correctness for this task. What gives? Techniques for Reducing the Complexity of Object-Oriented Execution Traces, by Hamou-Lhadj and Lethbridge This paper is about filtering techniques, which makes it potentially important. The paper's phrase "execution traces" could equally well be read as "log files" or "event sequences". Execution traces are very large, and very redundant. The ubiquity and reliance of most algorithms on loops guarantees this will be true for most programs. The analysis used in a software visualization should generally abstract and filter the data before it starts drawing graphics. Figure 2 of this paper gives a toy example in which a tiny duplication is removed; in practice, scale it up many orders of magnitude. multiplicity In software engineering design diagrams, multiplicity is commonly used to indicate the number of instances involved in a given association relationship. Might we use regular expressions to describe multiplicity in execution traces? A->B*-*>C*D Removing "utilities" constructors/destructors, accessor methods, utility and library classes. Potentially many incoming edges, with few or no outgoing dependencies. Polymorphic methods execution tree differences can be ignored when the abstract function performed is understood. lecture 40 Mailbag When I tried monitoring OO examples, on Windows I was unable to get them to run. On Linux they work fine...but my Linux Unicon does not do 3D. Thank you for the screen shot. I recommend an office consultation to look at your 3D issues. I may be able to get things to run on Windows. Zoom is a good way to do an appointment, if you can't bring the hardware to my office. In the fifth example of the HW4's you showed in class one of the students drew text in a 3d environment quite well, That didn't appear to be a texture, but instead a sole graphic. If you have time could you tell me how this student went about displaying this text? I would like to use it for my final project. Sure, let's go look at those. Dr. J Status Report I have been tied up a lot of the past two weeks writing the final version of a conference paper based on the literature survey I did for this course. The good part about writing a final version is, I found many more software city visualizations than I knew about in my first pass in January. Another good part is that I got acquainted by e-mail with a lot of the people who have done important work in this area. The downside is that I haven't had free time to work on the next steps in my implementation. Remainder of today's class, and Friday's, looks at some of these papers. Questions Regarding Final Exam Project Demos How are you going to spend your 10 minutes? Tell a story. Show pictures. Evaluate scalability: does it run on larger examples, or only toys? Analyze the events before you start kicking out graphic primitives If possible, ascertain how much time is spent on the monitoring/analysis versus how much time the graphic rendering costs. Balance would be good. Simultaneous Visual Analysis of Multiple Software Hierarchies This paper appeared in the 2018 Working Conference on Software Visualization. More Research Papers? Some papers that I didn't have on our reading list. Discovered while preparing final copy of a literature survey on software cities. Interactive Views to Improve the Comprehension of UML Models - An Experimental Validation, by Lange and Chaudron. 15th International Conference on Program Comprehension, ICPC'07. 2D and 3D Visualization of AspectJ Programs, by Bentrad and Meslati. In Proceedings of the 10th International Symposium on Programming and Systems (ISPS), pp. 183-190, IEEE Computer Society Press, 2011. An empirical study assessing the effect of SeeIT 3D on comprehension by B. Sharif, G. Jetty, J. Aponte, and E. Parra, in Proc. of VISSOFT. IEEE, 2013, pp. 1-10. lecture 41 Mailbag I was the author of the code that had the string implementation that was requested. You have my permission to share the code. How I implemented that was from one of your examples shared in class where you started by opening a 2D window. Thanks for your permission. Code presented below is from your HW#4; I have not checked if you changed anything from what I gave earlier. I am currently done making my buildings in my semester project but I wanted to add some detail to the city I'm trying to build. How would you recommend me to approach making a road or a ground surface so my building do not look like they are floating? For my city, I took a single big 2D image and used it as a texture for a single rectangular ground surface. Since my area was large, this stretches out the pixels enormously. It would be possible to either (a) use an image that repeats many times in both the x and z dimensions so that it doesn't look pixelated, by using texture coordinates > 1.0, or (b) plot a non-flat ground surface, if you preferred, perhaps using a 2D matrix whose values are the "y" values at the various x,z locations around your ground surface. Fonts from the Fifth HW#4 Example Well, there is this bit. It depends on a textures already set, and a twidths table already initialized. # Code from Dr. Jeffery's text.icn example # procedure myDrawString(x,y,z,s) WAttrib("texmode=on") every c := !s do { i := ord(c) row := i/16 col := i%16 ht := 20.5 wd := 20.5 * real(twidths[c]) / 32 u1 := col*32.0/512 v1 := 1.0-(row+1)*32.0/512 u2 := col*32.0/512 v2 := 1.0-row*32.0/512 u3 := (col+(wd/ht))*32.0/512 v3 := 1.0-row*32.0/512 u4 := (col+wd/ht)*32.0/512 v4 := 1.0-(row+1)*32.0/512 Texcoord(u1,v1, u2,v2, u3,v3, u4,v4) DrawPolygon(x-wd/2,y-ht/2,z, x-wd/2,y+ht/2,z, x+wd/2,y+ht/2,z, x+wd/2,y-ht/2,z) x +:= wd + 0.1 } end The initialization code was found in main() &window := open("win","g","size=512,512", "font=sans,32,bold", "canvas=hidden") # 2D window is hidden #### Code from Dr. Jeffery's text.icn example to draw strings ##### asc := WAttrib("ascent") every i := 1 to 16 do every j := 1 to 16 do { DrawString((j-1)*32, (i-1)*32+asc, char((i-1)*16+(j-1))) } twidths := table() every i := 0 to 255 do twidths[char(i)] := TextWidth(char(i)) wfont := &window &window:= open("HW4", "gl", "size="||size) WAttrib("texmode=on") Texture(&window, wfont) Brief Discussion of Texture Tiling Mostly review, I would guess Textures are 2D images whose sizes are powers of two. E.g. you might have a 512x1024 pixel image Textures are always rectangles, even though in 3D, you always render triangles texture coordinates are Cartesian and "normalized" (0.0, 0.0) is the lower left of the texture, (1.0, 1.0) is upper right (x,y,z) vertices in 3D space map to (u,v) points in normalized texture space usually this is, for any triangle in 3-space, extracting a triangle slice of the 2d image common to pack many triangular textures into one big texture image also common to repeat a texture many times to fill a large triangle texture coordinates > 1.0 say to repeat texture by default this will have seams at edges some textures are inherently seamy; on others seams are ugly/unnatural techniques exist to modify a texture so it is seamless, good for e.g. grass, clouds, etc. Mini-example. In CVE, we have carpeting and flooring and walls. If we tried to use textures that cover the entire area, we would either be far too low-resolution, or use far too much texture memory. We need high resolution textures that can repeat For an arbitrary space to be textured, how many times should I repeat the texture? Measure/estimate/record real-world size of NxM pixel image. In CVE, in the textures directory we placed a mini-database of the textures' real-world sizes. I suppose I should convert to JSON: floor_1.jpg { name floor1 real_world_x .4 real_world_y .4 } Divide real-world size of space to be textured (i.e. x,y,z world coordinates of vertices) by real world size of image. Result is (u,v) texture coordinates saying how many times to tile For the JEB tile, we estimated it as 0.4x0.4 (a little less than half a meter). You would tile it 2.5 times in each dimension to fill 1 square meter. For the JEB 2nd floor corridor outside my office, we measured 21.1x3.4 meters. The (u,v) is (52.75,8.5). The four texture coordinates might be (0.0, 0.0), (0.0, 8.5), (52.75,8.5), (52.75,0.0). Vertex order matters. It will look crazy if (x,y,z) vertices are not given in same order as (u,v) texture coordinates. Easy to get things flipped, skewed, etc. In my city, I tossed in some building textures real fast, but didn't supply texture coordinates? So my buildings did not know how to tile last time I showed them to you. Maybe by next Friday, they will. :-) Visualizing Live Software Systems in 3D by Greevy, Lanza, Wysseier (SOFTVIS 2006) From the same group that gave us CodeCity (and preceding that paper!), this paper gives me great hope of addressing some of the issues that I am passionate about, regarding the visualization of static+dynamic information. "feature-centric reverse engineering" you know, captured traces of selected runtime behavior. Like as if you used an Event Mask to only ask for features of interest. how static source artifacts contribute to runtime behavior the connection of statics to dynamics is a central task "feature trace" a record of the steps a program takes during execution of a feature "feature" user-triggerable functionality of a software system which parts of the code are active during the execution of a feature? what's instantiated and how objects collaborate on a feature what patterns of activity are common across features? alleged to give insights into the architectural structure of the system what activities are specific to one feature? The Greevy approach: apply static analysis, extract a static model instrument the code execute code to obtain traces ("trees of method calls") of feature executions. resolve/bind/connect trace events back to static model Trace summarization may eliminate details that provide valuable insights! Visualization is Static class hierarchy + "towers of communicating instances". (Sounds Very similar to SynchroVis, which came after). 5 Dimensions of Interest of Software Visualization (Maletic): Task. Why is the visualization needed? Audience. Who will use the visualization? Target: What low level aspects are visualized? Representation: What best conveys the target information to users? Medium: where are the visualizations rendered? KScope: A Modularized Tool for 3D Visualization of Object-Oriented Programs by Davis, Pestka, and Kaplan (VISSOFT 2003) KScope compare "reverse engineering" of standard UML (left) with Kscope visualization (right) there is a class under study (multicolored cube) cube vs. pyramid for class vs. interface dark blue == "terminator class" (library class) line color (red=association, blue=dependency, magenta=composition, black=implementation, green=inheritance, yellow=interface inheritance) click things for info detail BCEL: Byte Code Engineering Library, a Java thing from Apache. Perhaps subsumed by ASM Visualizing Memory Graphs by Zimmermann and Zeller (Dagstuhl seminar, 2001) Who needs visualization? Programmers debugging bugs need visualization! (gdb) print *tree *tree = {value = 7,name = 0x8049e88 "Ada", _left = 0x804d7d8, _right = 0x0, left_thread = false, right_thread = false, date ={day_of_week = Thu, day = 1, month = 1, year = 1970, _vptr. = 0x8049f78}, static shared = 4711} Modern GUI debuggers still mostly show these values as text. If you use a good one, you might get some depiction of pointers: DDD (pictured above) makes you expand/follow each pointer manually. Pro: program is in control, sets focus of what is to be displayed. Con: wow, to display a linked list of length 100, click 100 next pointers. A memory graph (pictured above) might in fact be a graphic depiction of an entire program state. Consider it to be a (relatively) brute force or literal depiction of memory, with pointers as arrowed edges. Given this depiction, how easy is it to answer questions like these: are there any pointers pointing to this address? how many elements does this data structure have? is this allocated block reachable from within my module? did this tree change during the last function call? Now: what downsides or challenges can you suggest might occur with memory graphs? How do they get these memory graphs? I think it is fair to say: painfully.