Tree Structure
Detection and Visualization
Michael Chase Brazell and Clinton L. Jeffery
August 6, 1997
Technical Report CS-97-7
Abstract
Trees are a common data structure used in many programs. Monitoring tree
growth and changes in structure can be difficult, especially when trees become
large. This report describes a mechanism for automatic detection of tree
data structures in ANSI C programs and subsequent visualization of the
dynamic tree structure using a 3-dimensional graphics display.
Division of Computer Science
The University of Texas at San Antonio
San Antonio, TX 78249
Introduction
In order to understand or debug a program that utilizes tree data
structures, it is often necessary to determine the structural dynamics of
the tree while the program is executing. This may involve printing the
contents of nodes with a tabular layout to distinguish the levels of the
tree. For larger trees, such textual representations are not easily read
and may be inadequate for the programmer to visualize the tree mentally.
Graphical depictions of trees are more useful for program understanding and
debugging, but often they are not included in applications or source-level
debuggers.
This report offers an alternative approach to printf
s and
tabular displays.
The methodology described consists of monitoring the target program
during its execution, detecting the allocation and access patterns of tree
structures, and displaying the resulting trees using a 3-dimensional,
graphical display. CTV, C Tree Visualizer, is a tree visualization tool that
works on ANSI C programs using various tree representations. The technique
that CTV employs requires no modification to the target program source code.
Alamo--A Framework for Execution Monitoring
The detection of tree data types and structure dynamics is facilitated
by monitoring the execution of the target program. Alamo, A
Lightweight Architecture for MOnitoring, is employed
for this task [2]. Alamo provides
a framework for monitoring the execution of ANSI C programs [1]. The framework
consists of three major components: a Configurable C Instrumentation (CCI)
tool, the Alamo Monitor Executive (AME), and run-time support libraries for
target program access.
Before a program can be monitored by Alamo, it must be instrumented to
generate events--individual units of program behavior such as memory
references and heap allocations [2]. Instrumentation of the target program
to generate events is performed prior to compilation by the framework
component called CCI. CCI is a preprocessor that performs a full front-end
compiler analysis (lexical, syntax and semantic)
for ANSI C, and generates instrumented C which is then compiled into
relocatable object file format.
The framework component called AME provides an execution environment in
which program execution monitors can observe a C program's execution [1].
AME dynamically loads monitors into the same address space as the target
program under observation. AME allows monitors to obtain event information
reported by the instrumentation and provides access to the target program's
state and variable information through run-time support libraries.
With AME, execution monitors use descriptors to reference target program
variables. Descriptors are defined as:
typedef struct {
struct ctype *type;
void *addr;
} Desc;
type is a pointer to a structure that contains type information.
addr is a generic pointer that holds the variable's memory location.
Service functions are provided by AME that extract information for a given
descriptor, such as variable type and size.
Events
During the target program's execution, events are generated which describe
run-time behavior. The events are reported to the execution
monitor for processing.
For example,
tree *root;
root = (tree *)malloc(sizeof(tree));
would generate the events:
E_Refp [pointer reference]
E_Setp [pointer being assigned]
E_Fcall [function call]
E_Fparam [function parameter]
E_Fret [function return]
E_Assignp [pointer assignment]
Events have associated event values [1]. The value of the event depends
upon the type of event generated. For example, the event value of E_Refp,
the event indicating pointer reference, is the value of the memory address
of the pointer being referenced. The event value of E_Assignp, the event
indicating pointer assignment, is the value being assigned.
Detection of Tree Operations
Each type of tree-related activity such as node allocations and node references
are indicated by a specific series of events. Because other non-tree behavior
produces similar event sequences, a finite state machine (FSM) is used to
track series of events and detect tree operations. A push-down FSM uses a
stack, which is useful for saving states and coming back to them later
(e.g., push on a function call, pop on a function return).
A push-down FSM is implemented in the function
handle_Event() to process the events and determine whether or not the
target program activity is of interest. Figure 1 shows the state diagram for
the FSM.
To illustrate using the example of events in the previous section,
handle_Event() would first be called
with the E_Refp event and its event value, the address of the pointer being
referenced. Using the AME access function EvStab(), a descriptor for the
variable is obtained. The descriptor is then tested by isRecursiveType()
(described in the next section) to determine if it is a recursive structure.
If it is, the FSM goes to state 1. Upon seeing the E_Setp event, the FSM
goes to state 2. The E_Fcall event causes a push of the FSM, and the state
is reset to 0. When the E_Fret event is encountered, the FSM is popped, and
the state is restored to 2. On the E_Assignp event, the value for the
variable is available, and the variable is stored as a root in the tree
table (described in a later section). At this point, the FSM is reset to
state 0, and the process continues.
A similar operation occurs for child nodes. States 3 and 4 of the FSM
are used for child node references.
To handle deallocation of nodes, a flag is used to indicate a call to free().
On the E_Fparam event, if the FSM has seen a recursive structure
(state 1 or 3), and the free_called flag is set, the node is removed
from the tree table.
Figure 1: state diagram for the finite state machine used in handle_Event()
Detection of Recursive Structures
The function isRecursiveType() determines if a descriptor refers to a
recursively-defined structure. If the descriptor's variable is a structure type
that contains a self-reference, the descriptor is considered to be recursively
defined. Specifically, if any element of the structure or any nested structure
refers to the outermost structure, the structure is considered to be of a
recursive type.
isRecursiveType() returns true for standard recursively-defined structures
declared like:
struct tree {
...
struct tree *left, *right;
};
and:
struct tree {
...
struct tree *children[MAX];
};
as well as heterogeneous recursively-defined structures declared like:
struct node {
...
struct nodelist *children;
};
struct nodelist {
...
struct node *n;
struct nodelist *next;
};
Note that not all recursively-defined structures are trees. For example,
DAGs and cyclic graphs are recursively-defined structures, but are not trees.
Currently, CTV considers any recursively-defined structure to be a tree. As
discussed in the Future Work section of this report, it would be desirable
to "weed out" non-tree structures, and this is a subject for future
development.
Storage of "Mirror" Tree
To keep track of detected trees and their structures, CTV stores "mirror"
images of the trees. A hash table is used for this purpose. Each node
has an entry in the hash table. A list of roots is also maintained for
efficiency of visualization. An entry is defined as:
typedef struct entry_t {
Desc var;
int addr;
int numkids;
position pos;
struct entry_t *parent;
struct entry_t *kids[MAX_KIDS];
struct entry_t *last;
struct entry_t *next;
} entry;
var is a descriptor that holds the type and memory address of the
value of the node (what the node points to). addr holds the
memory address of the actual node pointer. numkids is the current
number of children the node possesses. pos is used for visualization and
stores the node's position in 3D coordinates. parent is a pointer to the
entry containing this node's parent (if it exists). kids is an array of
pointers to this node's children. last and next are used for
hash table operations. Entries are hashed using the addr field of the var
descriptor.
Several functions are employed for the insertion and deletion of nodes
within the hash table:
entry *insertRoot(table *t, Desc node, int addr);
Creates an entry for a root node whose address is addr and whose value
is referenced by the descriptor node; inserts the entry into table
t, and returns a pointer to the newly created entry.
entry *insertNode(table *t, Desc parent, Desc child, int index, int addr);
Creates an entry for an internal node whose address is addr and whose
value is referenced by the descriptor child; parent is the
descriptor referencing the parent node; if index is positive, this
refers to the index of the child in the parent's list of child pointers being
assigned to; the entry for parent is obtained and connected with the newly
created entry for child; inserts the new entry into table t,
and returns a pointer to the entry.
void insertEntry(table *t, Desc parent, entry *child, int index, int addr);
Connects an existing entry, child, with the entry in table t
that holds the descriptor for parent; if index is positive,
this refers to the index of the child in the parent's list of
child pointers being assigned to; addr is the address being assigned to.
void removeRoot(table *t, Desc root);
Removes from table t the entry that holds the descriptor for
root.
void removeNode(table *t, Desc parent, Desc child, int index);
Removes from table t the entry that holds the descriptor for
child; disconnects the parent entry from the child entry;
if index is positive, this refers to the index of the child in the
parent's list of child pointers.
Visualization Using OpenGL
Once a node is inserted into the tree table, it can be visualized. The
graphics library used for the visualization is OpenGL.
OpenGL, or Open Graphics Library, is a hardware-independent graphics library
that produces high-quality color images of 3D objects [3]. It is available on many window systems, including X, Windows, and Windows NT, and because of this broad availability, it is a sensible choice for graphics programming.
CTV makes use of the OpenGL aux library for creating and displaying the tree
window, as well as for rendering the trees. Trees are displayed using spheres
for nodes and cylinders for connections between nodes and children. Roots are
centered on the y-axis, and multiple trees are rendered along the y-axis,
separated by empty space. The nodes of different trees are colored
differently to further distinguish them.
To display a tree, the tree window is first created. This is accomplished
with a call to:
void initTreeWindow(void);
This function creates, positions, and displays an OpenGL window, and sets up
the lights in the render area. With the window now created, trees can be
displayed. To display all trees within a tree table, the function
void drawTrees(table *t);
is called. This function walks through the rootlist of the table t,
drawing each tree encountered.
The first call to drawTrees() occurs when the first node is inserted into
the tree table. After which, any changes to the tree table causes a redraw
to update the graphics display. When the target program has completed
execution, the tree window continues to display the final trees. The trees
are rotated about the y-axis continually to give a complete view of all sides.
Examples
The examples shown below illustrate the final output of CTV after monitoring
various sample programs.
Figure 2 shows the result of the following program:
#define mknode() (tree *)calloc(1, sizeof(tree))
typedef struct node_t {
struct node_t *kids[3];
} node;
void main(void)
{
node *root;
root = mknode();
root->kids[0] = mknode();
root->kids[1] = mknode();
root->kids[2] = mknode();
}
Figure 2: result of simple program
Figure 3 shows a display of multiple trees, demonstrating the use
of different colors to distinguish distinct trees.
Figure 3: display of multiple trees
Figure 4 shows the result of the following program which utilizes a
heterogeneous type tree:
#define mknode() (node *)calloc(1, sizeof(node))
#define mklist() (list *)calloc(1, sizeof(list))
typedef struct nodelist_t {
struct node_t *n;
struct nodelist_t *next;
} list;
typedef struct node_t {
struct nodelist_t *kids;
} node;
void main(void)
{
node *root;
root = mknode();
root->kids = mklist();
root->kids->n = mknode();
root->kids->next = mklist();
root->kids->next->n = mknode();
}
Figure 4: result of program utilizing heterogeneous type tree
Figure 5 shows the output for a more complex tree.
Figure 5: a more complex tree
Future Work
The tree detection and visualization tool was motivated partly by a desire
to create a debugging tool for programs utilizing trees. To this end, it
would be desirable to be able to get information about the nodes being
displayed. Such information would include node structure contents, and
could be obtained by "clicking" on the node in the display window.
As mentioned before, CTV currently considers all recursively-defined
structures to be trees. This poses a problem when a program utilizes a
recursive data type such as a DAG. Because CTV was created to visualize
trees, its behavior is undefined with non-tree, recursively-defined
structures. A future endeavor would include dynamic analysis of data types
to ensure that only trees are monitored and visualized.
This implementation of CTV uses a simple and naive algorithm to assign
positions to tree nodes. In some cases, two nodes may overlap. This
limitation could be eliminated by the use of a good tree-drawing
algorithm, such as cone tree [4] or fisheye [5].
Further enhancements could include user options such as types of trees
to display and extent of depth to display.
References
[1] Wenyi Zhou, "Implementation of the Alamo Monitor Executive", Master's Thesis,
Division of Computer Science, University of Texas at San Antonio, 1996.
[2] Clinton Jeffery, Wenyi Zhou, and Kevin Templer, "A Lightweight
Architecture for Program Execution Monitoring", Technical Report 96-7,
Division of Computer Science, University of Texas at San Antonio, 1996.
[3] Jackie Neider, Tom Davis, and Mason Woo, OpenGL Programming Guide,
Addison-Wesley Publishing Company, 1993.
[4] George G. Robertson, Jock D. MacKinlay, and Stuart K. Card, "Cone Trees:
Animated 3D Visualizations of Hierarchical Information", Proceedings of
CHI '91, New Orleans, April, 1991, pp. 189-194.
[5] Anthony G. Jones, Khahn K. Mai, Niem Tang, and Clinton L. Jeffery,
"Visualization Techniques for Large Trees", Technical Report 95-4,
Division of Computer Science, University of Texas at San Antonio, 1995.