Tree Structure
Detection and Visualization


Michael Chase Brazell and Clinton L. Jeffery
August 6, 1997
Technical Report CS-97-7







Abstract


Trees are a common data structure used in many programs. Monitoring tree growth and changes in structure can be difficult, especially when trees become large. This report describes a mechanism for automatic detection of tree data structures in ANSI C programs and subsequent visualization of the dynamic tree structure using a 3-dimensional graphics display.






Division of Computer Science
The University of Texas at San Antonio
San Antonio, TX 78249









Introduction

In order to understand or debug a program that utilizes tree data structures, it is often necessary to determine the structural dynamics of the tree while the program is executing. This may involve printing the contents of nodes with a tabular layout to distinguish the levels of the tree. For larger trees, such textual representations are not easily read and may be inadequate for the programmer to visualize the tree mentally. Graphical depictions of trees are more useful for program understanding and debugging, but often they are not included in applications or source-level debuggers.

This report offers an alternative approach to printfs and tabular displays. The methodology described consists of monitoring the target program during its execution, detecting the allocation and access patterns of tree structures, and displaying the resulting trees using a 3-dimensional, graphical display. CTV, C Tree Visualizer, is a tree visualization tool that works on ANSI C programs using various tree representations. The technique that CTV employs requires no modification to the target program source code.

Alamo--A Framework for Execution Monitoring

The detection of tree data types and structure dynamics is facilitated by monitoring the execution of the target program. Alamo, A Lightweight Architecture for MOnitoring, is employed for this task [2]. Alamo provides a framework for monitoring the execution of ANSI C programs [1]. The framework consists of three major components: a Configurable C Instrumentation (CCI) tool, the Alamo Monitor Executive (AME), and run-time support libraries for target program access.

Before a program can be monitored by Alamo, it must be instrumented to generate events--individual units of program behavior such as memory references and heap allocations [2]. Instrumentation of the target program to generate events is performed prior to compilation by the framework component called CCI. CCI is a preprocessor that performs a full front-end compiler analysis (lexical, syntax and semantic) for ANSI C, and generates instrumented C which is then compiled into relocatable object file format.

The framework component called AME provides an execution environment in which program execution monitors can observe a C program's execution [1]. AME dynamically loads monitors into the same address space as the target program under observation. AME allows monitors to obtain event information reported by the instrumentation and provides access to the target program's state and variable information through run-time support libraries.

With AME, execution monitors use descriptors to reference target program variables. Descriptors are defined as:
	typedef struct {
		struct ctype *type;
		void *addr;
	} Desc;
	
type is a pointer to a structure that contains type information. addr is a generic pointer that holds the variable's memory location. Service functions are provided by AME that extract information for a given descriptor, such as variable type and size.

Events

During the target program's execution, events are generated which describe run-time behavior. The events are reported to the execution monitor for processing.

For example,
        tree *root;
        root = (tree *)malloc(sizeof(tree));	
would generate the events:
        E_Refp          [pointer reference]
        E_Setp          [pointer being assigned]
        E_Fcall         [function call]
        E_Fparam        [function parameter]
        E_Fret          [function return]
        E_Assignp       [pointer assignment]
	
Events have associated event values [1]. The value of the event depends upon the type of event generated. For example, the event value of E_Refp, the event indicating pointer reference, is the value of the memory address of the pointer being referenced. The event value of E_Assignp, the event indicating pointer assignment, is the value being assigned.

Detection of Tree Operations

Each type of tree-related activity such as node allocations and node references are indicated by a specific series of events. Because other non-tree behavior produces similar event sequences, a finite state machine (FSM) is used to track series of events and detect tree operations. A push-down FSM uses a stack, which is useful for saving states and coming back to them later (e.g., push on a function call, pop on a function return). A push-down FSM is implemented in the function handle_Event() to process the events and determine whether or not the target program activity is of interest. Figure 1 shows the state diagram for the FSM.

To illustrate using the example of events in the previous section, handle_Event() would first be called with the E_Refp event and its event value, the address of the pointer being referenced. Using the AME access function EvStab(), a descriptor for the variable is obtained. The descriptor is then tested by isRecursiveType() (described in the next section) to determine if it is a recursive structure. If it is, the FSM goes to state 1. Upon seeing the E_Setp event, the FSM goes to state 2. The E_Fcall event causes a push of the FSM, and the state is reset to 0. When the E_Fret event is encountered, the FSM is popped, and the state is restored to 2. On the E_Assignp event, the value for the variable is available, and the variable is stored as a root in the tree table (described in a later section). At this point, the FSM is reset to state 0, and the process continues.

A similar operation occurs for child nodes. States 3 and 4 of the FSM are used for child node references.

To handle deallocation of nodes, a flag is used to indicate a call to free(). On the E_Fparam event, if the FSM has seen a recursive structure (state 1 or 3), and the free_called flag is set, the node is removed from the tree table.



Figure 1: state diagram for the finite state machine used in handle_Event()


Detection of Recursive Structures

The function isRecursiveType() determines if a descriptor refers to a recursively-defined structure. If the descriptor's variable is a structure type that contains a self-reference, the descriptor is considered to be recursively defined. Specifically, if any element of the structure or any nested structure refers to the outermost structure, the structure is considered to be of a recursive type.

isRecursiveType() returns true for standard recursively-defined structures declared like:
        struct tree {
               ...
               struct tree *left, *right;
        };
	
and:
        struct tree {
               ...
               struct tree *children[MAX];
        };
	
as well as heterogeneous recursively-defined structures declared like:
        struct node {
               ...
               struct nodelist *children;
        };

        struct nodelist {
               ...
               struct node *n;
               struct nodelist *next;
        };
	
Note that not all recursively-defined structures are trees. For example, DAGs and cyclic graphs are recursively-defined structures, but are not trees. Currently, CTV considers any recursively-defined structure to be a tree. As discussed in the Future Work section of this report, it would be desirable to "weed out" non-tree structures, and this is a subject for future development.

Storage of "Mirror" Tree

To keep track of detected trees and their structures, CTV stores "mirror" images of the trees. A hash table is used for this purpose. Each node has an entry in the hash table. A list of roots is also maintained for efficiency of visualization. An entry is defined as:
        typedef struct entry_t {
                Desc var;
                int addr;
                int numkids;
                position pos;
                struct entry_t *parent;
                struct entry_t *kids[MAX_KIDS];
                struct entry_t *last;
                struct entry_t *next;
        } entry;
        
var is a descriptor that holds the type and memory address of the value of the node (what the node points to). addr holds the memory address of the actual node pointer. numkids is the current number of children the node possesses. pos is used for visualization and stores the node's position in 3D coordinates. parent is a pointer to the entry containing this node's parent (if it exists). kids is an array of pointers to this node's children. last and next are used for hash table operations. Entries are hashed using the addr field of the var descriptor.

Several functions are employed for the insertion and deletion of nodes within the hash table:
        entry *insertRoot(table *t, Desc node, int addr);
        
Creates an entry for a root node whose address is addr and whose value is referenced by the descriptor node; inserts the entry into table t, and returns a pointer to the newly created entry.
        entry *insertNode(table *t, Desc parent, Desc child, int index, int addr);
        
Creates an entry for an internal node whose address is addr and whose value is referenced by the descriptor child; parent is the descriptor referencing the parent node; if index is positive, this refers to the index of the child in the parent's list of child pointers being assigned to; the entry for parent is obtained and connected with the newly created entry for child; inserts the new entry into table t, and returns a pointer to the entry.
        void insertEntry(table *t, Desc parent, entry *child, int index, int addr);
        
Connects an existing entry, child, with the entry in table t that holds the descriptor for parent; if index is positive, this refers to the index of the child in the parent's list of child pointers being assigned to; addr is the address being assigned to.
        void removeRoot(table *t, Desc root);
        
Removes from table t the entry that holds the descriptor for root.
        void removeNode(table *t, Desc parent, Desc child, int index);
        
Removes from table t the entry that holds the descriptor for child; disconnects the parent entry from the child entry; if index is positive, this refers to the index of the child in the parent's list of child pointers.

Visualization Using OpenGL

Once a node is inserted into the tree table, it can be visualized. The graphics library used for the visualization is OpenGL.

OpenGL, or Open Graphics Library, is a hardware-independent graphics library that produces high-quality color images of 3D objects [3]. It is available on many window systems, including X, Windows, and Windows NT, and because of this broad availability, it is a sensible choice for graphics programming.

CTV makes use of the OpenGL aux library for creating and displaying the tree window, as well as for rendering the trees. Trees are displayed using spheres for nodes and cylinders for connections between nodes and children. Roots are centered on the y-axis, and multiple trees are rendered along the y-axis, separated by empty space. The nodes of different trees are colored differently to further distinguish them.

To display a tree, the tree window is first created. This is accomplished with a call to:
        void initTreeWindow(void);
	
This function creates, positions, and displays an OpenGL window, and sets up the lights in the render area. With the window now created, trees can be displayed. To display all trees within a tree table, the function
        void drawTrees(table *t);
	
is called. This function walks through the rootlist of the table t, drawing each tree encountered.

The first call to drawTrees() occurs when the first node is inserted into the tree table. After which, any changes to the tree table causes a redraw to update the graphics display. When the target program has completed execution, the tree window continues to display the final trees. The trees are rotated about the y-axis continually to give a complete view of all sides.

Examples

The examples shown below illustrate the final output of CTV after monitoring various sample programs.

Figure 2 shows the result of the following program:
        #define mknode()        (tree *)calloc(1, sizeof(tree))

        typedef struct node_t {
                struct node_t *kids[3];
        } node;

        void main(void)
        {
           node *root;

           root = mknode();
           root->kids[0] = mknode();
           root->kids[1] = mknode();
           root->kids[2] = mknode();
        }
	


Figure 2: result of simple program


Figure 3 shows a display of multiple trees, demonstrating the use of different colors to distinguish distinct trees.



Figure 3: display of multiple trees


Figure 4 shows the result of the following program which utilizes a heterogeneous type tree:
        #define mknode()        (node *)calloc(1, sizeof(node))
        #define mklist()        (list *)calloc(1, sizeof(list))

        typedef struct nodelist_t {
                struct node_t *n;
                struct nodelist_t *next;
        } list;

        typedef struct node_t {
                struct nodelist_t *kids;
        } node;


        void main(void)
        {
           node *root;

           root = mknode();
           root->kids = mklist();
           root->kids->n = mknode();
           root->kids->next = mklist();
           root->kids->next->n = mknode();
        }
	


Figure 4: result of program utilizing heterogeneous type tree


Figure 5 shows the output for a more complex tree.



Figure 5: a more complex tree


Future Work

The tree detection and visualization tool was motivated partly by a desire to create a debugging tool for programs utilizing trees. To this end, it would be desirable to be able to get information about the nodes being displayed. Such information would include node structure contents, and could be obtained by "clicking" on the node in the display window.

As mentioned before, CTV currently considers all recursively-defined structures to be trees. This poses a problem when a program utilizes a recursive data type such as a DAG. Because CTV was created to visualize trees, its behavior is undefined with non-tree, recursively-defined structures. A future endeavor would include dynamic analysis of data types to ensure that only trees are monitored and visualized.

This implementation of CTV uses a simple and naive algorithm to assign positions to tree nodes. In some cases, two nodes may overlap. This limitation could be eliminated by the use of a good tree-drawing algorithm, such as cone tree [4] or fisheye [5].

Further enhancements could include user options such as types of trees to display and extent of depth to display.

References

[1] Wenyi Zhou, "Implementation of the Alamo Monitor Executive", Master's Thesis, Division of Computer Science, University of Texas at San Antonio, 1996.

[2] Clinton Jeffery, Wenyi Zhou, and Kevin Templer, "A Lightweight Architecture for Program Execution Monitoring", Technical Report 96-7, Division of Computer Science, University of Texas at San Antonio, 1996.

[3] Jackie Neider, Tom Davis, and Mason Woo, OpenGL Programming Guide, Addison-Wesley Publishing Company, 1993.

[4] George G. Robertson, Jock D. MacKinlay, and Stuart K. Card, "Cone Trees: Animated 3D Visualizations of Hierarchical Information", Proceedings of CHI '91, New Orleans, April, 1991, pp. 189-194.

[5] Anthony G. Jones, Khahn K. Mai, Niem Tang, and Clinton L. Jeffery, "Visualization Techniques for Large Trees", Technical Report 95-4, Division of Computer Science, University of Texas at San Antonio, 1995.