CS 370 Lab #8: Three Address Instructions

Three-Address Code (TAC) are the basis of intermediate code generation. The instruction set, described in the lecture notes, corresponds approximately to what CISC computers can do in 1-2 instructions, and RISC computers might do in 2-4 instructions. TAC skips all issues related to addressing modes and registers which vary radically depending on the CPU.

In this lab you will develop a C data type for working with TAC codes. Take these instructions as guidelines, not hard rules.

Representation of TAC Instructions

Here is one possible C representation of three address instructions:
struct instr {
   int opcode;
   struct addr dest, src1, src2;
   struct instr *next;
   };
What are the opcodes? Integer codes that correspond to each instruction. How do we print them out by name, instead of number? See if you can figure out how to use the following components:

in the .h file... in a .c file...

#define NOP 0
#define ADD 1
#define SUB 2
#define MUL 3
#define DIV 4
#define NEG 5
#define ASN 6
#define ADDR 7
#define LCONT 8
#define SCONT 9
#define GOTO 10
#define BEQ 11
#define BGR 12
#define BGREQ 13
#define BLESS 14
#define BLEQ 15
#define BNEQ 16
#define BIF 17
#define BIFN 18
#define PARM 19
#define CALL 20
#define RET 21
#define GLOB22
#define PROC 23
#define LOCAL 24
#define LAB 25
#define END 26

char *opcodes[] = {
"NOP", /* no operation */
"ADD",
"SUB",
"MUL",
"DIV",
"NEG",
"ASN",
"ADDR",
"LCONT",
"SCONT",
"GOTO",
"BEQ",
"BGR",
"BGREQ",
"BLESS",
"BLEQ",
"BNEQ",
"BIF",
"BIFN",
"PARM",
"CALL",
"RET",
"GLOB",
"PROC",
"LOCAL",
"LAB",
"END",
};

Yeah, OK, but what are addresses? For most languages, addresses are region:offset pairs:

struct addr {
   int region;
   int offset;
   };
For our purposes there are four regions (codes 1, 2, 3, 4, for "code", "global", "stack", and "heap" respectively). Offsets are from the beginning of the region and are positive, except for stack offsets, which are relative to a "frame pointer" register.

There is also a "pseudo-region": integer constants can be expressed immediately as region 5, with the offset holding the actual value of the constant.

For the sake of shortness, regions have easy-to-remember one-letter codes to use when printing human-readable text versions of these addresses:

code meaning
L "label" (the standard way to refer to code addresses)
G "global"
S "stack" (the only region where offsets can be negative)
H "heap" (used for certain string and array operations)
C "constant" (immediate mode, for integer constants)

Operations on TAC Instruction Lists

Sequences of TAC instructions form linked lists, so we need link list operations.

  • Allocate one instruction:
    struct instr *gen(int o, struct addr d, struct addr s1, struct addr s2)
    {
       struct instr *p = (struct instr *)malloc(sizeof (struct instr));
       if (p == NULL) return NULL;
       p->opcode = o;
       p->dest = d;
       p->src1 = s1;
       p->src2 = s2;
       p->next = NULL;
       return p;
    }
    
    For opcodes which do not use one or more of the operands, define a global variable
    struct addr NOTUSED = {-1,-1};
    
    your code can pass NOTUSED in whenever it calls gen() for the addresses that are not used by that opcode.
  • Concatenate two lists:
    struct instr *cat(struct instr *l1, struct instr *l2)
    {
       struct instr *p = l1;
       while (p->next != NULL) p=p->next;
       p->next = l2;
       return l1;
    }
    

    Lab Exercises

    Do these by the start of the next lab.
    1. Create files instr.h and instr.c with the code give above. Put structures, #define's, extern references to globals, and function prototypes in the .h file. Put actual functions and global variables in the .c file.
    2. Write a function tacprint(struct instr *tac) that prints out an entire link list of three-address instructions, one per line. Use the opcodes array to print opcodes by human-readable name. Print addresses in the format R:offset where R is a letter naming the region and offset is an integer. For example: G:32 for global region offset 32.
    3. Test your three address instructions by writing a toy main() function that builds up a list of three instructions, and then calling tacprint() to obtain the output:
      	ASN G:4 C:5
      	MUL G:8 G:4 C:7
      	ADD G:0 G:4 G:8
      
      This loads the value 5 into global memory at offset 4, multiplies the value at offset 4 by the constant 7 and stores the result (35) at global region offset 8, and adds values from global region offsets 4 and 8 and stores the result (40) in global region offset 0.
    4. Turn in (electronically, via the turnin.html page) your main.c and your instr.c files.