CS 370
Homework Assignment 6: Code Generation

Due: Tuesday May 2, 11:59pm

Take your 370-C semantic analyzer, and add intermediate code generation. If you are out of time and cannot do HW #7, you should write your intermediate code out in a special executable format described below.

Output from this phase should consist of a file containing intermediate code instructions. If the input was foo.c, the output file should be named foo.ic ("intermediate code"). The syntax is intended to be human readable, and readily translated into an executable format.

3CPO: 3-address C-based Partly-compiled Output

Note: if you do HW#7, you do not have to use this format for HW#6. However, you should print out your list of 3-address instructions in a reasonable human-readable format with opcodes written as strings, and memory addresses' regions written using a string or one-letter code, it should not be something where the TA or myself is guessing what it is.

The goal of 3CPO intermediate code is to be a machine-independent format that you can actually run in order to test your work. 3CPO is syntactically correct, greatly simplified C language source code. For example, after compiling hello.c your compiler should produce hello.c.c and then run the real C compiler in order to produce an executable file. You should recognize a special option -.c that stops after your compiler generates code and does not invoke the real C compiler; all other options should be passed on to the real C compiler, including -c, -S, and so forth.

No complex expressions

3CPO contains at most one assignment and one other operator per statment. For example,

memref := memref op memref

In this example memref stands for a memory reference; see the section on memory layout, below.

No control structures

3CPO contains no if-then-else's, no switch statements, and no loops. You should use labels and go-to's instead. A conditional branch (an if expression with a goto for a "then" part, no "else" part, and at most one operator in the test expression) is allowed.

No constants

Manifest constants should be reduced to memory references allocated from the global variable section; see below. Constants will have to be broken down into their individual byte contents and written out in appropriate byte order (which is different on Intel x86 than on Sparc, for example).

Memory layout

Do your own memory layout for globals and for each procedure. The layout must include the layout of structures and arrays. 3CPO code does not include the dot, arrow, or subscript operators (. -> [])
Two global variables
In 3CPO, you are allowed only two global variables, G1 and G2, which are arrays of char. G1 holds ordinary (uninitialized) variables. G2 is an initialized array and holds global variables with initial values as well as constants. Allocate all your global variables as byte-offsets and widths within these arrays. Access all base types other than char by casting the correct pointer to the appropriate type, and dereferencing it. Longword-align all ints and floats.

Example. To reference an int at offset 16 in the uninitialized global section:

   *(int *)(G1+16)
This format will be used for all memory references.
One Local Variable
You are allowed only one local variable per procedure, L, which is an array of char. Allocate all your local variables at byte-offsets and widths within this array. Longword-align all ints.
One Parameter
You are allowed only one parameter per procedure, P, which is a pointer to char. Allocate all your parameters as byte-offsets and widths within this array. Longword-align all ints and floats.

Note that the stack space for parameters must be allocated by the caller in their local variable area. Think of parameters as temporary variables allocated by the caller but accessible to the callee. The exception to the above parameter passing rules is that you must use ordinary calling conventions to call C library routines, such as printf. In order to tell them apart, you should treat all external procedures (i.e. ones for which you didn't generate the code yourself) specially, and call them using normal parameter conventions.

Example

For the following program:
void p(char s[], int x);
int x;
int y = 5;
void main()
{
  int z;
  z = y + 2;
  p("hello, z is %d\n", z);
}
void p(char s[], int x)
{
  printf(s, z);
}
The 3CPO output might look like:
char G1[4];
char G2[23] = {
   0, 0, 0, 5,
   0, 0, 0, 2,
   'h', 'e', 'l', 'l', 'o', ' ', 'z', ' ', 'i', 's', ' ', '%', 'd', '\n', 0,
   };
void main()
{
   char L[12];
   *(int *)(L+0) = *(int *)(G2+0) + *(int *)(G2+4);
   *(char **)(L+4) = G2+8;
   *(int *)(L+8) = *(int *)(L+0);
   p(L+4);
}
void p(char P[])
{
   printf(*(char **)(P+0), *(int *)(P+4));
}
3CPO is not pretty, but it will let you run your intermediate code. Generating real assembler code might be more educational, more challenging, and more fun.