CS 370
Homework Assignment 6: Code Generation
Due: Tuesday May 2, 11:59pm
Take your 370-C semantic analyzer, and add intermediate code generation.
If you are out of time and cannot do HW #7, you should write your
intermediate code out in a special executable format described below.
- Use the struct type for lists of 3-address instructions from
lab 8.
- Define a data type for "memory address". A memory address is a
<region,offset> pair, where region is one of GLOBAL, CONSTANT,
PARAMETER, or LOCAL. Offsets start with 0 in each region.
- Write out "declarations" (pseudocode instructions) for functions.
- For each variable in each symbol table, assign it a memory address.
Compute offsets assuming everything requires 4 bytes.
- Compute a synthesized attribute "location" for every expression.
Location is of type "memory address", i.e. a <region:offset> pair.
Allocate a "temporary variable" out of the LOCAL region for each
value computed by an operator or function call.
- Compute a synthesized attribute "code" that builds
a link list of 3-address instructions, and a memory address
for each expression.
Output from this phase should consist of a file containing intermediate
code instructions. If the input was foo.c, the output file should be
named foo.ic ("intermediate code"). The syntax is intended to be
human readable, and readily translated into an executable format.
3CPO: 3-address C-based Partly-compiled Output
Note: if you do HW#7, you do not have to use this format for HW#6.
However, you should print out your list of 3-address instructions in a
reasonable human-readable format with opcodes written as strings, and
memory addresses' regions written using a string or one-letter code,
it should not be something where the TA or myself is guessing what it is.
The goal of 3CPO intermediate code is to be a machine-independent format
that you can actually run in order to test your work. 3CPO is
syntactically correct, greatly simplified C language source code.
For example, after compiling hello.c your compiler should produce
hello.c.c and then run the real C compiler in order to produce an
executable file. You should recognize a special option -.c that
stops after your compiler generates code and does not invoke the
real C compiler; all other options should be passed on to the real
C compiler, including -c, -S, and so forth.
No complex expressions
3CPO contains at most one assignment and one other operator per statment.
For example,
memref := memref op memref
In this example memref
stands for a memory reference;
see the section on memory layout, below.
No control structures
3CPO contains no if-then-else's, no switch statements, and no loops. You
should use labels and go-to's instead. A conditional branch (an if
expression with a goto for a "then" part, no "else" part, and at most
one operator in the test expression) is allowed.
No constants
Manifest constants should be reduced to memory references allocated from
the global variable section; see below. Constants will have to be broken
down into their individual byte contents and written out in
appropriate byte order (which is different on Intel x86 than on Sparc,
for example).
Memory layout
Do your own memory layout for globals and for each procedure. The layout
must include the layout of structures and arrays. 3CPO code does not
include the dot, arrow, or subscript operators (. -> []
)
Two global variables
In 3CPO, you are allowed only two global variables, G1 and G2, which are
arrays of char. G1 holds ordinary (uninitialized) variables. G2 is an
initialized array and holds global variables with initial values as well
as constants. Allocate all your global variables as byte-offsets and widths
within these arrays. Access all base types other than char by casting the
correct pointer to the appropriate type, and dereferencing it.
Longword-align all ints and floats.
Example. To reference an int at offset 16 in the uninitialized global
section:
*(int *)(G1+16)
This format will be used for all memory references.
One Local Variable
You are allowed only one local variable per procedure, L, which is an array
of char. Allocate all your local variables at byte-offsets and widths within
this array. Longword-align all ints.
One Parameter
You are allowed only one parameter per procedure, P, which is a pointer to
char. Allocate all your parameters as byte-offsets and widths within this
array. Longword-align all ints and floats.
Note that the stack space for parameters must be allocated
by the caller in their local variable area. Think of parameters as temporary
variables allocated by the caller but accessible to the callee.
The exception to the above parameter passing rules is that you must use
ordinary calling conventions to call C library routines, such as printf.
In order to tell them apart, you should treat all external procedures
(i.e. ones for which you didn't generate the code yourself) specially, and
call them using normal parameter conventions.
Example
For the following program:
void p(char s[], int x);
int x;
int y = 5;
void main()
{
int z;
z = y + 2;
p("hello, z is %d\n", z);
}
void p(char s[], int x)
{
printf(s, z);
}
The 3CPO output might look like:
char G1[4];
char G2[23] = {
0, 0, 0, 5,
0, 0, 0, 2,
'h', 'e', 'l', 'l', 'o', ' ', 'z', ' ', 'i', 's', ' ', '%', 'd', '\n', 0,
};
void main()
{
char L[12];
*(int *)(L+0) = *(int *)(G2+0) + *(int *)(G2+4);
*(char **)(L+4) = G2+8;
*(int *)(L+8) = *(int *)(L+0);
p(L+4);
}
void p(char P[])
{
printf(*(char **)(P+0), *(int *)(P+4));
}
3CPO is not pretty, but it will let you run your intermediate code.
Generating real assembler code might be more educational, more challenging,
and more fun.