In this assignment you will write a lexical analyzer in C or flex(1), for the "370-C" language. Your lexical analyzer should be compatible (or written in) lex(1); e.g. you should write a function yylex() that returns an integer code for each token. Note: watch this page on the web for additional clarifications as needed based on your questions.
The 370-C language has:
For the lexical analyzer, you should try to recognize the whole C language, and print errors for those parts that you don't have to implement. You are strongly urged to develop a set of regular expressions (and if you code in C, the corresponding finite automata) to describe the relevant token types, before you start coding!
struct token { int category; /* the integer code returned by yylex */ char *text; /* the actual string (lexeme) matched */ int lineno; /* the line number on which the token occurs */ char *filename; /* the source file in which the token occurs */ int ival; /* if you had an integer constant, store its value here */ int *sval; /* if you had a string constant, malloc space and store */ } /* the string (less quotes and after escapes) here */
In the last homework (#1) it was OK to have one global variable of this type and overwrite it for each call to yylex(), but in this homework your main() procedure should copy out each token into a separate chunk of memory and build a LINK LIST of all the token structs. In the next assignment, we will insert all these tokens in a giant (syntax) tree.
Example linked list structure:
struct tokenlist { struct token *t; struct tokenlist *next; }Use the malloc() function to allocate chunks of memory for structs token and tokenlist.
#define IDENTIFIER 260This is required for the sake of readability. Your yylex() should return -1 when it hits end of file.
In this assignment, your program should be organized the same as in the last assignment. There should be (at least) two separately-compiled .c files and a makefile. The yylex() function will be called by a main() procedure in a loop, similar to the last assignment. The main() procedure should for each token, write out a line containing the token category (an integer > 257) and lexical attributes.
void main() { printf("%d", 10+2); } |
your output should look something like:
Category Text Lineno Filename Ival/Sval ------------------------------------------------------------------------- 262 void 1 tst.c 271 main 1 tst.c 290 ( 1 tst.c 291 ) 1 tst.c 292 { 1 tst.c 271 printf 2 tst.c 290 ( 2 tst.c 273 "%d" 2 tst.c %d 288 , 2 tst.c 272 10 2 tst.c 0000000A 300 + 2 tst.c 272 2 2 tst.c 00000002 291 ) 2 tst.c 260 ; 2 tst.c 293 } 3 tst.c |