CS 370
Homework Assignment 1: Doing Lexical Analysis by Hand

Due: Tuesday January 31, 1:00pm (firm deadline)

This first assignment is intended to verify that you are (or can become) capable of programming in C on UNIX systems, which will be necessary if you are to complete the assignments in this class. It is intended to be somewhat easy. It must completed and turned in successfully, electronically and on paper.

Your task is to write a C program that reads in text files and breaks them up into pieces (a "piece" here refers to a string of one or more non-whitespace characters; from now on we will refer to pieces as lexemes). Your program should discard spaces, tabs and newlines (called "whitespace" from here on), and write each contiguous sequence of non-whitespace characters on a separate line. You should identify each lexeme's category within the following table, and write the integer code on the line preceding the characters from the lexeme.

category name integer code description
identifier 1 1+ letters, initial lowerCase
name 2 1+ letters, first letter was Capital
number 3 1+ digits from 0-9, optional leading - and decimal point
punctuation 4 something not in categories 1-3
mixture 5 a mixture of characters from categories 1-4

In addition, you must do this following a particular organization, described below to prepare you for the next assignment.

Organization
Your program must include two source files named main.c and yylex.c as well as a makefile.
Input and Output
Your input could be any file; no coredumps allowed even if I run it on executable files! But mainly it will be tested on "program source code" files. Your output should consist of one line per lexeme, each line containing some non-whitespace character(s) that were grouped together.
main.c
Your main.c must contain a main() function that performs as follows:
    FILE *yyin;
    char yytext[1024];
    /* the following is pseudocode, you must write C code */
    for each filename given on the command line
	open the file and store its reference in global variable yyin
	while not end-of-file do
	    call a function yylex() to obtain the next lexeme
	    if yylex() returned -1 at end-of-file, quit this loop
	    print the lexeme
	close the file, move to the next filename
See the course lecture notes web page for an example of reading files and command line argument processing in C.
yylex.c
Your yylex.c function must contain a function yylex() that performs several tasks. Each time yylex() is called it processes one lexeme of non-whitespace characters and stores them in an array of characters named yytext. yylex() returns a category (1, 2, 3, or 4) if it finds characters, and a -1 if it hits end of file. Details are given below:
makefile
You must write a makefile, per the discussion in class. Your makefile must include separate rules to compile each source file individually (a rule to build yylex.o and main.o) and a rule to link the files together to form an executable file named scan. The first (default) rule should be the link rule, which should in turn depend on the other two rules, so if I just type "make" your program will compile to an executable. It must also include a rule named turnin that looks like the following, where userid is your UNIX user id.
turnin:
	tar cf userid.tar makefile main.c yylex.c


Example:

For the input file

void main() { printf(" Twelve =%d", 10 + 2); }

your output should look like:

1	void
5	main()
4	{
5	printf("
2	Twelve
5	=%d",
3	10
4	+
5	2);
4	}

In the next few weeks, your scanner will have to become much smarter about breaking up source code into actual words.

Turning your Program In

Please test your program carefully and then turn your program in electronically at http://www.cs.nmsu.edu/~jeffery/turnin.html

Please be careful to turn it in as Homework 1, not some other lab or assignment. The turnin web page shows you when you submit your assignment whether it was received in the correct format, and "make" built an executable successfully. I must be able to unpack your assignment and compile it in order for you to get any credit. You will get your hard copy back with written comments as well as the results from executing your program on our tests. Please select and e-mail me your password for electronic turnins to this site.

Please also print a paper copy of your program and turn it in either in class or to Dr. J's office (SH 125) by 1:00pm. If my door is not open for some reason, you may slip assignments under the door.