CS 210 Homework #1: Flex

Due: Wednesday February 5, start of class (9:30am)
Please use bblearn.uidaho.edu to submit a .zip archive file containing a Flex .l source code file, with your name in a (C-style) comment at the top.

Use the declarative lexical analysis language Flex to develop a scanner for JSON files. JSON is described at json.org. Your solution to this homework will be hooked together with a parser in a subsequent homework assignment. Note that you are to develop your work for this assignment in a single file (json.l); it will be linked to the following makefile and main.c. The main.c will include a file json.h to define integer codes for different categories of words/tokens. You may not substitute your own main() procedure. Your yylex() really must return an integer code each time it recognizes a token, it may not just print the output itself.

makefile
# flex makefile.
#

json: main.o lex.yy.o
	cc -o json main.o lex.yy.o

main.o: main.c json.h
	cc -c -g -DLEX main.c

lex.yy.o: lex.yy.c json.h
	cc -c -g lex.yy.c

lex.yy.c: json.l
	flex json.l

main.c
#include "json.h"
#include <stdio.h>
#include <stdlib.h>
extern FILE *yyin;
extern char *yytext;
char *yyfilename;

int main(int argc, char *argv[])
{
   int t;
   if (argc < 2) { printf("usage: json file.json\n"); exit(-1); }
   yyin = fopen(argv[1],"r");
   if (yyin == NULL) { printf("can't open/read '%s'\n", argv[1]); exit(-1); }
   yyfilename = argv[1];
   while ((t=yylex()) > 0) {
      if (t <= 32) {
         printf("token %d text %s\n", t, yytext);
	 }
      else {
         printf("token %c\n", t);
         }
      }
   return 0;
}
json.h
/* header file for JSON tokens, codes from json.org */

#define TRUE      1
#define FALSE     2
#define NULL      3
#define LCURLY   '{'
#define RCURLY   '}'
#define COMMA    ','
#define COLON    ':'
#define LBRACKET '['
#define RBRACKET ']'
#define STRINGLIT 4
/* #define CHARLIT 5 json does not have character literals! */
#define NUMBER    6

JSON: JavaScript Object Notation

JSON is an important real-world data file format that represents structured data in a more compact and human friendly manner than XML, making it well-suited for many tasks including web applications.

In this homework, you write out the category found for each "word" of a JSON file, one per line. If the word is from one of the lower integer categories (values 1-6), you also write out the letters that were matched for that word.

If the input file contains things that are not legal in JSON, you should write a line containing "lexical error on line n", where n is the line number. The easiest way to do this, as seen in class, is to use %option yylineno in your .l file, so you are required to do that. Lexical errors do not return integer categories from yylex(), they are treated as whitespace.

Example

For the file
{
  "name":"Clint",
  "age":54,
  "cars": {
    "car1":"Ford",
    "car2":"Pontiac",
    "car3":"Pontiac"
  }
 }
You don't have to worry yet about the syntax or semantics of this file, your job is to break it into lexical tokens: adjacent sequences of characters that form indivisible values or entities. Your program (linked with the main.c above) would print out
token {
token 4 text "name"
token :
token 4 text "Clint"
token ,
token 4 text "age"
token :
token 6 text 54
token ,
token 4 text "cars"
token :
token {
token 4 text "car1"
token :
token 4 text "Ford"
token ,
token 4 text "car2"
token :
token 4 text "Pontiac"
token ,
token 4 text "car3",
token :
token 4 text "Pontiac"
token }
token }