CS 210 Homework #2: Bison

Due: Friday February 21, 9:30
Turnin: a json.zip by bblearn

Bright cheery words of encouragement: if you put off this assignment until shortly before it is due, you will probably fail to complete it.

Use bison to develop a parser (syntax checker), syntax tree, and pretty printer for the JSON language. Develop your work for this assignment as a Bison grammar file (you write json.y); plus a tweaked version of your previous flex assignment. You may add .c or .h files and if you do, you must add correct dependency rules to your makefile.

The file json.h that defines integer codes for different categories of words/tokens must be modified to take those codes from bison, which generates them with the -d option as a .tab.h file.

In order to print a useful syntax error message, your .y file's helper functions section should define a function yyerror(char *s) per YACC and Bison standards.

makefile
jsonpp: main.o lex.yy.o json.tab.o
	cc -o jsonpp main.o lex.yy.o json.tab.o -ll

main.o: main.c json.tab.h json.h
	cc -c -g main.c

lex.yy.o: lex.yy.c
	cc -c -g lex.yy.c

lex.yy.c: json.l json.h json.tab.h
	flex json.l

json.tab.o: json.tab.c
	cc -c -g json.tab.c

json.tab.c: json.y
	bison json.y

json.h: json.tab.h
	touch json.h

json.tab.h: json.y
	bison -d json.y
main.c
#include "json.tab.h"
#include <stdio.h>
#include <stdlib.h>
extern FILE *yyin;
extern char *yytext;
char *yyfilename;
int main(int argc, char *argv[])
{
   int i;
   if (argc < 2) { printf("usage: iscan file.dat\n"); exit(-1); }
   yyin = fopen(argv[1],"r");
   if (yyin == NULL) { printf("can't open/read '%s'\n", argv[1]); exit(-1); }
   yyfilename = argv[1];
   if ((i=yyparse()) != 0) {
      printf("parse failed\n");
      }
   else printf("no errors\n");
   return 0;
}
json.h
#include "json.tab.h"

Examples

For normal files, your program (linked with the main.c above) would print out
no errors

Note that the grammar given here imposes some restrictions which would impose a change to the examples in Homework #1. Since the title is required, the Homework #1 examples would indicate errors as described below.

Building a Parse Tree in C

This is actually a fairly big job. It involves concepts in Bison like $$ that will be vital in CS 445. My advice: Keep IT Simple As Possible (KITSAP). A tree node will need to know:
"what" it is
an integer code for what terminal symbol, or what grammar rule produced it if it is an internal node. What non-terminal symbol it is is implied by the grammar rule #, but it could be stored separately if you wanted.
how many children the tree node has
both terminal symbols and "epsilon productions" have 0 children
pointers to kids, if any
This can be an array, sized for the max # of children found in the rules in the grammar.
lexical information, if any
this comes from flex
struct treenode {
   int label;		/* terminal symbol, or production rule # */
   int nkids;		/* 0 for tokens (tree leaves)
   struct treenode *kids[5]; /* sized for muth.y */
   struct token lexinfo;
}
For every grammar rule, you are writing something like:
mynonterminal : MYTERM1 mynont2 MYTERM2 {
   $$ = calloc(1,sizeof(struct treenode));
   $$->label = PRODRULE;
   $$->nkids = 3;
   $$->kids[0] = $1;
   $$->kids[1] = $2;
   $$->kids[2] = $3;
   } ;
For the leaves of your tree, you either have to modify HW#1 to create leaves in yylex(), or have to write your JSON grammar to encapsulate and create leaves every time a terminal symbol is recognized. The "modify HW#1" option looks like:
myregex  {
   yylval.tree = calloc(1, sizeof(struct treenode));
   yylval.tree->label = MYTERMINALSYMBOL;
   yylval.tree->nkids = 0;
   yylval.tree->lexinfo.lexeme = strdup(yytext);
   .../* insert code to preserve line and column #, filename, etc. */
   return MYTERMINALSYMBOL;
   }
The "encapsulate terminal symbols in the grammar" option looks like the following additional rules, for each terminal symbol, in your grammar:
myterminalsymbol : MYTERMINALSYMBOL {
   $$ = calloc(1, sizeof(struct treenode));
   $$->label = MYTERMINALSYMBOL;
   $$->nkids = 0;
   $$->lexinfo.lexeme = strdup(yytext);
   .../* insert code to preserve line and column #, filename, etc. */
   } ;
With either option, you then have to plug these leaves into the larger internal nodes as children, when larger-scale non-terminal rules occur during the parse.

Tree Traversal

Tree-walks are recursive. They can be pre-, post-, or in-order. Analyzing tree contents, or pretty-printing, both are derived by tweaking a basic tree-traversal such as this one.
void print_tree(struct treenode *n)
{
  int i;
  if (n == NULL) return; /* don't segfault on NULLs */
  printf("node %d\n", n->label);
  if (n->nkids==0) {
     /* print stuff about leaf */
     }
  else {
     for(i=0;inkids;i++) print_tree(n->kids[i]);
     }
}