CS 210: Programming Languages Lecture Notes

lecture #1

Welcome to CS210, here is our Syllabus

The Computer Science Assistance Center (CSAC), located in the JEB floor "2R" area, has tutors available during most of "business hours" Monday through Friday. Most likely you will need help in this course; get to know who works the CSAC and which ones know which languages.

History and Overview of Programming Languages

Why Programming Languages

This course is central to most of computer science.
Definition of "programming language"
a human-readable textual or graphic means of specifying the behavior of a computer.
Programming languages have a short history
~60 years
The purpose of a programming language
allow a human and a computer to communicate
Humans are bad at machine language:
Computers are bad at natural language:
Time flies like an arrow.
So we use a language human and computer can both handle:
procedure main()
   w := open("binary","g", "fg=green", "bg=black")
   every i := 1 to 12 do {
      GotoRC(w,i,1); writes(w, randbits(80))
      }
   WriteImage(w, "binary.gif")
   Event(w)
end
procedure randbits(n)
   if n = 0 then return ""
   else return ((?2)-1) || randbits(n-1)
end

Even if humans could do machine language very well, it is still better to write programs in a programming language.

Auxiliary reasons to use a programming language:
portability
so that the program can be moved to new computers easily
natural (human) language ambiguity
Computers would either guess, or take us too literally and do the wrong thing, or be asking us constantly to restate the instructions more precisely.

At any rate, programming of computers started with machine language, and programming languages are characterized by how close, or how far, they are from the computers' hardware capabilities and instructions. Higher level languages can be more concise, more readable, more portable, less subject to human error, and easier to debug then lower languages. As computers get faster and software demands increase, the push for languages to become ever higher level is slow but inevitable.

Turing vs. Sapir

The first thing you learn in studying the formal mathematics of computational machines is that all computer languages are equivalent, because they all express computations that can be mapped down onto a Turing Machine, and from there, into any of the other languages. So who cares what language we use, right? This is from the point of view of the computer, and it should be taken with a grain of salt, but I believe it is true that the computer does not in fact care which language you use to write applications.

On the other hand, the Sapir-Whorf hypothesis suggests to us that improving the programming language notation in use will not cause just a first-order difference in programming productivity; it causes a second-order difference in allowing new types of applications to be envisioned and undertaken. This is from the human side of the human-computer relationship.

From a practical standpoint, we study programming languages in order to learn more tools that are good for different types of jobs. An expert programmer knows and uses many different programming languages, and can learn new languages easily when new programming tasks create the need. The kinds of solutions offered in some programming languages suggest approaches to problem solving that are usable in any language, but might not occur to you if you only know one language.

The Ideal programming language is an executable pseudocode that perfectly captures the desired program behavior in terms of software designs and requirements. The two nearly insurmountable problems with this goal are that (a) attempts to create such a language may be notoriously inefficient, and (b) no design notation fits all different types of programs.

A Brief History of Programming Languages

There have been a few major conferences on the History of Programming Languages. By the second one, the consensus was that "the field of programming languages is dead", because "all the important ideas in languages have been discovered". Shortly after this report from the 2nd History of Programming Languages (HOPL II) conference, Java swept the computing world clean, and major languages have been invented since then. It is conceivable that the opposite is true, and the field of programming languages is still in its infancy.

There are way over 1000 major (i.e. publically available and at one point used for real applications) programming languages. Much less than half are still "alive" by most standards. Programming languages mostly have lifespans like pet cats and small dogs. Any language can expect to be obsoleted by advances in technology within a decade or at most two, and requires some justification for its continued existence after that. Nevertheless some dead languages are still in wide use and might be considered "undead", so long as people have businesses or governments that are depending on them.

History of Programming Languages, cont'd.

Languages evolved very approximately thus:
machine code, assembler
instruction sets vary enormously in size, complexity, and capabilities. Difficult for humans.
FORTRAN, COBOL
"high-level" languages. go-tos. flowcharts. chaos. imperative paradigm.
Lisp, SNOBOL, APL, BASIC
functional paradigm and alternatives. interpretive. user-friendlier. slow.
Algol, C, Pascal, PL/1
"structured" languages. fast. go-tos considered harmful.
Ada, Modula-2, C++
"modular" systems programming languages. data abstraction.
SmallTalk, Prolog, Icon, Perl
"pure OO", declarative, and scripting languages.
Visual Basic, Python, Java, Ruby, PHP, ...
GUI-oriented and web languages. mix-friendly languages.

OK, now your turn? What languages should be on this list? What new languages are "hot"?

Programming Language Buzzwords

"low level", "high level", and "very high level"
"low" (machine code level) vs. "high" (anything above machine level) is ubiquitous but inadequate
machine readable vs. human readable
certainly humans have difficulty reading binary codes, but machines find reading human language text vexing as well
data abstraction vs. control abstraction
really, I might prefer data vs. code as my counterpoints
kinds of data abstractions
basic/atomic/scalar vs. structural/composite
"first class" value
an entity in a programming language that can be computed/constructed at runtime, assigned to a variable, passed in or returned out of a subroutine.
kinds of control abstractions
many variants on selection, looping, subroutines
syntax and semantics
meat and potatoes of language comparison and use
translation models
compilation, interpretation, source/target/implementation languages

Googling for History

Here are some highlights from the history of programming languages; google them and see if they give clean answers or raise more questions (for exam purposes):

lecture #2

Paradigms and Languages

Several paradigms, or "schools of thought", have been promulgated regarding how best to program computers.

The dominant imperative paradigm has been gradually refined over time. It basically states that to program a computer, you give it instructions in terms it understands. a.k.a. "procedural" paradigm: a program is a set of procedures/functions. You write new "instructions" by defining procedures. Since the underlying machine works this way, this is the default paradigm and the one that all other paradigms reduce themselves to in order to execute.

Functional and object-oriented paradigms are arguably special cases of imperative programming. In functional programming you give the computer instructions in clean, mathematical formulas that it understands. In object-oriented programming, you give the computer instructions by defining new data types and instructions that operate on those types.

Declarative programming is a polar opposite of imperative programming, introduced in many different application contexts. In declarative programming, you specify what computation is required, without specifying how the computer is to perform that computation. The logic programming paradigm is arguably a special case of declarative programming.

Languages are implemented by compilers or interpreters. There are many implementation techniques that fall somewhere in between.

Pure vs. Impure; Multi-paradigm

Really, when we say a programming language embodies a particular paradigm, we are usually saying what it "mainly" does. Languages can be characterized by evaluating how "pure" is their adherence to their dominant paradigm. Impurity usually means: falling back on imperative paradigm when expedient or necessary. Purity is elegant but often comes at the price of idiocy.

Pure Language Examples
Language Example Commentary
SmallTalk
quadMultiply: i1 and: i2 
    "This method multiplies the given numbers by each other and the result by 4."
    | mul |
    mul := i1 * i2.
    ^mul * 4
Pure OO. Even ints are objects.
classic Lisp
(defun fibonacci (N)
  "Compute the N'th Fibonacci number."
  (if (or (zerop N) (= N 1))
      1
    (+ (fibonacci (- N 1)) (fibonacci (- N 2)))))
Pure functional. No I/O, no assignment statements, etc.
Prolog
perfect(N) :-
    between(1, inf, N), U is N // 2,
    findall(D, (between(1,U,D), N mod D =:= 0), Ds),
    sumlist(Ds, N).
Pure logic. Surprise failures, wild backtracking, nontermination

Different programming paradigms seem ideal for different application domains. What is great for business data processing may be terrible for rocket scientists. A computer scientist should know all the major paradigms well enough to know which paradigm is best for each new project that they come across. One option is to become proficient in several diverse languages.

Another option, sometimes, is to use a language that supports multiple paradigms. These run the risk of being Frankenlanguages. They are more likely to succeed when designed by a genius, and when pragmatic, viewing multi-paradigm as an extension of impurity rather than a theoretical ideal to aspire to.
Example Multi-Paradigm Languages
language example commentary
LEDA
relation grandChild(var X, Y : names);
var Z : names;
begin
  begin writeln('test father-father descent'); end;
  grandChild(X,Y) :- father(X,Z), father(Z,Y).
  begin writeln('test father-mother descent'); end;
  grandChild(X,Y) :- father(X,Z), mother(Z,Y).
  begin writeln('test mother-father descent'); end;
  grandChild(X,Y) :- mother(X,Z), father(Z,Y).
  begin writeln('test mother-mother descent'); end;
  grandChild(X,Y) :- mother(X,Z), mother(Z,Y).
end;
logic paradigm default; imperative when needed
Oz
proc {Insert Key Value TreeIn ?TreeOut}
   case TreeIn
   of nil then TreeOut = tree(Key Value nil nil)
   [] tree(K1 V1 T1 T2) then 
      if Key == K1 then TreeOut = tree(Key Value T1 T2)
      elseif Key < K1 then T in 
        TreeOut = tree(K1 V1 T T2)
        {Insert Key Value T1 T}
      else T in 
        TreeOut = tree(K1 V1 T1 T)
        {Insert Key Value T2 T}
      end 
   end 
end
Pattern matching seems inspired by FORMAN, which is under-credited.
Icon
#  Generate words
#
procedure words()
   while line := read() do {
      lineno +:= 1
      write(right(lineno, 6), "  ", line)
      map(line) ? while tab(upto(&letters)) do {
         s := tab(many(&letters))
         if *s >= 3 then suspend s# skip short words
         }
      }
end
Imperative default, but logic-style programming when the programmer uses certain constructs. Unicon adds OO (along with a lot of I/O capabilities).

Syntax

At first glance the syntax of a language is its most defining characteristic. Languages differ in terms of how they form expressions (prefix, postfix, infix), what kinds of control structures govern the evaluation of expressions, and how the programmer composes complex operations from built-ins and simpler operations.

Syntax is described formally using a lexicon and a grammar. A lexicon describes the categories of words in the language. A grammar describes how words may be combined to make programs. We use regular expressions and context free grammars to describe these components in formal mathematical terms. We will define these notations in the coming weeks.

Example Regular Expressions Example Context Free Grammar
 ident	[a-z][a-z0-9]*
 intlit  [0-9]+
 E : ident
 E : intlit
 E : E + E
 E : E - E

Many excellent languages have died (or, been severely hampered) simply because their syntax was poorly designed, or too weird. Introducing new syntax is becoming less and less popular. Recent languages such as Java demonstrate that it is possible to add more power to programming languages without turning their syntax inside out.

Syntax starts with lexicon, then expression syntax, and grammar. We are going to study these ideas in some detail in this course; expect to revisit this topic.

A context free grammar notation is sufficient to completely describe many programming languages, but most popular languages are described using a context free grammar plus a small set of cheat rules where surrounding context or semantic rules affect the legal syntax of the language.

Lexical syntax defines the individual words of the language. Often there are a set of "reserved words", a set of operators, a definition of legal variable names, and a definition of legal literal values for numeric and string types.

Expression syntax may be infix, prefix, or postfix, and may include precedence and associativity rules. Some languages "expression-based", meaning that everything in the language is an expression. This might or might not mean the language is simple to parse without needing a grammar.

Context free grammars are a notion introduced by Chomsky and heavily used in programming languages. It is common to see a variant of BNF notation used to formally specify a grammar as part of a language definition. Context free grammars have terminals, nonterminals, and rewriting rules.

CFG's cannot describe all languages, and some grammars are inherently ambiguous. Consider

1 - 0 - 1
and
if E1 then if E2 then S1 else S2

Semantics

However much we love to study syntax, it is semantics that really defines the paradigms. Semantics generally includes type system details and an evaluation model. We will come back to it again and again this semester. For now, note that there can be axiomatic semantics, operational semantics, and denotational semantics.

Runtime Systems

Programming Languages' semantics are partly defined by the compiler or interpreter, and partly by the runtime system. A runtime system consists of libraries that implement the language semantics. They range from tiny to gigantic. The may be linked into generated code, or linked into an interpreter, or sometimes embedded directly in generated code. They include things ranging from implementing language built-ins that aren't supported directly by hardware, to memory managers and garbage collectors, to thread schedulers, to my favorite...

I/O: the Key to All Power in the (Computing) Universe

Almost all programming languages tend to consider I/O an afterthought.

Dr. J's Conjecture: I/O is a dominant aspect of modern computing and of the effort required to develop most programs.

Evidence: dominance of graphics, networking, and storage in modern hardware advances; necessity of I/O in communication of results to humans; proliferation of different computing devices with different I/O capabilities.

Implications: programming language syntax and semantics should promote extensible I/O abstractions as central to their language definitions. Ubiquitous I/O harware should be supported by language built-ins.

Expansion on the whole "Compilers" vs. "Interpreters" thing

Remind me of your definitions of "compiler" and "interpreter" in the domain of programming languages. What's the difference? Are they mutually exclusive?

Variants on the Compiler

classic
source code to machine code
preprocessor
source code to...simpler source code (Cfront, Unicon)
JIT
compiles at runtime, VM-to-native or otherwise
???
source code to hardware
???
source code to network message(s)

Variants on the Interpreter

classic
executes human-readable text, possibly a statement or line at a time
tokenizing
executes "tokenized" source code (array of array of tokens)
tree
executes via tree traversal
VM
executes via software interpretation of a virtual machine instruction set

Lisp Lecture #1

Functional Programming and Lisp

You must unlearn what you have learned. -- Master Yoda
Our first language, Lisp, is one of the oldest languages in common use today. It exemplifies the functional programming paradigm. Although Lisp is an acronym for "LISt Processor", its name is not usually given in all-capitals. Lisp tries to view the entirety of computing in terms of mathematical functions that operate on lists, and it is astonishing how much, and how easily, one can accomplish things with a few simple building blocks.

Lisp was invented by John McCarthy and colleagues at MIT around 1960. It was immediately and tremendously influential, serving as an example of how research in universities helped form computing as we know it, alongside industry R & D.

Lisp was the first interactive language, the first language to come bundled with an IDE, the first language to encourage self-modifying code, the chosen language of the field of artificial intelligence, and owns many other firsts. It was titanically influential in the development of other languages, from SmallTalk (Xerox InterLisp was the environment and culture in which SmallTalk was fostered), to scripting languages such as Python, to later functional languages such as ML and Haskell.

Whole companies were founded on the premise of making hardware-accelerated implementations of Lisp on $30,000 workstations. Large multi-million dollar companies built such machines, such as Symbolics and Texas Instruments. There were dozens of major Lisp dialects, with similar general syntax and myriads of incompatible variants. Eventually, a standard language called Common Lisp emerged and is still popular today. From a pragmatic standpoint, I am interested also in one other modern dialect, Emacs Lisp.

Lisp is small enough that it has been repeatedly used as a scripting/extension language, not just in Emacs but in major commercial programs such as AutoCAD's AutoLISP.

lecture #3

Announcements

Functional programming in a nutshell

Reading

1. Work, your way through Sean Luke's Lisp Quickstart Tutorials 1, 2 and 3 (local mirror: 1, 2, and 3, )

2. Skim or read the Common Lisp reference manual (CMU multi-formatted version) as needed in order to support your understanding.

Additional Common Lisp resources that may be more useful for some of you: For comparison, you might find these other Lisp manuals interesting:

Lisp Topics to Learn

Lisp language
syntax and semantics
Lisp runtime system
garbage collection, symbol table
Using Lisp
know a lot of particular functions and special forms
Lisp execution behavior
be able to diagram memory

Why we (still) study Lisp

Lisp: language considerations

Atoms and Lists

Lisp has two kinds of values:

S-expressions

Both code and data are represented using symbolic expressions, which are parenthesized, and not comma separated. Because code and data are all the same stuff, it is fairly easy to build up some new code on the fly in a data structure, and then execute it.

cons cells

The fundamental building block of lists is the cons cell. It has a data payload (car) and a next-cell pointer (cdr).

Lists

A list is a null-terminated chain of cons cells. This has a recursive definition:

A collection of cons cells that is not null terminated is not a list. A dot is used to denote cons cells that are not null-terminated, as in

("hello" . "there")

This is called "dotted-pair notation", mainly for the one-cons-cell case, but you can specify a chain of cells with a dot before the final element to indicate the absence of a null termination.

read-eval-print

Lisp interpreters use a read-eval-print loop to interact with the programmer.

Using Lisp

Lisp is normally used interactively.
Normally, once you invoke a Lisp interpreter from the command-line (for example "clisp"), you are sitting at a lisp prompt (for example "[1]> "), typing source code directly at an interpreter.
"Grow" your programs bottom-up
You write one function at a time, often writing and unit-testing helper functions immediately, before using them in larger functions.
Run them as "scripts" when testing complete programs
For example, here is a complete, simple Lisp script that prompts the user and attempts to compute the square of the user's answer. It does not include error checking that might be helpful, it illustrates Lisp scripts.
#!/usr/local/bin/clisp
(defun square (x) (* x x))
(defun readsquare ()
 (print "gimme an x:")
 (setq x (read))
 (square x)
)
(print (readsquare))
Stored in some text file (say, "square") and marked as executable (via "chmod u+x square"), this program can be invoked from the command line prompt ("./square", or just "square" if it is located on your PATH). Note that a list of strings holding the command line arguments are available in the symbol *args* if there are any; if not it will be NIL.
Compile them when you are finished.
"clisp" features a bytecode compiler; many common lisps will also feature an optimizing native code compiler.

Note on (quit)

With Lisp interpreters, if you fail to halt. or (quit) properly, especially if you ran them from an interesting shell such as a subshell running under emacs, it is possible for your process to be left running after you logout. As far as Dr. J is concerned this is a bug in the operating system, but as a pragmatic good citizen, we should make a point of (quit)ting properly and (kill -9)ing our Lisps when we have to.

Evaluation

As noted earlier, Lisp uses a read-eval-print loop, where "eval" means "evaluate an expression to obtain its value". Generally, evaluation goes like this: Freaky parts:

The Most Universal Lisp Functions

There are several hundreds of built-in functions in Lisp. We start with the most universal.
(cons x y)
(car x)
(cdr x)
(+ x y)
Lists are commonly nested inside each other to form trees or other complex structures. It is common to walk through many car's and cdr's to get to the value that is needed. Built-in functions that perform multiple car's and cdr's are an old-school way of accessing elements in deeper structures. (caar x) produces the first element of the first element of x (assuming x is at least two levels deep). (cadr x) produces the 2nd element of the list x. (cdar x) produces the "rest" of the first element of x. (cddr x) produces the remainder of x after its first two elements. This pattern continues and is good for any combination of at least three and possibly more a's and d's (caaaadr, etc).

The newer way of picking out element i, instead of saying (caddddddddr L) would be (nth L i). In this case, i is 0-based.

Predicates

Besides list construction and access/traversal, and numeric computations, one of the favorite categories of Lisp functions are those that return true or false. In Lisp, false is denoted as nil and anything that isn't nil is true. There is also a special reserved symbol, named t, that may be used as a generic "true" value. Many predicate functions follow the original hungarian notation convention of ending their name with a "p" for predicate.

Predicate examples:

(null x):bool		; is x nil?
(atom x):bool		; is x at atom?
(listp x):bool		; is x a list?
(numberp x):bool	; is x a number?
(integerp x):bool	; is x an integer?
(zerop x):bool		; is x the value 0?
(oddp x):bool		; is x odd?
(evenp x):bool		; is x even?
(consp x):bool		; is x a cons cell?
(plusp x):bool		; is x positive?
(minusp x):bool		; is x negative?
(< x y):bool		; is x less than y?
See also several equality-test predicates below

Symbols

Symbols are atoms that can be used as names for values. The concept of symbols replaces that of variable in ordinary languages. Symbols routinely have characters like - and * in them; unlike mainstream languages. Several pieces of information may be associated with each symbol in the symbol table.

Symbol Table

The symbol table is an efficient structure for looking up stuff associated with symbols. Besides the name and evaluation value, the separate slot for function value, there is more stuff -- at the least, a property list that can be used to associate various attributes with a symbol.

Example symbol table entry
field value
name "x"
value 7
function (lambda (a b c) (+ (* a b) c))
properties ...
??? ...

The Lisp evaluator does an implicit/automatic symbol table lookup anytime a symbol appears during an evaluation. It uses the 3rd slot when the symbol appears in an initial position in a list (function value) and the 2nd slot when the symbol appears on a 2nd or subsequent position. Rules are very different for "special forms". Symbol table entries can also be accessed explicitly by programs, which is how property lists are used.

Lisp generally emphasizes a single global symbol table, rather than a hierarchy of little local symbol tables as used in compilers for mainstream languages.

Classical LISP Functions

(cons x L): new L	; allocate cons cell
(car L): x		; L->car
(cdr L): L		; L->cdr
(quit)			; quits LISP
(load f)		; load LISP code from filename f
(setf x 5)		; assignment
(+ x 1)			; arithmetic; takes any # of args
lecture #4

Tip for the homework: a student of mine once wrote:

I was working on my homework which received a [low score] due to load errors. I have spent a good deal of time trying to figure out this error but I cannot seem to get it. I used this online lisp debugger and it provided the same error as did using clisp. The error suggests that I am missing a parens at line 89, but I cannot tell where it needs to go.

If you cannot match your parentheses:

Lisp Special Forms

Learn and understand the defun and let special forms. Compare let with setq. Learn the quote special form. In general, to write your own special forms, you write macros. Lisp macros make C/C++ macros look like "wimps".

Reason #1 that I dislike Common Lisp, compared with Emacs Lisp: Common Lisp's while loop is not a special form named "while".

defun

The defun special form defines a function. It is a list whose first three elemens are the symbol defun, the symbol denoting the function name, and a list of arguments. The rest of the defun are S-expressions that are evaluated when the function is called. The last expression will produce the return value of the function.

The general format

(defun f (x)
   ; code for f given in 1 or more lists
   ; function return value is the value of the last thing evaluated
   )

Example:

(defun square (x)
   (* x x)
   )

Common Lisp also has return expressions but they are more involved than in C or C++; look them up in the Common Lisp manual if you need them. Most times you should use the final expression as a return value.

A short summary is:

(return expr)
breaks out of a current loop (doesn't return from the function!)
(return-from f expr)
breaks out of block named f (f can be a function name)

Quote

The quote special form is sort of the simplest, no-operation special form, which just passes its argument on without evaluating it. Its syntax, (quote x), is commonly abbreviated with the apostrophe: 'x.

Recursion

Recursion is central to lisp, not only for certain mathematical computations, but also for various data structure operations, such as traversing a list. A recursive function is a function that may call itself as part of computing its result. It always should consist of
basis case:
a finishing-up circumstance where it does not need to call itself
recursion (induction) step:
circumstances where it solves the problem by combining a little work with a call to itself that does the "rest" of the work.
Example;
(defun factorial (n)
   (if (<= n 1) 1
       (* n (factorial (- n 1)))))

Suppose you didn't have a while special form, could you implement a recursion to execute 10 times?

(defun foo (x) (print x) (if (< x 10) (foo (+ x 1)) x))

Compound Expressions

(progn expr expr exprn)
Evaluates each expression in sequence. The value of the whole progn is the value of the final exprn.

Recursion Tip

What is wrong with
(defun f (x y)
   (if (null x) y
     (f (cdr x))
   )
)
It has a basis case, it has a recursion on x.

Equality Functions

(eq x y)
t if x and y are the same exact object
(eql x y)
t if x and y are eq or if x and y are numerically the same
(equal x y)
t if x and y are eql or if x and y are structurally equivalent

Lisp Function-Writing-O-Rama

Given (defun) and the basic (car, cdr, cons) and functions on atoms, you can write almost any computation, generally using recursion where a loop might otherwise be suggested.
(listn n)
Returns a list containing n numbers from 1 to n.
(isprime x)
Returns whether x is a prime number or not.
(numprimes L)
Returns the number of prime numbers in L.
(app L x)
Return a list that is one longer than list L that is a copy of L except with x added onto the end. Use only car, cdr, and cons.
(copy L)
Return a list that is a copy of L. Seldom needed in Lisp since, if you obey proper functional programming style, you can just pass L around where it is needed and not worry about a called function modifying it on you.
(reverse L)
Return a list that is the reverse of L. Use only car, cdr, and cons. You may use (app L x) if you define it successfully above. Note: there is a built-in reverse in Common Lisp; don't use it until you can write your own. You might need to name yours (myreverse L).
(cat L1 L2)
Return a list that is the concatenation of lists L1 and L2.
(squareL L)
Given a list L, return a list whose elements are the squares of corresponding elements in L
(mylength L)
Given a list L, compute its length.
(widest L)
Given a list L, return its longest (sub)list.
(average2 x y)
Compute the average (mean) of x and y.
(average L)
Compute the average (mean) of elements in L. Sum / Length.
(sum L)
Compute the sum (+) of elements in L.

lecture #5 began here

...we looked at a bunch of recursion practice

Review of Dr. J's "Zen of Recursion"

If your Lisp function isn't recursive you are doing it wrong.
Sure you can write C with Lisp syntax, by why would you?*
What is the basis case?
It is usually easy if you look at the function return type (nil, 0, "" etc)
Is the recursive step summative or constructive?
Numeric recursionss usually apply some arithmetic to combine current with "rest". List recursions usually cons a result list.
If you can't think of your recursion step, write a helper function
For example, consider how to write reverse. Easy to "reverse the rest", hard to place the first element at the end of such a list.
Add a parameter
Recursive helper functions often have more parameters than clean external public API functions.
Additional tips from past students:

Lisp Formatting

*Lisp can be less readable than C if you don't use newlines and indentation. Compare
(if (and (< x y) (< y z)) (progn (print "aha!") (exit)) (progn (print "OK")))
with
   (if (and (< x y) (< y z)) (progn ; then
        (print "aha!")
        (exit)
        )
       (progn                             ; else
        (print "OK")
       )
   )

Let and Let*

The special form named Let introduces 1 or more local variables. Let* does so and implies they will be introduced one at a time, such that previous ones are available as later ones appear.
(let ((x val) (y (+ x 1)))
     (print y))
Notice the two opening parentheses after the (let part. You have an extra set of parentheses to bound a list of two-element (variable value) lists... even if you have only one local variable to declare.

cond

A cond is a chain of if statements
(cond (bool1 exprs...)
      (bool2 exprs...)
       ...
      (booln exprs...))
Often, the final bool is a "t" (default).

Common Lisp also has a case special form

(case key
       ((keylist) exprs...)
       ((keylist) exprs...)
        (t exprs...))

lecture #6 began here

Strings and Characters

Lisp Strings are (0-based) arrays of characters.

String Recursion

Consider the following recursive version of the "ascii to integer" function atoi(s). The recursion would read in English as "if we are length 1, return our ASCII-converted numeric value, else recursively convert all the digits but the last one, multiply by 10, and add in the last one".
(defun atoi (s)
   (if (= (length s) 1) (digit-char-p (char s 0))
       (+ (* (atoi (subseq s 0 (- (length s) 1))) 10)
          (digit-char-p (char s (- (length s) 1)))))
)
Common Lisp has a built-in for this, (parse-integer "-64"), but this version of (atoi s) is a good example of recursing on strings.

Loops

Expect to spend a little while with your common lisp manual getting the details right on these. Recursion is simpler. :-)
(dotimes (var n result) exprs)
(dolist (var L result) exprs)
(do ((var start step) (var2 start step) ...)
    (test actions)
    exprs...)

Some more Lisp functions

Note for the last two functions: passing function f into apply or mapcar generally requires that you write 'func instead of just func, since func would be evaluated to obtain its variable value before apply or mapcar ever saw it. apply and mapcar need its "function value" so they need the unevaluated symbol.

Common Lisp Data Types (pass 2)

Not just your usual int-float-string-array types; common lisp includes ratios, complex, multiple character types, vectors, bit-vectors, hash tables, structures, random-states, and others. Learning the whole language is a challenge; the philosophy is of learning the parts you need on-demand and building up your "working set".
I-am-a-symbol		; symbol
"I am a String"		; string
#\a			; character
#(1 2 3)		; vector

Lambda forms

(lambda (x) (* x x)) is an example of a lambda form. It is essentially an anonymous function, usable anywhere a function name would be used. Particularly, they are a technique of choice when a function generates some new code and returns that function as its return value.

lambda forms are a bit "deep" to understand. The Wikipedia entry for anonymous functions states that they are useful for "temporary" functions that might get created on the fly (say, by an A/I program that generates custom functions for some algorithm), used immediately, and then discarded. They are used in other high-powered mathematics ("lambda calculus", invented by Alonzo Church) and as building blocks for certain computing techniques, such as closures, which you can learn about. To sum up, there are a lot of computing techniques (in Lisp) that create new code on the fly as data, and such new code may not have a natural human name. Lambda forms are useful in such circumstances.

There was a basic question in class, how does an anonymous function recurse? An deep theoretical answer no doubt exists. I have a shallow, easier-to-swallow answer. An anonymous function can recurse by defining a local name for itself, and using that name to call itself within itself. The miraculous LABELS special form is a bit like a cross between a LET and a DEFUN... Consider

(funcall (lambda (x) (labels ((myname (x) (if (<= x 1) 1 (+ (myname (- x 2)) (myname (- x 1)))))) (myname x))) 3)
Extra credit points if you can come up with a shorter / more understandable anonymous recursive function.

Of course, even though this works, it would be cooler if Lisp had a function-equivalent of the "self" or "this" variable used in OO languages. You know, sort of a this-func such that you could write

(lambda (x) (if (<= x 1) 1 (+ (this-func (- x 2)) (this-func (- x 1)))))
It is hard to imagine that this hasn't been done in some Lisp dialect already, maybe something similar to it has.

lecture #7 began here

Things I Learned While Writing This Handy Sudoku Solver

Formatted Output

Example:
(format t "~A" "hello, world")
More typically, like with printf(), the format string is used to stick a value in the middle of a larger string. ~A is interesting since it does the "right" thing with values of several/many Lisp data types. From Gigamonkeys.com:
(format nil "The value is: ~a" 10)           ==> "The value is: 10"
(format nil "The value is: ~a" "foo")        ==> "The value is: foo"
(format nil "The value is: ~a" (list 1 2 3)) ==> "The value is: (1 2 3)"

Applicative Programming

Eval, Apply, and Funcall

(eval expr) calls the Lisp evaluator on its argument. It is an ordinary Lisp function, and lets you execute arbitrarily constructed data as code.
(setf x '(+ 2 2))
x			; returns (+ 2 2)
(eval x)		; returns 4

Earlier we saw (apply f L) calls function f with list L as its parameters. Apply is kind of like (eval (cons f L))

(apply '+ '(1 3 5))	; returns 9
Note: apply does not work with special forms!

(funcall f &rest args...) is like apply, only the args are supplied directly in the normal function call manner.

Optional Parameters

Lisp allows functions to declare optional parameters like this:
(defun subseq (seq startpos &optional endpos) ...)

Keyword Parameters

Keywords are symbols starting with colon (:) as in
   (with-open-file f "foo" :direction :input)
This is really a user-definable mechanism.

Q: When do you use it?
A: When you have multiple (many) optional parameters

(defun f (x &key (y "because") (z 'zebra))
   ; ... code body uses x, y, and z
)
allows such calls to f as
(f 1)			; x and y default
(f 2 :y 'not)		; z defaults
(f 3 :z 'zbigniew)	; y defaults
Example: more of with-open-file's keyword parameters

File I/O (in Common Lisp)

By the way, stdout in CommonLisp is named *standard-output*

Programming environment issues

You write Lisp code in a .l or .el file, and load it into your Lisp interpreter (or Emacs session!) session by executing (load "file.el") under Emacs that would be (load-file "file.el"). Many Emacs functions are helpful in finding online documentation on Emacs Lisp, including "apropos" and "describe-function". Invoke these interactively by typing escape-X and then typing the function name and pressing return.

A word about compilation

I did a naive test of the speedup generated by our GNU Common Lisp's bytecode compiler, to see if it is worth your bothering with. After defining a file fib.l:
(defun fib (n) 
  (if (< n 2) 1 (+ (fib (- n 1)) (fib (- n 2)))))
and executing (load "fib.l"), the call (fib 35) was taking be about 30 seconds, while if I do a (compile-file "fib.l") it generates a bytecode file named fib.fas, and if I do a (load "fib.fas"), the call (fib 35) takes about 5 seconds -- rough ballpark is a factor of 6 speedup on a trivial example. Your mileage may vary.

The function's code body

Suppose you
(defun foo (x) (* x x))
In common Lisp, for arbitrary function foo in the symbol table you can request its "function definition" with #'foo, but that gives you a function value, something that might be native code. On the other hand,
(function-lambda-expression #'foo)
peels out the actual lambda form of the function. In our example above you get something like
(LAMBDA (X) (DECLARE (SYSTEM::IN-DEFUN FOO)) (BLOCK FOO (* X X)))
...so you could get at the code block with something like
(nth 3 (function-lambda-expression #'foo))
giving an answer of
(BLOCK FOO (* X X))
Thus
(nth 2 (nth 3 (function-lambda-expression #'foo)))
gives
(* X X)
Now, how would we destructively change that * to a + ... if only Lisp weren't mathematically pure. If only it were evil...

rplaca, rplacd

Having gone to a lot of trouble to say that proper use of Lisp never modifies any existing variable or structure, but instead relies on pure mathematical functions (if you stick to those principles, for example, your code is thread-safe pretty much for free; some Lisps automatically parallelize your code for you)...

Now it is time to show how to modify an existing cons cell. (rplaca L x) replaces L's car with x, and (rplacd L x) replaces L's cdr with x.

(rplaca (nth 2 (nth 3 (function-lambda-expression #'foo))) '+)
Does it actually modify the semantics of foo?
> (foo 3)
6
...You bet it does!

I guess you could say: self-modifying code starts with being able to modify code...

Sequences

Generalizing from Lists, Common Lisp has many data types that all fall under the umbrella of an ordered sequence of elements, on which a large set of built-in sequence functions work.

Arrays

(make-array dimensions &key :element-type :initial-element :initial-contents ...)
The dimensions parameter is one of Supplying an :element-type allows more efficient implementation of specialized arrays. (aref a &rest subscripts) produces an element of array a for assignment for evaluation:
   (aref a 5)		; a[5]
   (aref m 3 2)		; m[3][2]
   (setf (aref a 5) 10) ; a[5] = 10
Note that &rest indicates a function with a variable number of arguments. There are several other array helper functions:
(array-element-type a)
(array-rank a)
(array-dimension a i)
(array-in-bounds-p a &rest subscripts)

Categories of Sequence Operations

Simple functions
elt, length, reverse, subseq, make-sequence
Concatenate, map, reduce
(reduce #'+ '(1 2 3 4)) --> 10
(map 'list #'+ '(1 2 3) '(4 5 6)) --> (5 7 9)
Modifiers
fill, replace, remove, substitute
Predicates
some, every, notany, notevery
Search functions

concatenate

(concatenate 'result-type seq1 seq2 ... seqN)
and many other Lisp functions are polymorphic; they operate on any sequence type (lists, strings, arrays/vectors...). You may find it convenient to wrap them in helper functions.
(defun strcat (s1 s2) (concatenate 'string s1 s2))
(defun lcat (L1 L2) (concatenate 'list L1 L2))
(defun cat (x1 x2)
   (typecase x1
      (string (concatenate 'string x1 x2))
      (list (concatenate 'list x1 x2))
   )
)
Given such an awesome piece of Jeffery-wonderfulness, try
 (cat '(1 2 3) '(4 5 6))
or
 (cat "hello" "there")
or
 (cat '(1 2 3) "there") 

More map, and mapcar

We already saw map and reduce, but here's a tip: map may be handy in putting your input data (i.e. for a homework assignment) into a convenient format to work with.

(map type f sequences) calls function f once for each element in its sequences.

(map 'list #'- '(1 2 3)) ; returns (-1 -2 -3)
(defun oddc (n)
   (if (oddp n) #\1 #\0)))
(map 'string #'oddc '(1 2 3 4)) ; returns "1010"
(map 'list #'string "abcd") # returns ("a" "b" "c" "d")
(mapcar f L L2 ... Ln) is about the same, but is specific to lists. mapcar has several relatives.

mapcar is kind of like:

   (cons (f (car L) (car L2) ... (car Ln))
         (mapcar f (cdr L) (cdr L2) ... (cdr Ln)))
Examples:
(mapcar '1+ '(100 200)) ; returns (101 201)
(mapcar '+ '(1 2 3) '(100 200 300)) ; returns (101 202 303)

Search functions

(find item seq)     ; returns leftmost occurrence of item in seq
                    ; good particularly on lists of structures
(position item seq)  ; leftmost position
(count item seq)     ; # of matches
(mismatch seq1 seq2) ; position of mismatch
(search s1 s2)       ; position of s1 in s2

Pragmatics of Lisp

Thusfar we have covered classical Lisp topics. The focus of the final Lisp lecture(s) will be practical solution to real-world problems in Common Lisp.

Reading Lines into a List

The wrong way:
(setf L ())
(read-em f)
(defun read-em (f)
   (let ((s (read-line f nil nil)))
      (cond ((not (null s))
		(setf L (cons s L))
		(read-em f))
      )
   )
)

The right way:

(defun read-em (f)
   (let ((s (read-line f nil nil)))
      (if (null s) nil (cons s (read-em f)))
   )
)

Load Error?

[16]> (load "words.l")
(load "words.l")[16]> (load "words.l")
;; Loading file words.l ...
*** - READ: input stream
      #<INPUT BUFFERED FILE-STREAM CHARACTER #P"words.l" @30> ends within an
      object. Last opening parenthesis probably in line 13.
The following restarts are available:
ABORT          :R1      Abort main loop
Break 1 [17]> abort
abort
[18]> 

String Processing

String processing is not Lisp's strong point. On sourceforge you can find a valuable resource on string processing in Common Lisp.

Example helper functions

(defun emptystr (s) (= 0 (length s)))
(defun strcat (s1 s2) (concatenate 'string s1 s2))

Recursing on a String

We last time identified our basis case on a string (string length 0). To walk through a string with recursion we might use a "car" and a "cdr" to pick out the first and the rest of the string.
(defun carstr (s) (char s 0) )
(defun cdrstr (s) (substring s 1 (length s)))
Now, what does the following recursion on a string do?
(defun woof (s)
   (if (emptystr s) nil
      (cons (carstr s) (woof (cdrstr s))
   )
)

How Many Helper Functions

As I think about it, functional programming style is easier if you are more aggressive in writing more, smaller functions than would be normal in C. A good rule of thumb is, if you've got more than 7 (+/-2) levels deep of nesting in the code body (not counting the one for defun, or the parameters or local variable declarations) you might start to bump into your cognitive limits on grokking code, and perhaps should think about breaking it up into more helper functions. But note that this says "deep" and is not a simple matter of counting parentheses in most cases.

Back to (words s)

Last time we started on a function (words s) that would return a list of "words", given an input text (say, a line read from a file). We got about this far:
;; return a list of words from string s
(defun words (s)
   (if (= 0 (length s)) nil
    ; ... add "else" part here
    )
)
As a programmer, I tend to want a few more base cases with error checks here, a bunch of "if" conditions. I could all these conditions using a ton of "or" clauses, but I might prefer to write them as separate so as to be less confusing. I could write a bunch of nested "if" expressions, chaining them along the else branches, but that is also ugly if it gets too deep. Another option would be to learn and use Common Lisp's "cond" expression, which is perhaps a nicer encapsulation of a classic if-else-if-else-if-else-if chain.

One other thing we left of with last time was: what characters make up a "word" and how do we test for them in Common Lisp? Homework 4 is very precise (and simple) in saying a word starts with a-zA-Z0-9 and anything else is a non-word that separates words. You should look for (in common lisp) a built-in function, or write your own helper function to test whether a character is a "word character" or not.

(defun word-char (c) (or (alpha-char-p c) (digit-char-p c)))
As we walk along happily recursing down the string, we need to remember The easiest way to remember all these things as we walk along, is to pass them as parameters.

Given that we do different things depending on whether we are in a word or not, you can either add extra conditions to track that, or you can write separate recursive functions for when you are in a word at the moment, and when you are not.

;; return a list of words from string s
(defun words (s)
   ; if s is empty
   (if (emptystr s) nil
         ; if first char is a "word char"
	 (if (word-char (char s 0))
            ; then go into "in word" mode
            (in-word (cdrstr s) (string (char s 0)) nil)
          ; else try next char
          (words (cdrstr s))
         )
   )
)
Note the indentation style, and the comment convention. Also note the repeated calls to the same function with the same arguments. A good optimizing compiler will avoid those duplicate calls, but on an interpreter if we cared about performance we might want to only call those once, storing results in local variables.
;; return a list of words from string s
(defun words (s)
   (if (emptystr s) nil
       (let ((c (char s 0)) (d (cdrstr s)))
	 (if (word-char c)
            (in-word d (string c) nil)
          (words d)
         )
       )
   )
)
It remains to be seen how to grab more letters in the current word.
;; already in a word w
(defun in-word (s w L)
   (if (emptystr s) (append L (list w))
       (let ((c (char s 0)) (d (cdrstr s)))
	 (if (word-char c)
           (in-word d (strcat w (string c)) L)
           (notin-word (cdrstr s) (append L (list w)))
))))
The interesting thing here is that if c is a word-char we add it onto the current word and call recursively, but if c is not, we add the whole word onto the accumulated words list and call our evil twin to process the next character.
;; not in word w
(defun notin-word (s L)
   (if (emptystr s) L
       (let ((c (char s 0)) (d (cdrstr s)))
	 (if (word-char c) (in-word d (string c) L)
           (notin-word d L)
))))

Hash Tables

You probably need to know Common Lisp's hash table type. You can read more about them at the Common Lisp Cookbook Hashes Page.

(let ((tab (make-hash-table)))
   (if (null (gethash s tab))
       (setf (gethash s tab) 0)
       (setf (gethash s tab) (+ 1 (gethash s tab)))
   )
)
As far as walking through your hash table looking at all of them, you can use (maphash #'helperfunc t) if you write a (helperfunc key value). Or you can use one of several iterator or loop methods given in the cookbook.

Warning about Common Lisp Hash Tables

Common Lisp uses "eq" semantics when hashing and looking up keys. Two strings might look identical, but if they were allocated at different times, in different locations in memory, they will not hash to the same place. To hash reliably on strings, you might convert them into symbols using (intern s).
(setq tab (make-hash-table))
(gethash "hi" tab)
NIL
(setf (gethash "hi" tab) 1)
1
(gethash "hi" tab)
nil
(setf (gethash (intern "bye") tab) 2)
2
(gethash (intern "bye") tab)
2

Sorting

From n-a-n-o's cmu common lisp tutorials we note that Common Lisp has a built-in sort function, (sort L f). Caveat: sort is destructive ?! So make a copy of the cons cell chain.

One more kind of recursion

It is worth mentioning one more important kind of recursion: problems in which you can divide the problem in half each time, solving recursive subproblems of half the size. You will only have to subdivide in half (log n) times. Quick Sort and Binary Search are examples of algorithms that might employ this kind of recursion.

lecture#8

Unicon

Reading Assignment

Read Chapter 2 of "Graphics Programming in Icon"
the clearest available description of Icon and Unicon's goal-directed expression evaluation. You can read it from the UA Icon website, from a local copy, or come pickup your free hard copy from Dr. J.
Visit unicon.org
Check out the Unicon book there.

Unicon Basics

procedure main()
   write("hello, world")
end
Save in "hello.icn". Compile with "unicon hello". Run with "./hello".

Now some more basics:

Variable declaration is optional
local, global, and static declarations are recommended in large programs or libraries.
Variables can hold any data type, and reassigned with different types
Like in Lisp. But this is very rare in practice.
type(x) returns a string type name ("list", "integer" etc)
You can write code that works on multiple types, like in Lisp.
arithmetic is normal
^ is an exponentiation operator. Integers are unlimited precision. Reals are C doubles.
Type conversion is often automatic
Runtime error when conversion won't work, except in explicit conversion functions, which fail instead.
Strings use double quotes and are escaped using \
indexes are 1-based; they are immutable, atomic; not arrays of char; there is no char type
s[i] := "hello" works
really like s[1:i] || "hello" || s[i+1:0]
*s is a length operator, repl(s,i) is i concatenations of s
expressions in Icon can fail to produce a result
failure cascades to surrounding expressions
Built-in types include lists, tables, sets, csets, and records.
Arguably simpler to use than Common Lisp's
Classes and packages
Well-suited for large-scale apps
Easy I/O capabilities
2D, 3D, and network programming

  • What new features are introduced by these interesting functions:
    procedure fwords(s)
        t := table(0)
        L := wordsinfile(s)
        while s := pop(L) do t[s] +:= 1
        every k := key(t) do if t[k]=1 then delete(t,k)
        return sort(t)
    end
    procedure wordsinfile(s)
        f := open(s)
        L := []
        while line := read(f) do L |||:= wordsinline(line)
        close(f)
        return L
    end
    procedure wordsinline(s)
       alnum := &letters ++ &digits
       L := []
       s ? while tab(upto(alnum)) do put(L, map(tab(many(alnum))))
       return L
    end
    

    lecture 10

    Generator Examples

    For further reading, see "Generators in Icon", by Griswold, Hanson, and Korb.
    a | b
    The simplest generator is alternation. Instead of saying
    x = 5 | x = 10
    
    you can just say x = (5|10). You might not want to hear it, but: yes this is shorter and more readable than ordinary programming languages, instead of adding power by being "weirder". Maybe read | as "then" instead of "or". So what does
      (1 | 2) + (x | y)
    
    do?
    i to j
    i to j by step
    The coolness here is that a traditional language's "for-loop" has been generalized not just into an iterator, but into an expression that can be smoothly blended into any surrounding expression context.
    !x
    All data structures in the language support the "generate" operator to produce their contents. Files generate their contents a line at a time. Consider
       s == !f
    
    find(s), upto(c), and bal(c1,c2,c3)
    These classic string pattern matching generators produce indices within a string. They have several optional parameters for string to examine, and start and end positions to consider. They are usually used in a string scanning environment where these parameters may be omitted. Of the three, bal() is seldom used and a bit trickier than the others. It generates positions containing characters in c1 (like upto) balanced with respect to c2 and c3. Note that if *c2 and *c3 are greater than 1, though, it does not distinguish different kinds of parentheses.
    seq(), key()
    For completeness sake, we list the remaining two "built-in" generators. seq() generates an infinite sequence of integers. key() generates the "keys" of a table or set.

    User-Defined Generators

    Generators are often a convenient way to write dynamic programming solutions.
    procedure fib()
       local u1, u2, f, i
       suspend 1|1
       u1 := u2 := 1
       repeat {
          f := u1 + u2
          suspend f
          u1 := u2
          u2 := f
          }
    end
    
    Given a record tree(data, ltree, rtree), what does the following procedure do?
    procedure walk(t)
       if /t then
          fail
       else {
          suspend walk(t.ltree | t.rtree)
          return t.data
       }
    end
    
    Compare that with a non-generator solution:
    procedure walk(t, p)
       if /t then fail
       walk(t.ltree, p)
       walk(t.rtree, p)
       p(t.data)
    
    What does this procedure do?
    procedure leaves(t)
       if /t then fail
       else if /(t.ltree === t.rtree) then
          return t.data
       else {
          suspend leaves(t.ltree | t.rtree)
          }
    end
    

    Recursion and Backtracking

    Recursive backtracking examples, UT Longhorn-style.

    Unicon: Graphics

    Unicon has some of the world's easiest 2D graphics (open() mode "g"), inspired by the TRS-80 Extended Color BASIC graphics, as well as the X Window System. The 3D facilities (open() mode "gl") are also pretty darn simple. They are built atop (classic) OpenGL and have grown to emphasize the use of textures over time.

    Main concepts:

    1. window = canvas + context
    2. "attribute=value" strings
    3. pixels, coordinates, colors, fonts
    4. input processing and callback routines
    5. language level versus (GUI) class level

    Unicon: Networking

    Unicon has some of the world's easiest internet client and server facilities. There are basic TCP and UDP protocols accessed via open() mode "n" and "nu", and there are several higher level internet protocols such as HTTP and POP that are accessed via open() mode "m".

    Main concepts:

    1. slow, reliable and ordered (TCP) or fast (UDP)
    2. asynchronous, non-blocking I/O and timeouts
    3. dropped connections and widely varying delays
    4. multi-plexing and select()

    Unicon: Classes and OOP

    Unicon

    Here is a gentle syntax comparison, adapted from Hani Bani-Salameh.
    C++ Unicon
    class Example_Class {
    private:
       int x;
       int y;
    public:
       Example_Class() {
          x = y = 0;
       }
       ~Example_Class() { }
       int Add()
       {
          return x + y;
       }
    };
    
    class Example_Class (x,y)
       method Add()
          return x + y
       end
    initially
       x := y := 0
    end
    
    A discussion of Unicon's classes and inheritance model ensued.

    lecture #11 began here

    Commentary on Homeworks

    Quiz #1

    Let's get it out of the way.

    Reading

    Please read selected parts of Programming with Unicon, especially chapters 8-12. The rest of the book is fair game for you, but you will receive no exam questions from it. In writing real programs including homework assignments, I recommend that you read or at least skim chapters 5-7.

    Discussion of HW#4

    There may be additional "pair programming" assignments in this class, but I promise no more programs where the computation has to do with pairing.

    lecture #12 began here

    Unicon Tips from Quiz #1

    procedures end with end
    not { } as in C/C++/Java. Same goes for classes, methods
    && is not an "and" operator
    & is an "and" operator
    a generator is only as generative as its surrounding expression demands
    if its not driven by "every", it may well stop at its first result
    if its already a generator, ! won't make it more so
    rather, it will generally mess it up
    Can't just start assigning elements of an empty list
    After L:=[], you will find that L[1] does not exist yet. Create with list(n) or put() or push() elements before you try to subscript them.

    An OOP Example

    class listable(L,T)
       method insert(k,value)
          /value := k
          T[k] := value
          put(L, value)
       end
       method lookup(k)
          return T[k]
       end
       method gen_in_order()
          suspend !L
       end
    initially(defaultvalue)
       L := [ ]
       T := table(defaultvalue)
    end
    
    So, this is a table, except it remembers the order in which its elements are inserted. Like Java, because we don't have operator overloading, we can't make it look exactly like a table...
    procedure main(argv)
       LT := listable(0)
       every s := !argv do
          LT.insert(s, LT.lookup(s)+1)
       every x := LT.gen_in_order() do
          write(x)
    end
    
    What is wrong with this picture?

    Unicon Inheritance

    Inheritance in Unicon has "closure-based" semantics. Instead of a kid being an instance of a parent with some additions, a kid is its own being, who pulls in fields and methods via transitive closure (depthfirst search) of its superclasses. Closure-based semantics gives the cleanest resolution of multiple inheritance conflicts that I am aware of. Most of the time you do not notice or care.
    class fraction(numerator, denominator)
       #methods here
    initially
    end
    
    class inverse : fraction(denominator)
    initially
      numerator := 1
    end
    
    class sub : A : B(x)
    initially
       x := 0
       self.A.initially()	# calling parent method in overriding subclass method
       self.B.initially()	# self is implicit in most other contexts.
    end
    
    

    String Scanning

       s ? expr
    
    evaluates expr in a string scanning environment in which string s is analyzed (terminology: s is the subject string). While in a string scanning environment, string functions all have a default string, and a default position within the string at which they are to operate.

       s ? find(s2)
    
    searches for s2 within s and is a lot like find(s2, s, 1).

    You almost never use string scanning if you only have one string function to call, but rather, when you are breaking up a string into pieces with multiple functions. In this case, function tab(i) changes the position to i, and function move(i) moves the position by i characters. tab() and move() return the substring between the start position and where they change it to.

        s ? {
           if write(f, tab(find("//"))) then {
    	  move(2) # move past //
              write(&errout, "trimmed comment ", tab(0))
              }
           else write(&errout, "there was no comment")
           }
    
    Built-in scanning functions include:
    find(s)
    search for a string
    upto(c)
    search for a position at which any character in set c can be found
    match(s)
    if current position starts with s, return position after it
    any(c)
    if current character is in c, return position after it
    many(c)
    if current position starts with characters in c, return position after them
    bal(c1,c2,c3)
    like upto(), but only return positions at which string is "balanced" with respect to c2, c3. Tricky in one respect.
    Actually several of these are generators.

    Co-expressions

    lecture #13 began here

    Threads

    thread write(1 to 3)
    
    is equivalent to
     spawn( create write(1 to 3) )
    

    The usual problem with a thread is: you aren't waiting for it to be done, and you can't even tell when it finishes. Well, assign it to a variable and you can at least do that much.

    mythread := thread write(1 to 3)
    ...
    wait(mythread)
    
    waits for a thread to be done.

    Typically, a thread has some work (data structure) and an id passed into some function. After the thread is finished, the results will have to be incorporated back into the main computation somehow

    t1 := thread sumlist(2, [4,5,6])
    ...
    procedure sumlist(id, L)
       s := 0
       every s +:= !L
       #... can't easily just "return" the value
    end
    

    The classic way threads might communicate is: global variables! But these have race conditions. Alternatives include files or pipes or network connections (all slow), or an extra language feature, but first: how to avoid race conditions.

    global mtx
    mtx := mutex()
    ...
    critical mtx: expr
    
    is equivalent to
    lock(mtx)
    expr
    unlock(mtx)
    

    Another way to avoid race conditions in Unicon is to use a "mutex'ed" data structure, as in

    L := mutex([])
    

    There are also thread-based versions of the activate operator: four or eight of them:

    @> @>> <@ <<@
    send blocking send receive blocking receive

    They follow this (weird) model:

    There is more to concurrency: condition variables, private channels... this was just your gentle introduction. See UTR14 for more.

    A Unicon Thread Story

    Real Life intrudes upon our tender classroom...

    Discussion of Sort Module

    The Icon Program Library sort module handles more exotic sorting needs than those of the built-in sort(). We have a example to consider, but we almost have to get some more core data types and control structures covered in order to appreciate it.

    Bits of Icon/Unicon Wisdom

    Things I love about Icon and Unicon

    Yeah, this list isn't complete...
    x1 < y < x2
    ranges the way I saw them back in math class
    lists and tables
    the most convenient data structures building blocks in any language
    !L === x and P(!L) and such
    the most convenient algorithms building blocks in any language
    open() and friends
    the most convenient graphics and network I/O in any language

    Things I hate about Icon and Unicon

    Run-time errors that have &null values because of typos
    compiler option -u helps but isn't a cure-all
    Run-time errors that have &null values because of surprise failure
    if's are needed to check for failure...in a large percent of expressions
    Computational accidents because of surprise generators
    some things were never meant to be backtracked-into.
    the language is slow
    from time to time I get help from students interested in fixing this
    the IDE is immature
    many Bothan spies died to bring you this IDE.

    OOP Lessons from the Unicon Class Libraries

    The unicon distribution is basically an Icon with an extensively modified VM, plus a uni/ directory that looks like
    3d/   guidemos/  iyacc/     Makefile   progs/  ulex/	unidoc/
    CVS/  ide/	 lib/	    native/    shell/  unicon/	util/
    gui/  ivib/	 makedefs   parser/    udb/    unidep/	xml/
    
    We can't cover all the libraries in a single lecture, but we can learn about objects from some of the highlights.

    lecture #14 began here

    Flex and Bison

    Our next "language" in this course is really two languages that were designed to work together.

    Reading Assignment: Flex

    Read Sections 3-6 of the Flex manual, Lexical Analysis With Flex.

    Regular Expressions

    The notation we use to precisely capture all the variations that a given category of token may take are called "regular expressions" (or, less formally, "patterns". The word "pattern" is really vague and there are lots of other notations for patterns besides regular expressions). Regular expressions are a shorthand notation for sets of strings. In order to even talk about "strings" you have to first define an alphabet, the set of characters which can appear.
    1. Epsilon (ε) is a regular expression denoting the set containing the empty string
    2. Any letter in the alphabet is also a regular expression denoting the set containing a one-letter string consisting of that letter.
    3. For regular expressions r and s,
               r | s
      is a regular expression denoting the union of r and s
    4. For regular expressions r and s,
               r s
      is a regular expression denoting the set of strings consisting of a member of r followed by a member of s
    5. For regular expression r,
               r*
      is a regular expression denoting the set of strings consisting of zero or more occurrences of r.
    6. You can parenthesize a regular expression to specify operator precedence (otherwise, alternation is like plus, concatenation is like times, and closure is like exponentiation)
    Although these operators are sufficient to describe all regular languages, in practice everybody uses extensions:

    lecture 22

    Some Regular Expression Examples

    In a previous lecture we saw regular expressions, the preferred notation for specifying patterns of characters that define token categories. The best way to get a feel for regular expressions is to see examples. Note that regular expressions form the basis for pattern matching in many UNIX tools such as grep, awk, perl, etc.

    What is the regular expression for each of the different lexical items that appear in C programs? How does this compare with another, possibly simpler programming language such as BASIC?
    lexical category BASIC C
    operators the characters themselves For operators that are regular expression operators we need mark them with double quotes or backslashes to indicate you mean the character, not the regular expression operator. Note several operators have a common prefix. The lexical analyzer needs to look ahead to tell whether an = is an assignment, or is followed by another = for example.
    reserved words the concatenation of characters; case insensitive Reserved words are also matched by the regular expression for identifiers, so a disambiguating rule is needed.
    identifiers no _; $ at ends of some; 2 significant letters!?; case insensitive [a-zA-Z_][a-zA-Z_0-9]*
    numbers ints and reals, starting with [0-9]+ 0x[0-9a-fA-F]+ etc.
    comments REM.* C's comments are tricky regexp's
    strings almost ".*"; no escapes escaped quotes
    what else?

    lecture 23

    lex(1) and flex(1)

    These programs generally take a lexical specification given in a .l file and create a corresponding C language lexical analyzer in a file named lex.yy.c. The lexical analyzer is then linked with the rest of your compiler.

    The C code generated by lex has the following public interface. Note the use of global variables instead of parameters, and the use of the prefix yy to distinguish scanner names from your program names. This prefix is also used in the YACC parser generator.

    FILE *yyin;	/* set this variable prior to calling yylex() */
    int yylex();	/* call this function once for each token */
    char yytext[];	/* yylex() writes the token's lexeme to an array */
                    /* note: with flex, I believe extern declarations must read
                       extern char *yytext;
                     */
    int yywrap();   /* called by lex when it hits end-of-file; see below */
    

    The .l file format consists of a mixture of lex syntax and C code fragments. The percent sign (%) is used to signify lex elements. The whole file is divided into three sections separated by %%:

       header
    %%
       body
    %%
       helper functions
    

    Lex/Flex Powerpoint

    What is a "lexical attribute" ?

    A lexical attribute is a piece of information about a token. These typically include:
    category an integer code used to check syntax
    lexeme actual string contents of the token
    line, column, file where the lexeme occurs in source code
    value for literals, the binary data they represent

    Flex Header Section

    The header consists of C code fragments enclosed in %{ and %} as well as macro definitions consisting of a name and a regular expression denoted by that name. lex macros are invoked explicitly by enclosing the macro name in curly braces. Following are some example lex macros.
    letter		[a-zA-Z]
    digit		[0-9]
    ident		{letter}({letter}|{digit})*
    

    The body consists of of a sequence of regular expressions for different token categories and other lexical entities. Each regular expression can have a C code fragment enclosed in curly braces that executes when that regular expression is matched. For most of the regular expressions this code fragment (also called a semantic action consists of returning an integer that identifies the token category to the rest of the compiler, particularly for use by the parser to check syntax. Some typical regular expressions and semantic actions might include:

    " "		{ /* no-op, discard whitespace */ }
    {ident}		{ return IDENTIFIER; }
    "*"		{ return ASTERISK; }
    "."		{ return PERIOD; }
    
    You also need regular expressions for lexical errors such as unterminated character constants, or illegal characters.

    The helper functions in a lex file typically compute lexical attributes, such as the actual integer or string values denoted by literals. One helper function you have to write is yywrap(), which is called when lex hits end of file. If you just want lex to quit, have yywrap() return 1. If your yywrap() switches yyin to a different file and you want lex to continue processing, have yywrap() return 0. The lex or flex library (-ll or -lfl) have default yywrap() function which return a 1, and flex has the directive %option noyywrap which allows you to skip writing this function.

    lecture 24

    A Short Comment on Lexing C Reals

    C float and double constants have to have at least one digit, either before or after the required decimal. This is a pain:
    ([0-9]+"."[0-9]* | [0-9]*"."[0-9]+) ...
    
    You may be happier with something like:
    ([0-9]*"."[0-9]*)    { return (strcmp(yytext,".")) ? REAL : PERIOD; }
    

    or
    ([0-9]*"."[0-9]*)    { return (strlen(yytext)>1) ? REAL : PERIOD; }
    

    You-all know and love C/C++'s ternary e1 ? e2 : e3 operator, don't ya? It's an if-then-else expression, very slick. Since flex allows more than one regular expression to match, and breaks ties by using the regular expression that appears first in the specification, perhaps the following is best:

    "."                { return PERIOD; }
    ([0-9]*"."[0-9]*)  { return REAL; }
    
    This is still not complete.
    After you add in optional "e" scientific exponent notation, what should it look like?
    If present, it is an E followed by an integer with an optional minus sign.
    Remember that there are optional suffixes F and L.
    E, F, and L are case insensitive (either upper or lower case) in real constants if present.

    Lex extended regular expressions

    Lex further extends the regular expressions with several helpful operators. Lex's regular expressions include:
    c
    normal characters mean themselves
    \c
    backslash escapes remove the meaning from most operator characters. Inside character sets and quotes, backslash performs C-style escapes.
    "s"
    Double quotes mean to match the C string given as itself. This is particularly useful for multi-byte operators and may be more readable than using backslash multiple times.
    [s]
    This character set operator matches any one character among those in s.
    [^s]
    A negated-set matches any one character not among those in s.
    .
    The dot operator matches any one character except newline: [^\n]
    r*
    match r 0 or more times.
    r+
    match r 1 or more times.
    r?
    match r 0 or 1 time.
    r{m,n}
    match r between m and n times.
    r1r2
    concatenation. match r1 followed by r2
    r1|r2
    alternation. match r1 or r2
    (r)
    parentheses specify precedence but do not match anything
    r1/r2
    lookahead. match r1 when r2 follows, without consuming r2
    ^r
    match r only when it occurs at the beginning of a line
    r$
    match r only when it occurs at the end of a line

    Flex Manpage Examplefest

    To read a UNIX "man page", or manual page, you type "man command" where command is the UNIX program or library function you need information on. Read the man page for man to learn more advanced uses ("man man").

    It turns out the flex man page is intended to be pretty complete, enough so that we can draw our examples from it. Perhaps what you should figure out from these examples is that flex is actually... flexible. The first several examples use flex as a filter from standard input to standard output.

    Warning: Flex can be Arbitrary and Capricious!

    Perhaps because of a desire for brevity, the lex family of tools makes one the same, fatal and idiotic mistakes as Python and FORTRAN: using whitespace as a significant part of the syntax! Consider when are %{ and %} needed in test1.l, test2.l, test3.l

    Toy compiler example

      /* scanner for a toy Pascal-like language */
    
      %{
      /* need this for the call to atof() below */
      #include <math.h>
      %}
    
      DIGIT    [0-9]
      ID       [a-z][a-z0-9]*
    
      %%
    
      {DIGIT}+    {
         printf("An integer: %s (%d)\n", yytext,
                atoi( yytext ) );
         }
    
      {DIGIT}+"."{DIGIT}*        {
         printf( "A float: %s (%g)\n", yytext,
         atof( yytext ) );
         }
    
      if|then|begin|end|procedure|function        {
         printf( "A keyword: %s\n", yytext );
         }
    
      {ID}        printf( "An identifier: %s\n", yytext );
    
      "+"|"-"|"*"|"/"   printf( "An operator: %s\n", yytext );
    
      "{"[^}\n]*"}"     /* eat up one-line comments */
    
      [ \t\n]+          /* eat up whitespace */
    
      .           printf( "Unrecognized character: %s\n", yytext );
    
      %%
    
      int main(int argc, char **argv )
      {
         ++argv, --argc;  /* skip over program name */
         if ( argc > 0 )
            yyin = fopen( argv[0], "r" );
         else
            yyin = stdin;
    
         yylex();
         return 0;
      }
    

    lecture 25

    HW6 status / discussion

    A large number of HW#6's did not run for the TA. If yours didn't run, please fix and resubmit or see Dr. J to get a clearer understanding of the problem.

    yyin

    consider how yyin is used in the preceding toy compiler example, if you have not already done so.

    Warning: Lex and Flex Are Idiosyncratic!

    Examples of past student consultations:

    Doctor J, my program is sick:
    ...
    IDENT	[a-zA-Z_]+		/* this is an ident */
    ...
    
    C comments are allowed some places in Lex/Flex, but I guess not all. This one causes a cryptic error message where the macro is used.
    Doctor J, my program won't do the regular expression I wrote:
    ...
    [ \t\n]+		{ /* skip whitespace*/ }
    ...
    ^[ ]*[a-zA-Z_]+		{ return IDENT; }
    ...
    
    If the newline and whitespace are consumed by one big grab, the newline won't still be sitting around in the input buffer to match against ^ in this ident rule.

    Point: a language can be declarative, but if it is cryptic and/or gives poor error diagnostics, much of the claimed benefits of declarative paradigm are lost.

    Matching C-style Comments

    Will the following work for matching C comments? A student e-mail proposed:
    [ \t]*"/*".*"*/"[ \t]*\n
    
    What parts of this are good? Are there any flaws that you can identify?

    The use of square-bracket character sets in Flex

    A student once sent me an example regular expression for comments that read:
       COMMENT [/*][[^*/]*[*]*]]*[*/]
    
    This is actually trying to be much smarter that the previous example. One problem here is that square brackets are not parentheses, they do not nest, they do not support concatenation or other regular expression operators. They mean exactly: "match any one of these characters" or for ^: "match any one character that is not one of these characters". Note also that you can't use ^ as a "not" operator outside of square brackets: you can't write the expression for "stuff that isn't */" by saying (^ "*/")

    Does your assignment this semester need to detect anything similar to C style comments? If so, you should find or invent a working regular expression that is better than the "easy, wrong" one. Many different solutions are available around the Internet and in books on lex and yacc, but let's see what we can do. On a midterm exam, I am likely to ask you not for this regular expression, but for a regular expression that matches some pattern of comparable complexity.

    Danger Will Robinson:

    /\* ... \*/
    
    legal in classic regular expressions, not so in Flex which uses / as a lookahead operator! Feel free to try
    \/\* ... \*\/
    

    But I prefer double-quoting over all those slashes. A famous non-solution:

    "/*".*"*/"
    
    and another, pathologically bad attempt:
    "/*"(.|"\n")*"*/"
    

    Flex End-of-file semantics

    yylex() returns integers. From the Flex manual, it returns 0 at end of file. HW#1 NOTE: originally the HW#1 spec said to return -1 on end of file. To do that, you would write a regular expression like
    <<EOF>>		{ return -1; }
    
    This would be compatible with C language tradition of using -1 to indicate EOF in functions such as fgetc(). However, I changed the main.c spec to say it would continue to ask for words/tokens as long as it is getting positive values returned, and it will not matter whether your yylex() function returns 0 or -1 to indicate end of file. Still, you should know about this EOF thing in case I make you do multiple files (and use yywrap()) later on.

    Lexical Attributes and Token Objects

    Besides the token's category, an integer returned by yylex(), the rest of a compiler or interpreter may need several pieces of information about a token in order to perform semantic analysis, code generation, and error handling. These are stored in an object instance of class Token, or in C, a struct. The fields are generally something like:
    struct token {
       int category;
       char *text;
       int   linenumber;
       char *filename;
       union literal value;
    }
    
    The union literal will hold computed values of integers, real numbers, and strings.

    Flex "States" (Start Conditions)

    Section 10 of the Flex Manual discusses start conditions, which allow you to specify a set of states and apply different regular expressions in those different states. State names are declared in the header section on lines beginning with %s or %x. %s states will also allow generic regular expressions while in that state. %x states will only fire regular expressions that are explicitly designated as being for that state.

    There is effectively an implicit global variable that remembers what state you are in. That variable is set using a macro named BEGIN(); in the C code body in response to seeing some regular expression that you want to indicate the start of a state.

    ALL your regular expressions in the main section may optionally specify via <sc> what start condition(s) they belong to.

    lecture 26

    Scroll backwards a bit and review at Start Conditions

    At least in Fall 2013 semester: if I haven't given you any really compelling examples of Start Conditions, and you haven't needed them for your homework, I am not going to put them on an examination.

    Chomsky Hierarchy

    Lexical Structure of Languages

    A vast majority of languages can be studied lexically and found to have the following kinds of token categories:

    In addition, almost all languages will have separators/whitespace that occur between tokens, and comments.

    As you may have seen from homeworks 1-2, regular expressions can't always handle real world lexical specifications. FORTRAN, for example, has lexical challenges such as having no reserved words. Consider the line

    DO 99 I = 1.10
    
    FORTRAN doesn't use spaces as separators. The keyword DO isn't a keyword, unless you change the period to a comma, in which case we can't be doing an assignment to a variable named "DO99I" any more...

    How many of you used "states" (a.k.a. "start conditions")? What online resources for flex have you found? Googling "lex manual" or "flex manual" gives great results.

    Syntax Analysis

    Lexical analysis was about what words occur in a given language. Syntax analysis is about how words combine. In natural language this would be about "phrases" and "sentences"; in a programming language it is how to express meaningful computations. If you could make up any three improvements to C++ syntax, what would they be? Some syntax is a lot more powerful or more readable for humans than others, so syntax design actually matters. And some syntax is a lot harder for the machine to parse. The next language (Bison/YACC) is all about syntax analysis. But first, some broader thoughts.

    Some Comments on Language Design

    Language Design Criteria

    "(programming) language design is compiler construction" - Wirth

    Syntax design considerations

    Context Free Grammars

    A context free grammar G has: A context free grammar can be used to generate strings in the corresponding language as follows:
    let X = the start symbol s
    while there is some nonterminal Y in X do
       apply any one production rule using Y, e.g. Y -> ω
    
    When X consists only of terminal symbols, it is a string of the language denoted by the grammar. Each iteration of the loop is a derivation step. If an iteration has several nonterminals to choose from at some point, the rules of derivation would allow any of these to be applied. In practice, parsing algorithms tend to always choose the leftmost nonterminal, or the rightmost nonterminal, resulting in strings that are leftmost derivations or rightmost derivations.

    lecture 27

    Context Free Grammar Examples

    OK, so how much of the C language grammar can we come up with in class today? Start with expressions, work on up to statements, and work there up to entire functions, and programs.

    YACC

    YACC ("yet another compiler compiler") is a popular tool which originated at AT&T Bell Labs. YACC takes a context free grammar as input, and generates a parser as output. Several independent, compatible implementations (AT&T yacc, Berkeley yacc, GNU Bison) for C exist, as well as many implementations for other popular languages.

    YACC files end in .y and take the form

    declarations
    %%
    grammar
    %%
    subroutines
    
    The declarations section defines the terminal symbols (tokens) and nonterminal symbols. The most useful declarations are:
    %token a
    declares terminal symbol a; YACC can generate a set of #define's that map these symbols onto integers, in a y.tab.h file. Note: don't #include your y.tab.h file from your grammar .y file, YACC generates the same definitions and declarations directly in the .c file, and including the .tab.h file will cause duplication errors.
    %start A
    specifies the start symbol for the grammar (defaults to nonterminal on left side of the first production rule).

    The grammar gives the production rules, interspersed with program code fragments called semantic actions that let the programmer do what's desired when the grammar productions are reduced. They follow the syntax

    A : body ;
    
    Where body is a sequence of 0 or more terminals, nonterminals, or semantic actions (code, in curly braces) separated by spaces. As a notational convenience, multiple production rules may be grouped together using the vertical bar (|).

    rttgram.y example

    A Little Peek Behind Lex and Yacc Magic

    Why? Because you should never trust a declarative language unless you trust its underlying math.

    lecture 28

    Ambiguity

    In normal English, ambiguity refers to a situation where the meaning is unclear, but in context free grammars, ambiguity refers to an unfortunate property of some grammars that there is more than one way to derive some input, starting from the start symbol. Often it is necessary or desirable to modify the grammar rules to eliminate the ambiguity.

    The simplest possible ambiguous CFG:

    S -> x
    S -> x
    
    Maybe you wouldn't write that, but it is pretty easy to do it accidentally:
    S -> A | B
    A -> w | x
    B -> x | y
    
    In this grammar, if the input is "x", the grammar says it is legal. But what is it, an A or a B?

    Conflicts in Shift-Reduce Parsing

    "Conflicts" occur when an ambiguity in the grammar creates a situation where the parser does not know which step to perform at a given point during parsing. There are two kinds of conflicts that occur.
    shift-reduce
    a shift reduce conflict occurs when the grammar indicates that different successful parses might occur with either a shift or a reduce at a given point during parsing. The vast majority of situations where this conflict occurs can be correctly resolved by shifting.
    reduce-reduce
    a reduce reduce conflict occurs when the parser has two or more handles at the same time on the top of the stack. Whatever choice the parser makes is just as likely to be wrong as not. In this case it is usually best to rewrite the grammar to eliminate the conflict, possibly by factoring.
    Example shift reduce conflict:
    S->if E then S
    S->if E then S else S
    

    Consider the sample input

    if E then if E then S1 else S2
    
    In many languages, nested "if" statements produce a situation where an "else" clause could legally belong to either "if". The usual rule attaches the else to the nearest (i.e. inner) if statement. This corresponds to choosing to shift the "else" on as part of the current (inner) if-statement being parsed, instead of finishing up that "if" with a reduce, and using the else for the earlier if which was unfinished and saved previously on the stack.

    Example reduce reduce conflict:

    (1)	S -> id LP plist RP
    (2)	S -> E GETS E
    (3)	plist -> plist, p
    (4)	plist -> p
    (5)	p -> id
    (6)	E -> id LP elist RP
    (7)	E -> id
    (8)	elist -> elist, E
    (9)	elist -> E
    
    By the point the stack holds ...id LP id
    the parser will not know which rule to use to reduce the id: (5) or (7).

    YACC error handling and recovery

    Improving YACC's Error Reporting

    yyerror(s) overrides the default error message, which usually just says either "syntax error" or "parse error", or "stack overflow".

    You can easily add information in your own yyerror() function, for example GCC emits messages that look like:

    goof.c:1: parse error before '}' token
    
    using a yyerror function that looks like
    void yyerror(char *s)
    {
       fprintf(stderr, "%s:%d: %s before '%s' token\n",
    	   yyfilename, yylineno, s, yytext);
    }
    

    Yacc/Bison syntax error reporting, cont'd

    You could instead, use the error recovery mechanism to produce better messages. For example
    lbrace : LBRACE | { error_code=MISSING_LBRACE; } error ;
    
    Where LBRACE is an expected token {
    This uses a global variable error_code to pass parse information to yyerror().

    Another related option is to call yyerror() explicitly with a better message string, and tell the parser to recover explicitly:

    package_declaration: PACKAGE_TK error
    	{ yyerror("Missing name"); yyerrok; } ;
    

    But, using error recovery to perform better error reporting runs against conventional wisdom that you should use error tokens very sparingly. What information from the parser determined we had an error in the first place? Can we use that information to produce a better error message?

    lecture 29

    Getting Lex and Yacc to Talk

    The main way that Lex and YACC communicate is by the parser calling yylex() once for each terminal symbol in the input sequence. The terminal symbol is indicated by the integer function return values returned by yylex().

    An extended example of this functioning can be built by expanding the earlier Toy compiler example Flex file for a subset of Pascal so that it talks to a similar toy Bison grammar.

    Getting Lex and Yacc to Talk ... More

    In addition, YACC uses a global variable named yylval, of type YYSTYPE, to collect lexical information from the scanner. Whatever is in this variable each time yylex() returns to the parser is copied over onto the top of a parser data structure called the "value stack" when the token is shifted onto the parse stack.

    The YACC Value Stack

    yacc/bison: The Calc Demo

    The first of these files includes a full handwritten yylex() in C, which the second file would replace via flex. A "token" must be returned for a newline character if one wishes the calculator to calculate at that point.

    lecture 30

    Here is another "calc" example from [Louden], that

    E : E '+' T | E '-' T | T ;
    T : T '*' G | T '/' G | G ;
    G : F '^' G | F ;
    F : N | '(' E ')' ;
    N : D N | D ;
    D : '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' ;
    

    Question: how would you extend this grammar to do the exponentiation (using '^' for exponentiation)? How would you encode its right-associativity? Question: how would you modify this grammar to compute the values using the value stack? Especially, how would you compute $$ values for nonterminals D and N?

    Note: C version of yylex() around 32 lines; Flex version 17 lines.

    Using the Value Stack for More Than Just Integers

    You can either declare that struct token may appear in the %union, and put a mixture of struct node and struct token on the value stack, or you can allocate a "leaf" tree node, and point it at your struct token. Or you can use a tree type that allows tokens to include their lexical information directly in the tree nodes. If you have more than one %union type possible, be prepared to see type conflicts and to declare the types of all your nonterminals.

    Getting all this straight takes some time; you can plan on it. Your best bet is to draw pictures of how you want the trees to look, and then make the code match the pictures. No pictures == "Dr. J will ask to see your pictures and not be able to help if you can't describe your trees."

    Declaring value stack types for terminal and nonterminal symbols

    Unless you are going to use the default (integer) value stack, you will have to declare the types of the elements on the value stack. Actually, you do this by declaring which union member is to be used for each terminal and nonterminal in the grammar.

    Example: in a .y file we could add a %union declaration to the header section with a union member named treenode:

    %union {
      nodeptr treenode;
    }
    
    This will produce a compile error if you haven't declared a nodeptr type using a typedef, but that is another story. To declare that a nonterminal uses this union member, write something like:
    %type < treenode > function_definition
    
    Terminal symbols use %token to perform the corresponding declaration. If you had a second %union member (say struct token *tokenptr) you might write:
    %token < tokenptr > SEMICOL
    

    Mailbag

    What should main() look like?
    Probably it should live in its own .c file, such as main.c or myapp.c. It should #include include files or declare externs or provide prototypes to access any item from the generated lex or yacc .c files. It may need to include a global variable to hold the filename; most tools like this need to remember the filename they are working on in order to report errors.
    you said to modify our .l file as needed, but the assignment says it is a Bison assignment (a .y file), what up?
    Bison does not live in a vacuum, it is always used with C and/or Flex. Turn in everything needed to build your program.

    Hand-simulating an LR parser

    Suppose we simulate the "calc" parser on an example input. It uses the following algorithm. The details are sort of beyond the scope of this class; what you are supposed to get out of this is some intuition.
    ip = first symbol of input
    repeat {
       s = state on top of parse stack
       a = *ip
       case action[s,a] of {
          SHIFT s': { push(a); push(s') }
          REDUCE A -> β: {
             pop 2*|β| symbols; s' = new state on top
             push A
             push goto(s', A)
             }
          ACCEPT: return 0 /* success */
          ERROR: { error("syntax error", s, a); halt }
          }
       }
    

    LR Parsing Cliffhanger.

    OK, here comes a sample input data! The grammar is:
    E : E '+' T | E '-' T | T ;
    T : T '*' G | T '/' G | G ;
    G : F '^' G | G ;
    F : NUM | '(' E ')' ;
    
    What we are really missing in order to actually simulate a shift-reduce parse of this are the parse tables and how they are calculated -- this is covered thoroughly in a number of compiler writing textbooks. By the way LR parsing (the magic that YACC does) is not the only or most human-friendly of parsing methods.

    discussion of parsing "(213*11^5)-8"

    One thing left implicit in the previous lecture was that the lexical analysis and the parsing are usually interleaved -- it is not like the whole array of tokens has been constructed before parsing. Rather, yyparse() calls yylex() once every time it shifts, and lexical analysis is gradually performed. This might mix CPU operations and I/O operations in an attractive balance, but in practice, the I/O has to be heavily buffered to get good performance at it. You can at least figure that you are starting with an array of characters

    Now, let's see that parse again. The array of char looks like.
    (213*11^5)-8
    The parse stack is empty, yyparse() calls yylex() to read the first token

    Parse stack current token remaining input
    empty '('
    213*11^5)-8

    Shift or reduce ? -- shift. Note that you could reduce, even in this empty stack case, if the grammar had a production rule where there was some optional thing at the start.

    Parse stack current token remaining input
    '(' NUM213
    *11^5)-8

    Shift or reduce ? -- shift. Can't reduce '('.

    Parse stack current token remaining input
    NUM213
    '('
    '*'
    11^5)-8

    Shift or reduce ?? Before we can shift a '*' onto the stack, we have to have an T. We don't have one, we have to reduce. What can we reduce? We can reduce NUM to an F.

    Parse stack current token remaining input
    F
    '('
    '*'
    11^5)-8

    Shift or reduce ?? We still have to have a T and don't, so reduce again.

    Parse stack current token remaining input
    T
    '('
    '*'
    11^5)-8

    Shift or reduce ?? Shift the '*'

    Parse stack current token remaining input
    '*'
    T
    '('
    NUM11
    ^5)-8

    Shift or reduce ??

    lecture 31

    Comments from Student Office-Hour Visits

    Debugging a Bison Program

    The power of lex and yacc (flex and bison) is that they are declarative: you don't have to supply the algorithm by which they work, you can treat it as if it is magic. Good luck debugging magic. Good luck using gdb to try and step through the generated parser. If "bison --verbose" generates enough information for you to debug your problem, great. If not, your best hope is to go into the .tab.c file that Bison generates, and turn on YYDEBUG and then assign yydebug=1. If you do, you will get a runtime trace of the shifts and the reduces. Between that and a trace of every token returned by yylex(), you can figure out what is going on, or get help with it.

    An Inconvenient Truth about YACC and Bison

    Did we mention that the parsing algorithm used by YACC and Bison (LALR) can only handle a subset of all legal context free grammars?

    lecture 32

    Extended Discussion of Parse Trees and Tree Traversals

    Semantics

    Semantics, as you may recall, means is the study of what something means.

    Attributes

    It is tempting to use the heavily-overloaded term attributes when talking about semantic properties that a compiler or interpreter would know about a name in order to apply its meaning in terms of code. When we talk about lexical analysis we have lexical attributes, when we talk about syntax we have syntactic attributes (which can build on or make use of lexical attributes), and when we talk about semantics, we have semantic attributes (which can build on or make use of lexical and syntactic attributes). Cheesey example:
    double f(int n)
    {
       ...
    }
    
    In order for any code elsewhere in the program to use f correctly, it had better know what attributes? So for example, if the input included somewhere later in the program
        x = f('\007');
    

    Environment and State

    Environment maps source code names onto storage addresses (at compile time), while state maps storage addresses into values (at runtime). Environment relies on binding rules and is used in code generation; state operations are loads/stores into memory, as well as allocations and deallocations. Environment is concerned with scope rules, state is concerned with things like the lifetimes of variables.
    name
    --(scope)-->
    declaration
    --(binding)-->
    address
    --(state)-->
    value
    --------------------------(environment)------------------------------

    Scopes and Bindings

    Variables may be declared explicitly or implicitly in some languages

    Scope rules for each language determine how to go from names to declarations.

    Each use of a variable name must be associated with a declaration. This is generally done via a symbol table. In most compiled languages it happens at compile time, but interpreters will build and maintain a symbol table while the program runs.

    A few comments about Nested Blocks

    Louden gives examples of languages' variations in how they do nesting of blocks and variable declarations. Why do we care? Semantics has to map names to addresses, and it can be confusing especially when the name name is "live" with different memory locations at the same time ... in different scopes.

    Runtime Memory Regions

    Operating systems vary in terms of how the organize program memory for runtime execution, but a typical scheme looks like this:

    code
    static data
    stack (grows down)
    heap (may grow up, from bottom of address space)

    The code section is usually read-only, and shared among multiple instances of a program. Dynamic loading may introduce multiple code regions, which may not be contiguous, and some of them may be shared by different programs. The static data area may consist of two sections, one for "initialized data", and one section for uninitialized (i.e. all zero's at the beginning). Some OS'es place the heap at the very end of the address space, with a big hole so either the stack or the heap may grow arbitrarily large. Other OS'es fix the stack size and place the heap above the stack and grow it down.

    Much CPU architecture has included sophisticated support for making the stack as fast as possible, and more generally, for making repeated and sequential memory accesses as fast as possible. This sort of ideally fits C and Pascal (i.e. traditional "structured" imperative programming) and performs pathologically poorly on Lisp (functional) and OOP languages that exhibit poor locality of reference, exaggerating the already extreme speed differences between medium-level languages and very high level languages. Hardware that eschews caches in favor of "more cores" are not as biased.

    Symbol Tables

    Symbol tables are used to resolve names within name spaces. Symbol tables are generally organized hierarchically according to the scope rules of the language. Although initially concerned with simply storing the names of various that are visible in each scope, symbol tables take on additional roles in the remaining phases of the compiler. In semantic analysis, they store type information. And for code generation, they store memory addresses and sizes of variables.

    mktable(parent)
    creates a new symbol table, whose scope is local to (or inside) parent
    enter(table, symbolname, type, offset)
    insert a symbol into a table
    lookup(table, symbolname)
    lookup a symbol in a table; returns structure pointer including type and offset. lookup operations are often chained together progressively from most local scope on out to global scope.
    addwidth(table)
    sums the widths of all entries in the table. ("widths" = #bytes, sum of widths = #bytes needed for an "activation record" or "global data section"). Worry not about this method until code generation you wish to implement.
    enterproc(table, name, newtable)
    enters the local scope of the named procedure

    Variable Reference Analysis

    The simplest use of a symbol table would check:

    Allocation and Variable Lifetimes

    Activation Records

    Activation records organize the stack, one record per method/function call.
    return value
    parameter
    ...
    parameter
    previous frame pointer (FP)
    saved registers
    ...
    FP-->saved PC
    local
    ...
    local
    temporaries
    SP-->...
    At any given instant, the live activation records form a chain and follow a stack discipline. Over the lifetime of the program, this information (if saved) would form a gigantic tree. If you remember prior execution up to a current point, you have a big tree in which its rightmost edge are live activation records, and the non-rightmost tree nodes are an execution history of prior calls.

    Aliasing, and Dangling References

    How many kinds of aliasing can occur?

    "Modern" Runtime Systems

    The preceding discussion has been mainly about traditional languages such as C. Object-oriented programs might be much the same, only every activation record has an associated object instance; they need one extra "register" in the activation record. In practice, modern OO runtime systems have many more differences than this, and other more exotic language features imply substantial differences in runtime systems. Here are a few examples of features found in runtimes such as the Java Virtual Machine and .Net CLR.

    Supplemental Comments on Imperative Programming

    Imperative programming is programming a computer by means of explicit instructions. Assembler language uses imperative programming, as do C, C++, and most other popular languages.

    One way to think of imperative programming is that it is any programming in which the programmer determines the control flow of execution. This might be using goto's or loops and conditionals or function calls. It contrasts with declarative programming, where the programmer specifies what the program ought to do, but does not determine the control flow.

    Def: a program is structured if the flow of control through the program is evident from the syntactic structure of the program text. "evident" means single-entry/single-exit.

    Common constructs in imperative programming include:

    Assertions, invariants, preconditions, and postconditions

    The problem with imperative programming is: you know you told the computer to do something, but how do you know that you told it to do what you want? In particular, people write code that behaves differently than they intend all the time. We reason about program correctness by inserting logical assertions into our code; these may be annotations or actual checks at runtime to verify that expected conditions are true. Curly brackets {expr} are often used to enclose assertions, especially among former Pascal programmers; another common convention is assert(expr), which is a macro available in many C compilers.

    A precondition is an assertion before a statement executes, that defines the expected state. It defines requirements that must be true in order for the statement to do what it intends. A postcondition is an assertion after a statement executes that describes what the statement has caused to become true. An invariant is an assertion of things that do not change during the execution of a statement. An invariant is particularly useful with loop statements.

    while x >= y do
       { x >= y if we get here }
       x := x - y
    
    suppose {x >= 0 and y > 0} is true. Then we can further say { x >= y > 0} inside the loop. After the assignment, a different assertion holds:
    { x >= 0 and y > 0}
    while  x >= y do
       { y >= 0 and x >= y }
       x := x - y
       { x >= 0 and y > 0 }
    
    While these kinds of assertions can allow you to prove certain things about program behavior, they only allow you to prove that program behavior corresponds to requirements if requirements are defined in terms of formal logic. There is a certain difficulty in scaling up this approach to handle real-world software systems and requirements, but there is certainly a great need for every technique that helps programmers write correct programs.

    Java

    One popular representative modern object-oriented language is Java.

    Reading Assignment

    Some Java Slides

    Compiling and Running Java Locally on Wormulon

    Add the following to your ~/.bashrc file. They specify the sizes of Java's heap memory region. By default Java asks for a size that fails on wormulon!
    alias java="java -Xmx100m -Xms10m"
    alias javac="javac -J-Xmx100m"
    
    These aliases should be placed in your ~/.profile or possibly ~/.bashrc file. You may have to "source" the file that you place them in order for the current shell session to see those aliases, but in subsequent logins they should just be there for you automatically since shells autoload such commands.

    Once you have your aliases setup, compile with "javac hello.java" and run with "java hello"

    Example #0

    We looked at a hello.java that was specially tailored to remind you of features you would need for homework 1: random numbers from java.util and the command line arguments passed into main().

    Things to Learn About Java Today

    Java is an Almost-SmallTalk?

    A few languages (mainly SmallTalk) have chosen to be "pure OO", meaning that everything down to basic integers and characters are objects. Most languages don't go that far -- Java for example has built-in types like "int" and constructs like arrays, but then very quickly you are forced to use system classes, and encouraged to organize your own code with classes.

    So, it isn't about whether you will use classes a lot in Java, like it would be in C++. It is: how are you going to map your application domain onto a set of (built-in system, or new written-by-you) classes? For many problems, this is a natural fit, but for other problems it is silly and awkward.

    When to OOP?

    When you use a language where OOP is optional, go OOP under two (2) circumstances:
    1. your application domain maps naturally onto a set of classes, or
    2. your problem is so large that you will have trouble wrapping your brain around the whole thing.
    In other words: OOP becomes more and more useful as your program size grows.

    Another Example of Bad OOP in Java

    HW#1 in Java
    Sure you can use Java to write recursive Lisp functions. But if your class is a set of unrelated functions that do not share state, it is pretty bad OOP.

    Java Concepts (and APIs) to Learn Today

    IO: the next steps

    Exception Basics

    JAR files

    java archive file format bundles multiple files (usually .class files) into a single archive. They are really ZIP files, but the jar command-line program uses commands similar to the classic UNIX tar(1) command.

    Unlike C/C++, Java does not have a "linker" that resolves symbols at "link time" to produce an executable. Symbols are resolved at "load time" which is generally the first time that a class is needed/used, often during program startup/initialization. This can mean that Java programs are slower to start than native code executables, but it does provide a certain flexibility.

    Since Java does not have a linker, JAR files are the closest approximation that it has: a Jar archive can bundle a collection of .class files as one big file that can be run directly by the java VM (using the -jar option). To build a JAR that will run as a program, you specify the options "cfe", the name of which class' main() function to use at startup, and the set of class files:

    jar cfe foo.jar foo foo.class bar.class baz.class
    java -jar foo.jar
    
    The options cfe stand for "create" a "file" with an "entrypoint".

    Separate Compilation and Make

    You might have seen the world-famous and ultra-fabulous "make" tool already. If you already know it, awesome. In any case, "make" is an example of the declarative programming paradigm.

    Consider this example makefile:

    hello.jar: hello.class
    	jar cfe hello.jar hello hello.class
    
    run: hello.jar
    	java -jar hello.jar
    
    hello.class: hello.java
    	javac hello.java
    
    What it defines are build rules for building a set of files, and a dependency graph of files that combine to form a whole program.

    Enscript

    enscript(1) is a program that converts ASCII text files into postscript. It has some basic options for readable formatting.
    enscript --color=1 -C -Ejava -1 -o hello.ps hello.java && ps2pdf hello.ps
    
    produces a PDF like this.

    CS 210 Java Example: Hamurabi

    This is a past semester's CS 210 homework assignment, to use Java to write the classic resource simulation program called Hammurabi, with local extensions described below.

    Hammurabi in a Nutshell

    Hammurabi, the Babylonian king, is a tyrant who wants to grow his population to the largest possible size in order to be the most powerful ruler on earth. In ancient mesopotamia there is a lot of fertile land due to the annual flooding, but there are no defendable borders and the only safety lies in numbers (of spears). To make more people, you have to grow more food, which means you have to plant more land, which takes more seed grain. And by the way, the harvest yield varies from year to year, ranging from 0 to enormous. But the more grain you store, the higher percentage of stored grain is lost each year (rats, corruption, whatever).

    Students were asked to modify an existing Java program to fill in the missing Java code to report on current population and grain and land holdings, and then ask Hamurabi each year:

    Hamurabi: the Java Code

    Sample code at http://www.roseindia.net/java/java-tips/oop/q-hammurabi/q-pr-hammurabi-1.shtml was given as a starting point; its open source source files were locally copied at

    Required Addition

    As obtained from the internet, class Hammurabi is not object-oriented enough. It has no member variables or non-static methods. Students were asked to modify the simulation so that it supported computer-controlled enemy countries, and reported to the user on their progress each year.

    What to Learn About Java from the Hamurabi Code

    There is some substantially interesting code there. What Java can we learn from it?
    Code by delta (Δ refers to change)
    Whether you call it extension, modification, generalization, or filling in the blanks, lots of Java programs are written by modifying existing classes. Sometimes that means writing subclasses. How much inheritance have you done so far in your programming?
    Object creation and method invocation
    Have you gotten the basic OO syntax of Java yet? Is it any different from C++ so far? if so, how so?
    Wrapper Classes
    Java deals with its impurity by providing wrappers for non-class builtin types. Java programmers should know the basics of Integer, Double, Float, Short, Long, Character, Boolean, Void, and Byte. Start with the parse*() methods, e.g. Integer.parseInt(s)
    Did we say "No preprocessor"?
    Constant names get awkward:
    private final static int POUND_DEFINE_WAS_SO_COOL = 1;
    
    Getters and setters = lame-o-OO
    But I guess setters are the ones that really bug me. And I can live with them so long as they are controlled.
    Know how to (use) "swing"?
    javax.swing is a graphical user interface library. Most Java applications might be written using this class library, unless they are applets, or are written in JOGL or something like that.
    Graphical interface
    In order to run swing programs, you almost have to either install and run Java on a local computer, or run on Linux machines in the lab. It is possible run swing and other graphic programs on wormulon, but only if you install an "X Window server" program on your local machine, and have an SSH connection that does "X11 port forwarding". And that can be slow, especially if you are not on campus. Avoid using wormulon this way unless you have good reason.
    Who/what is JOptionPane?
    Minimally you should know its showInputDialog() and showMessageDialog() methods.

    A Couple Items Gleaned from HW#1

    don't use an object instance to invoke a static method
    CLASS.mystaticmethod(), not instance.mystaticmethod()
    do use templated collection typenames in constructors (after "new")
    ArrayList<String> names = new ArrayList<String>();
    What else did you learn, or do you have questions about?

    HW#2

    An Exceptional Example: Using a Class to Make "Swing" Optional

    When I compiled and tried to run the hamurabi from roseindia.net on wormulon, I originally got:
    > java hamurabi
    Exception in thread "main" java.awt.HeadlessException: 
    No X11 DISPLAY variable was set, but this program performed an operation which requires it.
    ... long java runtime exception stack trace ...
    
    But back then with no X11, what was Dr. J to do? Options include:
    1. Rewrite the game code to just use the console, skip the GUI dialogs.
    2. Run locally, instead of running on the machine where we turn code in.
    3. Modify the game to ask whether a GUI is available, and use the console when no GUI will work.
    Option #3 has more options.
    1. Try and detect whether graphics are present, without using them, in order to avoid the exception in the example.
    2. Just go ahead and try to use graphics, and if they fail, handle the exception and enable the fallback.
    At first I checked if the DISPLAY environment variable was set; if it isn't, then we should use the console:
    if (System.getenv("DISPLAY") == null) // ... use console
    
    but that is not exactly portable -- on MS Windows no DISPLAY is needed. So a better solution is to use an exception handler to catch that fatal error we saw earlier, and revert to console IO:
    	use_swing = true;
    	try {
    	    JOptionPane.showMessageDialog(null,
    					  "Minister says we are swinging");
    	} catch (Exception e) {
    	    System.out.println("Minister says we are using the console.");
    	    use_swing = false;
    	}
    

    Using Exceptions in OO Design

    Last time we saw a try...catch statement that allows Java to gracefully recover from a runtime error and fall back to using the console when Swing is not available. Where to put this code?

    At this point, our object-oriented version of Hammurabi looks like the following picture:

    More About Inheritance

    OOP experts will tell you that there are different kinds of inheritance: abstract inheritance and concrete inheritance.
    abstract inheritance
    inheritance of a public interface, which is to say, a set of methods with matching/compatible signatures. Abstract inheritance is exactly that (sub)part of inheritance necessary for polymorphism to work.
    A signature
    Is a function's prototype information: name, number and type of parameters, and return type
    concrete inheritance
    concrete inheritance consists of inheriting actual code.

    Interfaces

    Unlike C++, Java actually has an explicit construct for Interfaces. From the Java Tutorials we see:
    interface Bicycle {
        void changeCadence(int newValue);    //  wheel revolutions/minute
        void changeGear(int newValue);
        void speedUp(int increment);
        void applyBrakes(int decrement);
    }
    
    This contains you no code. All it enables is that various classes can now be declared to implement the interface as follows:
    class ACMEBicycle implements Bicycle {
        // remainder of this class 
        // implemented as before
    }
    
    This let's you write code that takes parameters of type Bicycle. Such code will be inherently polymorphic, working with any classes that implement the Bicycle interface.

    Concrete Inheritance

    Java has a limited, simple form of concrete inheritance. Suppose you have a nice generic bicycle class implemented:
    public class Bicycle {
        public int cadence, gear, speed;
        public Bicycle(int startCadence, int startSpeed, int startGear) {
            gear = startGear; cadence = startCadence; speed = startSpeed; }
        public void setCadence(int newValue) {  cadence = newValue; }
        public void setGear(int newValue)    {  gear = newValue;    }
        public void applyBrake(int decrement) { speed -= decrement; }
        public void speedUp(int increment)    { speed += increment; }
    }
    

    Thinking Object Orientedly

    Y'all have programmed in an object-oriented language such as C++ for awhile now; what does it mean to think object-orientedly?









    As a young computer scientist, I read and believed that object-orientation consisted of:

    encapsulation + polymorphism + inheritance
    Each of these terms is important to this course.
    encapsulation
    closely related to information hiding, this is the idea that access to a set of related data can be protected and controlled, so as to avoid bugs and ensure consistency between different bits of data. This concept has been mathematically expressed in the notion of an Abstract Data Type (ADT), which is a set of values and a set of rules (operations) for manipulating those values. In programming languages, it is provided by a class or module construct.
    polymorphism
    Literally meaning "many shapes" or more loosely "shape changing", this idea is that if you write an algorithm in terms of a set of abstract operations, that algorithm can work on different data types. It occurs in some languages as templates (C++), generics (Ada), interfaces (Java), by passing functions as parameters (C), or simply going with a flexible, dynamic type system (Lisp).
    inheritance
    By analogy to biological inheritance of traits or genes, inheritance is when you define a class in terms of an existing class.

    Encapsulation

    Write functions (a la functional programming) around collections of related data. By convention or language construct, hide/protect that (private) data behind a set of public interface functions.

    This is the single most important principle of OOP. It is more than just saying "class" a few times in each program. It is usually well-supported in any OO language. The potential abuse comes from the encumbrance of too much required syntax which distracts programmers from the actual problems they need to solve.

    Algorithms written to use an encapsulated object and access it only via its interface functions will not mind if you totally rewrite its innards to fix it, make it faster, etc.

    Polymorphism

    Algorithms written to use an encapsulated object and access it only via its interface functions will not mind if you totally substitute other types of objects, including unrelated objects that implement the same interface.

    Dynamic OOP languages usually support this well. Static OOP languages usually support polymorphism somewhat awkwardly, as is the case of C++ templates.

    Inheritance

    The major difference between OO languages and other languages with strong information hiding encapsulation is inheritance. Inheritance can mean: starting with generic code, and augmenting it gradually with special cases and extra details. There is abstract vs. concrete inheritance, and parent-centric vs. child-centric inheritance. There is multiple inheritance.

    The above concepts are important and useful. They are what object-oriented programming languages typically try to directly support. However, they do not tell the whole story, and programmers who stop there often write bad OO code.

    The best way to think object-orientedly is to think of the computer program as modeling some application domain. The model of the application domain is the heart of the software design for any program that you write, so the best way to think object-orientedly is from a software engineering perspective, constructing the pieces that the customer needs in order for this program to solve their problems.

    Java Inheritance Discussion, interrupted.

    For any number of customized, specialty bicycles, you might want to start by saying "they behave just like a regular bike, except ..." and then give some changes. In Java you declare such a subclass with the extends reserved word:
    public class MountainBike extends Bicycle {
        public int seatHeight; // subclass adds one field
        // overrides constructor, calls superclass constructor
        public MountainBike(int startHeight, int startCadence,
                            int startSpeed,  int startGear) {
            super(startCadence, startSpeed, startGear);
            seatHeight = startHeight;
        }   
        public void setHeight(int newValue) {    // subclass adds one method
            seatHeight = newValue;
        }   
    }
    

    Two ways to check whether your Bicycle is a mountain bike

    1. MountainBike mb = (MountainBike)b;
      
    2. if (b instanceof MountainBike) ...
      
    But note that usually if you were going to say:
    if (b instanceof MountainBike) b.doMountainyStuff()
    else if (b instanceof RacingBike) b.doRacingStuff()
    ...
    
    you'd be more object-oriented, and more efficient, to be defining a method doStuff and having each class override it, so you can just say
    b.doStuff()
    

    OO Design Practice

    What class reorganization or addition is needed in order to meet your homework's requirements for (computer-controlled) players 2-4? Let us consider the design options.

    Java Trails Commentary

    Do the required online reading of the Trails Covering the Basics! Be sure you know about:
    JavaDoc
    Know what /** */ comments are for, and be able to give examples.
    JavaBeans
    This component technology seems to be famous or important. For what?
    applets
    What are applets, and how do I write one?
    NetBeans
    What is NetBeans good for?
    Java's byte vs. char types
    What is the difference? What's with those '\uffff'-style char literals?

    JavaDoc

    Who it is for: large scale software system builders.

    What it does: write out a collection of webpages to help "navigate" your Java class libraries.

    Big success, inspired numerous copycats!!

    Writing Doc Comments [from Oracle documentation]

    A doc comment is written in HTML and must precede a class, field, constructor or method declaration. It is made up of two parts -- a description followed by block tags. In this example, the block tags are @param, @return, and @see.
    /**
     * Returns an Image object that can then be painted on the screen. 
     * The url argument must specify an absolute {@link URL}. The name
     * argument is a specifier that is relative to the url argument. 
     * 

    * This method always returns immediately, whether or not the * image exists. When this applet attempts to draw the image on * the screen, the data will be loaded. The graphics primitives * that draw the image will incrementally paint on the screen. * * @param url an absolute URL giving the base location of the image * @param name the location of the image, relative to the url argument * @return the image at the specified URL * @see Image */ public Image getImage(URL url, String name) { try { return getImage(new URL(url, name)); } catch (MalformedURLException e) { return null; } }

    printf / Math

    Note the %n, which may write out \n, \r, or \r\n depending on which platform you are on. The Math class methods are static; the System.out methods are not.
    public class BasicMathDemo {
        public static void main(String[] args) {
            double a = -191.635, b = 43.74;
            int c = 16, d = 45;
            double degrees = 45.0, radians = Math.toRadians(degrees);
    
            System.out.printf("The absolute value of %.3f is %.3f%n", 
                              a, Math.abs(a));
    
            System.out.printf("The ceiling of %.2f is %.0f%n", 
                              b, Math.ceil(b));
    
            System.out.format("The cosine of %.1f degrees is %.4f%n",
                              degrees, Math.cos(radians));
    
        }
    }
    

    Arrays Example

    Have you seen this syntax enough to be familiar with it yet?
    int[] anArray;
    anArray = new int[10];
    
    Also, be sure you can recognize code like:
    int[] anArray = {100, 200, 300, 400, 500, 600, 700, 800, 900, 1000};
    
    Arrays are not objects, but they have (at least) one field: anArray.length gives the array's size.

    Strings versus arrays of char

    Strings really are not arrays of char. Consider this example:
    public class hello {
       public static void main(String[]args){
         String s = "Niagara. O roar again!"; 
         char c = s[9];
         System.out.println("10th char of "+s+" is "+c);
       }
    }
    
    You have to say s.charAt(9) instead of s[9].

    lecture #8 began here

    More on Exceptions

    Three kinds:
    checked
    probably recoverable. catch-or-specify required
    error
    you can catch it, but you probably can't recover. problem outside the app.
    runtime
    you can catch it, but you probably can't recover. problem inside the app, i.e. a bug that needs to be fixed.
    Additional observations:
    try {
        out = new PrintWriter(new FileWriter("OutFile.txt"));
        for (int i = 0; i < SIZE; i++) {
            out.println("Value at: " + i + " = " + list.get(i));
        }
    } catch (FileNotFoundException e) {
        System.err.println("FileNotFoundException: " + e.getMessage());
        throw new SampleException(e);
    
    } catch (IOException e) {
        System.err.println("Caught IOException: " + e.getMessage());
    }
    
    By the way, if you don't handle an exception (no "catch"), you can still use a try { } block to document that you know an exception may occur there. Also, a finally clause will execute at the end of a try block whether an exception is handled or not.
    static String readFirstLineFromFileWithFinallyBlock(String path)
    throws IOException {
        BufferedReader br = new BufferedReader(new FileReader(path));
        try {
            return br.readLine();
        } finally {
            if (br != null) br.close();
        }
    }
    

    Concurrency

    Threads

    A thread is a computation, with a set of CPU registers and an execution stack on which to evaluate expressions, call methods, etc.

    In Java, threads can be created for any Runnable class, which must implement a public void method named run().

    public class HelloRunnable implements Runnable {
        
        public void run() {
            System.out.println("Hello from a thread!");
        }
        public static void main(String args[]) throws InterruptedException {
        Thread t;
        HelloRunnable r = new HelloRunnable();
            (t = new Thread(r)).start();
             // can use r to "talk" to the child thread via class variables...
             t.join();
        }
    }
    

    Easy Synchronization

    Synchronization means: forcing concurrent threads to take turns, and wait for each other to finish. Imagine trying to talk at the same time as someone you are with.
        public synchronized void increment() {
            c++;
        }
    

    Communication

    Threads are in the same address space so they can can "talk" by just storing values in variables that each other can see. Examples would be static variables, and class fields in instances that both threads know about (how would both threads know about an instance???).

    The main kicker is to avoid race conditions, where two threads get inconsistent information by writing to the same variable at the same time. How to avoid that? Synchronization.

    CLASSPATH

    This -cp command line argument (to java) or CLASSPATH environment variable specifies a list of directories and/or .jar files in which to search for user class files. In large/complex Java applications, it is often Very difficult to keep this straight.

    Collections

    Compared with more dynamic languages, Java has to spend a fair amount of work to provide full compile-time type safety and reasonable polymorphism. The organization of its "collections framework" reflects that challenge. They use template classes a lot to allow types like "collection of X" but are not good at handling "collection of mixed stuff" codes.
    Interfaces
    There is a whole hierarchy of collection interfaces algorithms code for.
    Implementations
    A set of reusable data structures
    Algorithms
    Searching, sorting, etc.
    Per the Oracle docs:

    Typical is to declare via:

     abstracttype<elem> var = new concretetype<elem>(...);
    
    The actual Collection base interface mainly defines size(), isEmpty(), contains(o), iterator(), plus the ability to convert to/from other collections and/or arrays. They usually also have add(o) and remove() operation(s) of some kind.

    Iterating

    Iterable classes have an iterator() method that returns an object Iterator() that sort of keeps track of where they are in the original object and let's you walk through its elements. Mainly Iterators provide a next() method to get the next element, and a hasNext() to say whether they are done or not.

    Lists

    Ordered collections know how to: sort, shuffle, reverse, rotate, swap, replaceAll, fill, copy, binarySearch... kind of obviously related to Lisp lists, but several implementations available with different performance strengths and weaknesses.

    Maps

    Hash tables are one of the most important types in any "high level" language.

    Notice that in order to initialize this "word frequency counter", you first do a m.get(), and if it is null you start the count at 1. Otherwise, you increment the count.

    import java.util.*;
    public class Freq {
        public static void main(String[] args) {
            Map<String, Integer> m = new HashMap<String, Integer>();
            // Initialize frequency table from command line
            for (String a : args) {
                Integer freq = m.get(a);
                m.put(a, (freq == null) ? 1 : freq + 1);
            }
            System.out.println(m.size() + " distinct words:");
            System.out.println(m);
        }
    }
    

    Introspection

    "to look inside oneself" -- really in programming languages, it is the ability of an object to describe itself at runtime. C++ has the concept of "runtime type information" which is similar. In Java, any object can be asked its getClass() method, which returns a Class object that can cough up its fields, methods, etc. Consider the following example from http://www.cs.grinnell.edu/~rebelsky/Courses/CS223/2004F/Handouts/introspection.html
    public static void summarize(Object o) throws Exception
    {
        Class c = o.getClass();
        System.out.println("Class: " + c.getName());
        Method[] methods = c.getMethods();
        System.out.println("  Methods: ");
        for (int i = 0; i < methods.length; i++) {
          System.out.print("    " + methods[i].toString());
          if (methods[i].getDeclaringClass() != c)
            System.out.println(" (inherited from " +
              methods[i].getDeclaringClass().getName() + ")");
          else
            System.out.println();
        }
      } // summarize(String)
    

    JavaBeans

    Just so you all have heard a bit about them, JavaBeans are reusable software components. They are just classes that follow a few conventions.

    Applets

    An Applet is a Java program that will run in a web browser.
    import javax.swing.JApplet;
    import javax.swing.SwingUtilities;
    import javax.swing.JLabel;
    
    public class HelloWorld extends JApplet {
        //Called when this applet is loaded into the browser.
        public void init() {
            //Execute a job on the event-dispatching thread; creating this applet's GUI.
            try {
                SwingUtilities.invokeAndWait(new Runnable() {
                    public void run() {
                        JLabel lbl = new JLabel("Hello World");
                        add(lbl);
                    }
                });
            } catch (Exception e) {
                System.err.println("createGUI didn't complete successfully");
            }
        }
    }
    
    In addition to the init() method, many applets will have start() and stop() methods to do any additional computation (such as launching/killing threads) other than responding to GUI clicks.

    To deply an applet, compile the code and package it as a JAR file. Then in your web page you write

    <applet code=AppletClassName.class
            archive="JarFileName.jar"
            width=width height=height>
    </applet>
    

    Final Exam Review

    Review language paradigms
    Know what imperative, functional, declarative, object-oriented, and goal-directed languages are about.
    Lisp
    • What paradigm does Lisp represent?
    • Know what are atoms
    • Give the mathematical definition of Lisp lists
    • What are Lisp predicates?
    • Practice every kind of recursion that you were asked to learn.
    • What's the difference between functions and special forms?
    • What are the most common built-in functions and special forms.
    Flex
    • Know regular expressions
    • What are Flex's rules for deciding which rule to use when they overlap?
    • What is Flex's general syntax?
    • What is the public interface of Flex-generated lexical analyzers to programs such as Bison parsers?
    Bison
    • Know context free grammars, and common special cases.
    • What are Bison's rules for decide which rule to use when they overlap?]
    • what is more powerful about Bison than Flex?
    • What is Bison's public interface from a calling program?
    • What paradigm to flex and bison represent, and how pure an example of that paradigm are they?
    Unicon
    • Know Unicon's general syntax. What does a program look like?
    • What about Unicon is different from Java? Compare its OOP features.
    • Know Unicon's built-in types and rules for type checking.
    • How do strings and lists and tables work?
    • What is goal-directed expression evaluation?
    • Know what generators are, give simple examples.
    Java
    • Know Java's general syntax. What does a program look like?
    • What about Java is different from C++?
    • Know Java's built-in types and rules for type checking.
    • How do you write/create new types in Java?
    • Know basics of I/O, like how to open a named file and read from it.