05-01: Introduction to Parsing

Input: sequence of tokens from lexer; Output: parse tree of the program.

05-02: Context Free Grammars Part 1

Programming language have recursive structure;
Contex-free grammar are a natural notation for this recursive structure.

A CFG consists of:

A set of terminals ( $T$ );
A set of non-terminals ( $N$ );
A start symbol $S,\ S ∈ N$ ;
A set of productions: $X \to Y_i ... Y_n,\ X \isin N,\ Y_i \isin N \cup T \cup \{ε\}$ .

Example:

$S \to (S)$ $S \to ε$ $N = \{S\}$ $T = \{(,)\}$

Let G be a contex-free grammar with start symbol S. Then the language L(G) of G is: $\{ a_1 ... a_n\ \vert\ \forall\ i\ a_i \isin T\ \&\ S \to a_1 ... a_n \}$

Terminals ought to be tokens of the language.

05-02: Context Free Grammars Part 2

Many grammars generate the same language;
Tools are sensitive to the grammar.

05-03: Derivations Part 1

A derivation is a sequence of productions;
A derivation can be drawn as a tree:
- Start symbol is the tree’s root;
For a production $X \to Y_1 ... Y_n$ add children $Y_1 ... Y_n$ to node X.
A parse tree has:
- Terminals at the leaves;
- Non-terminals at the interior nodes.
Properties:
- An in-order traversal of the leaves is the original input;
- The parse tree shows the association of operations, the input string does no.
The left-most derivation:
- At each step, replace the left-most non-terminal.
The right-most derivation do an opposite algorithm.

05-03: Derivations Part 2

A derivation defines a parse tree:
- But one parse tree may have many derivations.

05-04: Ambiguity Part 1

A grammar is ambiguous if it has more than one parse tree for some string.

05-04: Ambiguity Part 3

Impossible to convert automatically an ambiguous grammar to an unambiguous one;
Used with care, ambiguity can simplify the grammar;
Most tools allow precedence and associativity declarations to disambiguate grammars.

06-01: Error Handling

Error handler should:
- Report errors accurately and clearly;
- Recover from an error quickly;
- Not slow down compilation of valid code.
Panic mode:
- When an error is detected:
1. Discard tokens until one with a clear role is found;
2. Continue from there. - Looking for synchronizing tokens:
3. Typically the statement or expression terminators.
Error production:
- Specify known common mistakes in the grammar;
- Complicates the grammar.
Error correction:
- Try token insertions and deletions;
- Exhaustive search.
- Disadvantages:
1. Hard to implement;
2. Slows down parsing of correct programs;
3. “Nearby” program is not necessarily “the intended” program.

06-02: Abstract Syntax Trees

Abstract syntax trees:
- Like parse trees but ignore some details;
- Abbreviated as AST.

06-03: Recursive Descent Parsing

The parse tree is constructed:
- From the top;
- From the left to right.
Terminals are seen in order of appearance in the token stream.
Match and backtrack on failure.

06-04: Recursive Descent Algorithm

Let the global next point to the next input token.

Define boolean function that check for a match of:
- A given token terminal:
```
 bool term(TOKEN tok) { return *next++ == tok; }
```
- The nth production of S:
```
 bool Sn() {...}
```
- Try all productions of S:
```
 bool S() {...}
```
For production $E \to T$ :
```
 bool E1() { return T(); }
```

For production

E \to T\ +\ E

 bool E2() { return T() && term(PLUS) && E(); }

For all productions of E (with backtracking):

 bool E() { TOKEN *save = next; return (next = save, E1()) || (next = save, E2()); }

Functions for non-terminal T:

 bool T1() { return term(INT); }
 bool T2() { return term(INT) && term(TIMES) && T(); }
 bool T3() { return term(OPEN) && E() && term(CLOSE); }
 bool T() { TOKEN *save = next; return (next = save, T1()) || (next = save, T2()) || (next = save, T3()); }

To start the parser:
- Initialize next to point to first token;
- Invoke E().

06-04-1: Recursive Descent Limitations

Presented recursive descent algorithm is not general;
Sufficient for grammars where for any non-terminal at most one production can succeed;
The example grammar can be rewritten to work with the presented algorithm:
- By left factoring.

06-05: Left Recursion Part 1

A left-recursive grammar has a non-terminal $S:\ S \to+ Sα$ for some α.
Recursive descent does not work with left-recursive grammars.
Rewrite using right-recursion: $Let\ S \to Sα\ \vert\ β$ , then $S \to βS'S \to αS'\ \vert\ ε$ eliminates left-recursion.