05-01: Introduction to Parsing
Input: sequence of tokens from lexer; Output: parse tree of the program.
05-02: Context Free Grammars Part 1
- Programming language have recursive structure;
- Contex-free grammar are a natural notation for this recursive structure.
A CFG consists of:
- A set of terminals ();
- A set of non-terminals ();
- A start symbol ;
- A set of productions: .
Example:
Let G be a contex-free grammar with start symbol S. Then the language L(G) of G is:
Terminals ought to be tokens of the language.
05-02: Context Free Grammars Part 2
- Many grammars generate the same language;
- Tools are sensitive to the grammar.
05-03: Derivations Part 1
- A derivation is a sequence of productions;
- A derivation can be drawn as a tree:
- Start symbol is the tree’s root;
- For a production add children to node X.
- A parse tree has:
- Terminals at the leaves;
- Non-terminals at the interior nodes.
- Properties:
- An in-order traversal of the leaves is the original input;
- The parse tree shows the association of operations, the input string does no.
- The left-most derivation:
- At each step, replace the left-most non-terminal.
- The right-most derivation do an opposite algorithm.
05-03: Derivations Part 2
- A derivation defines a parse tree:
- But one parse tree may have many derivations.
05-04: Ambiguity Part 1
A grammar is ambiguous if it has more than one parse tree for some string.
05-04: Ambiguity Part 3
- Impossible to convert automatically an ambiguous grammar to an unambiguous one;
- Used with care, ambiguity can simplify the grammar;
- Most tools allow precedence and associativity declarations to disambiguate grammars.
06-01: Error Handling
- Error handler should:
- Report errors accurately and clearly;
- Recover from an error quickly;
- Not slow down compilation of valid code.
- Panic mode:
- When an error is detected:
- Discard tokens until one with a clear role is found;
- Continue from there. - Looking for synchronizing tokens:
- Typically the statement or expression terminators.
- Error production:
- Specify known common mistakes in the grammar;
- Complicates the grammar.
- Error correction:
- Try token insertions and deletions;
- Exhaustive search.
- Disadvantages:
- Hard to implement;
- Slows down parsing of correct programs;
- “Nearby” program is not necessarily “the intended” program.
06-02: Abstract Syntax Trees
- Abstract syntax trees:
- Like parse trees but ignore some details;
- Abbreviated as AST.
06-03: Recursive Descent Parsing
- The parse tree is constructed:
- From the top;
- From the left to right.
- Terminals are seen in order of appearance in the token stream.
- Match and backtrack on failure.
06-04: Recursive Descent Algorithm
Let the global next point to the next input token.
- Define boolean function that check for a match of:
- A given token terminal:
bool term(TOKEN tok) { return *next++ == tok; }
- The nth production of S:
bool Sn() {...}
- Try all productions of S:
bool S() {...}
- A given token terminal:
- For production :
bool E1() { return T(); }
- For production :
bool E2() { return T() && term(PLUS) && E(); }
- For all productions of E (with backtracking):
bool E() { TOKEN *save = next; return (next = save, E1()) || (next = save, E2()); }
- Functions for non-terminal T:
bool T1() { return term(INT); } bool T2() { return term(INT) && term(TIMES) && T(); } bool T3() { return term(OPEN) && E() && term(CLOSE); } bool T() { TOKEN *save = next; return (next = save, T1()) || (next = save, T2()) || (next = save, T3()); }
- To start the parser:
- Initialize next to point to first token;
- Invoke E().
06-04-1: Recursive Descent Limitations
- Presented recursive descent algorithm is not general;
- Sufficient for grammars where for any non-terminal at most one production can succeed;
- The example grammar can be rewritten to work with the presented algorithm:
- By left factoring.
06-05: Left Recursion Part 1
- A left-recursive grammar has a non-terminal for some α.
- Recursive descent does not work with left-recursive grammars.
- Rewrite using right-recursion: , then eliminates left-recursion.