05-01: Introduction to Parsing

Input: sequence of tokens from lexer; Output: parse tree of the program.

05-02: Context Free Grammars Part 1

A CFG consists of:

Example:

S(S)S \to (S) SεS \to ε N={S}N = \{S\} T={(,)}T = \{(,)\}

Let G be a contex-free grammar with start symbol S. Then the language L(G) of G is: {a1...an   i aiT & Sa1...an}\{ a_1 ... a_n\ \vert\ \forall\ i\ a_i \isin T\ \&\ S \to a_1 ... a_n \}

Terminals ought to be tokens of the language.

05-02: Context Free Grammars Part 2

05-03: Derivations Part 1

  1. A derivation is a sequence of productions;
  2. A derivation can be drawn as a tree:
    • Start symbol is the tree’s root;
  3. For a production XY1...YnX \to Y_1 ... Y_n add children Y1...YnY_1 ... Y_n to node X.
  4. A parse tree has:
    • Terminals at the leaves;
    • Non-terminals at the interior nodes.
  5. Properties:
    • An in-order traversal of the leaves is the original input;
    • The parse tree shows the association of operations, the input string does no.
  6. The left-most derivation:
    • At each step, replace the left-most non-terminal.
  7. The right-most derivation do an opposite algorithm.

05-03: Derivations Part 2

  1. A derivation defines a parse tree:
    • But one parse tree may have many derivations.

05-04: Ambiguity Part 1

A grammar is ambiguous if it has more than one parse tree for some string.

05-04: Ambiguity Part 3

06-01: Error Handling

  1. Error handler should:
    • Report errors accurately and clearly;
    • Recover from an error quickly;
    • Not slow down compilation of valid code.
  2. Panic mode:
    • When an error is detected:
    1. Discard tokens until one with a clear role is found;
    2. Continue from there. - Looking for synchronizing tokens:
    3. Typically the statement or expression terminators.
  3. Error production:
    • Specify known common mistakes in the grammar;
    • Complicates the grammar.
  4. Error correction:
    • Try token insertions and deletions;
    • Exhaustive search.
    • Disadvantages:
    1. Hard to implement;
    2. Slows down parsing of correct programs;
    3. “Nearby” program is not necessarily “the intended” program.

06-02: Abstract Syntax Trees

  1. Abstract syntax trees:
    • Like parse trees but ignore some details;
    • Abbreviated as AST.

06-03: Recursive Descent Parsing

  1. The parse tree is constructed:
    • From the top;
    • From the left to right.
  2. Terminals are seen in order of appearance in the token stream.
  3. Match and backtrack on failure.

06-04: Recursive Descent Algorithm

Let the global next point to the next input token.

  1. Define boolean function that check for a match of:
    • A given token terminal:
       bool term(TOKEN tok) { return *next++ == tok; }
      
    • The nth production of S:
       bool Sn() {...}
      
    • Try all productions of S:
       bool S() {...}
      
  2. For production ETE \to T:
     bool E1() { return T(); }
    
  3. For production ET + EE \to T\ +\ E:
     bool E2() { return T() && term(PLUS) && E(); }
    
  4. For all productions of E (with backtracking):
     bool E() { TOKEN *save = next; return (next = save, E1()) || (next = save, E2()); }
    
  5. Functions for non-terminal T:
     bool T1() { return term(INT); }
     bool T2() { return term(INT) && term(TIMES) && T(); }
     bool T3() { return term(OPEN) && E() && term(CLOSE); }
     bool T() { TOKEN *save = next; return (next = save, T1()) || (next = save, T2()) || (next = save, T3()); }
    
  6. To start the parser:
    • Initialize next to point to first token;
    • Invoke E().

06-04-1: Recursive Descent Limitations

  1. Presented recursive descent algorithm is not general;
  2. Sufficient for grammars where for any non-terminal at most one production can succeed;
  3. The example grammar can be rewritten to work with the presented algorithm:
    • By left factoring.

06-05: Left Recursion Part 1