07-01: Predictive Parsing Part 1

  1. Like recursive-descent but parser can “predict” which production to use:
    • By looking at the next few tokens (lookahead);
    • No backtracking.
  2. Predictive parsers accept LL(k) grammars(Left-to-right Left-most derivation k tokens lookahead);
  3. In recursive-descent:
    • At each step, many choices of production to use;
    • Backtracking used to undo bad choices.
  4. In LL(1):
    • At each step, only one choice of production.
  5. But can require left-factoring to eliminate the common prefix of multiple productions for one non terminal.

07-01: Predictive Parsing Part 2

  1. Algorithm to use parsing table:
    • Method similar to recursive descent, except:
      1. For the leftmost non-terminal S;
      2. We look at the next input token a;
      3. And choice the production shown at [S, a].
    • A stack records frontier of parse tree:
      1. Non-terminals that have yet to be expanded;
      2. Terminals that have yet to matched against the input;
      3. Top of stack = leftmost pending terminal or non-terminal.
    • Reject on reaching error state;
    • Accept on end of input & empty stack.
  2. Code:
    initialize stack = < S, $ > and next
    repeat
      case stack of
     < X, rest > :
       If T[X, *next] = Y1...Yn
         then stack <- < Y1...Yn rest >;
       else error();
     < t, rest > :
       If t == *next++
         then stack <- < rest >;
       else error();
    until stack == < >
    

07-02: First Sets

  1. Consider non-terminal A, production AαA \to α, & token t;
  2. T[A, t] = α, in two cases:
    • If αtβα \to\ast tβ:
      1. α can derive a t in the first position;
      2. We say that tFirst(α).t \isin First(α).
    • if AαA \to α and αεα \to\ast ε and SβAtδS \to\ast βAtδ:
      1. Useful if stack has A, input is t, and A cannot derive t;
      2. In this case only option is to get rid of A (by deriving ε):
        • Can work only if t can follow A in at least one derivation.
      3. We say tFollow(A).t \isin Follow(A).
  3. Definition: First(X)=t  X  tαε  X εFirst(X) = {t\ \vert\ X\ \to\ast\ tα} \cup {ε\ \vert\ X\ \to\ast ε}, X - arbitrary string.
  4. Algorithm sketch:
    • First(t) = {t}, t - terminal;
    • εFirst(X)ε \isin First(X):
      1. if XεX \to ε;
      2. if XA1...An and εFirst(Ai) for 1inX \to A_1 ... A_n\ and\ ε \isin First(A_i)\ for\ 1 \leq i \leq n.
    • First(α)First(X) if XA1..Anα and εFirst(Ai) for 1inFirst(α) \subseteq First(X)\ if\ X \to A_1 .. A_{nα}\ and\ ε \isin First(A_i)\ for\ 1 \leq i \leq n.

07-03: Follow Sets

  1. Definition: Follow(X)={t  SβXtδ}Follow(X) = \{t\ \vert\ S \to\ast βXtδ\}.
  2. Intuition:
    • if XABX \to AB then First(B) ⊆ Follow(A) and Follow(X) ⊆ Follow(B);
    • if BεB \to\ast ε then Follow(X)Follow(A)Follow(X) \subseteq Follow(A).
    • if S is the start symbol then $Follow(S)\$ \isin Follow(S).
  3. Algorithm sketch:
    • $Follow(S)\$ \isin Follow(S);
    • First(β){ε}Follow(X)First(β) - \{ε\} \subseteq Follow(X)
      1. For each production AαXβA \to αXβ.
    • Follow(A)Follow(X)Follow(A) \subseteq Follow(X)
      1. For each production AαXβA \to αXβ where εFirst(β)ε \isin First(β).

07-04: LL1 Parsing Tables

  1. Construct a parsing table T for CGG G:
  2. For each production AαA \to α in G do:
    • For each terminal tFirst(α)t \isin First(α) do:
      1. T[A,t]=αT[A, t] = α
    • If εFirst(α)ε \isin First(α), for each tFollow(A)t \isin Follow(A) do:
      1. T[A,t]=αT[A, t] = α
    • If εFirst(α)ε \isin First(α) and $Follow(A)\$ \isin Follow(A) do:
      1. T[A,$]=αT[A, \$] = α
  3. If any entry in the parsing table is multiply defined then G is not LL(1). A grammar is definitely not LL(1) if it is not left factored, is left recursive and ambiguous;
  4. Most programming languages GFGs are not LL(1).

07-05: Bottom-Up Parsing

  1. Bottom-up parsing is more general than (deterministic) top-down parsing
    • And just as efficient;
    • Builds on ideas in top-down parsing.
  2. Bottom-up is the preferred method;
  3. Bottom-up parsers don’t need left-factored grammars;
  4. Bottom up parsing reduces a string to the start symbol by inverting productions;
  5. The productions, read backwards, trace a rightmost derivation;
  6. A bottom-up parser traces a rightmost derivation in reverse.

07-06: Shift-Reduce Parsing

  1. Let αβω be a step of a bottom-up parse;
  2. Assume the next reduction is by XβX \to β;
  3. Then ω is a string of terminals;
  4. Idea: Split string into two substrings:
    • Right substring is as yet unexamined by parsing;
    • Left substring has terminals and non-terminals;
    • The dividing point is marked by a |.
  5. Shift: Move | one place to the right:
    • Shift a terminal to the left string (read a terminal).
  6. Reduce: Apply an inverse production at the right end of the left string:
    • If AxyA \to xy is a production, then Cbxy  ijk  CbA  ijkCbxy\ \vert\ ijk\ \to\ CbA\ \vert\ ijk.
  7. Left string can be implemented by a stack:
    • Top of the stack is the |.
  8. Shift pushes a terminal on the stack;
  9. Reduce:
    • pops symbols off of the stack;
    • pushes a non-terminal on the stack.
  10. In a given state, more than one action (shift or reduce) may lead to a valid parse;
  11. If it is legal to shift or reduce, there is a shift-reduce conflict;
  12. If it legal to reduce by two different productions, there is a reduce-reduce conflict.

08-01: Handles

  1. Intuition: Want to reduce only if the result can still be reduced to the start symbol:
    • Assume a rightmost derivation SαXωαβωS \to\ast αXω \to αβω;
    • Then αβ is a handle of αβω.
  2. Handles formalize the intuition:
    • A handle is a reduction that also allows further reductions back to the start symbol.
  3. We only want to reduce at handles;
  4. In shift-reduce parsing, handles appear only at the top of the stack, never inside;
  5. Handles are never to the left of the rightmost non-terminal:
    • Therefore, shift-reduce moves are sufficient; the | need never move left.
  6. Bottom-up parsing algorithms are based on recognizing handles.

08-02: Recognizing Handles

  1. α is a viable prefix if there is an ω such that αωα \vert ω is a state of a shift-reduce parser:
    • A viable prefix does not extend past the right end of the handle;
    • It’s a viable prefix because it is a prefix of the handle;
    • As long as a parser has viable prefixes on the stack no parsing error has been detected.
  2. For any grammar, the set of viable prefixes is a regular language;
  3. An item is a production with a ”.” somewhere on the rhs:
    • The only item for XεX \to ε is X>.X -> .
    • Items are ofter called LR(0) items.
  4. The stack may have many prefixes of rhs’s: Prefix(1) Prefix(2) … Prefix(n-1) Prefix(n)
  5. Let Prefix(i) be a prefix of rhs of X(i)α(i)X(i) \to α(i):
    • Prefix(i) will eventually reduce to X(i);
    • The missing part of α(i-1) start with X(i);
    • i.e X(i1)Prefix(i1)X(i)βX(i-1) \to Prefix(i-1)X(i)β for some β.
  6. Recursively, Prefix(k+1) … Prefix(n) eventually reduces to the missing part of α(k);
  7. Idea: To recognize viable prefixes, we must:
    • Recognize a sequence of partial rhs’s of productions, where
    • Each partial rhs can eventually reduce to part of the missing suffix of its predecessor.

08-03: Recognizing Viable Prefixes

  1. Add a dummy production SSS' \to S to G;
  2. The NFA states are the items of G:
    • Including the extra production.
  3. For item Eα.XβE \to α.Xβ add transition Eα.XβXEαX.βE \to α.Xβ \to_{X} E \to αX.β;
  4. For item Eα.XβE \to α.Xβ and production XγX \to γ add:
    • E>α.Xβ>εX>.γE -> α.Xβ* ->_{ε} X -> .γ
  5. Every state is an accepting state;
  6. Start state is S.SS' \to .S.

08-04: Valid Items

  1. Item Xβ.γX \to β.γ is valid for a viable prefix αβ if SαXωαβγωS' \to\ast αXω \to αβγω by a right-most derivation;
  2. After parsing αβ, the valid items are the possible tops of the stack of items;
  3. An item I is valid for a viable prefix α if the DFA recognizing viable prefixes terminates on input α in a state s containing I;
  4. The items in s describe what the top of item stack might be after reading input α.

08-05: SLR Parsing

  1. LR(0) Parsing: Assume:
    • stack contains α;
    • next input is t;
    • DFA on input α terminates in state s.
  2. Reduce by X>βX -> β if:
    • s contains item Xβ.X \to β.
  3. Shift if:
    • s contains item Xβ.tωX \to β.tω;
    • equivalent to saying s has a transition labeled t.
  4. LR(0) has a reduce/reduce conflict if:
    • Any state has two reduce items:
    1. Xβ.X \to β. and Xω.X \to ω.
  5. LR(0) has a shift/reduce conflict if:
    • Any state has a reduce item and a shift item:
    1. Xβ.X \to β. and Yω.tδY \to ω.tδ
  6. SLR Idea:
    • Reduce by XβX \to β if:
    1. s contains item Xβ.X \to β.
    2. tFollow(X)t \isin Follow(X)
  7. SLR Parsing Algorithm:
    • Let M be DFA for viable prefix of G;
    • Let  x1...xn$\vert\ x_1 ... x_n\$ be initial configuration;
    • Repeat until configuration is S|$:
    1. Let α  ωα\ \vert\ ω be current configuration;
    2. Run M on current stack α;
    3. if M rejects α, report parsing error:
      • Stack α is not a viable prefix.
    4. if M accepts α with item I, let a be next input:
      • Shift if X  β.aγ  IX\ \to\ β.aγ\ \isin\ I;
      • Reduce if Xβ. and aFollow(X)X \to β.\ and\ a \isin Follow(X);
      • Report parsing error if neither applies.
        • if there is a conflict in the last step, grammar is not SLR(k).

08-07: SLR Improvements

  1. Rerunning the viable prefix automaton on the stack at each step is wasteful;
  2. Remember the state of the automaton on each prefix of the stack;
  3. Change stack to contain pairs <Symbol, DFA state>;
  4. For a stack <sym(1), state(1)> … <sym(n), state(n)> state(n) is the final state of the DFA on sym(1) … sym(n);
  5. Detail: The bottom of the stack is <any, start> where:
    • any is any dummy symbol;
    • start is the start state of the DFA.
  6. Define goto[i, A] = j if state(i)Astate(j)state(i) \to_{A} state(j);
  7. Goto is just the transition function of the DFA:
    • one of two parsing tables.
  8. For each state s(i) and terminal a:
    • If s(i) has item Xα.aβX \to α.aβ and goto[i, a] = j then action[i, a] = shift j;
    • If s(i) has item Xα.X \to α. and aFollow(X)a \isin Follow(X) and XSX \neq S' then action[i,a]=reduceXαaction[i, a] = reduce X \to α;
    • If s(i) has item SS.S' \to S. then action[i, $] = accept;
    • Otherwise, action[i, a] = error.
  9. Algorithm:
    Let I = w$ be initial input
    Let j = 0
    Let DFA state 1 have item S' -> .S
    Let stack = <dummy, 1>
    repeat
     case action[top_state(stack), I[j]] of
       shift k: push <I[j++], k>
       reduce X -> A:
         pop |A| pairs,
         push <X, goto[top_state(stack), X]>
       accept: halt normally
       error: halt and report error
    
  10. Some common constructs are not SLR(1);
  11. LR(1) is more powerful:
    • Build lookahead into the items;
    • [T.intT,$][T \to .int*T,\$] means:
      1. After seeing TintTT \to_{int}\ast T reduce if lookahead is $.
    • More accurate than just using follow sets.