07-01: Predictive Parsing Part 1
- Like recursive-descent but parser can “predict” which production to use:
- By looking at the next few tokens (lookahead);
- No backtracking.
- Predictive parsers accept LL(k) grammars(Left-to-right Left-most derivation k tokens lookahead);
- In recursive-descent:
- At each step, many choices of production to use;
- Backtracking used to undo bad choices.
- In LL(1):
- At each step, only one choice of production.
- But can require left-factoring to eliminate the common prefix of multiple productions for one non terminal.
07-01: Predictive Parsing Part 2
- Algorithm to use parsing table:
- Method similar to recursive descent, except:
- For the leftmost non-terminal S;
- We look at the next input token a;
- And choice the production shown at [S, a].
- A stack records frontier of parse tree:
- Non-terminals that have yet to be expanded;
- Terminals that have yet to matched against the input;
- Top of stack = leftmost pending terminal or non-terminal.
- Reject on reaching error state;
- Accept on end of input & empty stack.
- Method similar to recursive descent, except:
- Code:
initialize stack = < S, $ > and next repeat case stack of < X, rest > : If T[X, *next] = Y1...Yn then stack <- < Y1...Yn rest >; else error(); < t, rest > : If t == *next++ then stack <- < rest >; else error(); until stack == < >
07-02: First Sets
- Consider non-terminal A, production , & token t;
- T[A, t] = α, in two cases:
- If :
- α can derive a t in the first position;
- We say that
- if and and :
- Useful if stack has A, input is t, and A cannot derive t;
- In this case only option is to get rid of A (by deriving ε):
- Can work only if t can follow A in at least one derivation.
- We say
- If :
- Definition: , X - arbitrary string.
- Algorithm sketch:
- First(t) = {t}, t - terminal;
- :
- if ;
- if .
- .
07-03: Follow Sets
- Definition: .
- Intuition:
- if then First(B) ⊆ Follow(A) and Follow(X) ⊆ Follow(B);
- if then .
- if S is the start symbol then .
- Algorithm sketch:
- ;
-
- For each production .
-
- For each production where .
07-04: LL1 Parsing Tables
- Construct a parsing table T for CGG G:
- For each production in G do:
- For each terminal do:
- If , for each do:
- If and do:
- For each terminal do:
- If any entry in the parsing table is multiply defined then G is not LL(1). A grammar is definitely not LL(1) if it is not left factored, is left recursive and ambiguous;
- Most programming languages GFGs are not LL(1).
07-05: Bottom-Up Parsing
- Bottom-up parsing is more general than (deterministic) top-down parsing
- And just as efficient;
- Builds on ideas in top-down parsing.
- Bottom-up is the preferred method;
- Bottom-up parsers don’t need left-factored grammars;
- Bottom up parsing reduces a string to the start symbol by inverting productions;
- The productions, read backwards, trace a rightmost derivation;
- A bottom-up parser traces a rightmost derivation in reverse.
07-06: Shift-Reduce Parsing
- Let αβω be a step of a bottom-up parse;
- Assume the next reduction is by ;
- Then ω is a string of terminals;
- Idea: Split string into two substrings:
- Right substring is as yet unexamined by parsing;
- Left substring has terminals and non-terminals;
- The dividing point is marked by a |.
- Shift: Move | one place to the right:
- Shift a terminal to the left string (read a terminal).
- Reduce: Apply an inverse production at the right end of the left string:
- If is a production, then .
- Left string can be implemented by a stack:
- Top of the stack is the |.
- Shift pushes a terminal on the stack;
- Reduce:
- pops symbols off of the stack;
- pushes a non-terminal on the stack.
- In a given state, more than one action (shift or reduce) may lead to a valid parse;
- If it is legal to shift or reduce, there is a shift-reduce conflict;
- If it legal to reduce by two different productions, there is a reduce-reduce conflict.
08-01: Handles
- Intuition: Want to reduce only if the result can still be reduced to the start symbol:
- Assume a rightmost derivation ;
- Then αβ is a handle of αβω.
- Handles formalize the intuition:
- A handle is a reduction that also allows further reductions back to the start symbol.
- We only want to reduce at handles;
- In shift-reduce parsing, handles appear only at the top of the stack, never inside;
- Handles are never to the left of the rightmost non-terminal:
- Therefore, shift-reduce moves are sufficient; the | need never move left.
- Bottom-up parsing algorithms are based on recognizing handles.
08-02: Recognizing Handles
- α is a viable prefix if there is an ω such that is a state of a shift-reduce parser:
- A viable prefix does not extend past the right end of the handle;
- It’s a viable prefix because it is a prefix of the handle;
- As long as a parser has viable prefixes on the stack no parsing error has been detected.
- For any grammar, the set of viable prefixes is a regular language;
- An item is a production with a ”.” somewhere on the rhs:
- The only item for is
- Items are ofter called LR(0) items.
- The stack may have many prefixes of rhs’s: Prefix(1) Prefix(2) … Prefix(n-1) Prefix(n)
- Let Prefix(i) be a prefix of rhs of :
- Prefix(i) will eventually reduce to X(i);
- The missing part of α(i-1) start with X(i);
- i.e for some β.
- Recursively, Prefix(k+1) … Prefix(n) eventually reduces to the missing part of α(k);
- Idea: To recognize viable prefixes, we must:
- Recognize a sequence of partial rhs’s of productions, where
- Each partial rhs can eventually reduce to part of the missing suffix of its predecessor.
08-03: Recognizing Viable Prefixes
- Add a dummy production to G;
- The NFA states are the items of G:
- Including the extra production.
- For item add transition ;
- For item and production add:
- Every state is an accepting state;
- Start state is .
08-04: Valid Items
- Item is valid for a viable prefix αβ if by a right-most derivation;
- After parsing αβ, the valid items are the possible tops of the stack of items;
- An item I is valid for a viable prefix α if the DFA recognizing viable prefixes terminates on input α in a state s containing I;
- The items in s describe what the top of item stack might be after reading input α.
08-05: SLR Parsing
- LR(0) Parsing: Assume:
- stack contains α;
- next input is t;
- DFA on input α terminates in state s.
- Reduce by if:
- s contains item
- Shift if:
- s contains item ;
- equivalent to saying s has a transition labeled t.
- LR(0) has a reduce/reduce conflict if:
- Any state has two reduce items:
- and
- LR(0) has a shift/reduce conflict if:
- Any state has a reduce item and a shift item:
- and
- SLR Idea:
- Reduce by if:
- s contains item
- SLR Parsing Algorithm:
- Let M be DFA for viable prefix of G;
- Let be initial configuration;
- Repeat until configuration is S|$:
- Let be current configuration;
- Run M on current stack α;
- if M rejects α, report parsing error:
- Stack α is not a viable prefix.
- if M accepts α with item I, let a be next input:
- Shift if ;
- Reduce if ;
- Report parsing error if neither applies.
- if there is a conflict in the last step, grammar is not SLR(k).
08-07: SLR Improvements
- Rerunning the viable prefix automaton on the stack at each step is wasteful;
- Remember the state of the automaton on each prefix of the stack;
- Change stack to contain pairs <Symbol, DFA state>;
- For a stack <sym(1), state(1)> … <sym(n), state(n)> state(n) is the final state of the DFA on sym(1) … sym(n);
- Detail: The bottom of the stack is <any, start> where:
- any is any dummy symbol;
- start is the start state of the DFA.
- Define goto[i, A] = j if ;
- Goto is just the transition function of the DFA:
- one of two parsing tables.
- For each state s(i) and terminal a:
- If s(i) has item and goto[i, a] = j then action[i, a] = shift j;
- If s(i) has item and and then ;
- If s(i) has item then action[i, $] = accept;
- Otherwise, action[i, a] = error.
- Algorithm:
Let I = w$ be initial input Let j = 0 Let DFA state 1 have item S' -> .S Let stack = <dummy, 1> repeat case action[top_state(stack), I[j]] of shift k: push <I[j++], k> reduce X -> A: pop |A| pairs, push <X, goto[top_state(stack), X]> accept: halt normally error: halt and report error
- Some common constructs are not SLR(1);
- LR(1) is more powerful:
- Build lookahead into the items;
- means:
- After seeing reduce if lookahead is $.
- More accurate than just using follow sets.