Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
SCC 312 Compilation
A Reminder
• LL(1) :
• Left-to-right token processing
• Leftmost derivation
• Top-down parsing
• One lookahead token
Aims
• What is a recursive descent parser?
• How does it work? Why does it work?
• How to process a non-terminal
• Illustrated by a small worked example
• What kind of error reporting and recovery is possible?
Recursive Descent and LL(1)
Grammars
•We start with a top-down strategy for writing
parsers called recursive descent
•Also known as predictive recursive descent
•It is useful particularly for hand-generation of
a compiler from a grammar
Recursive Descent Parsers
• The parser is going to consist of a collection of
methods
• One for each non-terminal of the grammar
• Named after that non-terminal
• With its method body derived semi-automatically
from the grammar rule for that non-terminal
Motivation
• Consider the method
• It is called when the next token is the word “if”
• We are at the start of a sequence of tokens which
represents an if statement
• We want our method to find its way to the end of this
sequence, ready to look at the first token of whatever
follows the if statement
• ::= if then fi
Motivation
• So we want it to find its way over
• The token “if”
• The tokens representing
• The token “then”
• The tokens representing
• Etc.
• ::= if then
fi
Motivation
• This sequence is precisely specified by the
statement> grammar rule
• ::= if then fi
• To find our way over the tokens of ,
we must call the method corresponding to
, etc.
General Overview of the Parser
• The method for a particular non-terminal X is called when
the parser has decided that it wants to recognise an X
starting at this point in the input stream
• The method will “consume” the tokens making up the X, and
leave the parser ready to process the first terminal of the
next non-terminal in the input stream
• As a by-product, the method will also build the
appropriate piece of the parse tree, and whatever else is
required for X
General Overview of the Parser
• The parser has to know at each point what non-terminal it
wants to recognise
• It does this by being allowed to look at the next token in the
input stream
• That is, the first token of the non-terminal it is deciding to
recognise
General Overview of the Parser
• The parser needs to know all the possible first terminals or
tokens for each non-terminal
• To make this work, we see that there are restrictions on how
these sets of terminals overlap
• And these restrictions are the requirements for a grammar
to be LL(1)
Structure of the Parser
• We have a set of methods to recognise the various elements
of the grammar, one for each non-terminal
• There is a variable nextSymbol, which contains the next
token recognised by the lexical analysis phrase
Structure of the Parser
• The method corresponding to non-terminal X expects to find
in nextSymbol one of the tokens listed in FIRST(X)
• It expects to finish by leaving in nextSymbol one of the
tokens which is in FIRST(Y) for some Y which can appear
immediately after X
• That is, it is a token in the set FOLLOW(X), also known as
the Lookahead or Follow Set
The acceptTerminal Method
• We have a method
acceptTerminal (t):
if ( nextSymbol == t )
get next token from lexical
analyser into nextSymbol ;
else
report error ;
't' is the token we
now expect
Non-terminal
• We have one method for each non-terminal
• Suppose we have non-terminal X, and its grammar rule is
::= a …
• … then we have a method named X with body T(a).
• What could a be? In other words, what can appear on
the RHS of a production rule?
• There are four possibilities, shown next.
• ::= a
• a method X with body T(a).
void X ( … )
{
….
….
….
}
T(a)
Possibilities for a
Possibility General Example
1. A single
terminal
::= t ::= return
2. A single non-
terminal
::= ::=
3. A sequence of
terminals and
non-terminals
::= a1 a2 .. aN ::= if
then …
4. A set of
alternatives
::= a1 | a2 | .. | aN ::= |
| …
::= a First Possibility
• A single terminal
• ::= t
• ::= return
• If a is a single terminal t, then T(a) is
• acceptTerminal (t) ;
• if a is “return”, T(a) would be “acceptTerminal (returnSymbol) ;”
::= a Second possibility
• A single non-terminal
• ::=
• ::=
• If a is a single non-terminal NT, then T(a) is
• NT() ;
• That is, a call of the method associated with non-terminal Y
• if a is “ ”
T(a) would be “declarationList () ; ”
::= a Third Possibility
• A sequence of terminals and non-terminals
• ::= a1 a2 .. aN
• If a is a sequence of terminals and non-terminals a1 a2 ... an, then
T(a) is the sequence:
T(a1) ;
T(a2) ;
...
T(an) ;
::= a Third Possibility: example
• ::= if then …
• If a is “if then … ”,
T(a) would be
acceptTerminal (ifSymbol) ;
expression() ;
acceptTerminal (thenSymbol) ;
statement() ;
…
::= a Fourth Possibility
• A set of alternatives
• ::= a1 | a2 | .. | aN
• If a is a set of alternatives a1 | a2 | ... | an, then T(a) is
switch (nextSymbol)
{
case FIRST(a1) :
T(a1) ; break ;
case FIRST(a2) :
T(a2) ; break ;
...
case FIRST(an) :
T(an) ; break ;
} // end of switch
::= a Fourth Possibility
• ::= | | …
• If a is “ | | …”, then T(a) is
switch (nextSymbol)
{
case ifSymbol :
ifStatement() ; break ;
case whileSymbol :
whileStatement() ; break ;
...
} // end of switch
Dealing with Extended BNF
• Some versions of BNF are extended, so that {x} means zero or more
repetitions of x
• Then this would be transformed to:
• while (nextSymbol is in FIRST(x))
{
T(x) ;
} // end of while
• For an example, see “expression” and “term” later.
Dealing with Extended BNF
• Similarly [x] means zero or one occurrence of x in some extensions of
BNF
• Then this would be transformed to:
• if (nextSymbol is in FIRST(x))
{
T(x) ;
} // end of if
The main Method
• The main method, to get everything started, has the body:
• get first token from lexical analyser into nextSymbol ;
() ; (or whatever the distinguished symbol is)
acceptTerminal (eofSymbol) ;
report success ;
• We need to ensure there are no extraneous characters after the valid
string