Derivations and similarity
Let G be a CFG. A string w є L(G) may have many derivations,
corresponding to how we choose the rules to apply and how we choose
which symbol to expand.
- At a given step we may be able to expand two or more non-terminal symbols.
The order in which we expand the non-terminal symbols will determine the derivation,
i.e. different orders will result in different derivations, even if we choose same rules.
- At a given step for a given non-terminal symbol there may be several rules to choose.
Choosing different rules will result in different derivations.
Some derivations are inherently different,
others are not. To distinguish between these two cases,
we will define a precedence relation on the set of the derivations
of a given string, and similarity between derivations.
Definition: Let D and D’ be two derivations of some string in L(G)
D = x1 => x2 => … => xn
D’ = x’1 => x’2 => … => x’n
where xi, x’i є V* for i = 1, 2, …, n
x1, x’1 є V - ∑, xn, x’n є ∑*
We say that D precedes D’, written as D < D’, if N > 2 and there is an integer 1 < k < n,
such that:
a) xi = x’i for i ¹ k
b) xk-1 = x’k-1 = uAvBw, where u, v, w є V*, A, B є V - ∑
c) xk = uyvBw, where A → y є R
d) x’k = uAvzw, where B → z є R
e) xk+1 = x’k+1 = uyvzw
The difference is in the order of expanding non-terminal symbols.
Definition: two derivations are similar if one of them precedes the other.
The similarity relation is reflexive, symmetric and transitive, therefore it defines classes of equivalence.
Each derivation can be depicted using a derivation tree , also called a parse tree.
Example:
Consider the string (( )( )) and the grammar with rules given below:
Rule1: S → SS
Rule2: S → (S)
Rule3: S → ( )
One possible derivation is:
S => (S) by Rule1
=> (SS) by Rule2
=> (( )S) by Rule3
=> (( )( )) by Rule3
The process of derivation is depicted in the following parse tree
Each non-terminal symbol is expanded by applying a grammar rule that contains the symbol in its left- hand side. Its children are the symbols in the right-hand side of the rule.
Note: The order of applying the rules depends on the symbols to be expanded. At each tree level we may apply several different rules corresponding to the nodes to be expanded.
Derivations within the same class of similarity yield the same parse tree.
Within a given class, we have a derivation that is not preceded by any other derivation.
This is the left-most derivation.
There is a derivation that precedes all other derivations in a given class.
This is the right-most derivation.
Thus we can use either left-most or right-most derivation when building the parse tree.
Parse Trees
A parse tree for a string in L(G) is a tree where
- the root is the start symbol for G
- the interior nodes are the nonterminals of G
- the leaf nodes are the terminal symbols of G.
- the children of a node T (from left to right) correspond to the symbols on the
right hand side of some production for T in G.
Every terminal string generated by a grammar has a corresponding parse tree;
every valid parse tree represents a string generated by the grammar
(called the yield of the parse tree).
Example: Given the grammar G = (V, ∑, R, E), where
V = { E, D, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, +, -, *, /, (, ) },
∑ = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, +, -, *, /, (, ) },
and R contains the following rules:
1. E → D
2. E → ( E )
3. E → E + E
4. E → E - E
5. E → E * E
6. E → E / E
7. D → 0 | 1 | 2 | ... 9
find a parse tree for the string 1 + 2 * 3:
The parse tree is:
E
/ | \
/ | \
/ | \
/ | \
/ | \
E | \
/|\ | \
/ | \ | \
E | E | E
| | | | |
D | D | D
| | | | |
1 + 2 * 3
(from
http://www.cs.rochester.edu/u/nelson/courses/csc_173/grammars/parsetrees.html)
Ambiguity
Definition: A grammar G is ambiguous if there is a string є L(G)
with two different left-most (right-most) derivations.
Thus, if the grammar yields more than one parse tree for a given string,
the grammar is called ambiguous, and the corresponding language - ambiguous language.
Example:
Let S denote a statement, Exp - an expression.
Consider the following rules
for if statements (the words if, then, else are terminal symbols):
Rule1: If_statement ® if Exp then S else S
Rule2: If_statement ® if Exp then S
Consider now the statement
if a < b then if c < y then write(yes) else write(no);
Following the grammar rules above, there are two possible interpretations:
If a < b then
if c < y then write(yes)
else write(no);
If a < b then if c < y then write(yes)
else write(no);
Here is the parse tree for the first interpretation:
Here is the parse tree for the second interpretation:
The problem can be solved semantically by having a rule that states the
ELSE belongs to the last preceding IF for which there is not already
an associated ELSE.
We introduce a new non-terminal symbol S_withElse to stand for the “then”
part of an “if” statement that has an “else” part:
Rule1: IF_statement ® if Exp then S_withElse else S
Rule2: IF_statement ® if Exp then S
Rule3: S_withElse ® if Exp then S_withElse else S_withElse
Rule4: S_withElse ® S
The resulting grammar is unambiguous.