Lecture 9 (Parse Trees), 2/13/18
Concepts, definitions (in italics):
Example of CFG: simple arithmetic expressions with +, - and *
3 derivations of x - x + x (2 equivalent left+rightmost incorrect, 1 different correct)
Leftmost derivations, rightmost derivations
Parse Trees: internal nodes = variables, leaves = terminals, root = start variable; the fringe of the tree spells out the derived string
Example: 2 parse trees for arithmetic expression x – x + x
For every derivation, there is a corresponding parse tree
Carries the same information, except for the order of substitutions
Relation to lexing (numbers turn into NUM token)
Connection between parse trees and expression evaluation
Parsing: given string s and grammar G, constructing parse tree for s in G
This is bottom-up
Flattening a parse tree into a derivation
A derivation corresponds to a unique parse tree
A parse tree may correspond to multiple derivations, but there is exactly one rightmost, and one leftmost among them.
Applications of parse trees:
Natural Language Processing
A language L0 is context-free (CFL) iff there is some context-free grammar (CFG) G such that L0 = L(G)
Why are these grammars called context-free?
Two grammars are equivalent iff they generate the same language.
Ambiguous grammars – there are two different parse trees, or different leftmost derivations for the same string
The simple CFG for the language of arithmetic expressions is ambiguous; programming languages would use a different CFG for it which is not ambiguous.
Disambiguation: given an ambiguous CFG, finding an equivalent grammar that's not ambiguous
Example: Unambiguous grammar for arithmetic expressions with + and - (which only creates correct parse trees)
Inherently ambiguous CFL: all grammars for it are ambiguous
Example: an bn cm U an bm cm
If two grammars share variables, we can rename the variables for one or both of the grammars so the variables are disjoint.
Do 3 constructions
Given CFGs G1 and G2, we can construct a CFG G0 such that L(G0) = L(G1) o L(G2)
Given CFGs G1 and G2, we can construct a CFG G0 such that L(G0) = L(G1) U L(G2)
Given CFGs G1 and G2, we can construct a CFG G0 such that L(G0) = L(G1)*
Each of the constructions involves making the variables disjoint, combining together existing grammars, then creating a new start state and adding one or two new rules involving this state and the original start states of G1 and G2.
Theorem: The class of Regular Languages is a proper (or strict) subset of the class of Context Free Languages
Part 1 (every RL is CF): Proved by recursion (induction) that all regular languages are context free, using the above three constructions.
Part 2 (some CFLs are not regular): Proved by finding a CFG whose language is not regular (did in class last time)
Chapter 2.1 up to end of p. 108 (“chomsky normal form”)
Know new definitions, understand new concepts.
Given a CFG G and a derivation in G for some string w, construct a parse tree that corresponds to this derivation
Given a CFG G and a parse tree in G for some string w, construct a leftmost/rightmost deriv that corresponds to this parse tree
Given a string w and a CFG G, create a derivation/parse tree for w in G
Given a description of a language, show that it is context-free (i.e. create a CFG for it).
Show that a given grammar G is ambiguous (i.e. find a string in L(G) with more than one parse tree).
Given a RE, create an equivalent CFG (including homework problem)