Generating Functions, Context Free Grammars

Context Free Grammars

Given a context free grammar, one can, theoretically, derive a generating function that describes the number of derivations leading to words of length n. However, finding the generating function, and its coefficients, can be challenging, even for modest grammars. Furthermore, if the grammar is ambiguous, multippple derivations will lead to the same word. We really want an unambiguous grammar, so the generating function describes the number of words of length n in our language.

Associate a generating function with each nonterminal, and an equation with each rule. If q is one of the nonterminals, the function q(x) determines the number of words of length n that are generated by q. If s is the start symbol, s(x) provides the number of words of length n in the language. This is what we are looking for.

Let's run through an example. Terminals are capitalized as usual, and E is the empty string.

s → E | p s
p → A s B

The first production, s → E, allows for 1 derivation, giving a word of length 0. This is expressed by the sequence a0 = 1, or s(x) = 1.

The second production, s → ps, produces the equation s(x) = p(x)s(x). Take a moment to verify this. If a word has length n, p could produce a word of length k, and s must produce a word of length n-k. We are adding up the words of length k times the words of length n-k, as k runs from 0 to n, and this is precisely the nth coefficient on p(x)s(x).

By induction, the functions associated with terminals and nonterminals can be multiplied together to build the terms of our equations.

Put this all together to write the following equations for s and p.

s = 1 + ps

p = x2s

Substitute the second equation into the first.

s = x2s2 + 1

Strings can never have odd length, so replace x2 with y, and realize that the nth coefficient now describes strings of length 2n.

s = ys2 + 1

ys2 -s +1 = 0

s = ( 1 ± sqrt(1-4y) ) / 2y

The last step is an application of the quadratic formula.

Ok, we have a function, but what are the taylor coefficients? The problem is the square root - how do we expand sqrt(1-4y)?

Use the continuous binomial theorem to perform this expansion. In fact, sqrt(c+x) has already been worked out in detail. Replace c with 1 and x with -4y and get -(2n:n)/(2n-1).

Subtract this from 1 and divide by 2y. Remember that dividing by y shifts the series, so evaluating the formula at n actually produces an-1. This shift slides the 1 in our formula off the left end, into a-1, so we can forget about it. We only need evaluate -sqrt(1-4y)/2y. Plug n+1 into the formula, and an looks like this.

an = (2n+2:n+1) / 2×(2n+1)

an = (2n+2)×(2n+1)×(2n:n)/(n+1)2 / 2×(2n+1)

an = (2n+2)×(2n:n)/(n+1)2 / 2

an = (n+1)×(2n:n) / (n+1)2

an = (2n:n) / (n+1)

The first few terms, starting at a0, are as follows.

1, 1, 2, 5, 14, 42, 132, 429, 1430, 4862, 16796, 58786, 208012, …

Parentheses

The above grammar is more than just a contrived example. Replace the terminals A and B with ( and ) respectively, and the grammar cranks out strings of properly nested parentheses. Examples include (), ((())), (()(()))(), and so on. Furthermore, the grammar is unambiguous. You can prove this by induction on the length of the string. Thus each string has only one derivation, and the formula for an gives the number of strings of n pairs of parentheses, properly nested.