Generating Functions, Context Free Grammars

Context Free Grammars

Given a context free grammar, one can, theoretically, derive a generating function that describes the number of derivations leading to words of length n. However, finding the generating function, and its coefficients, can be challenging, even for modest grammars. Furthermore, if the grammar is ambiguous, multippple derivations will lead to the same word. We really want an unambiguous grammar, so the generating function describes the number of words of length n in our language.

Associate a generating function with each nonterminal, and an equation with each rule. If q is one of the nonterminals, the function q(x) determines the number of words of length n that are generated by q. If s is the start symbol, s(x) provides the number of words of length n in the language. This is what we are looking for.

Let's run through an example. Terminals are capitalized as usual, and E is the empty string.

s → E | p s
p → A s B

The first production, s → E, allows for 1 derivation, giving a word of length 0. This is expressed by the sequence a₀ = 1, or s(x) = 1.

The second production, s → ps, produces the equation s(x) = p(x)s(x). Take a moment to verify this. If a word has length n, p could produce a word of length k, and s must produce a word of length n-k. We are adding up the words of length k times the words of length n-k, as k runs from 0 to n, and this is precisely the n^th coefficient on p(x)s(x).

By induction, the functions associated with terminals and nonterminals can be multiplied together to build the terms of our equations.

Put this all together to write the following equations for s and p.

s = 1 + ps

p = x²s

Substitute the second equation into the first.

s = x²s² + 1

Strings can never have odd length, so replace x² with y, and realize that the n^th coefficient now describes strings of length 2n.

s = ys² + 1

ys² -s +1 = 0

s = ( 1 ± sqrt(1-4y) ) / 2y

The last step is an application of the quadratic formula.

Ok, we have a function, but what are the taylor coefficients? The problem is the square root - how do we expand sqrt(1-4y)?

Use the continuous binomial theorem to perform this expansion. In fact, sqrt(c+x) has already been worked out in detail. Replace c with 1 and x with -4y and get -(2n:n)/(2n-1).

Subtract this from 1 and divide by 2y. Remember that dividing by y shifts the series, so evaluating the formula at n actually produces a_n-1. This shift slides the 1 in our formula off the left end, into a_-1, so we can forget about it. We only need evaluate -sqrt(1-4y)/2y. Plug n+1 into the formula, and a_n looks like this.

a_n = (2n+2:n+1) / 2×(2n+1)

a_n = (2n+2)×(2n+1)×(2n:n)/(n+1)² / 2×(2n+1)

a_n = (2n+2)×(2n:n)/(n+1)² / 2

a_n = (n+1)×(2n:n) / (n+1)²

a_n = (2n:n) / (n+1)

The first few terms, starting at a₀, are as follows.

1, 1, 2, 5, 14, 42, 132, 429, 1430, 4862, 16796, 58786, 208012, …

Parentheses

The above grammar is more than just a contrived example. Replace the terminals A and B with ( and ) respectively, and the grammar cranks out strings of properly nested parentheses. Examples include (), ((())), (()(()))(), and so on. Furthermore, the grammar is unambiguous. You can prove this by induction on the length of the string. Thus each string has only one derivation, and the formula for a_n gives the number of strings of n pairs of parentheses, properly nested.