Context Free, Pumping Lemma

Pumping Lemma

Like regular languages, context free languages have pumping properties.

Given a context free language, convert its grammar to chomsky normal form. Let this grammar contain k nonterminals, including the start symbol.

Select a word in the language of length > n, where n = 2k, and find its derivation. Suppose a nonterminal never expands to form a string containing that nonterminal. Once s is expanded, we never see s again. If s becomes xy, the expansion of x never produces y, and the expansion of y never produces x. If s leads to xx, then x is never seen again. (Remember, each nonterminal becomes a terminal or two nonterminals.)

The depth of this derivation tree cannot exceed k+1, else a nonterminal is repeated. (The last step at level k+1 could turn nonterminals into terminals.) There are 2k possible paths, without repeating a nonterminal. The resulting word has length at most 2k, or n. Since our word is longer, we have reached a contradiction. There are nonterminals that repeat.

Start from the bottom and find a repeated instance of a nonterminal z. The final instance of z expands to a string that I will call w. The penultimate instance of z generates a string vwx. The word is now uvwxy. Expand the final instance of z as we did its predecessor, and the word becomes uvvwxxy. This can continue indefinitely, producing the pumping lemma.

Working backwards through the derivation, z is the first repeated nonterminal, hence the path from z to z has length at most k. Thus vwx has length at most n.

For any context free language, there is an integer n, such that every word longer than n can be written uvwxy, where vwx has length at most n, v or x is nontrivial, and uvjwxjy is in the language. Furthermore, a pumpable vwx can be drawn from any substring of length greater than n.

Examples

consider the strings of 0's whose lengths are perfect squares. Select n as above, and adjacent squares will eventually differ by more than n. We cannot copy one or two substrings, whose lengths add up to n or less, and find another word in the language. This is not a context free language.

The language 0n1n2n is also not context free, because we would be forced to pump 0's without 2's or 2's without 0's. Here is a grammar for the language, proving there are context sensitive languages that are not context free.

s | E | 0u12
u1 | 1 | 011v
v1 | 1v
v2 | w22
1w | w1
0w | 0u

Ogden's Lemma

This is a variation on the pumping lemma, described above. Recall that the grammar has k nonterminals, and a word longer than n, where n = 2k, can be pumped to create more words in the language. In this theorem we will designate n of the letters in our word as "distinguished". At least one of these letters sits between pumpable regions.

Start with s, which produces all n distinguished terminals. If the first step is s → xy, select either x or y, whichever leads to more distinguished terminals. Repeat this process k times and find a path of k+1 nonterminals, whence some nonterminal, call it z, repeats. Find a new word in the language by expanding the second instance of z as the first was expanded. In other words, uvwxy implies uvvwxxy, and at least one of our distinguished letters is contained in w, below z.

Since z is near the top of the derivation, rather than the bottom, it may generate a verry long substring. There is no bound on the length of vwx, as there was in the pumping lemma.