Languages, Pumping Lemma

Pumping Lemma

If a language is regular, there is an n corresponding to the number of states in the smallest fsm that accepts that language. If a word in the language is longer than n, then a state must repeat as that word is being analyzed. Write the word as uvw, where v begins and ends with the same state. Thus uvvw is also in the language. In general, uviw are all in the language. The substring v is "pumped" repeatedly, hence this is called the pumping lemma.

This acts as a tool for proving languages are, or are not, regular. For instance, the language 0(i2) is not regular, because for any n, we may consider 0(n2). Take any substring v of length n or less. Replace v with vv and the resulting string has length strictly between n2 and (n+1)2. This is not a word in the language. Thus the language, though well defined, is not regular.

The set of primes in base b is also irregular. (think of b as 10, if that makes it easier.) Assume there are n states in the fsm, and choose some prime p > bn. Locate the substring v within p and replace it with p copies of itself. Either u or w must be nonempty, or the p copies of v form a composite number, divisible by v. (If v is 0 and u and w are empty, then uvw = 0, which is not prime. We could have v = 0, or even 0000, but again, u or w is nonempty.) So go ahead, pump away.

Let q be uvpw, which represents a prime number via the pumping lemma. If x is a substring of q, let m(x) be the value of x, in its position. Don't move x, pad it with zeros on either side, and take its value. Thus m(37) in 1374 = 370. Now consider m(q) mod p. Since w is smaller than p, m(w) = w. After pumping, m(u) is multiplied by b to the p-1 power. Yet this is 1 by Fermat's little theorem. Thus m(u) is unchanged. What happens to m(v) when v is replicated p times? If there are 4 digits in v, m(v) is multiplied by a number that looks like 1000100010001…10001, with p ones present. A similar result holds when v has length k; there are k-1 zeros between each pair of ones. This number is bk, raised to the p, minus 1, divided by bk-1. Again, a number to the p power is unchanged mod p, so the numerator equals the denominator, and the entire quotient becomes 1. Hence m(v) is unchanged, even after we replicate v p times. Put this all together and m(p) = m(q). The new prime is larger than p, and divisible by p, which is a contradiction.

What about base 1, the language of 0p for all primes p? Given a value n, n composite numbers follow (n+1)!+1, so the pumping lemma fails here too.