Given a context free language, convert its grammar to chomsky normal form. Let this grammar contain k nonterminals, including the start symbol.
Select a word in the language of length > n, where n = 2k, and find its derivation. Suppose a nonterminal never expands to form a string containing that nonterminal. Once s is expanded, we never see s again. If s becomes xy, the expansion of x never produces y, and the expansion of y never produces x. If s leads to xx, then x is never seen again. (Remember, each nonterminal becomes a terminal or two nonterminals.)
The depth of this derivation tree cannot exceed k+1, else a nonterminal is repeated. (The last step at level k+1 could turn nonterminals into terminals.) There are 2k possible paths, without repeating a nonterminal. The resulting word has length at most 2k, or n. Since our word is longer, we have reached a contradiction. There are nonterminals that repeat.
Start from the bottom and find a repeated instance of a nonterminal z. The final instance of z expands to a string that I will call w. The penultimate instance of z generates a string vwx. The word is now uvwxy. Expand the final instance of z as we did its predecessor, and the word becomes uvvwxxy. This can continue indefinitely, producing the pumping lemma.
Working backwards through the derivation, z is the first repeated nonterminal, hence the path from z to z has length at most k. Thus vwx has length at most n.
For any context free language, there is an integer n, such that every word longer than n can be written uvwxy, where vwx has length at most n, v or x is nontrivial, and uvjwxjy is in the language. Furthermore, a pumpable vwx can be drawn from any substring of length greater than n.
The language 0n1n2n is also not context free, because we would be forced to pump 0's without 2's or 2's without 0's. Here is a grammar for the language, proving there are context sensitive languages that are not context free.
s | E | 0u12
u1 | 1 | 011v
v1 | 1v
v2 | w22
1w | w1
0w | 0u
Start with s, which produces all n distinguished terminals. If the first step is s → xy, select either x or y, whichever leads to more distinguished terminals. Repeat this process k times and find a path of k+1 nonterminals, whence some nonterminal, call it z, repeats. Find a new word in the language by expanding the second instance of z as the first was expanded. In other words, uvwxy implies uvvwxxy, and at least one of our distinguished letters is contained in w, below z.
Since z is near the top of the derivation, rather than the bottom, it may generate a verry long substring. There is no bound on the length of vwx, as there was in the pumping lemma.