Context Free, Canonical, Ambiguous

Canonical Form, Ambiguous

A word's derivation is in left canonical form if the intermediate steps are in sorted order, so that two rules that replace disjoint substrings are always applied in order, substituting the left most substring first. Right canonical derivations are possible as well.

The simplest example of a noncanonical derivation comes from the following grammar

s → uv
u → 0
v → 1

Start with s and apply the rules in the order given above. Since u is replaced first, then v, the derivation is left canonical, and not right canonical. If we replace v → 1, and then u → 0, the derivation becomes right canonical, and not left canonical.

A grammar is ambiguous if a word has two different left canonical derivations. A language is ambiguous if every grammar that defines that language is ambiguous.

In the above example there is but one left canonical derivation for the string 01, which happens to be the only word in the language. There is but one canonical derivation for every word in the language, the grammar is not ambiguous, and the language is not ambiguous.

Since each intermediate string produced by a regular grammar has precisely one nonterminal, all regular grammars are unambiguous, and all regular languages are unambiguous.

Many computer languages have brief moments of ambiguity. Consider the following code fragment.

if(this) if(that) foo; else bar;

A compiler uses the following two rules to "parse" this piece of code.

statement → if ( expression ) statement
statement → if ( expression ) statement else statement

So we have a choice; the else clause can be applied to the first if statement or the second. Either derivation is in left canonical form, so the grammar is ambiguous. In practice, compilers add special code to break the ambiguity, hence the fragment is interpreted this way.

if(this) { if(that) foo; else bar; }

But is the computer language ambiguous, or did we just select a bad grammar? Perhaps some other grammar would associate the else clause with the second if statement automatically. I believe the language itself is ambiguous, but I will not attempt to prove it here.

There are languages, somewhat contrived, that are known to be ambiguous. Unfortunately the proofs are usually long and tedious. If you would like to follow up on this, let me get you started. The language anbncmdm ∪ anbmcmdn is context free, and ambiguous. It's easy to create a context free grammar for this language; I'll leave that to you. The ambiguity arises from a word of the form anbncndn. (You may need to choose n larger than the number of rules in the grammar, or something like that.) Now the grammar, which claims to be unambiguous, must travel down one path or the other, producing words from the left side of the union or the right. Yet the grammar could take either path when m = n, giving two canonical derivations for this word.

More complex languages admit even more ambiguity. English has plenty to go around.

A nurse examined every patient, and Lucy too.

Lucy could be a doctor on staff, who also examined every patient, or Lucy could have received an exam from the nurse, even though she is not a patient at the present time. Unless you know Lucy, there is no way to resolve the ambiguity. But we usually do know Lucy, i.e. we understand the context of the conversation, and ambiguities are resolved in a flash, before we are even aware of them.