Turing Machines, The Connection to Grammars

The Connection to Grammars

Languages are defined by machines that accept them, and by grammars that generate them. Sometimes we can draw a connection between these two worlds. State machines recognize the languages produced by type 3 (regular) grammars, and nondeterministic push down machines recognize the languages produced by type 2 (context free) grammars. Take a moment to review the grammar types.

In this section we will complete the correspondence. nondeterministic bounded turing machines accept the languages of type 1 grammars, and unconstrained turing machines accept the languages of type 0 grammars.

Given a grammar and a candidate word, apply all productions to the start symbol and see if the word emerges. Failing this, consider all 2-step derivations, i.e. the application of two productions in sequence, and see if the word appears. Failing this, consider all 3-step derivations, and so on. This is a straightforward algorithm that is within the capabilities of a turing machine. Therefore a tm can recognize the language generated by any grammar.

If the grammar is context sensitive (type 1), the productions never make the intermediate strings shorter. If the candidate word w has length l, and there are k symbols in the grammar, there are at most k + k² + k³ + … k^l strings that can participate in the derivation of w. Call this number n. Each string can be assigned an index, the shortest derivation that will produce that string. For instance, the start symbol has index 0, requiring no productions whatsoever. No string can have index beyond n, because an intermediate string v is repeated somehwhere in the derivation, and we can excise the productions that carry v back to itself, giving a shorter index. Thus, if w is in the language, it has an index bounded by n. Apply all possible derivations of length n and see if this grammar generates w. After a finite (albeit large) number of operations the tm halts, and indicates success or failure. Thus the language is recursive.

The simplest implementation is nondeterministic. Copy the input word, apply n productions, compare against the original, and halt (with success) on a match. If the word is not generated, fail. The counter used to go from 1 to n takes up space proportional to the input word. The overhead is also linear, hence the tm that recognizes this language is bounded; in other words, an nbtm. Note that a btm is not likely to work, because, if there are p productions, we have to cycle through all pⁿ derivations, and that's a really big stack!

Before we map machines back to grammars, here's a quick grammar to generate a string of terminals, in this case the binary digits 0 and 1, then make a copy. The symbol # separates the two copies.

s → ^q₀w#$
w → w0 | w1 | c
c0 → 0cbd
c1 → 1cbe
c# → #h
d0 → 0d
d1 → 1d
e0 → 0e
e1 → 1e
d# → #d
e# → #e
d$ → f0$
e$ → f1$
0f → f0
1f → f1
#f → f#
cbf → c
h0 → 0h
h1 → 1h
h$ →

Given a tm, use the above grammar to generate two copies of an input word, separated by #, with a beginning marker ^ and the start state q₀ at the left. Include a production for each state transition, so q₃0 might become 1 q₇ (moving right), and 0q₅1 might become q₈00 (moving left), and so on. Additional productions on q_i# allow the head to venture into blank territory. If a final state q_f is reached, the productions: q_f → z, 0z | 1z | z0 | z1 → z, and ^z# → E, eat everything up to #, leaving only the input word as a string of terminals. To get rid of # you have to eat the left side, hence you have to reach a final state, which you can only do by applying transitions to the input word. The grammar generates the language that the tm accepts.

Next, build a context sensitive grammar for the language of an arbitrary btm. This involves overloaded nonterminals, which hold terminals and state information. For illustration, the terminals are once again 0 and 1.

Create a nonterminal for [00|01|10|11] cross every state in the btm, cross any other markers we may need. If q₀ is the start state, let s generate (^q₀00)w | (q₀11)w, and let w generate (00)w | (11)w | E. (The items in parentheses represent single nonterminals.) This produces two interleaved copies of the input word, with q₀ at the left. Transition productions are included as before, but they only modify the first binary digit in each nonterminal, leaving the second digit alone. If the btm is nondeterministic, convert all possible state transitions to productions. (Grammars are inherently nondeterministic.) There are no q_i# productions, which would enable the string to get longer. A final state q_f changes to y, which moves left, eating away the first digit and leaving only the second. At the left edge, the nonterminal has a "beginning of string" marker built in. This combines with y to produce z. Now z moves right, eating the first binary digit and leaving the second. When z encounters and "end of string" marker they vanish together, leaving a terminal string. This is a context sensitive grammar, and it generates only those words that are accepted by the btm.

Each grammar type has now been associated with a machine that accepts the corresponding language. The two characterizations, grammar or machine, are equivalent.

Grammar Type	Language	Machine
Type 0	RE	Turing Machine
Type 0.5	Recursive	Generalized Bounded Turing Machine
Type 1	Context Sensitive	Nondeterministic Bounded Turing Machine
Type 2	Context Free	Nondeterministic Push Down Machine
Type 2.5	???	Push Down Machine
Type 3	Regular	Finite State Machine