Turing Machines, The Connection to Grammars

The Connection to Grammars

Languages are defined by machines that accept them, and by grammars that generate them.  Sometimes we can draw a connection between these two worlds.  State machines recognize the languages produced by type 3 (regular) grammars, and nondeterministic push down machines recognize the languages produced by type 2 (context free) grammars.  Take a moment to review the grammar types.

In this section we will complete the correspondence.  nondeterministic bounded turing machines accept the languages of type 1 grammars, and unconstrained turing machines accept the languages of type 0 grammars.

Given a grammar and a candidate word, apply all productions to the start symbol and see if the word emerges.  Failing this, consider all 2-step derivations, i.e. the application of two productions in sequence, and see if the word appears.  Failing this, consider all 3-step derivations, and so on.  This is a straightforward algorithm that is within the capabilities of a turing machine.  Therefore a tm can recognize the language generated by any grammar.

If the grammar is context sensitive (type 1), the productions never make the intermediate strings shorter.  If the candidate word w has length l, and there are k symbols in the grammar, there are at most k + k2 + k3 + … kl strings that can participate in the derivation of w.  Call this number n.  Each string can be assigned an index, the shortest derivation that will produce that string.  For instance, the start symbol has index 0, requiring no productions whatsoever.  No string can have index beyond n, because an intermediate string v is repeated somehwhere in the derivation, and we can excise the productions that carry v back to itself, giving a shorter index.  Thus, if w is in the language, it has an index bounded by n.  Apply all possible derivations of length n and see if this grammar generates w.  After a finite (albeit large) number of operations the tm halts, and indicates success or failure.  Thus the language is recursive.

The simplest implementation is nondeterministic.  Copy the input word, apply n productions, compare against the original, and halt (with success) on a match.  If the word is not generated, fail.  The counter used to go from 1 to n takes up space proportional to the input word.  The overhead is also linear, hence the tm that recognizes this language is bounded; in other words, an nbtm.  Note that a btm is not likely to work, because, if there are p productions, we have to cycle through all pn derivations, and that's a really big stack!

Before we map machines back to grammars, here's a quick grammar to generate a string of terminals, in this case the binary digits 0 and 1, then make a copy.  The symbol # separates the two copies.

s → ^q0w#$
w → w0 | w1 | c
c0 → 0cbd
c1 → 1cbe
c# → #h
d0 → 0d
d1 → 1d
e0 → 0e
e1 → 1e
d# → #d
e# → #e
d$ → f0$
e$ → f1$
0f → f0
1f → f1
#f → f#
cbf → c
h0 → 0h
h1 → 1h
h$ →

Given a tm, use the above grammar to generate two copies of an input word, separated by #, with a beginning marker ^ and the start state q0 at the left.  Include a production for each state transition, so q30 might become 1 q7 (moving right), and 0q51 might become q800 (moving left), and so on.  Additional productions on qi# allow the head to venture into blank territory.  If a final state qf is reached, the productions: qf → z, 0z | 1z | z0 | z1 → z, and ^z# → E, eat everything up to #, leaving only the input word as a string of terminals.  To get rid of # you have to eat the left side, hence you have to reach a final state, which you can only do by applying transitions to the input word.  The grammar generates the language that the tm accepts.

Next, build a context sensitive grammar for the language of an arbitrary btm.  This involves overloaded nonterminals, which hold terminals and state information.  For illustration, the terminals are once again 0 and 1.

Create a nonterminal for [00|01|10|11] cross every state in the btm, cross any other markers we may need.  If q0 is the start state, let s generate (^q000)w | (q011)w, and let w generate (00)w | (11)w | E.  (The items in parentheses represent single nonterminals.)  This produces two interleaved copies of the input word, with q0 at the left.  Transition productions are included as before, but they only modify the first binary digit in each nonterminal, leaving the second digit alone.  If the btm is nondeterministic, convert all possible state transitions to productions.  (Grammars are inherently nondeterministic.)  There are no qi# productions, which would enable the string to get longer.  A final state qf changes to y, which moves left, eating away the first digit and leaving only the second.  At the left edge, the nonterminal has a "beginning of string" marker built in.  This combines with y to produce z.  Now z moves right, eating the first binary digit and leaving the second.  When z encounters and "end of string" marker they vanish together, leaving a terminal string.  This is a context sensitive grammar, and it generates only those words that are accepted by the btm.

Each grammar type has now been associated with a machine that accepts the corresponding language.  The two characterizations, grammar or machine, are equivalent.

Grammar TypeLanguageMachine
Type 0RETuring Machine
Type 0.5RecursiveGeneralized Bounded Turing Machine
Type 1Context SensitiveNondeterministic Bounded Turing Machine
Type 2Context FreeNondeterministic Push Down Machine
Type 2.5???Push Down Machine
Type 3RegularFinite State Machine