Languages, Regular Grammars, Machines, and Expressions

Regular Grammars, Machines, and Expressions

Finite state machines, regular grammars, and regular expressions all define the same languages, i.e. regular languages. To demonstrate equivalence, construct a regular grammar for each fsm, a regular expression for each regular grammar, and then an fsm for each regular expression. That completes the circle.

Given an fsm (assume it is deterministic), let's construct an equivalent grammar. Associate a nonterminal with each state, and the start symbol with the start state. Include x → E if the state corresponding to x is a success state. Include x → Ty if the input character T drives the fsm from state x into state y. Verify that this grammar generates a word iff the original fsm accepts the same word.

Given a grammar, combine the productions for each nonterminal to form a corresponding regular expression:

x → xB | xD | yC | zD | z | B | E

For now, the regular expressions so created contain terminals and nonterminals.

Using the distributive property, normalize these expressions so that each nonterminal appears only once on the right hand side.

x → x(B|D) | yC | z(D|E) | (B|E)

Next, eliminate direct recursion:

x → yC(B|D)* | z(D|E)(B|D)* | (B|E)(B|D)*

We could get in trouble with a rule like x → xC, but then, such a grammar is not grounded. The same words are generated if we ignore x and its pathological rule. So assume, hereinafter, that the grammar is grounded.

Now each nonterminal generates a regular expression consisting of terminals and other nonterminals.

Select any nonterminal, and replace it with its regular expression in every other regular expression that references it. Again, normalize using the distributive property, and eliminate recursion. As long as the grammar is grounded, all nonterminals can be eliminated. The expression corresponding to s is the language defined by the grammar. Thus regular grammars determine regular expressions.

Given a regular expression, construct a corresponding fsm using induction on the number of operators in the expression.

The expression E corresponds to a one-state fsm, whose start state is the final state.

The expression T, a singel terminal, corresponds to a two state machine, start on T goes to final.

If the expression is a repitition of another expression, delete the trailing '*', construct the fsm that determines the shorter expression, and add E transitions from its final states back to its start state.

When two expressions are concatenated, construct the two fsms using disjoint states, and add E transitions taking the final states of the first fsm to the start state of the second.

When an expression is the alternation of two shorter expressions, construct the two fsms using disjoint states, and add E transitions combining the start states and the final states of the two fsms.

Thus regular expressions determine fsms, and all three systems define the same languages, that is, regular languages.