Languages, Grammar Types

Grammar Types

Grammars are partitioned into 4 types, based on their complexity. A grammar is type 3, or regular, if its rules are as follows. Here x and y are arbitrary nonterminals, T is an arbitrary terminal, and E is the empty string.

x → E | T | y | yT

Alternatively, we coulde replace the format x → yT with x → Ty, spinning terminals off to the left as we go. The language is essentially the same, except the words are reversed. Use whichever form is convenient.

A grammar is type 2, or context free, if the left side of each rule is a single nonterminal.

A grammar is type 1, or context sensitive, if each string generates a string that is at least as long. Intermediate strings never get shorter as you proceed through the derivation. The production s → E is tolerated, if the language includes the empty word.

Type 0 grammars are not constrained in any way.

By definition, a type x language can be defined by a type x grammar, but not a type x+1 grammar. We will eventually show that type x languages exist, for x in 3 2 1 0. This hierarchy really does represent increasing levels of complexity.

Finite Languages are Regular

When a language consists of a finite set of words, a regular grammar defines that language.

In this construction, word_i,j is the j^th letter of the i^th word. The variables x_i,j are nonterminals that support the grammar. (In both cases, we start numbering at 0.) For each word_i in the language, construct a rule:

s → x_i,0

Thus the first stepin the derivation selects the word. Subsequent steps spell out the word, as we shall see. Add the following rules, for each letter in each word.

x_i,j → word_i,j x_i,j+1
x_i,j → word_i,j (for the last letter in the i^th word)

The rule s → E is added if the language includes the empty word.

This is a theoretical result with little practical value, since natural (English) and artificial (fortran) languages, although finite, are much too large for exhaustive enumeration. Besides, a simple list of "legal" strings denies the structure of the language, which is necessary for comprehension, whether you are a compiler or a human brain.