Languages, An Introduction

Introduction

You may think of a language as English or French, or perhaps perl or java, but there is a formal definition that is much more general. It encompasses these languages, and other, abstract languages such as the prime numbers, or the valid proofs of the 4 color theorem.

Start with a finite set, which is called the alphabet. Consider all finite ordered strings, i.e. finite tuples, drawn from this alphabet. A language is any well defined subset of these strings. Each finite string in the language is called a word.

If the alphabet contains the 26 English letters, a language might be the set of words with no vowels. There is a formula, in logic, that says: "word in language iff there is no letter in word such that letter = A E I O or U." This formula reduces the list of all possible strings down to a subset, the strings with no vowels, which becomes a well defined language.

Here's another example. Let the alphabet contain 0 and 1, and describe the language of prime numbers. "Include(word) iff there are no strings p and q such that p×q = word, except where p or q is 1." Well it is not at all clear that p×q can be written as an expression in logic, in the context of set theory, but let me put that question aside for the moment.

We will see that languages fall into various classes, according to their complexity. Some languages can be parsed, i.e. interpreted, by a very simple state machine. Others require the human brain, or something comparable.

Languages also have many representations: machines that recognize them, expressions that describe them, and grammars that generate them.