CmSc 365 Theory of Computation


Grammars, Chomsky Hierarchy
Numerical Functions

Learning goals


  1. Grammars

  2. Grammars are language generators. They consist of an alphabet of terminal symbols, alphabet of non-terminal symbols, a starting symbol and rules. Each language, generated by some grammar, can be recognized by some automaton. Languages (and the corresponding grammars) can be classified according to the minimal automaton sufficient to recognize them. Such classification, known as Chomsky Hierarchy, has been defined by Noam Chomsky, a distinguished linguist with major contributions to linguistics.

    The Chomsky Hierarchy comprises four types of languages and their associated grammars and machines.

      Language Grammar Machine Example
    Type 3 Regular languages Regular grammars
    • Right-linear grammars
    • Left-linear grammars
    Finite-state automata a*
    Type 2 Context-free languages Context-free grammars Push-down automata anbn
    Type 1 Context-sensitive languages Context-sensitive grammars Linear-bound automata anbncn
    Type 0 Recursive languages
    Recursively enumerable languages
    Unrestricted grammars Turing machines any computable function

    The types of languages form a strict hierarchy:
    regular languages Ì context-free languages Ì context-sensitive languages Ì recursive languages Ì recursively enumerable languages.

    The distinction between languages can be seen by examining the structure of the grammar rules of their grammar, or the nature of the automata which can be used to identify them.

    • Type 3 - Regular Languages
    • As we have discussed, a regular language is one which can be represented by a regular grammar, described using a regular expression, or accepted using an FSA.

      There are two kinds of regular grammar:
      • Right-linear (right-regular), with rules of the form
        A → α B or A → α , where A and B are single non-terminal symbols, α is a terminal symbol
        Parse trees with these grammars are right-branching.

      • Left-linear (left-regular), with rules of the form
        A → B α or A → α Parse trees with these grammars are left-branching.

      Examples of regular languages are pattern matching languages (regular expressions).

    • Type 2 - Context-Free Languages
    • A Context-Free Grammar (CFG) is one whose production rules are of the form:

      A → α where A is any single non-terminal, and α is any combination of terminals and non-terminals.

      The minimal automaton that recognizes context-free languages is a push-down automaton. It uses stack when expanding the non-terminal symbols with the right-hand side of the corresponding grammar rule.

      Examples of CFLs are some simple programming languages.

    • Type 1 - Context-Sensitive Languages
    • Context-Sensitive grammars may have more than one symbol on the left-hand-side of their grammar rules, provided that at least one of them is a non-terminal and the number of symbols on the left-hand-side does not exceed the number of symbols on the right-hand-side.

      Their rules have the form:

      α A γα β γ where A is a single non-terminal symbol, and α β γ are any combination of terminals and non-terminals.

      Since we allow more than one symbol on the left-hand-side, we refer to those symbols other than the one we are replacing as the context of the replacement.

      The automaton which recognizes a context-sensitive language is called a linear-bounded automaton: an FSA with a memory to store symbols in a list.

      Since the number of the symbols on the left-hand side is always smaller or equal to the number of the symbols on the right-hand side, the length of each derivation string is increased when applying a grammar rule. This length is bound by the length of the input string. Thus a linear-bound automaton always needs a finite list as its store

      Examples of context-sensitive languages are most programming languages

    • Type 0 - Unrestricted (Free) Languages
    • Unrestricted grammars have no restrictions on their grammar rules, except that there must be at least one non-terminal on the left-hand-side. The rules have the form
      αβ where α and β are arbitrary strings of terminal and non-terminal symbols and α ¹ ε (the empty string)

      The type of automata which can recognize such a language is a Turing machine, with an infinitely long memory.

      Examples of unrestricted languages are almost all natural languages.

    Theorem: A language is generated by an unrestricted grammar if and only if it is recursively enumerable.

    Are all languages recursively enumerable? The answer is no. We have shown that there are languages that are not recursively enumerable. Such languages cannot be described a formal grammar. Each formal grammar has a finite description and therefore can be considered as a string. Thus, the set of all formal grammars is infinitely countable. The set of all languages over an alphabet is the power set of all strings over that alphabet. We have shown that power sets of infinite sets are not countable. Therefore there is no one-to-one match between grammars and languages.

  3. Numerical functions

  4. Recursive language - a language that can be decided by a Turing machine
    Recursive function - a function that can be computed by a Turing machine

    Why do we use the word "recursive"?
    It turns out that functions computable by a Turing machine can be represented by means of very simple, basic functions using composition and recursive definition.

    Three basic numerical functions, so simple that their computability is obvious:

    1. Zero function: matches a tuple to zero

    2. zk(n1,n2,…nk) = 0 for any k

    3. Identity function: matches a tuple to a number within the tuple:

    4. Id j,k (n1,n2,…nk) = nj, 0 < j £ k Example: Id 3,5 (1,3,5,7,9) = 5

    5. Successor function, defines the natural numbers:
      s(0) = 1
      s(n) = n+1

    Using these three functions we can define more complex functions.

    Examples:

    Addition:
    plus(m,0) = m
    plus(m,n+1) = s(plus(m,n))

    Multiplication:
    mult(m,0) = 0
    mult(m,n+1) = plus(m,mult(m,n))

    It can be proved that all computable functions can be obtained from these primitive functions and vice versa - all functions that can be obtained are computable.


Learning goals

  • Be able to decsribe Chomsky Hierarchy and the correspondence between types of grammars, types of languages and types of automata.
  • Be able to descrybe each type of grammar by the form of its rules


Back to Contents page

Created by Lydia Sinapova