A.2 Formal Language Theory Review

A.2.1 Strings

Let ∑ be a finite, non-empty set of symbols, called an alphabet.

A string (or word) over ∑ is a finite sequence of symbols put side-by-side.

For example, abacc is a string over {a, b, c}.

The length of string s over ∑, denoted by |s|, is the total number of occurrences of symbols in s.

For example, |abacc| = 5.

A string of length 0 is called the empty string or null string, denoted by ε or λ.

A.2.2 Binary Operation on Strings

Let x = a1 a2am and y = b1 b2bn be strings over ∑, of lengths m and n respectively. The concatenation of x and y, denoted by xy, or xoy is the string a1 a2 … am b1 b2 …bn .

A.2.3 Relationships Between Strings

A string x is called a prefix of a string z iff ∃y, [z = xy]. A string x is called a suffix of a string z iff ∃y, [z = yx]. A string x is called a substring of a string z iff ∃u ∃ v [z = uxv].

x is a proper prefix of z iff x is a prefix of z and xz. x is a proper suffix of z iff x is a suffix of z and xz. x is a proper substring of z iff x is a substring of z and xz.

* denotes the set of all strings over .

Theorem A.2.1 Let ∑ be an alphabet. (∑*, o, ε) forms a monoid, where οis concatenation and ε is the empty string.

 

This theorem means that strings as defined have associative operator οand a unique identity element ε.

Sometimes, especially when applying the formal language theory to programming languages, we use an alternative nomenclature.

Definition A.2.1 (Vocabulary) Vocabulary V is a finite, non-empty set of symbols:

 

V = v1, v2,

 

Definition A.2.2 (Strings) Strings are formed by concatenation operation ‘.’, for example, ‘a.b’, which can also be written as ‘ab’ If there is no chance of confusion between symbols ‘a’, ‘b’ and ‘ab’. In short, it means zero or more symbols from some vocabulary V, placed side-by-side in a series or sequence.

Definition A.2.3 (Length of a string) A string has length equal to number of symbols in it and isdenoted by / s /.

Definition A.2.4 (Set of strings) Let V1 denote the set of all the strings of length 1 from V, V2 denote the set of all the strings of length 2 from V, etc.

Then, V2 = V1.V1; similarly for V3, V4, etc.

Then V+ = V1V2 ∪… and V* = {ε} ∪ V1V2 ∪….

 

String Sets as Algebraic System

λ or ε is the identity element in the algebraic system: < V*, ., λ >, because for any sV*, s.λ = λ.s = s. Comparing with < N, +, 0 >, we find that the two systems are isomorphic. This result is important and forms basis of many of our later derivations.

A.2.4 Languages

In short, a Language L over vocabulary V is given by LV*. Usually, some “alphabet” VT (a set of Terminal symbols) is used.

We shall call the strings in a language a sentence. Although structurally they look similar, note that “string” and “sentence” carry different meanings. A string is any arbitrary member of the set V*, while a sentence has to be a member of a particular language L, which may be defined over V*, i.e. LV*. A language has some form of discipline built into it, as specified by its grammar (see below).

We have already seen that ∑* denotes the set of all strings over .

 

Inductive Definition of ∑*

  1. Base Clause: ε is in ∑*.
  2. Inductive Clause: If x is in ∑* and b ∈ ∑, then xb ∈ ∑*.
  3. Extremal Clause: ∑* includes only those strings which can be constructed by a finite number of applications of (1) and (2).

A language over ∑ is a subset of ∑*.

For example, let ∑= {a, b}.

Language L = {an bn | n ≥ 0} = λ, ab, aabb, aaabbb,

A.2.5 Binary Operation on Languages

Let X and Y be languages over ∑. The concatenation of X and Y, denoted by XY, is defined as

XY = {xy | xX and yY}.

For example,

 

X = {a, ab, bb, baa}, Y = {0, 11} and XY = {a0, a11, ab0, ab11, bb0, bb11, baa0, baa11}

 

Note that XYand YX are different in general.

A.2.6 Power Xi f a Language X

Inductive definition of Xiof a language X is:

  1. Base Clause: X 0 = {ε}
  2. Inductive Clause: Xi+1 = (Xi) X

For example,

 

Y = {0, 11}, Y2 = Y1 {0,11} = (Y0 {0, 11}){0, 11}) = ({λ}{0, 11}){0, 11} = {00, 011, 110, 1111}

 

A.2.7 Closure of a Language

Let X be a language over ∑. The closure of X, denoted by X*, is defined as follows:

 

X* = ∪i≥0 Xi =x0x1x2∪…

 

The positive closure of X, denoted by X+, is defined as follows:

 

X+ = ∪i1 Xi = x1x2∪…

 

Note that the set of all strings over ∑ is equal to the closure of ∑, where is viewed as a set of strings of length 1 rather than an alphabet. This is the reason we denote the set of all strings over ∑ by ∑*.

We often regard X* as a unary operation to X, called Kleene star.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.190.175