Let ∑ be a finite, non-empty set of symbols, called an alphabet.
A string (or word) over ∑ is a finite sequence of symbols put side-by-side.
For example, abacc is a string over {a, b, c}.
The length of string s over ∑, denoted by |s|, is the total number of occurrences of symbols in s.
For example, |abacc| = 5.
A string of length 0 is called the empty string or null string, denoted by ε or λ.
Let x = a1 a2 … am and y = b1 b2 … bn be strings over ∑, of lengths m and n respectively. The concatenation of x and y, denoted by xy, or xoy is the string a1 a2 … am b1 b2 …bn .
A string x is called a prefix of a string z iff ∃y, [z = xy]. A string x is called a suffix of a string z iff ∃y, [z = yx]. A string x is called a substring of a string z iff ∃u ∃ v [z = uxv].
x is a proper prefix of z iff x is a prefix of z and x ≠ z. x is a proper suffix of z iff x is a suffix of z and x ≠ z. x is a proper substring of z iff x is a substring of z and x ≠ z.
∑* denotes the set of all strings over ∑.
Theorem A.2.1 Let ∑ be an alphabet. (∑*, o, ε) forms a monoid, where οis concatenation and ε is the empty string.
This theorem means that strings as defined have associative operator οand a unique identity element ε.
Sometimes, especially when applying the formal language theory to programming languages, we use an alternative nomenclature.
Definition A.2.1 (Vocabulary) Vocabulary V is a finite, non-empty set of symbols:
Definition A.2.2 (Strings) Strings are formed by concatenation operation ‘.’, for example, ‘a.b’, which can also be written as ‘ab’ If there is no chance of confusion between symbols ‘a’, ‘b’ and ‘ab’. In short, it means zero or more symbols from some vocabulary V, placed side-by-side in a series or sequence.
Definition A.2.3 (Length of a string) A string has length equal to number of symbols in it and isdenoted by / s /.
Definition A.2.4 (Set of strings) Let V1 denote the set of all the strings of length 1 from V, V2 denote the set of all the strings of length 2 from V, etc.
Then, V2 = V1.V1; similarly for V3, V4, etc.
Then V+ = V1 ∪ V2 ∪… and V* = {ε} ∪ V1 ∪ V2 ∪….
λ or ε is the identity element in the algebraic system: < V*, ., λ >, because for any s ∈ V*, s.λ = λ.s = s. Comparing with < N, +, 0 >, we find that the two systems are isomorphic. This result is important and forms basis of many of our later derivations.
In short, a Language L over vocabulary V is given by L ⊂ V*. Usually, some “alphabet” VT (a set of Terminal symbols) is used.
We shall call the strings in a language a sentence. Although structurally they look similar, note that “string” and “sentence” carry different meanings. A string is any arbitrary member of the set V*, while a sentence has to be a member of a particular language L, which may be defined over V*, i.e. L ⊂ V*. A language has some form of discipline built into it, as specified by its grammar (see below).
We have already seen that ∑* denotes the set of all strings over ∑.
A language over ∑ is a subset of ∑*.
For example, let ∑= {a, b}.
Language L = {an bn | n ≥ 0} = λ, ab, aabb, aaabbb, …
Let X and Y be languages over ∑. The concatenation of X and Y, denoted by XY, is defined as
XY = {xy | x ∈ X and y ∈ Y}.
For example,
Note that XYand YX are different in general.
Inductive definition of Xiof a language X is:
For example,
Let X be a language over ∑. The closure of X, denoted by X*, is defined as follows:
The positive closure of X, denoted by X+, is defined as follows:
Note that the set of all strings over ∑ is equal to the closure of ∑, where ∑ is viewed as a set of strings of length 1 rather than an alphabet. This is the reason we denote the set of all strings over ∑ by ∑*.
We often regard X* as a unary operation to X, called Kleene star.
18.216.96.94