2.11 Context-Sensitivity and Semantics

Context-free grammars, by definition, cannot represent context in language. A classical example of context-sensitivity in English is “the first letter of a sentence must be capitalized.” A context-sensitive grammar8 for this property of English sentences is:

Three lines of context-sensitive grammar for capitalizing the first letter of a sentence.
Description

In a context-sensitive grammar, the left-hand side of a production rule is not limited to one non-terminal, as is the case in context-free grammars. In this example, the production rule Production rule reads: Double quotes, left angle bracket, article, right angle bracket, leads to A, vertical bar, An, vertical bar, The, double quotes. only applies in the context of <start> to the left of <article>; that is, the non-terminal <start> provides the context for the application of the rule.

The pattern to which the production rules of a context-sensitive grammar must adhere are less restrictive than that of a context-free grammar. The productions of a context-sensitive grammar may have more than one non-terminal on the left-hand side. Formally, a grammar is a context-sensitive grammar if and only if every production rule is in the form:

αXβαγβ

where X element of set V and alpha comma beta comma gamma element of set, left parenthesis, summation union V, right parenthesis, asterisk., and X can be replaced with γ only in the context of α to its left and β to its right. The strings α and β may be empty in the productions of a context-sensitive grammar, but Gamma not equal to element of set.. However, the rule Production rule reads: S leads to element of set. is permitted as long as S does not appear on the right-hand side of any production.

Context and semantics are often confused. Recall that semantics deals with the meaning of a sentence. Context can be used to validate or discern the meaning of a sentence. Context can be used in two ways:

  • Determine semantic validity. A classical example of context-sensitivity in programming languages is “a variable must be declared before it is used.” For instance, while the following C program is syntactically valid, context reveals that it is not semantically valid because the variable y is referenced, but never declared:

    A set of four lines of code in C. Line 1: I n t main, left parenthesis, right parenthesis, left curly brace. Line 2: i n t x semicolon. Line 3: y equals 1 semicolon. Line 4: Right curly brace.

    Even if all referenced variables are declared, context may still be necessary to identify type mismatches. For instance, consider the following C++ program:

    A set of eight lines of code in C plus plus.
    Description

    Again, while this program is syntactically correct, it is not semantically valid because of the assignment of the value of a variable of one type to a variable of a different type (line 6). We need methods of static semantics (i.e., before run-time) to address this problem. We can generate semantically invalid programs from a context-free grammar because the production rules of a context-free grammar always apply, regardless of the context in which a non-terminal on the left-hand side appears; hence, the rules are called context-free.

  • Disambiguate semantic validity. Another example of context-sensitivity in programming languages is the * operator in C. Its meaning is dependent upon the context in which it is used. It can be used (1) as the multiplication operator (e.g., x asterisk 3.); (2) as the pointer dereferencing operator (e.g., Asterisk p t r.); and (3) in the declaration of pointer types (e.g., i n t asterisk p t r.). Without context, the semantics of the expression x asterisk y. are ambiguous. If we see the declarations i n t x equals 1 comma x equals 2.; immediately preceding this expression, the meaning of the * is multiplication. However, if the statement Statement reads: type d e f i n t x.; precedes the expression x asterisk y., it declares a pointer to an int.

Formalisms, including context-sensitive grammars, for dealing with these and other issues of semantics in programming languages are not easily implementable. Context-free grammars lend themselves naturally to the implementation of parsers (as we see in Chapter 3); context-sensitive grammars do not and, therefore, are not helpful in parser implementation. Thus, while C, Python, and Scheme are context-sensitive languages, the parser for them is implemented using a context-free grammar.

A practical approach to modeling context in programming languages is to infuse context, where practically possible, into a context-free grammar—that is, to include additional production rules to help (brute-)force the syntax to imply the semantics.9 This approach involves designing the context-free production rules in such a way that they cannot generate a semantically invalid program. We used this approach previously to enforce proper operator precedence and associativity.

Applying this approach to capture more sophisticated semantic rules, including the requirement that variables must be declared prior to use, leads to an inordinately large number of production rules; consequently it is often unreasonable and impractical. For instance, consider the determination of whether a collection of items is a set (i.e., an unordered collection without duplicates). That determination requires context. In particular, to determine if an element disqualifies the collection from being a set, we must examine the other items in the collection (i.e., the context). If the universe from which the items in the collection are drawn is finite, we can simply enumerate all possible sets from that universe. Such an enumeration results in not only a context-free grammar, but also a regular grammar. However, that approach can involve a large number of production rules. A device called an attribute grammar is an extension to a context-free grammar that helps bridge the gap between content-free and context-sensitive grammars, while being practical for use in language implementation (Section 2.14).

While we encounter semantics of programming languages throughout this text, we briefly comment on formal semantics here. There are two types of semantics: static and dynamic. In general, in computing, these terms mean before and during run-time, respectively. An example of static semantics is the detection of the use of an undeclared variable or a type incompatibility (e.g., Statement reads: i n t x equals, double quotes, this is not an i n t, double quotes, semicolon.). Attribute grammars can be used for static semantics.

There are three approaches to dynamic semantics:operational, denotational, and axiomatic. Operational semantics involves discerning the meaning of a programming construct by exploring the effects of running a program using it. Since an interpreter for a programming language, through its implementation, implicitly specifies the semantics of the language it interprets, running a program through an interpreter is an avenue to explore the operational semantics of the expressions and statements within the program. (Building interpreters for programming languages with a variety of constructs and features is the primary focus of Chapters 1012.) Consider the English sentence “I chose wisely” which is in the past tense. If we replace the word “chose” with “chos,” the sentence has a lexics error because the substring “chos” is not lexically valid. However, if we replace the word “chose” with “choose,” the sentence is lexically, syntactically, and semantically valid, but in the present tense. Thus, the semantics of the sentence are valid, but unintended. Such a semantic error, like a run-time error in a program, is difficult to a detect.

Conceptual Exercises for Section 2.11

Exercise 2.11.1 Give an example of a property in programming languages (other than any of those given in the text) that is context-sensitive or, in other words, an example property that is not context-free.

Exercise 2.11.2 A context-sensitive grammar can express context that a context-free grammar cannot model. State what a context-free grammar can express that a regular grammar cannot model.

Exercise 2.11.3 We stated in this section that sometimes we can infuse context into a context-free grammar (often by adding more production rules) even though a context-free grammar has no provisions for representing context. Express the context-sensitive grammar given in Section 2.11 enforcing the capitalization of the first character of an English sentence using a context-free grammar.

Exercise 2.11.4 Define a context-free grammar for the language whose sentences correspond to sets of the elements a, b, and c. For instance, the sentences {a}, {a, b}, {a, b, c} are in the language, but the sentences {a, a}, {b, a, b}, and {a, b, c, a} are not.

8. Note that the use of the words -free and -sensitive in the names of formal grammars is inconsistent. The -free in context-free grammar indicates what such a grammar is unable to model—namely, context. In contrast, the -sensitive in context-sensitive grammar indicates what such a grammar can model.

9. Both approaches—use of context-sensitive grammar and use of a context-free grammar with many rules modeling the context—model context in a purely syntactic way (i.e., without ascribing meaning to the language). For instance, with a context-sensitive grammar or a context-free grammar with many rules to enforce semantic rules for C, it is impossible to generate a program referencing an undeclared variable, and a program referencing an undeclared variable would be syntactically invalid.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.158.36