Chapter 7. Tokens, Values, and Variables

 

There's nothing remarkable about it. All one has to do is hit the right keys at the right time and the instrument plays itself.

 
 --Johann Sebastian Bach

A program starts as a sequence of characters contained in a file—the source code. Interpreting those characters, according to the rules of a given language, is the job of the compiler, or interpreter. Some characters will represent the names of variables, others will be special keywords used by the language, still others will be operators or “punctuation” characters used to separate the other elements. All of these textual constructs form the lexical elements of the program. These lexical elements must be identified as keywords, comments, literals, variables, operators, or whatever else is appropriate for the given language. In this chapter we look at the basic lexical elements of a Java program, the literal values that can be expressed and the different kinds of variables that can hold those values.

Lexical Elements

One of the first phases of compilation is the scanning of the lexical elements into tokens. This phase ignores whitespace and comments that appear in the text—so the language must define what form whitespace and comments take. The remaining sequence of characters must then be parsed into tokens.

Character Set

Most programmers are familiar with source code that is prepared using one of two major families of character representations: ASCII and its variants (including Latin-1) and EBCDIC. Both character sets contain characters used in English and several other Western European languages.

The Java programming language, on the other hand, is written in a 16-bit encoding of Unicode. The Unicode standard originally supported a 16-bit character set, but has expanded to allow for up to 21-bit characters with a maximum value of 0x10ffff. The characters above the value 0x00ffff are termed the supplementary characters. Any particular 21-bit value is termed a code point. To allow all characters to be represented by 16-bit values, Unicode defines an encoding format called UTF-16, and this is how the Java programming language represents text. In UTF-16 all the values between 0x0000 and 0xffff map directly to Unicode characters. The supplementary characters are encoded by a pair of 16-bit values: The first value in the pair comes from the high-surrogates range, and the second comes from the low-surrogates range. Methods that want to work with individual code point values can either accept a UTF-16 encoded char[] of length two, or a single int that holds the code point directly. An individual char in a UTF-16 sequence is termed a code unit.

The first 256 characters of Unicode are the Latin-1 character set, and most of the first 128 characters of Latin-1 are equivalent to the 7-bit ASCII character set. Current environments read ASCII or Latin-1 files, converting them to Unicode on the fly.[1]

Few existing text editors support Unicode characters, so you can use the escape sequence uxxxx to encode Unicode characters, where each x is a hexadecimal digit (09, and af or AF to represent decimal values 10–15). This sequence can appear anywhere in code—not only in character and string constants but also in identifiers. More than one u may appear at the beginning; thus, the character Character Set can be written as u0b87 or uuu0b87.[2] Also note that if your editor does support Unicode characters (or a subset), you may need to tell your compiler if your source code contains any character that is not part of the default character encoding for your system—such as through a command-line option that names the source character set.

Exercise 7.1Just for fun, write a “Hello, World” program entirely using Unicode escape sequences.

Comments

Comments within source code exist for the convenience of human programmers. They play no part in the generation of code and so are ignored during scanning. There are three kinds of comments:

// comment.

Characters from // to the end of the line are ignored

/* comment */

All characters between /* and the next */ are ignored.

/** comment */

All characters between /** and the next */ are ignored.

These documentation comments come immediately before identifier declarations and are included in automatically generated documentation. These comments are described in Chapter 19.

Comments can include any valid Unicode character, such as yin-yang (u262f), asterism (u2042), interrobang (u203d), won (u20a9), scruple (u2108), or a snowman (u2603).[3]

Comments do not nest. This following tempting code does not compile:

/* Comment this out for now: not implemented
    /* Do some really neat stuff */
    universe.neatStuff();
*/

The first /* starts a comment; the very next */ ends it, leaving the code that follows to be parsed; and the invalid, stand-alone */ is a syntax error. The best way to remove blocks of code from programs is either to put a // at the beginning of each line or use if(false) like this:

if (false) {
    // invoke this method when it works
    dwim();
}

This technique requires that the code to be removed is complete enough to compile without error. In this case we assume that the dwim method is defined somewhere.

Tokens

The tokens of a language are its basic words. A parser breaks source code into tokens and then tries to figure out which statements, identifiers, and so forth make up the code. Whitespace (spaces, tabs, newlines, and form feeds) is not significant except to separate tokens or as the contents of character or string literals. You can take any valid code and replace any amount of intertoken whitespace (whitespace outside strings and characters) with a different amount of whitespace (but not none) without changing the meaning of the program.

Whitespace must be used to separate tokens that would otherwise constitute a single token. For example, in the statement

return 0;

you cannot drop the space between return and 0 because that would create

return0;

consisting of the single identifier return0. Use extra whitespace appropriately to make your code human-readable, even though the parser ignores it. Note that the parser treats comments as whitespace.

The tokenizer is a “greedy” tokenizer. It grabs as many characters as it can to build up the next token, not caring if this creates an invalid sequence of tokens. So because ++ is longer than +, the expression

j = i+++++i;    // INVALID

is interpreted as the invalid expression

j = i++ ++ +i;  // INVALID

instead of the valid

j = i++ + ++i;

Identifiers

Identifiers, used for names of declared entities such as variables, constants, and labels, must start with a letter, followed by letters, digits, or both. The terms letter and digit are broad in Unicode: If something is considered a letter or digit in a human language, you can probably use it in identifiers. “Letters” can come from Armenian, Korean, Gurmukhi, Georgian, Devanagari, and almost any other script written in the world today. Thus, not only is kitty a valid identifier, but Identifiers, Identifiers, Identifiers, Identifiers, and Identifiers are, too.[4] Letters also include any currency symbol (such as $, ¥, and £) and connecting punctuation (such as _).

Any difference in characters within an identifier makes that identifier unique. Case is significant: A, a, á, À, Å, and so on are different identifiers. Characters that look the same, or nearly the same, can be confused. For example, the Latin capital letter n “N” and the Greek capital ν “”” look alike but are different characters (u004e and u039d, respectively). The only way to avoid confusion is to write each identifier in one language—and thus in one known set of characters—so that programmers trying to type the identifier will know whether you meant E or E.[5]

Identifiers can be as long as you like, but use some taste. Identifiers that are too long are hard to use correctly and actually obscure your code.

Keywords

Language keywords cannot be used as identifiers because they have special meaning within the language. The following table lists the keywords (keywords marked with a are reserved but currently unused):

abstract

continue

for

new

switch

assert

default

goto

package

synchronized

boolean

do

if

private

this

break

double

implements

protected

throw

byte

else

import

public

throws

case

enum

instanceof

return

transient

catch

extends

int

short

try

char

final

interface

static

void

class

finally

long

strictfp

volatile

const

float

native

super

while

Although they appear to be keywords, null, true, and false are formally literals, just like the number 12, so they do not appear in the above table. However, you cannot use null, true, or false as identifiers, just as you cannot use 12 as an identifier. These words can be used as parts of identifiers, as in annulled, construe, and falsehood.

Types and Literals

Every expression has a type that determines what values the expression can produce. The type of an expression is determined by the types of values and variables used within that expression. Types are divided into the primitive types and the reference types.

The primitive data types are:

boolean

either true or false

char

16-bit Unicode UTF-16 code unit (unsigned)

byte

8-bit signed two's-complement integer

short

16-bit signed two's-complement integer

int

32-bit signed two's-complement integer

long

64-bit signed two's-complement integer

float

32-bit IEEE 754 floating-point number

double

64-bit IEEE 754 floating-point number

Each primitive data type has a corresponding class type in the java.lang package. These wrapper classesBoolean, Character, Byte, Short, Integer, Long, Float, and Double—also define useful constants and methods. For example, most wrapper classes declare constants MIN_VALUE and MAX_VALUE that hold the minimum and maximum values for the associated primitive type.

The Float and Double classes also have NaN, NEGATIVE_INFINITY, and POSITIVE_INFINITY constants. Both also provide an isNaN method that tests whether a floating-point value is “Not a Number”—that is, whether it is the result of a floating-point expression that has no valid result, such as dividing zero by zero. The NaN value can be used to indicate an invalid floating-point value; this is similar to the use of null for object references that do not refer to anything. The wrapper classes are covered in detail in Chapter 8.

There is no unsigned integer type. If you need to work with unsigned values originating outside your program, they must be stored in a larger signed type. For example, unsigned bytes produced by an analog-to-digital converter, can be read into variables of type short.

The reference types are class types, interface types, and array types. Variables of these types can refer to objects of the corresponding type.

Each type has literals, which are the way that constant values of that type are written. The next few subsections describe how literal (unnamed) constants for each type are specified.

Reference Literals

The only literal object reference is null. It can be used anywhere a reference is expected. Conventionally, null represents an invalid or uncreated object. It has no class, not even Object, but null can be assigned to any reference variable.

Boolean Literals

The boolean literals are true and false.

Character Literals

Character literals appear with single quotes: 'Q'. Any valid Unicode character can appear between the quotes. You can use uxxxx for Unicode characters inside character literals just as you can elsewhere. Certain special characters can be represented by an escape sequence:

newline (u000A)

tab (u0009)



backspace (u0008)

return (u000D)

f

form feed (u000C)

\

backslash itself (u005C)

'

single quote (u0027)

"

double quote (u0022)

ddd

a char by octal value, where each d is one of 07

Octal character constants can have three or fewer digits and cannot exceed 377 (u00ff)—for example, the character literal '12' is the same as ' '. Supplemental characters can not be represented in a character literal.

Integer Literals

Integer constants are strings of octal, decimal, or hexadecimal digits. The start of a constant declares the number's base: A 0 (zero) starts an octal number (base 8); a 0x or 0X starts a hexadecimal number (base 16); and any other digit starts a decimal number (base 10). All the following numbers have the same value:

29 035 0x1D 0X1d

Integer constants are long if they end in L or l, such as 29L; L is preferred over l because l (lowercase L) can easily be confused with 1 (the digit one). Otherwise, integer constants are assumed to be of type int. If an int literal is directly assigned to a short, and its value is within the valid range for a short, the integer literal is treated as if it were a short literal. A similar allowance is made for integer literals assigned to byte variables. In all other cases you must explicitly cast when assigning int to short or byte (see “Explicit Type Casts” on page 219).

Floating-Point Literals

Floating-point constants are expressed in either decimal or hexadecimal form. The decimal form consists of a string of decimal digits with an optional decimal point, optionally followed by an exponent—the letter e or E, followed by an optionally signed integer. At least one digit must be present. All these literals denote the same floating-point number:

18.  1.8e1  .18E+2  180.0e-1

The hexadecimal form consists of 0x (or 0X), a string of hexadecimal digits with an optional hexadecimal point, followed by a mandatory binary exponent—the letter p or P, followed by an optionally signed integer. The binary exponent represents scaling by two raised to a power. All these literals also denote the same floating-point number (decimal 18.0):

0x12p0  0x1.2p4  0x.12P+8  0x120p-4

Floating-point constants are of type double unless they are specified with a trailing f or F, which makes them float constants, such as 18.0f. A trailing d or D specifies a double constant. There are two zeros: positive (0.0) and negative (-0.0). Positive and negative zero are considered equal when you use == but produce different results when used in some calculations. For example, if dividing by zero, the expression 1d/0d is +∞, whereas 1d/-0d is -∞. There are no literals to represent either infinity or NaN, only the symbolic constants defined in the Float and Double classes—see Chapter 8.

A double constant cannot be assigned directly to a float variable, even if the value of the double is within the valid float range. The only constants you may directly assign to float variables and fields are float constants.

String Literals

String literals appear with double quotes: "along". Any character can be included in string literals, with the exception of newline and " (double quote). Newlines are not allowed in the middle of strings. If you want to embed a newline character in the string, use the escape sequence . To embed a double quote use the escape sequence ". A string literal references an object of type String. To learn more about strings, see Chapter 13.

Characters in strings can be specified with the octal digit syntax, but all three octal digits should be used to prevent accidents when an octal value is specified next to a valid octal digit in the string. For example, the string "116" is equivalent to " 6", whereas the string "116" is equivalent to "N".

Class Literals

Every type (primitive or reference) has an associated instance of class Class that represents that type. These instances are often referred to as the class object for a given type. You can name the class object for a type directly by following the type name with ".class", as in

String.class
java.lang.String.class
java.util.Iterator.class
boolean.class

The first two of these class literals refer to the same instance of class Class because String and java.lang.String are two different names for the same type. The third class literal is a reference to the Class instance for the Iterator interface mentioned on page 129. The last is the Class instance that represents the primitive type boolean.

Since class Class is generic, the actual type of the class literal for a reference type T is Class<T>, while for primitive types it is Class<W> where W is the wrapper class for that primitive type. But note, for example, that boolean.class and Boolean.class are two different objects of type Class<Boolean>. Generic types are discussed in Chapter 11, and the class Class is discussed in Chapter 16.

Exercise 7.2Write a class that declares a field for each of the primitive numeric types, and try to assign values using the different literal forms—for example, try to assign 3.5f to an int field. Which literals can be used with which type of field? Try changing the magnitude of the values used to see if that affects things.

Variables

A variable is a storage location[6] —something that can hold a value—to which a value can be assigned. Variables include fields, local variables in a block of code, and parameters. A variable declaration states the identifier (name), type, and other attributes of a variable. The type part of a declaration specifies which kinds of values and behavior are supported by the declared entity. The other attributes of a variable include annotations and modifiers. Annotations can be applied to any variable declaration and are discussed in Chapter 15.

Field and Local Variable Declarations

Fields and local variables are declared in the same way. A declaration is broken into three parts: modifiers, followed by a type, followed by a list of identifiers. Each identifier can optionally have an initializer associated with it to give it an initial value.

There is no difference between variables declared in one declaration or in multiple declarations of the same type. For example:

float x, y;

is the same as

float x;
float y;

Any initializer is expressed as an assignment (with the = operator) of an expression of the appropriate type. For example:

float x = 3.14f, y = 2.81f;

is the same as the more readable

float x = 3.14f,
      y = 2.81f;

is the same as the preferred

float x = 3.14f;
float y = 2.81f;

Field variables are members of classes, or interfaces, and are declared within the body of that class or interface. Fields can be initialized with an initializer, within an initialization block, or within a constructor, but need not be initialized at all because they have default initial values, as discussed on page 44. Field initialization and the modifiers that can be applied to fields were discussed in Chapter 2.

Local variables can be declared anywhere within a block of statements, not just at the start of the block, and can be of primitive or reference type. As a special case, a local variable declaration is also permitted within the initialization section of a for loop—see “for” on page 236. A local variable must be assigned a value before it is used.[7] There is no default initialization value for local variables because failure to assign a starting value for one is usually a bug. The compiler will refuse to compile code that doesn't ensure that assignment takes place before a variable is used:

int x;     // uninitialized, can't use
int y = 2;
x = y * y; // now x has a value
int z = x; // okay, safe to use x

Local variables cease to exist when the flow of control reaches the end of the block in which they were declared—though any referenced object is subject to normal garbage collection rules.

Apart from annotations, the only modifier that can be applied to a local variable is final. This is required when the local variable will be accessed by a local or anonymous inner class—see also the discussion of final variables below.

Parameter Variables

Parameter variables are the parameters declared in methods, constructors, or catch blocks—see “try, catch, and finally” on page 286. A parameter declaration consists of an optional modifier, a type name, and a single identifier.

Parameters cannot have explicit initializers because they are implicitly initialized with the value of the argument passed when the method or constructor is invoked, or with a reference to the exception object caught in the catch block. Parameter variables cease to exist when the block in which they appear completes.

As with local variables, the only modifiers that can be applied to a parameter are annotations, or the final modifier.

final Variables

The final modifier declares that the value of the variable is set exactly once and will thereafter always have the same value—it is immutable. Any variable—fields, local variables, or parameters—can be declared final. Variables that are final must be initialized before they are used. This is typically done directly in the declaration:

final int id = nextID++;

You can defer the initialization of a final field or local variable. Such a final variable is called a blank final. A blank final field must be initialized within an initialization block or constructor (if it's an instance field) while a blank final local variable, like any local variable, must be initialized before it is used.

Blank final fields are useful when the value of the field is determined by a constructor argument:

class NamedObj {
    final String name;

    NamedObj(String name) {
        this.name = name;
    }
}

or when you must calculate the value in something more sophisticated than an initializer expression:

static final int[] numbers = numberList();
static final int maxNumber; // max value in numbers

static {
    int max = numbers[0];
    for (int num : numbers) {
        if (num > max)
            max = num;
    }
    maxNumber = max;
}

static int[] numberList() {
    // ...
}

The compiler will verify that all static final fields are initialized by the end of any static initializer blocks, and that non-static final fields are initialized by the end of all construction paths for an object. A compile-time error will occur if the compiler cannot determine that this happens.

Blank final local variables are useful when the value to be assigned to the variable is conditional on the value of other variables. As with all local variables, the compiler will ensure that a final local variable is initialized before it is used.

Local variables and parameters are usually declared final only when they will be accessed by a local, or anonymous inner, class—though some people advocate always making parameters final, both as a matter of style, and to avoid accidentally assigning a value to a parameter, when a field or other variable was intended. Issues regarding when you should, and should not, use final on fields were discussed on page 46.

Array Variables

Arrays provide ordered collections of elements. Components of an array can be primitive types or references to objects, including references to other arrays. Arrays themselves are objects and extend Object. The declaration

int[] ia = new int[3];

declares an array named ia that initially refers to an array of three int values.

Array dimensions are omitted in the type declaration of an array variable. The number of components in an array is determined when it is created using new, not when an array variable is declared. An array object's length is fixed at its creation and cannot be changed. Note that it is the length of the array object that is fixed. In the example, a new array of a different size could be assigned to the array variable ia at any time.

You access array elements by their position in the array. The first element of an array has index 0 (zero), and the last element has index length–1. You access an element by using the name of the array and the index enclosed between [ and ]. In our example, the first element of the array is ia[0] and last element of the array is ia[2]. Every index use is checked to ensure that it is within the proper range for that array, throwing an ArrayIndexOutOfBoundsException if the index is out of bounds.[8] The index expression must be of type int—this limits the maximum size of an array.

The length of an array is available from its length field (which is implicitly public and final). In our example, the following code would loop over the array, printing each value:

for (int i = 0; i < ia.length; i++)
    System.out.println(i + ": " + ia[i]);

An array with length zero is said to be an empty array. There is a big difference between a null array reference and a reference to an empty array—an empty array is a real object, it simply has no elements. Empty arrays are useful for returning from methods instead of returning null. If a method can return null, then users of the method must explicitly check the return value for null before using it. On the other hand, if the method returns an array that may be empty, no special checking is needed provided the user always uses the array length to check valid indices.

If you prefer, you can put the array brackets after the variable name instead of after the type:

int ia[] = new int[3];

This code is equivalent to the original definition of ia. However, the first style is preferable because it places the type declaration entirely in one place.

Array Modifiers

The normal modifiers can be applied to array variables, depending on whether the array is a field or local variable. The important thing to remember is that the modifiers apply to the array variable not to the elements of the array the variable references. An array variable that is declared final means that the array reference cannot be changed after initialization. It does not mean that array elements cannot be changed. There is no way to apply any modifiers (specifically final and volatile) to the elements of an array.

Arrays of Arrays

You can have arrays of arrays. The code to declare and print a two-dimensional matrix, for example, might look like this:

float[][] mat = new float[4][4];
setupMatrix(mat);
for (int y = 0; y < mat.length; y++) {
    for (int x = 0; x < mat[y].length; x++)

        System.out.print(mat[y][x] + " ");
    System.out.println();
}

The first (left-most) dimension of an array must be specified when the array is created. Other dimensions can be left unspecified, to be filled in later. Specifying more than the first dimension is a shorthand for a nested set of new statements. Our new creation could have been written more explicitly as:

float[][] mat = new float[4][];
for (int y = 0; y < mat.length; y++)
    mat[y] = new float[4];

One advantage of arrays of arrays is that each nested array can have a different size. You can emulate a 4×4 matrix, but you can also create an array of four int arrays, each of which has a different length sufficient to hold its own data.

Array Initialization

When an array is created, each element is set to the default initial value for its type—zero for the numeric types, 'u0000' for char, false for boolean, and null for reference types. When you declare an array of a reference type, you are really declaring an array of variables of that type. Consider the following code:

Attr[] attrs = new Attr[12];

for (int i = 0; i < attrs.length; i++)
    attrs[i] = new Attr(names[i], values[i]);

After the initial new of the array, attrs has a reference to an array of 12 variables that are initialized to null. The Attr objects themselves are created only when the loop is executed.

You can initialize arrays with comma separated values inside braces following their declaration. The following array declaration creates and initializes an array:

String[] dangers = { "Lions", "Tigers", "Bears" };

The following code gives the same result:

String[] dangers = new String[3];

dangers[0] = "Lions";
dangers[1] = "Tigers";
dangers[2] = "Bears";

When you initialize an array within its declaration, you don't have to explicitly create the array using new—it is done implicitly for you by the system. The length of the array to create is determined by the number of initialization values given. You can use new explicitly if you prefer, but in that case you have to omit the array length, because again it is determined from the initializer list.

String[] dangers = new String[]{"Lions", "Tigers", "Bears"};

This form of array creation expression allows you to create and initialize an array anywhere. For example, you can create and initialize an array when you invoke a method:

printStrings(new String[] { "one", "two", "many" });

An unnamed array created with new in this way is called an anonymous array.

The last value in the initializer list is also allowed to have a comma after it. This is a convenience for multiline initializers so you can reorder, add, or remove values, without having to remember to add a comma to the old last line, or remove it from the new last line.

Arrays of arrays can be initialized by nesting array initializers. Here is a declaration that initializes an array to the top few rows of Pascal's triangle, with each row represented by its own array:

int[][] pascalsTriangle = {
            { 1 },
            { 1, 1 },
            { 1, 2, 1 },
            { 1, 3, 3, 1 },
            { 1, 4, 6, 4, 1 },
        };

Indices in an array of arrays work from the outermost inward. For example, in the above array, pascalsTriangle[0] refers to the int array that has one element, pascalsTriangle[1] refers to the int array that has two elements, and so forth.

For convenience, the System class provides an arraycopy method that allows you to assign the values from one array into another, instead of looping through each of the array elements—this is described in more detail in “Utility Methods” on page 665.

Arrays and Types

Arrays are implicit extensions of Object. Given a class X, classes Y and Z that extend X, and arrays of each, the class hierarchy looks something like this:

Arrays and Types

This class relationship allows polymorphism for arrays. You can assign an array to a variable of type Object and cast it back. An array of objects of type Y is usable wherever an array of objects of its supertype X is required. This seems natural but can require a run time check that is sometimes unexpected. An array of X can contain either Y or Z references, but an array of Y cannot contain references to X or Z objects. The following code would generate an ArrayStoreException at run time on either of its final two lines, which violate this rule:

Y[] yArray = new Y[3];      // a Y array
X[] xArray = yArray;        // valid: Y is assignable to X
xArray[0] = new Y();
xArray[2] = new X();        // INVALID: can't store X in Y[]
xArray[1] = new Z();        // INVALID: can't store Z in Y[]

If xArray were a reference to a real X[] object, it would be valid to store both an X and a Z object into it. But xArray actually refers to a Y[] object so it is not valid to store either an X reference or a Z reference in it. Such assignments are checked at run time if needed to ensure that no improper reference is stored into an array.

Like any other object, arrays are created and are subject to normal garbage collection mechanisms. They inherit all the methods of Object and additionally implement the Cloneable interface (see page 101) and the Serializable interface (see “Object Serialization” on page 549). Since arrays define no methods of their own, but just inherit those of Object, the equals method is always based on identity, not equivalence. The utility methods of the java.util.Arrays class—see “The Arrays Utility Class” on page 607—allow you to compare arrays for equivalence, and to calculate a hash code based on the contents of the array.

The major limitation on the “object-ness” of arrays is that they cannot be extended to add new methods. The following construct is not valid:

class ScaleVector extends double[] { // INVALID
    // ...
}

In a sense, arrays behave like final classes.

Exercise 7.3Write a program that calculates Pascal's triangle to a depth of 12, storing each row of the triangle in an array of the appropriate length and putting each of the row arrays into an array of 12 int arrays. Design your solution so that the results are printed by a method that prints the array of arrays using the lengths of each array, not a constant 12. Now change the code to use a constant other than 12 without modifying your printing method.

The Meanings of Names

Identifiers give names to a range of things within our programs—types, variables, fields, methods, and so forth. When you use a particular name in your program, the compiler has to determine what that name refers to, so that it can decide if you are using the name correctly and so that it can generate the appropriate code. The rules for determining the meaning of a name trade off convenience with complexity. At one extreme the language could require that every name in a program be unique—this makes things simple for the compiler but makes life very inconvenient for the programmer. If names are interpreted based on the context in which they are used, the programmer gets the convenience of reusing names (such as always using the name i for a for loop counter), but the compiler has to be able to determine what each name means—and so does any human being reading the code.

Name management is achieved with two mechanisms. First, the namespace is partitioned to give different namespaces for different kinds of names. Second, scoping is used to control the visibility of names declared in one part of a program to other parts. Different namespaces allow you to give the same name to a method and a field (not that we recommend doing this), and scoping allows you to use the same name for all your for loop counters.

There are six different namespaces:

  • package names,

  • type names,

  • field names,

  • method names,

  • local variable names (including parameters), and

  • labels

When a name is used in a program, its context helps determine what kind of name it is. For example, in the expression x.f= 3, we know that f must be a field—it can't be a package, type, method, or label because we are assigning a value to it, and it can't be a local variable because we are accessing it as a member of x. We know that x must be a typename, or a field, or a local variable that is an object reference—exactly which one is determined by searching the enclosing scope for an appropriate declaration, as you will see.

The use of separate namespaces gives you greater flexibility when writing code (especially when combining code from different sources) but can be abused. Consider this pathological, but perfectly valid, piece of code:

package Reuse;
class Reuse {
    Reuse Reuse(Reuse Reuse) {
      Reuse:
        for (;;) {
            if (Reuse.Reuse(Reuse) == Reuse)
                break Reuse;
        }
        return Reuse;
    }
}

Every declaration of a name has a scope in which that name can be used. The exact rules differ depending on the kind of name—type name, member name, local variable, and so on. For example, the scope of a parameter in a method is the entire body of that method; the scope of a local variable is the block in which the local variable is declared; the scope of a loop variable declared in the initialization section of a for loop is the rest of that for loop.

A name cannot be used outside its scope—for example, one method in a class cannot refer to the parameter of another method. However, scopes also nest and an inner scope has access to all names declared in the outer scope before the inner scope is entered. For example, the body of a for loop can access the local variables of the method in which it was declared.

When a name that could be a variable is used, the meaning of the name is determined by searching the current and enclosing scopes for declarations of that name in the different namespaces. The search order is:

  1. Local variables declared in the code block, for loop, or as parameters to the catch clause of a try statement. Then local variables declared in any enclosing code block. This applies recursively up to the method containing the block, or until there is no enclosing block (as in the case of an initialization block).

  2. If the code is in a method or constructor, the parameters to the method or constructor.

  3. A field of the class or interface, including any accessible inherited fields.

  4. If the type is a nested type, a variable in the enclosing block or field of the enclosing class. If the type is a static nested type, only static fields of an enclosing type are searched. This search rule is applied successively to any enclosing blocks and types further out.

  5. A static field of a class, or interface, specifically declared in a static import statement.

  6. A static field of a class, or interface, declared in a static import on demand statement.

For method names a similar process as for fields is followed, but starting at step 3, searching for methods in the current class or interface. There are special rules for determining how members of a class are accessed, as you'll see in “Member Access” on page 223.

The order of searching determines which declaration will be found. This implies that names declared in outer scopes can be hidden by names declared in inner scopes. And that means, for example, that local variable names can hide class member names, that nested class members can hide enclosing instance members, and that locally declared class members can hide inherited class members—as you have already seen.[9]

Hiding is generally bad style because a human reading the code must check all levels of the hierarchy to determine which variable is being used. Yet hiding is permitted in order to make local code robust. If hiding outer variables were not allowed, adding a new field to a class or interface could break existing code in subtypes that used variables of the same name. Scoping is meant as protection for the system as a whole rather than as support for reusing identifier names.

To avoid confusion, hiding is not permitted in nested scopes within a code block. This means that a local variable in a method cannot have the same name as a parameter of that method; that a for loop variable cannot have the same name as a local variable or parameter; and that once there is a local variable called, say, über, you cannot create a new, different variable with the name über in a nested block.

{
    int über = 0;
    {
        int über = 2; // INVALID: already defined
        // ...
    }
}

However, you can have different (non-nested) for loops in the same block, or different (non-nested) blocks in the same method, that do declare variables with the same name.

If a name appears in a place where a type name is expected, then the different type scopes must be searched for that name. Type scopes are defined by packages. The search order is as follows:

  1. The current type including inherited types.

  2. A nested type of the current type.

  3. Explicitly named imported types.

  4. Other types declared in the same package.

  5. Implicitly named imported types.

Again, hiding of type names is possible, but a type can always be explicitly referred to by its fully qualified name, which includes package information, such as java.lang.String. Packages and type imports are discussed in Chapter 18.

 

In order to make an apple pie from scratch, you must first create the universe.

 
 --Carl Sagan, Cosmos


[1] The Java programming language tracks the Unicode standard. See “Further Reading” on page 755 for reference information. The currently supported Unicode version is listed in the documentation of the Character class.

[2] There is a good reason to allow multiple u's. When translating a Unicode file into an ASCII file, you must translate Unicode characters that are outside the ASCII range into an escape sequence. Thus, you would translate Character Set into u0b87. When translating back, you make the reverse substitution. But what if the original Unicode source had not contained Character Set but had used u0b87 instead? Then the reverse translation would not result in the original source (to the parser, it would be equivalent, but possibly not to the reader of the code). The solution is to have the translator add an extra u when it encounters an existing uxxxx, and have the reverse translator remove a u and, if there aren't any left, replace the escape sequence with its equivalent Unicode character.

[3] These characters are Comments, Comments, Comments, Comments, Comments , and Comments, respectively.

[4] These are the word “cat” or “kitty” in English, Serbo-Croatian, Russian, Persian, Tamil, and Japanese, respectively.

[5] One is a Cyrillic letter, the other is ASCII. Guess which is which and win a prize.

[6] Type variables are not storage locations and are excluded from this discussion. They apply only to generic type declarations and are discussed in Chapter 11.

[7] In technical terms there is a concept of a variable being “definitely assigned.” The compiler won't allow the use of a local variable unless it can determine that it has been definitely assigned a value.

[8] The range check can often be optimized away when, for example, it can be proved that a loop index variable is always within range, but you are guaranteed that an index will never be used if it is out of range.

[9] Technically, the term hiding is reserved for this last case—when an inherited member is hidden by a locally declared member—and the other situations are referred to as shadowing. This distinction is not significant for this book so we simply refer to “hiding.”

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.93.0