Chapter 2. Lexical Elements

Java source code consists of words or symbols called lexical elements, or tokens. Java lexical elements include line terminators, whitespace, comments, keywords, identifiers, separators, operators, and literals. The words or symbols in the Java programming language are comprised of the Unicode character set.

Unicode and ASCII

Maintained by the Unicode Consortium standards organization, Unicode is the universal character set with the first 128 characters the same as those in the American Standard Code for Information Interchange (ASCII) character set. Unicode provides a unique number for each character, usable across all platforms, programs, and languages. Java SE 9 supports Unicode 8.0.0. You can find more information about the Unicode Standard in the online manual. Java SE 8 supports Unicode 6.2.0.

Tip

Java comments, identifiers, and string literals are not limited to ASCII characters. All other Java input elements are formed from ASCII characters.

The Unicode set version used by a specified version of the Java platform is documented in the Character class of the Java API. The Unicode Character Code Chart for scripts, symbols, and punctuation can be accessed at http://unicode.org/charts/.

Printable ASCII Characters

ASCII reserves code 32 (spaces) and codes 33–126 (letters, digits, punctuation marks, and a few others) for printable characters. Table 2-1 contains the decimal values followed by the corresponding ASCII characters for these codes.

Table 2-1. Printable ASCII characters

32 SP

48 0

64 @

80 P

96 '

112 p

33 !

49 1

65 A

81 Q

97 a

113 q

34 "

50 2

66 B

82 R

98 b

114 r

35 #

51 3

67 C

83 S

99 C

115 S

36 $

52 4

68 D

84 T

100 d

116 t

37 %

53 5

69 E

85 U

101 e

117 u

38 &

54 6

70 F

86 V

102 f

118 v

39 '

55 7

71 G

87 W

103 g

119 w

40 (

56 8

72 H

88 X

104 h

120 x

41 )

57 9

73 I

89 Y

105 i

121 y

42 *

58 :

74 J

90 Z

106 j

122 z

43 +

59 ;

75 K

91 [

107 k

123 {

44 ,

60 <

76 L

92

108 l

124 |

45 -

61 =

77 M

93 ]

109 m

125 }

46 .

62 >

78 N

94 ^

110 n

126 ~

47 /

63 ?

79 O

95 _

111 o

Nonprintable ASCII Characters

ASCII reserves decimal numbers 0–31 and 127 for control characters. Table 2-2 contains the decimal values followed by the corresponding ASCII characters for these codes.

Table 2-2. Nonprintable ASCII characters

00 NUL

07 BEL

14 SO

21 NAK

28 FS

01 SOH

08 BS

15 SI

22 SYN

29 GS

02 STX

09 HT

16 DLE

23 ETB

30 RS

03 ETX

10 NL

17 DC1

24 CAN

31 US

04 EOT

11 VT

18 DC2

25 EM

127 DEL

05 ENQ

12 NP

19 DC3

26 SUB

06 ACK

13 CR

20 DC4

27 ESC

Tip

ASCII 10 is a newline or linefeed. ASCII 13 is a carriage return.

Compact Strings

The compact strings feature is an optimization that allows for a more space-efficient internal representation of strings. It is enabled by default in Java 9. This feature may be disabled by using -XX:-CompactStrings, if you are mainly using UTF-16 strings.

Comments

A single-line comment begins with two forward slashes and ends immediately before the line terminator character:

// Default child's birth year
private Integer childsBirthYear = 1950;

A multiline comment begins with a forward slash immediately followed by an asterisk and ends with an asterisk immediately followed by a forward slash. The single asterisks in between provide a nice formatting convention; they are typically used, but are not required:

  /*
   * The average age of a woman giving birth in the
   * US in 2001 was 24.9 years old. Therefore,
   * we'll use the value of 25 years old as our
   * default.
   */
  private Integer mothersAgeGivingBirth = 25;

A Javadoc comment is processed by the Javadoc tool to generate API documentation in HTML format. A Javadoc comment must begin with a forward slash, immediately followed by two asterisks, and end with an asterisk immediately followed by a forward slash (Oracle’s documentation page provides more information on the Javadoc tool):

/**
 * Genitor birthdate predictor
 *
 * @author Robert J. Liguori
 * @author Gliesian, LLC.
 * @version 0.1.1 09-02-16
 * @since 0.1.0 09-01-16
 */
public class GenitorBirthdatePredictorBean {...}

In Java, comments cannot be nested:

/* This is /* not permissible */ in Java */

Keywords

Table 2-3 contains the Java 9 keywords. Two of these, the const and goto keywords, are reserved but are not used by the Java language.

Tip

Java keywords cannot be used as identifiers in a Java pro⁠gram.

Table 2-3. Java keywords

abstract

enum

module

synchronized

assert

exports

native

this

boolean

extends

new

throw

break

final

package

throws

byte

finally

private

to

case

float

protected

transient

catch

for

provides

try

char

goto

public

uses

class

if

requires

void

const

implements

return

volatile

continue

import

short

while

default

instanceof

static

with

do

int

strictfp

_

double

interface

super

else

long

switch

Tip

Sometimes true, false, and null literals are mistaken for keywords. They are not keywords; they are reserved literals.

Identifiers

A Java identifier is the name that a programmer gives to a class, method, variable, and so on.

Identifiers cannot have the same Unicode character sequence as any keyword, boolean, or null literal.

Java identifiers are made up of Java letters. A Java letter is a character for which Character.isJavaIdentifierStart(int) returns true. Java letters from the ASCII character set are limited to the dollar sign ($), upper- and lowercase letters, and the underscore symbol (_). Note that as of Java 9, (_) is a keyword and may not be used alone as an identifier.

Digits are also allowed in identifiers after the first character:

// Valid identifier examples
class GedcomBean {
  private File uploadedFile;  // uppercase and
                              // lowercase
  private File _file; // leading underscore
  private File $file; // leading $
  private File file1; // non-leading digit
}

See Chapter 1 for naming guidelines.

Separators

Several ASCII characters delimit program parts and are used as separators. (), { }, [ ], and < > are used in pairs:

() { } [ ] < > :: : ; , . ->

Table 2-4 cites nomenclature that can be used to reference the different types of bracket separators. The first names mentioned for each bracket are what is typically seen in the Java Language Specification.

Table 2-4. Java bracket separators
Brackets Nomenclature Usage

( )

Parentheses, curved brackets, oval brackets, and round brackets

Adjusts precedence in arithmetic expressions, encloses cast types, and surrounds set of method arguments

{ }

Braces, curly brackets, fancy brackets, squiggly brackets, and squirrelly brackets

Surrounds blocks of code and supports arrays

[ ]

Box brackets, closed brackets, and square brackets

Supports and initializes arrays

< >

Angle brackets, diamond brackets, and chevrons

Encloses generics

Guillemet characters, a.k.a. angle quotes, are used to specify stereotypes in UML << >>.

Operators

Operators perform operations on one, two, or three operands and return a result. Operator types in Java include assignment, arithmetic, comparison, bitwise, increment/decrement, and class/object. Table 2-5 contains the Java operators listed in precedence order (those with the highest precedence at the top of the table), along with a brief description of the operators and their associativity (left to right or right to left).

Table 2-5. Java operators
Precedence Operator Description Association

1

++,--

Postincrement, postdecrement

R → L

2

++,--

Preincrement, predecrement

R → L

+,-

Unary plus, unary minus

R → L

~

Bitwise complement

R → L

!

Boolean NOT

R → L

3

new

Create object

R → L

(type)

Type cast

R → L

4

*,/,%

Multiplication, division, remainder

L → R

5

+,-

Addition, subtraction

L → R

+

String concatenation

L → R

6

<<, >>, >>>

Left shift, right shift, unsigned right shift

L → R

7

<, <=, >, >=

Less than, less than or equal to, greater than, greater than or equal to

L → R

instanceof

Type comparison

L → R

8

==, !=

Value equality and inequality

L → R

==, !=

Reference equality and inequality

L → R

9

&

Boolean AND

L → R

&

Bitwise AND

L → R

10

^

Boolean exclusive OR (XOR)

L → R

^

Bitwise exclusive OR (XOR)

L → R

11

|

Boolean inclusive OR

L → R

|

Bitwise inclusive OR

L → R

12

&&

Logical AND (a.k.a. conditional AND)

L → R

13

||

Logical OR (a.k.a. conditional OR)

L → R

14

?:

Conditional ternary operator

L → R

15

=, +=, -=, *=, /=, %=, &=, ^=, |=, <<=, >> =, >>>=

Assignment operators

R → L

Literals

Literals are source code representation of values. As of Java SE 7, underscores are allowed in numeric literals to enhance readability of the code. The underscores may only be placed between individual numbers and are ignored at runtime.

For more information on primitive type literals, see “Literals for Primitive Types” in Chapter 3.

Boolean Literals

Boolean literals are expressed as either true or false:

boolean isFullRelation = true;
boolean isHalfRelation = Boolean.valueOf(false); // unboxed
boolean isEndogamyPresent = false;

Character Literals

A character literal is either a single character or an escape sequence contained within single quotes. Line terminators are not allowed:

char charValue1 = 'a';
// An apostrophe
Character charValue2 = Character.valueOf(''');

Integer Literals

Integer types (byte, short, int, and long) can be expressed in decimal, hexadecimal, octal, and binary. By default, integer literals are of type int:

int intValue1 = 34567, intValue2 = 1_000_000;

Decimal integers contain any number of ASCII digits, zero through nine, and represent positive numbers:

Integer integerValue1 = Integer.valueOf(100);

Prefixing the decimal with the unary negation operator can form a negative decimal:

public static final int INT_VALUE = -200;

Hexadecimal literals begin with 0x or 0X, followed by the ASCII digits 0 through 9 and the letters a through f (or A through F). Java is not case-sensitive when it comes to hexadecimal literals.

Hex numbers can represent positive and negative integers and zero:

int intValue3 = 0X64; // 100 decimal from hex

Octal literals begin with a zero followed by one or more ASCII digits zero through seven:

int intValue4 = 0144; // 100 decimal from octal

Binary literals are expressed using the prefix 0b or 0B followed by zeros and ones:

char msgValue1 = 0b01001111; // O
char msgValue2 = 0B01001011; // K
char msgValue3 = 0B0010_0001; // !

To define an integer as type long, suffix it with an ASCII letter L (preferred and more readable) or l:

long longValue = 100L;

Floating-Point Literals

A valid floating-point literal requires a whole number and/or a fractional part, decimal point, and type suffix. An exponent prefaced by an e or E is optional. Fractional parts and decimals are not required when exponents or type suffixes are applied.

A floating-point literal (double) is a double-precision floating point of eight bytes. A float is four bytes. Type suffixes for doubles are d or D; suffixes for floats are f or F:

[whole-number].[fractional_part][e|E exp][f|F|d|D]

float floatValue1 = 9.15f, floatValue2 = 1_168f;
Float floatValue3 = new Float(20F);
double doubleValue1 = 3.12;
Double doubleValue2 = Double.valueOf(1e058);
float expValue1 = 10.0e2f, expValue2=10.0E3f;

String Literals

String literals contain zero or more characters, including escape sequences enclosed in a set of double quotes. String literals cannot contain Unicode u000a and u000d for line terminators; use and instead. Strings are immutable:

String stringValue1 = new String("Valid literal.");
String stringValue2 = "Valid.
On new line.";
String stringValue3 = "Joins str" + "ings";
String stringValue4 = ""Escape Sequences"
";

There is a pool of strings associated with class String. Initially, the pool is empty. Literal strings and string-valued constant expressions are interned in the pool and added to the pool only once.

The following example shows how literals are added to and used in the pool:

// Adds String "thisString" to the pool
String stringValue5 = "thisString";
// Uses String "thisString" from the pool
String stringValue6 = "thisString";

A string can be added to the pool (if it does not already exist in the pool) by calling the intern() method on the string. The intern() method returns a string, which is either a reference to the new string that was added to the pool or a reference to the existing string:

String stringValue7 = new String("thatString");
String stringValue8 = stringValue7.intern();

Null Literals

The null literal is of type null and can be applied to reference types. It does not apply to primitive types:

String n = null;

Escape Sequences

Table 2-6 provides the set of escape sequences in Java.

Table 2-6. Character and string literal escape sequences
Name Sequence Decimal Unicode

Backspace



8

u0008

Horizontal tab

9

u0009

Line feed

10

u000A

Form feed

f

12

u000C

Carriage return

13

u000D

Double quote

34

u0022

Single quote

'

39

u0027

Different line terminators are used for different platforms to achieve a newline (see Table 2-7). The println() method, which includes a line break, is a better solution than hardcoding and when used appropriately.

Table 2-7. Newline variations
Operating system Newline

POSIX-compliant operating systems (e.g., Solaris, Linux) and macOS

LF ( )

Microsoft Windows

CR+LF ( )

macOS up to version 9

CR ( )

Unicode Currency Symbols

Unicode currency symbols are present in the range of u20A0u20CF (8352–8399). See Table 2-8 for examples.

Table 2-8. Currency symbols within range
Name Symbol Decimal Unicode

Franc sign

8355

u20A3

Lira sign

8356

u20A4

Mill sign

8357

u20A5

Rupee sign

8360

u20A8

Dong sign

8363

u20AB

Euro sign

8364

u20AC

Drachma sign

8367

u20AF

German penny sign

8368

u20B0

A number of currency symbols exist outside of the designated currency range. See Table 2-9 for examples.

Table 2-9. Currency symbols outside of range
Name Symbol Decimal Unicode

Dollar sign

$

36

u0024

Cent sign

¢

162

u00A2

Pound sign

£

163

u00A3

Currency sign

¤

164

u00A4

Yen sign

¥

165

u00A5

Latin small f with hook

ƒ

402

u0192

Bengali rupee mark

2546

u09F2

Bengali rupee sign

2547

u09F3

Gujarati rupee sign

2801

u0AF1

Tamil rupee sign

3065

u0BF9

Thai symbol baht

฿

3647

u0E3F

Script captial

8499

u2133

CJK unified ideograph 1

20803

u5143

CJK unified ideograph 2

20870

u5186

CJK unified ideograph 3

22278

u5706

CJK unified ideograph 4

22291

u5713

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.130.201