Java source code consists of words or symbols called lexical elements, or tokens. Java lexical elements include line terminators, whitespace, comments, keywords, identifiers, separators, operators, and literals. The words or symbols in the Java programming language are comprised of the Unicode character set.
Maintained by the Unicode Consortium standards organization, Unicode is the universal character set with the first 128 characters the same as those in the American Standard Code for Information Interchange (ASCII) character set. Unicode provides a unique number for each character, usable across all platforms, programs, and languages. Java SE 9 supports Unicode 8.0.0. You can find more information about the Unicode Standard in the online manual. Java SE 8 supports Unicode 6.2.0.
Java comments, identifiers, and string literals are not limited to ASCII characters. All other Java input elements are formed from ASCII characters.
The Unicode set version used by a specified version of the Java platform is documented in the Character
class of the Java API. The Unicode Character Code Chart for scripts, symbols, and punctuation can be accessed at http://unicode.org/charts/.
ASCII reserves code 32 (spaces) and codes 33–126 (letters, digits, punctuation marks, and a few others) for printable characters. Table 2-1 contains the decimal values followed by the corresponding ASCII characters for these codes.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
ASCII reserves decimal numbers 0–31 and 127 for control characters. Table 2-2 contains the decimal values followed by the corresponding ASCII characters for these codes.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
A single-line comment begins with two forward slashes and ends immediately before the line terminator character:
// Default child's birth year
private
Integer
childsBirthYear
=
1950
;
A multiline comment begins with a forward slash immediately followed by an asterisk and ends with an asterisk immediately followed by a forward slash. The single asterisks in between provide a nice formatting convention; they are typically used, but are not required:
/*
* The average age of a woman giving birth in the
* US in 2001 was 24.9 years old. Therefore,
* we'll use the value of 25 years old as our
* default.
*/
private
Integer
mothersAgeGivingBirth
=
25
;
A Javadoc comment is processed by the Javadoc tool to generate API documentation in HTML format. A Javadoc comment must begin with a forward slash, immediately followed by two asterisks, and end with an asterisk immediately followed by a forward slash (Oracle’s documentation page provides more information on the Javadoc tool):
/**
* Genitor birthdate predictor
*
* @author Robert J. Liguori
* @author Gliesian, LLC.
* @version 0.1.1 09-02-16
* @since 0.1.0 09-01-16
*/
public
class
GenitorBirthdatePredictorBean
{...}
In Java, comments cannot be nested:
/* This is /* not permissible */
in
Java
*/
Table 2-3 contains the Java 9 keywords. Two of these, the const
and goto
keywords, are reserved but are not used by the Java language.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Sometimes true, false
, and null
literals are mistaken for keywords. They are not keywords; they are reserved literals.
A Java identifier is the name that a programmer gives to a class, method, variable, and so on.
Identifiers cannot have the same Unicode character sequence as any keyword, boolean, or null literal.
Java identifiers are made up of Java letters. A Java letter is a character for which Character.isJavaIdentifierStart(int)
returns true
. Java letters from the ASCII character set are limited to the dollar sign ($
), upper- and lowercase letters, and the underscore symbol (_
). Note that as of Java 9, (_
) is a keyword and may not be used alone as an identifier.
Digits are also allowed in identifiers after the first character:
// Valid identifier examples
class
GedcomBean
{
private
File
uploadedFile
;
// uppercase and
// lowercase
private
File
_file
;
// leading underscore
private
File
$file
;
// leading $
private
File
file1
;
// non-leading digit
}
See Chapter 1 for naming guidelines.
Several ASCII characters delimit program parts and are used as separators. (), { }
, [ ]
, and < >
are used in pairs:
()
{
}
[
]
<
>
::
:
;
,
.
->
Table 2-4 cites nomenclature that can be used to reference the different types of bracket separators. The first names mentioned for each bracket are what is typically seen in the Java Language Specification.
Brackets | Nomenclature | Usage |
---|---|---|
|
Parentheses, curved brackets, oval brackets, and round brackets |
Adjusts precedence in arithmetic expressions, encloses cast types, and surrounds set of method arguments |
|
Braces, curly brackets, fancy brackets, squiggly brackets, and squirrelly brackets |
Surrounds blocks of code and supports arrays |
|
Box brackets, closed brackets, and square brackets |
Supports and initializes arrays |
|
Angle brackets, diamond brackets, and chevrons |
Encloses generics |
Guillemet characters, a.k.a. angle quotes, are used to specify stereotypes in UML << >>
.
Operators perform operations on one, two, or three operands and return a result. Operator types in Java include assignment, arithmetic, comparison, bitwise, increment/decrement, and class/object. Table 2-5 contains the Java operators listed in precedence order (those with the highest precedence at the top of the table), along with a brief description of the operators and their associativity (left to right or right to left).
Precedence | Operator | Description | Association |
---|---|---|---|
1 |
++,-- |
Postincrement, postdecrement |
R → L |
2 |
++,-- |
Preincrement, predecrement |
R → L |
+,- |
Unary plus, unary minus |
R → L |
|
~ |
Bitwise complement |
R → L |
|
! |
Boolean NOT |
R → L |
|
3 |
new |
Create object |
R → L |
(type) |
Type cast |
R → L |
|
4 |
*,/,% |
Multiplication, division, remainder |
L → R |
5 |
+,- |
Addition, subtraction |
L → R |
+ |
String concatenation |
L → R |
|
6 |
<<, >>, >>> |
Left shift, right shift, unsigned right shift |
L → R |
7 |
<, <=, >, >= |
Less than, less than or equal to, greater than, greater than or equal to |
L → R |
instanceof |
Type comparison |
L → R |
|
8 |
==, != |
Value equality and inequality |
L → R |
==, != |
Reference equality and inequality |
L → R |
|
9 |
& |
Boolean AND |
L → R |
& |
Bitwise AND |
L → R |
|
10 |
^ |
Boolean exclusive OR (XOR) |
L → R |
^ |
Bitwise exclusive OR (XOR) |
L → R |
|
11 |
| |
Boolean inclusive OR |
L → R |
| |
Bitwise inclusive OR |
L → R |
|
12 |
&& |
Logical AND (a.k.a. conditional AND) |
L → R |
13 |
|| |
Logical OR (a.k.a. conditional OR) |
L → R |
14 |
?: |
Conditional ternary operator |
L → R |
15 |
=, +=, -=, *=, /=, %=, &=, ^=, |=, <<=, >> =, >>>= |
Assignment operators |
R → L |
Literals are source code representation of values. As of Java SE 7, underscores are allowed in numeric literals to enhance readability of the code. The underscores may only be placed between individual numbers and are ignored at runtime.
For more information on primitive type literals, see “Literals for Primitive Types” in Chapter 3.
Integer types (byte, short,
int
, and long
) can be expressed in decimal, hexadecimal, octal, and binary. By default, integer literals are of type int
:
int
intValue1
=
34567
,
intValue2
=
1_000_000
;
Decimal integers contain any number of ASCII digits, zero through nine, and represent positive numbers:
Integer
integerValue1
=
Integer
.
valueOf
(
100
);
Prefixing the decimal with the unary negation operator can form a negative decimal:
public
static
final
int
INT_VALUE
=
-
200
;
Hexadecimal literals begin with 0x
or 0X
, followed by the ASCII digits 0
through 9
and the letters a
through f
(or A
through F
). Java is not case-sensitive when it comes to hexadecimal literals.
Hex numbers can represent positive and negative integers and zero:
int
intValue3
=
0
X64
;
// 100 decimal from hex
Octal literals begin with a zero followed by one or more ASCII digits zero through seven:
int
intValue4
=
0144
;
// 100 decimal from octal
Binary literals are expressed using the prefix 0b
or 0B
followed by zeros and ones:
char
msgValue1
=
0
b01001111
;
// O
char
msgValue2
=
0
B01001011
;
// K
char
msgValue3
=
0
B0010_0001
;
// !
To define an integer as type long
, suffix it with an ASCII letter L
(preferred and more readable) or l
:
long
longValue
=
100L
;
A valid floating-point literal requires a whole number and/or a fractional part, decimal point, and type suffix. An exponent prefaced by an e
or E
is optional. Fractional parts and decimals are not required when exponents or type suffixes are applied.
A floating-point literal (double
) is a double-precision floating point of eight bytes. A float
is four bytes. Type suffixes for doubles are d
or D
; suffixes for floats are f
or F
:
[
whole
-
number
].[
fractional_part
][
e
|
E
exp
][
f
|
F
|
d
|
D
]
float
floatValue1
=
9.15f
,
floatValue2
=
1_168
f
;
Float
floatValue3
=
new
Float
(
20
F
);
double
doubleValue1
=
3.12
;
Double
doubleValue2
=
Double
.
valueOf
(
1
e058
);
float
expValue1
=
10.0e2f
,
expValue2
=
10.0E3f
;
String literals contain zero or more characters, including escape sequences enclosed in a set of double quotes. String literals cannot contain Unicode u000a
and u000d
for line terminators; use
and
instead. Strings are immutable:
String
stringValue1
=
new
String
(
"Valid literal."
);
String
stringValue2
=
"Valid. On new line."
;
String
stringValue3
=
"Joins str"
+
"ings"
;
String
stringValue4
=
""Escape Sequences" "
;
There is a pool of strings associated with class String
. Initially, the pool is empty. Literal strings and string-valued constant expressions are interned in the pool and added to the pool only once.
The following example shows how literals are added to and used in the pool:
// Adds String "thisString" to the pool
String
stringValue5
=
"thisString"
;
// Uses String "thisString" from the pool
String
stringValue6
=
"thisString"
;
A string can be added to the pool (if it does not already exist in the pool) by calling the intern()
method on the string. The intern()
method returns a string, which is either a reference to the new string that was added to the pool or a reference to the existing string:
String
stringValue7
=
new
String
(
"thatString"
);
String
stringValue8
=
stringValue7
.
intern
();
Table 2-6 provides the set of escape sequences in Java.
Name | Sequence | Decimal | Unicode |
---|---|---|---|
Backspace |
8 |
u0008 |
|
Horizontal tab |
|
9 |
u0009 |
Line feed |
|
10 |
u000A |
Form feed |
f |
12 |
u000C |
Carriage return |
|
13 |
u000D |
Double quote |
” |
34 |
u0022 |
Single quote |
' |
39 |
u0027 |
Different line terminators are used for different platforms to achieve a newline (see Table 2-7). The println()
method, which includes a line break, is a better solution than hardcoding
and
when used appropriately.
Operating system | Newline | |
---|---|---|
POSIX-compliant operating systems (e.g., Solaris, Linux) and macOS |
LF ( ) |
|
Microsoft Windows |
CR+LF ( ) |
|
macOS up to version 9 |
CR ( ) |
Unicode currency symbols are present in the range of u20A0
–u20CF
(8352–
8399
). See Table 2-8 for examples.
Name | Symbol | Decimal | Unicode |
---|---|---|---|
Franc sign |
₣ |
8355 |
u20A3 |
Lira sign |
₤ |
8356 |
u20A4 |
Mill sign |
₥ |
8357 |
u20A5 |
Rupee sign |
₨ |
8360 |
u20A8 |
Dong sign |
₫ |
8363 |
u20AB |
Euro sign |
€ |
8364 |
u20AC |
Drachma sign |
₯ |
8367 |
u20AF |
German penny sign |
₰ |
8368 |
u20B0 |
A number of currency symbols exist outside of the designated currency range. See Table 2-9 for examples.
Name | Symbol | Decimal | Unicode |
---|---|---|---|
Dollar sign |
$ |
36 |
u0024 |
Cent sign |
¢ |
162 |
u00A2 |
Pound sign |
£ |
163 |
u00A3 |
Currency sign |
¤ |
164 |
u00A4 |
Yen sign |
¥ |
165 |
u00A5 |
Latin small f with hook |
ƒ |
402 |
u0192 |
Bengali rupee mark |
৲ |
2546 |
u09F2 |
Bengali rupee sign |
৳ |
2547 |
u09F3 |
Gujarati rupee sign |
૱ |
2801 |
u0AF1 |
Tamil rupee sign |
௹ |
3065 |
u0BF9 |
Thai symbol baht |
฿ |
3647 |
u0E3F |
Script captial |
|
8499 |
u2133 |
CJK unified ideograph 1 |
元 |
20803 |
u5143 |
CJK unified ideograph 2 |
円 |
20870 |
u5186 |
CJK unified ideograph 3 |
圆 |
22278 |
u5706 |
CJK unified ideograph 4 |
圓 |
22291 |
u5713 |
3.16.130.201