The
Java bytecode specification allows a switch
statement to be compiled into one of two different bytecodes. One
compiled switch
type works as follows:
Given a particular value passed to the
switch
block to be compared, the passed value is successively compared against the value associated with each case statement in order. If, after testing all cases, no statements match, then the default label is matched. When acase
statement that matches is found, the body of that statement and all subsequentcase
bodies are executed (until one body exits theswitch
statement, or the last one is reached).
The operation of this switch
statement is
equivalent to holding an ordered collection of values that are
compared to the passed value, one after the other in order, until a
match is determined. This means that the time taken for the
switch
to find the case that matches depends on
how many case
statements there are and where in
the list the matched case is. If no cases match, and the
default
must be used, that always takes the
longest matching time.
The other switch
bytecode works for
switch
statements where the case values all lie in
a particular range (or can be made to lie in a particular range). It
works as follows:
Given a particular value passed to the
switch
block to be compared, the passed value is tested to see if it lies in the range. If it does not, thedefault
label is matched; otherwise, the offset of the case is calculated and the corresponding case is matched directly. The body of that matched label and all subsequentcase
bodies are executed (until one body exits theswitch
statement, or the last one is reached).
For this latter switch
bytecode, the time taken
for the switch
statement to match the
case
is constant. The time is not dependent on the
number of case
s in the switch
,
and if no case
s match, the time to carry out the
matching and go to the default
is still the same.
This switch
statement operates as an ordered
collection with switch
value first being checked
to see if it is a valid index into the ordered collection, and then
that value is used as the index to arrive immediately at the matched
location.
Clearly, the second type of switch
statement is
faster than the first. Sometimes compliers can add dummy
case
s to a switch
statement,
converting the first type of switch
into the
second (faster) kind. (A compiler is not obliged to use the second
type of switch
bytecode at all, but generally it
does if it can easily be used.) You can determine which
switch
a particular statement has been compiled
into using javap
, the disassembler available with
the JDK. Using the -c
option so that the code is
disassembled, examine the method that contains the
switch
statement. It contains either a
“tableswitch” bytecode identifier, or a
“lookupswitch” bytecode identifier. The
tableswitch
keyword is the identifier for the
faster (second) type of switch
.
If you identify a bottleneck that involves a
switch
statement, do not leave the decision to the
compiler. You are better off constructing switch
statements that use contiguous ranges of case
values, ideally by inserting dummy case
statements
to specify all the values in the range, or possibly by breaking up
the switch
into multiple
switch
es that each use contiguous ranges. You may
need to apply both of these optimizations as in the next example.
Our tuning.loop.SwitchTest
class provides a
repeated test on three methods with switch
statements, and one other array-access method for comparison. The
first method, switch1( )
, contains some
noncontiguous values for the case
s, with each
returning a particular integer value. The second method,
switch2( )
, converts the single
switch
statement in switch1( )
into four switch
statements, with some of those
four switch
statements containing extra dummy
case
s to make each switch
statement contain a contiguous set of case
s. This
second method, switch2( )
, is functionally
identical to switch1( )
.
The third method, switch3( )
, replaces the
case
s with a contiguous set of
case
s, integers 1 to 13. This method is not
directly comparable to the first two methods; it is present as a
control test. The fourth method, switch4( )
, is
functionally identical to switch3( )
, but uses an
array access instead of the switch
statement,
essentially doing in Java code what the compiler implicitly does in
bytecodes for switch3( )
. I run two sets of tests.
The first set passes in a different integer for each call to the
switch
es.
This means that most
of the time, the default
label is matched. The
second set of tests always passes in the integer 8 to the
switch
es. The results are shown in Table 7-2 for various VMs. “Varying” and
“constant” refer to the value passed to the
switch
statement. Tests labeled varying passed
different integers for each iteration of the test loop; tests labeled
constant passed the integer 8 for each iteration of the loop.
Table 7-2. Speedup Using Exception-Driven Loop Termination
1.2 |
1.3 |
HotSpot 1.0 |
HotSpot 2nd Run[a] |
1.1.6 | ||
---|---|---|---|---|---|---|
1 |
|
100% |
55% |
208% |
29% |
109% |
2 |
|
12% |
53% |
218% |
27% |
12% |
3 |
|
23% |
79% |
212% |
36% |
23% |
4 |
|
9% |
36% |
231% |
15% |
9% |
5 |
|
41% |
33% |
195% |
30% |
45% |
6 |
|
17% |
42% |
207% |
30% |
15% |
7 |
|
20% |
48% |
186% |
24% |
20% |
8 |
|
6% |
42% |
200% |
12% |
6% |
[a] HotSpot is tuned for long-lived server applications, and so applies its optimizations after the first run of the test indicates where the bottlenecks are. |
There is a big difference in optimizations gained depending on
whether the VM has a
JIT or uses HotSpot technology. The times are
all relative to the JDK 1.2 "switch1
varying” case. From the variation in timings, it is not clear
whether the HotSpot technology fails to compile the handcrafted
switch
in an optimal way, or whether it does
optimally compile all the switch
statements but
adds overheads that cancel some of the optimizations.
For the JIT results, the first and second lines of output show the
speedup you can get by recrafting the switch
statements. Here, both switch1( )
and
switch2( )
are using the
default
for most of the tests. In this situation,
switch1( )
requires 13 failed comparisons before
executing the default
statement. switch2()
, on the other hand, checks the value against the range of
each of its four switch
statements, then
immediately executes the default
statement.
The first and third lines of output show the worst-case comparison
between the two types of switch
statements. In
this test, switch1( )
almost always fails all its
comparison tests. On the other hand, switch3( )
,
with the contiguous range, is much faster than switch1()
( JIT cases only). This is exactly what is expected, as
the average case for switch1( )
here consists of
13 failed comparisons followed by a return
statement. The average case for switch3( )
in this
test is only a pair of checks followed by a return
statement. The two checks are that the integer is smaller than or
equal to 13, and larger than or equal to 1. Both checks fail in most
of the calls for this “varying” case.
Even when the case statement in switch1( )
is
always matched, the fifth and sixth lines show that switch2()
can be faster (though again, not with HotSpot). In this
test, the matched statement is about halfway down the list of
case
s in switch1( )
, so the
seven or so failed comparisons for switch1( )
compared to two range checks for switch2( )
translate into switch2( )
being more than twice as
fast as switch1( )
.
In each set of tests, switch2( )
, which is
functionally identical to switch1( )
, is faster.
The output for switch4( )
is included for
comparison, and it turns out to be faster than the functionally
identical switch3( )
, thus indicating that it is
worth considering dispensing with the switch
tests
completely when you can convert the switch
to an
array access. In this example, the switch
merely
returns an integer, so the conversion to an array access is feasible;
in general, it may be difficult to convert a set of body statements
into an array access and subsequent processing:
package tuning.loop; public class SwitchTest { //Use a default size for the loop of 1 million iterations static int SIZE = 10000000; public static void main(String args[]) { //Allow an argument to set the size of the loop. if (args.length != 0) SIZE = Integer.parseInt(args[0]); int result = 0; //run tests looking mostly for the default (switch //test uses many different values passed to it) long time = System.currentTimeMillis( ); for (int i = SIZE; i >=0 ; i--) result += switch1(i); System.out.println("Switch1 took " + (System.currentTimeMillis( )-time) + " millis to get " + result); //and the same code to test timings on switch2( ), //switch3( ) and switch4( ) ... //run tests using one particular passed value (8) result = 0; time = System.currentTimeMillis( ); for (int i = SIZE; i >=0 ; i--) result += switch1(8); System.out.println("Switch1 took " + (System.currentTimeMillis( )-time) + " millis to get " + result); //and the same code to test timings on switch2( ), //switch3( ) and switch4( ) ... } public static int switch1(int i) { //This is one big switch statement with 13 case statements //in no particular order. switch(i) { case 318: return 99; case 320: return 55; case 323: return -1; case 14: return 6; case 5: return 8; case 123456: return 12; case 7: return 15; case 8: return 29; case 9: return 11111; case 123457: return 12345; case 112233: return 6666; case 112235: return 9876; case 112237: return 12; default: return -1; } } public static int switch2(int i) { //In this method we break up the 13 case statements from //switch1( ) into four almost contiguous ranges. Then we //add in a few dummy cases so that the four ranges are //definitely contiguous. This should ensure that the compiler //will generate the more optimal tableswitch bytcodes switch(i) { case 318: return 99; case 319: break; //dummy case 320: return 55; case 321: break; //dummy case 322: break; //dummy case 323: return -1; } switch(i) { case 5: return 8; case 6: break; //dummy case 7: return 15; case 8: return 29; case 9: return 11111; case 10: break; //dummy case 11: break; //dummy case 12: break; //dummy case 13: break; //dummy case 14: return 6; } switch(i) { case 112233: return 6666; case 112234: break; //dummy case 112235: return 9876; case 112236: break; //dummy case 112237: return 12; } switch(i) { case 123456: return 12; case 123457: return 12345; default: return -1; } } public static int switch3(int i) { switch(i) { //13 contiguous case statements as a kind of fastest control case 1: return 99; case 2: return 55; case 3: return -1; case 4: return 6; case 5: return 8; case 6: return 12; case 7: return 15; case 8: return 29; case 9: return 11111; case 10: return 12345; case 11: return 6666; case 12: return 9876; case 13: return 12; default: return -1; } } final static int[] RETURNS = { 99, 55, -1, 6, 8, 12, 15, 29, 11111, 12345, 6666, 9876, 12 }; public static int switch4(int i) { //equivalent to switch3( ), but using an array lookup //instead of a switch statement. if (i < 1 || i > 13) return -1; else return RETURNS[i-1]; } }
3.134.90.44