How Accurate Are Calculations?

The accuracy when evaluating a result is referred to as the precision of an expression. The precision may be expressed either as number of bits (64 bits), or as the data type of the result (double precision, meaning 64-bit floating-point format).

The precision of an expression

In Java, the precision of evaluating an operator depends on the types of its operands. Java looks at the types of the operands around an operator and picks the biggest of what it sees: double, float, and long, in that order of preference. Both operands are then promoted to this type, and that is the type of the result. If there are no doubles, floats, or longs in the expression, both operands are promoted to int, and that is the type of the result. This continues from left to right through the entire expression.

A Java compiler follows this algorithm to compile each operation:

  • If either operand is a double, do the operation in double precision.

  • Otherwise, if either operand is a float, do the operation in single precision.

  • Otherwise, if either operand is a long, do the operation at long precision.

  • Otherwise, do the operation at 32-bit int precision.

In summary, Java expressions end up with the type of the biggest, floatiest type (double, float, long) in the expression. They are otherwise 32-bit integers.

Limited significant figures in floating-point numbers

The precision tells you how many bits you get in your answer. But if your calculation involves floating point, you also have to be wary about the limited accuracy of the answer. Accuracy is not just the range of values of a type, but also (for real types) the number of significant figures that can be stored. The type float can store some numbers exactly, but in general you can only count on about six to seven digits of significant figures. When a long (which can hold at least 18 places of integer values) is implicitly or explicitly converted to a float, some precision may be lost there too. Here's an example showing loss of precision when floating a long and casting back.

public class inexact2 {
     public static void main(String s[]) {
          long  orig = 9000000000000000000L;
          float floatMe = orig;      // assign the long into a float
          long now = (long) floatMe; // put the float back into a long

          System.out.println("orig: " + orig + " " +
                                    " now: " + now);
     }
}

The output is as follows:

orig: 9000000000000000000
 now: 9000000202358128640

As you can see, after being assigned to and retrieved back from the float variable, the long has lost all accuracy after six or seven significant digits. The truth is that if a float has the value shown, 9000000202358128640, it could stand for any real value in the interval between the nearest representable floating-point number on each side. This is true for every float and double value. That's what floating point means.

The limitations of floating-point arithmetic apply to all programming languages. But people notice them a lot more in Java because Java doesn't round floating point numbers to six decimal places when it prints them. The C and C++ languages do round by default, hiding the floating point limitations from the unwary.

JDK 1.5 introduced a notation to pass methods a varying number of arguments. This was specifically intended to bring the much-loved (by some) C function printf() into Java. Printf outputs formatted numbers, and can be used to round numbers on printing. Chapter 17 has some examples of formatting numbers using printf.

Inaccuracies of floating-point numbers

Because floating-point numbers are (in general) approximations of the precise number, you must be wary about the accuracy of calculations. If this comes as a surprise to you, try this test program immediately, and thank your good fortune at having the chance to learn about it before you stumble over it as a difficult debugging problem.

public class inexact {
     public static void main(String s[]) {
          float total = 0.0F;
          for (int i=0; i<10; i++)
                total = total + 0.1F;

          if (total!=1.0F) System.out.println("total is "+total);
     }
}

You will see this results in the following output:

total is 1.0000001

0.1 cannot be represented exactly by summing several powers of two, which is the internal representation of floating point numbers. Summing ten “approximation to 0.1” does not exactly amount to one. This is a limitation of floating-point numbers in all programming languages, and not unique to Java. You can even reproduce it on many pocket calculators that uses floating point, unless the manufacturer has taken steps to round results and hide it.

This is why you should never use a floating-point variable as a loop counter. A longer explanation of this thorny topic is in “What Every Computer Scientist Should Know about Floating Point” by David Goldberg, in the March 1991 issue of Computing Surveys (volume 23, number 1). You can find that paper with a web search at the site docs.sun.com.

Floating-point extension

A new keyword was added to JDK 1.2: strictfp. The keyword is applied in front of a method or class when you need strictly identical floating point arithmetic on all your different platforms, and this is more important than faster, slightly more accurate results on some platforms. There are more details at java.sun.com/docs/books/jls/strictfp-changes.pdf.

Here's the wording for telling a method not to use extended precision:

strictfp void doCalc (float x, float y) {
     // some calculations not to be done in extended precision
}

By default, the JVM will feel free to use extended accuracy if the platform supports it, allowing slightly different and faster arithmetic results on PowerPC and x86 systems. The default (omitting the keyword) is the right choice unless you have a special reason for needing less accurate but more uniform answers.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.234.150