Floating-Point Numbers

Floating-Point Numbers

The main consideration in using floating-point numbers is that many fractional decimal numbers can't be represented accurately using the 1s and 0s available on a digital computer. Nonterminating decimals like 1/3 or 1/7 can usually be represented to only 7 or 15 digits of accuracy. In my version of Microsoft Visual Basic, a 32-bit floating-point representation of 1/3 equals 0.33333330. It's accurate to 7 digits. This is accurate enough for most purposes but inaccurate enough to trick you sometimes.

Following are a few specific guidelines for using floating-point numbers:

Avoid additions and subtractions on numbers that have greatly different magnitudes. With a 32-bit floating-point variable, 1,000,000.00 + 0.1 probably produces an answer of 1,000,000.00 because 32 bits don't give you enough significant digits to encompass the range between 1,000,000 and 0.1. Likewise, 5,000,000.02 - 5,000,000.01 is probably 0.0.

Cross-Reference

For algorithms books that describe ways to solve these problems, see "Additional Resources on Data Types" in Data Literacy.

Solutions? If you have to add a sequence of numbers that contains huge differences like this, sort the numbers first, and then add them starting with the smallest values. Likewise, if you need to sum an infinite series, start with the smallest term—essentially, sum the terms backwards. This doesn't eliminate round-off problems, but it minimizes them. Many algorithms books have suggestions for dealing with cases like this.

Avoid equality comparisons. Floating-point numbers that should be equal are not always equal. The main problem is that two different paths to the same number don't always lead to the same number. For example, 0.1 added 10 times rarely equals 1.0. The following example shows two variables, nominal and sum, that should be equal but aren't.

 

1 is equal to 2 for sufficiently large values of 1.

 
 --Anonymous

Example 12-2. Java Example of a Bad Comparison of Floating-Point Numbers

double nominal = 1.0;       <-- 1
double sum = 0.0;

for ( int i = 0; i < 10; i++ ) {
   sum += 0.1;       <-- 2
}

if ( nominal == sum ) {       <-- 3
   System.out.println( "Numbers are the same." );
}
else {
    System.out.println( "Numbers are different." );
}

(1)The variable nominal is a 64-bit real.

(2)sum is 10*0.1. It should be 1.0.

(3)Here's the bad comparison.

As you can probably guess, the output from this program is

Numbers are different.

The line-by-line values of sum in the for loop look like this:

0.1
0.2
0.30000000000000004
0.4
0.5
0.6
0.7
0.7999999999999999
0.8999999999999999
0.9999999999999999

Thus, it's a good idea to find an alternative to using an equality comparison for floating-point numbers. One effective approach is to determine a range of accuracy that is acceptable and then use a boolean function to determine whether the values are close enough. Typically, you'd write an Equals() function that returns true if the values are close enough and false otherwise. In Java, such a function would look like this:

Cross-Reference

This example is proof of the maxim that there's an exception to every rule. Variables in this realistic example have digits in their names. For the rule against using digits in variable names, see Kinds of Names to Avoid.

Example 12-3. Java Example of a Routine to Compare Floating-Point Numbers

final double ACCEPTABLE_DELTA = 0.00001;
boolean Equals( double Term1, double Term2 ) {
   if ( Math.abs( Term1 - Term2 ) < ACCEPTABLE_DELTA ) {
      return true;
   }
   else {
      return false;
   }
}

If the code in the "bad comparison of floating-point numbers" example were converted so that this routine could be used for comparisons, the new comparison would look like this:

if ( Equals( Nominal, Sum ) ) ...

The output from the program when it uses this test is

Numbers are the same.

Depending on the demands of your application, it might be inappropriate to use a hard-coded value for ACCEPTABLE_DELTA. You might need to compute ACCEPTABLE_DELTA based on the size of the two numbers being compared.

Anticipate rounding errors Rounding-error problems are no different from the problem of numbers with greatly different magnitudes. The same issue is involved, and many of the same techniques help to solve rounding problems. In addition, here are common specific solutions to rounding problems:

  • Change to a variable type that has greater precision. If you're using single-precision floating point, change to double-precision floating point, and so on.

  • Change to binary coded decimal (BCD) variables. The BCD scheme is typically slower and takes up more storage space, but it prevents many rounding errors. This is particularly valuable if the variables you're using represent dollars and cents or other quantities that must balance precisely.

    Cross-Reference

    Usually the performance impact of converting to BCD will be minimal. If you're concerned about the performance impact, see Summary of the Approach to Code Tuning.

  • Change from floating-point to integer variables. This is a roll-your-own approach to BCD variables. You will probably have to use 64-bit integers to get the precision you want. This technique requires you to keep track of the fractional part of your numbers yourself. Suppose you were originally keeping track of dollars using floating point with cents expressed as fractional parts of dollars. This is a normal way to handle dollars and cents. When you switch to integers, you have to keep track of cents using integers and of dollars using multiples of 100 cents. In other words, you multiply dollars by 100 and keep the cents in the 0-to-99 range of the variable. This might seem absurd at first glance, but it's an effective solution in terms of both speed and accuracy. You can make these manipulations easier by creating a DollarsAndCents class that hides the integer representation and supports the necessary numeric operations.

Check language and library support for specific data types Some languages, including Visual Basic, have data types such as Currency that specifically support data that is sensitive to rounding errors. If your language has a built-in data type that provides such functionality, use it!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.128.173.53