Numerical Accuracy in SAS Software (2/5)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

The following bullet points describe the table above in more detail:

• Base 16 – uses digits 0-9 and letters A-F (to represent the values 10-15).

For example, to convert the decimal value 3000 to hexadecimal, you use the base 16

number system:

Base 16

...

268,435,456 ... 65,536 4096 256 16 1

So, the value 3000 is represented in hexadecimal as BB8

• Base 2 – uses digits 0 and 1.

For example, to convert the decimal value 184 to binary, you use the base 2 number

system:

Base 2

...

128 ... 16 8 4 2 1

So, the value 184 is represented in binary as 10111000.

• exponent bits – the number of bits reserved for storing the exponent, which

determines the magnitude of the number that you can store. The number of exponent

bits varies between operating systems. IEEE systems yield numbers of greater

magnitude because they use more bits for the exponent.

• mantissa bits – the number of bits reserved for storing the mantissa, which

determines the precision of the number. Because there are more bits reserved for the

mantissa on mainframes, you can expect greater precision on a mainframe compared

to a PC.

• round or truncate – the chosen conversion method used for handling two or more

digits. Because there is room for only two hexadecimal digits in the mantissa, a

convention must be adopted on how to handle more than two digits. One convention

is to truncate the value at the length that can be stored. This convention is used by

IBM Mainframe systems.

An alternative is to round the value based on the digits that cannot be stored, which

is done on IEEE systems. There is no right or wrong way to handle this dilemma

since neither convention results in an exact representation of the value.

Numerical Accuracy in SAS Software 65

In SAS, the LENGTH statement works by truncating the number of mantissa bits.

For more information about the effects of truncated lengths, see “Using the TRUNC

Function When Comparing Values” on page 79.

• bias – an offset used to enable both negative and positive exponents with the bias

representing 0. If a bias is not used, an additional sign bit for the exponent must be

allocated. For example, if a system uses a bias of 64, a characteristic with the value

66 represents an exponent of +2, whereas a characteristic of 61 represents an

exponent of –3.

Floating-Point Representation Using the IEEE Standard

The IEEE standard for floating-point arithmetic is a technical standard for floating-point

computation created by the Institute of Electrical and Electronic Engineers (IEEE). The

standard defines how computers store numbers in floating-point representation. The

IEEE standard for floating-point numbers is used by many operating systems, including

Windows and UNIX.

Although the IEEE platforms use the same set of specifications, you might occasionally

see varying results between the platforms due to compiler differences, and math library

differences. Also, because the IEEE standard allows for some variations in how the

standard is implemented, there might be differences in how different platforms perform

calculations even though they are following the same standard. Hosts might yield

different results because the underlying instructions that each operating system uses to

perform calculations are slightly different.

There is no standard method for performing computations. All operating systems attempt

to compute numbers as accurately as possible. It is not uncommon to get slightly

different results between operating systems whose floating-point representation

components differ. For example, there are differences between the z/OS and Windows

operating systems and between the z/OS and UNIX operating systems.

The IEEE standard for double-precision, floating-point numbers specifies an 11-bit

exponent with a base of 2 and a bias of 1023, which means that it has much greater

magnitude than the IBM mainframe representation, but sometimes at the expense of 3

bits less in the mantissa. The value of 1 represented by the IEEE standard is as follows:

3F F0 00 00 00 00 00 00

On Windows platforms, the processor performs computations in extended real precision.

This means that instead of the 64 bits that are used to store numeric values in the basic

format (52 bits for the mantissa and 11 bits for the exponent), there are 16 additional

bits: 12 additional bits for the mantissa and 4 additional bits for the exponent. Numeric

values are not stored in 80 bits (10 bytes) since the maximum width for a numeric

variable in SAS is 8 bytes. This simply means that the processor uses 80 bits to represent

a numeric value before it is passed back to its 64–bit memory slot. Intermediate

calculations might be done in 80 bits, which affects a part of the final answer.

On Windows this allows storage of numbers larger than the basic IEEE floating-point

format used by operating systems such as UNIX. This is one reason why you might see

slightly different values from operating systems that use the same IEEE standard.

Extended precision formats provide greater precision and more exponent range than the

basic floating-point formats.

66 Chapter 4 • SAS Variables

Floating-Point Representation on Windows

Storage Format

The byte layout for a 64-bit, double-precision number on Windows is as follows:

S E E E E E E E

Byte 1

E E E E M M M M

Byte 2

M M M M M M M M

Byte 3

M M M M M M M M

Byte 4

M M M M M M M M

Byte 5

M M M M M M M M

Byte 6

M M M M M M M M

Byte 7

M M M M M M M M

Byte 8

This representation corresponds to bytes of data with each character being 1 bit, as

follows:

• The S in byte 1 is the sign bit of the number. A value of 0 in the sign bit is used to

represent positive numbers.

• The remaining M characters in bytes 2 through 8 represent the bits of the mantissa.

There is an implied radix point before the left-most bit of the mantissa. Therefore,

the mantissa is always less than 1. The term radix point is used instead of decimal

point because decimal point implies that you are working with decimal (base 10)

numbers, which might not be the case. The radix point can be thought of as the

generic form of decimal point.

The exponent has a base associated with it. Do not confuse this with the base in which

the exponent is represented; the exponent is always represented in binary format, but the

exponent is used to determine how many times the base should be multiplied by the

mantissa.

Conversion Example

This example shows the conversion process for the decimal value 255.75 to floating-

point representation.

1. Use the base 2 number system to write out the value 255.75 in binary.

Note: Each bit in the mantissa represents a fraction whose numerator is 1 and whose

denominator is a power of 2; that is, the mantissa is the sum of a series of

fractions such as 1 half , 1 fourth , 1 eighth , and so on. Therefore, for any

floating-point number to be represented exactly, you must express it as the

previously mentioned sum.

Base 2

-1

-2

128 64 32 16 8 4 2 1 1/2 1/4

255.75 =

1 x 2

-1

1 x 2

-2

Numerical Accuracy in SAS Software 67

So, the value 255.75 is represented in binary format as 1111 1111.11

2. Move the decimal over until there is only one digit to the left of it. This process is

called normalizing the value. Normalizing a value in scientific notation is the process

by which the exponent is chosen so that the absolute value of the mantissa is at least

one but less than ten. For this number, you move the decimal point 7 places:

1.111 1111 11

Because the decimal point was moved 7 places, the exponent is now 7.

3. The bias is 1023, so add 7 to 1023 to get

1030

4. Convert the decimal value, 1030, to hexadecimal using the base 16 number system:

Base 16

...

268,435,456 ... 65,536 4096 256 16 1

The converted hexadecimal value for 1030 will be placed in the exponent portion of

the final result.

5. Convert 406 to binary format:

0100 0000 0110

4 0 6

If the value that you are converting is negative, change the first bit to 1:

1100 0000 0110

This translates in hexadecimal to

C 0 6

6. In Step 2 above, delete the first digit and decimal (the implied one-bit):

11111111

7. Break these up into nibbles (half bytes) so that you have

1111 1111 1

8. To have a complete nibble at the end, add enough zeros to complete 4 bits:

1111 1111 1000

9. Convert

1111 1111 1000

to its hexadecimal equivalent to get the mantissa portion:

1111 1111 1000

F F 8

68 Chapter 4 • SAS Variables

The final floating-point representation for 255.75 is

406F F800 0000 0000

The final floating-point representation for –255.75 is

C06F F800 0000 0000

In this example, the starting decimal value, 255.75, conveniently converts to a finite

binary value that can be represented without rounding in both binary and hexadecimal.

The following section shows the conversion process for a decimal number that cannot be

represented precisely in floating-point representation.

Accuracy on x64 Windows Processors

Consider this example:

data _null_;

x=.500000000000000000000000;

y=.500000000000000000000000000;

if x=y then put 'equal';

else put 'not equal';

run;

Log Output

not equal

Although these values appear to be alike, the internal representations differ slightly,

because the IEEE floating-point representation can only represent 15 digits. Here is the

floating-point representation of both variables using the HEX16. format.

x=3FE0000000000000

y=3FDFFFFFFFFFFFFF

When the number of significant digits is reduced to 15 or less, the floating-point

representation is the same and the values are equal.

data _null_;

x=.5000000000000000;

y=.500000000000000;

if x=y then put 'equal';

else put 'not equal';

put x=hex16./

y=hex16.;

run;

Log Output

equal

x=3FE0000000000000

y=3FE0000000000000

This issue pertains to floating-point representation on the x64 processors. The routine

used to compute the result is slightly different on Windows than on any other host

(Linux, UNIX, AIX, and so on). The routine is written in x64 assembly to maximize

performance, and any changes to the routine for Windows x64 could not only lead to

poorer performance, but they could also potentially introduce other unintended side

effects or errors in other computations.

Numerical Accuracy in SAS Software 69

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Numerical Accuracy in SAS Software (2/5)

Create new playlist

Sign In

Sign Up

Table of Contents for
Numerical Accuracy in SAS Software (2/5)