Numerical Accuracy in SAS Software (1/5)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

end;

j=1;

do k = 1 to 5;

if indexc(old(k),from(j)) > 0 then do;

new(k)=translate(old(k),to(j),from(j));

j+1;

if j=4 then j=1;

end;

decrypt_num=input(cats(of new1-new5),5.);

keep num encrypt_num decrypt_num;

run;

proc print;

run;

The following output shows the results of the PROC PRINT for Example 3:

Numerical Accuracy in SAS Software

Overview

In any number system, whether it is binary or decimal, there are limitations to how

precise numbers can be represented. As a result, approximations have to be made. For

example, in the decimal number system, the fraction 1/3 cannot be perfectly represented

as a finite decimal value because it contains infinitely repeating digits

(.333...). On

computers, because of finite precision, this number must be approximated. Numerical

precision is the accuracy with which numbers are approximated or represented.

In computing, software applications are particularly susceptible to numerical precision

errors due to finite precision and machine hardware limitations. Computers are finite

machines with finite storage capacity, so they cannot represent an infinite set of numbers

with perfect precision.

The problem is further compounded by the fact that computers use a different number

system than people do. Decimal infinite-precision arithmetic is the norm for human

calculations but computers use finite binary representations of values and finite-

precision arithmetic. This representation has been proven adequate for many

calculations. Yet, depending on the problem, you may need an extended precision that is

wider than what the hardware offers. In that case, representation and arithmetic are done

mostly in software and are relatively much slower than hardware arithmetic.

60 Chapter 4 • SAS Variables

Furthermore, although computers do allow the use of decimal numbers and decimal

arithmetic via human-centric software interfaces, all numbers and data are eventually

converted to binary format to be stored and processed by the computer internally. It is in

the conversion between these 2 number systems – decimal to binary – that precision is

affected and rounding errors are introduced.

Truncation in Binary Numbers

Just like there are decimal values with infinitely repeating representations, there are also

binary values that have infinitely repeating representations. However, the numbers that

are imprecise in decimal are not always the same ones that are imprecise in binary.

For example, the decimal value 1/10 has a finite decimal representation (0.1), but in

binary it has an infinitely repeating representation. In binary, the value converts to

0.000110011001100110011 ...

where the pattern 0011 is repeated indefinitely. As a result, the value will be rounded

when stored on a computer.

Performing calculations and comparisons on imprecise numbers in SAS can lead to

unexpected results. Even the simplest calculations can lead to a wrong conclusion.

Hardware cannot always match what might seem obvious and expected in the decimal

system.

For example, in decimal arithmetic, the expression (3 x 0.1) is expected to be equal

0.3, so the difference between (3 x 0.1) and (0.3), must be 0. Because the

decimal values 0.1 and 0.3 do not have exact binary representations, this equality does

not hold true in binary arithmetic. If you compute the difference between the two values

in a SAS program, the result is not 0, as Example Code 4.6 on page 61 illustrates.

In the example, SAS sets the variables point_three and

three_times_point_three to 0.3 and (3 x 0.1), respectively. It then compares

the two values by subtracting one from the other and writing the result to the SAS log:

Example Code 4.6 Comparing Imprecise Values in SAS

data a;

point_three=0.3;

three_times_point_one= 3 * 0.1;

difference= point_three - three_times_point_one;

put 'The difference is ' difference;

run;

Output 4.5 Log Output for Comparing Imprecise Values in SAS

The log output shows that (3 x 0.1) — 0.3 does not equal 0, as it does in decimal

arithmetic. This is because the variable "difference" is the result of calculations that are

performed on rounded values, or, infinitely repeating binary values.

Numerical Accuracy in SAS Software 61

There are many decimal fractions whose binary equivalents are infinitely repeating

binary numbers, so be careful when interpreting results from general rational numbers in

decimal. There are some rational numbers that do not present problems in either number

system. For example, 1/2 can be finitely represented in both the decimal and binary

systems.

To understand better why a simple calculation such as this one can go wrong, or how a

number can be out of range, it is important to understand in more detail how SAS stores

binary numbers.

How SAS Stores Numeric Values

Maximum Integer Size

SAS stores all numeric values in 8 bytes of storage unless you specify differently. This

does not mean that a value is limited to 8 digits, but rather that 8 bytes are allocated for

storing the value. In the previous section, you learned how storing non-integer values

(fractions) can lead to problems with precision. But you can also encounter problems of

magnitude, or range, when working with integers (whole numbers).

On any computer, there are limits to how large the absolute value of an integer can be. In

SAS, this maximum integer value depends on two factors:

• the number of bytes that you explicitly specify for storing the variable (using the

LENGTH statement)

• the operating environment on which SAS is running

If you have not explicitly specified the number of storage bytes, then SAS uses the

default length of 8 bytes, and the maximum integer then depends solely on what

operating system you are using.

The following table lists the largest integer that can be reliably stored by a SAS variable

in the mainframe, UNIX, and Windows operating environments.

Table 4.8 Largest Integer That Can Be Safely Stored in a Given Length

When Variable

Length

Equals ...

Largest Integer

z/OS

Largest Integer

Windows/UNIX

2 256 not applicable

3 65,536 8,192

4 16,777,216 2,097,152

5 4,294,967,296 536,870,912

6 1,099,511,627,776 137,438,953,472

7 281,474,946,710,656 35,184,372,088,832

8 (default) 72,057,594,037,927,936 9,007,199,254,740,992

When viewing this table, consider the following points:

62 Chapter 4 • SAS Variables

• The minimum length for a SAS variable on Windows and UNIX operating systems

is 3 bytes, and the maximum length is 8 bytes. On IBM mainframes, the minimum

length for a SAS variable is 2 bytes, and the maximum length is 8 bytes.

• As the length of the variable increases so does the size of the integer that can be

reliably represented.

• For any given variable length, the maximum integer varies by host. This is because

mainframes have different specifications for storing floating-point numbers than

UNIX and PC machines do.

• Always store real numbers in the full 8 bytes of storage. If you want to save disk

space by using the LENGTH statement to reduce the length of your variables, you

can do so but only for variables whose values are integers. When adjusting the length

of variables, be sure that the values are less than or equal to the largest integer

allowed for that specified length.

For example, in the UNIX operating environment, if you know that the value of your

numeric variables will always be integers between -8192 and 8192, then you can

safely specify a length of 3 to store the number:

data myData;

length num 3;

num=8000;

run;

CAUTION:

Use the full 8 bytes to store variables that contain real numbers.

Floating-Point Representation

SAS stores numeric values in 8 bytes of data. The way that the numbers are stored and

the space available to store them also affects numerical accuracy. Although there are

various ways to store binary numbers internally, SAS uses floating-point representation

to store numeric values. Floating-point representation supports a wide range of values

(very large or very small numbers) with an adequate amount of numerical accuracy.

You might already be familiar with floating-point representation because it is similar to

scientific notation. In both scientific notation and floating-point representation, each

number is represented as a mantissa, a base, and an exponent.

987 =

mantissa

exponent

base

.987

x 10

• the mantissa is the number that is being multiplied by the base. In the example, the

mantissa is .987.

• the base is the number that is being raised to a power. In the example, the base is 10.

• the exponent is the power to which the base is raised. In the example, the exponent is

One major difference between scientific notation and floating-point representation is that

in scientific notation, the base is 10. In floating-point representation, on most operating

systems, the base is either 2 or 16 depending on the system.

The following figure shows the decimal value 987 written in the IEEE 754 binary

floating-point format. Because it is a small value, no rounding is needed.

Numerical Accuracy in SAS Software 63

987 = 0 100 0100 0111 0110

mantissa

exponent

sign

To store binary floating-point numbers, computers use standard formats called

interchange formats, or byte layouts. The byte layout is a standard way of grouping and

ordering bit strings, from left to right, so that the parts of the floating-point number are

represented in a standardized way. Each part of the floating-point value (sign, exponent,

mantissa) is allotted a specific number of bits in the string and a specific position in the

string. This allows for the exchange of floating-point data in an efficient and compact

form.

Figure 4.1 on page 64 shows the byte layout for a double-precision binary floating-

point number. This layout uses the first bit to encode the sign of the number, the next 11

bits to encode the exponent, and the final 52 bits to encode the mantissa. If the sign bit is

1, then the number is negative and if the sign bit is 0, the number is positive.

Figure 4.1 Byte Layout for a Double-Precision Binary Floating-Point Number

sign

exponent (11 bit)

mantissa (52 bit)

byte 1 byte 2 byte 3 byte 4 byte 5 byte 6 byte 7 byte 8

Different host computers can have different formats and specifications for floating-point

representation. All platforms on which the SAS System runs use 8-byte floating-point

representation.

Precision v. Magnitude

The largest integer value that can be represented exactly (without rounding) depends on

the base and the number of bits that are allotted to the exponent. The precision is

determined by the number of bits that are allotted for the mantissa. Whether an operating

system truncates or rounds digits affects errors in representation.

SAS stores truncated floating-point numbers using the LENGTH statement, which

reduces the number of mantissa bits.The following table shows some differences

between floating-point formats for the IBM mainframe and the IEEE standard. The IEEE

standard is used by the Windows and UNIX operating systems.

Table 4.9 IBM and IEEE Standard for Floating-Point Formats

Specifications

IBM

Mainframe

IEEE Standard

(Windows/UNIX) Affects

Base 16 2 magnitude

Exponent Bits 7 11 magnitude

Mantissa bits 56 52 precision

Round or Truncate Truncate Round precision

Bias for Exponent 64 1023

64 Chapter 4 • SAS Variables

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Numerical Accuracy in SAS Software (1/5)

Create new playlist

Sign In

Sign Up

Table of Contents for
Numerical Accuracy in SAS Software (1/5)