end;
j=1;
do k = 1 to 5;
if indexc(old(k),from(j)) > 0 then do;
new(k)=translate(old(k),to(j),from(j));
j+1;
if j=4 then j=1;
end;
end;
decrypt_num=input(cats(of new1-new5),5.);
keep num encrypt_num decrypt_num;
run;
proc print;
run;
The following output shows the results of the PROC PRINT for Example 3:
Numerical Accuracy in SAS Software
Overview
In any number system, whether it is binary or decimal, there are limitations to how
precise numbers can be represented. As a result, approximations have to be made. For
example, in the decimal number system, the fraction 1/3 cannot be perfectly represented
as a finite decimal value because it contains infinitely repeating digits
(.333...). On
computers, because of finite precision, this number must be approximated. Numerical
precision is the accuracy with which numbers are approximated or represented.
In computing, software applications are particularly susceptible to numerical precision
errors due to finite precision and machine hardware limitations. Computers are finite
machines with finite storage capacity, so they cannot represent an infinite set of numbers
with perfect precision.
The problem is further compounded by the fact that computers use a different number
system than people do. Decimal infinite-precision arithmetic is the norm for human
calculations but computers use finite binary representations of values and finite-
precision arithmetic. This representation has been proven adequate for many
calculations. Yet, depending on the problem, you may need an extended precision that is
wider than what the hardware offers. In that case, representation and arithmetic are done
mostly in software and are relatively much slower than hardware arithmetic.
60 Chapter 4 SAS Variables
Furthermore, although computers do allow the use of decimal numbers and decimal
arithmetic via human-centric software interfaces, all numbers and data are eventually
converted to binary format to be stored and processed by the computer internally. It is in
the conversion between these 2 number systems – decimal to binary – that precision is
affected and rounding errors are introduced.
Truncation in Binary Numbers
Just like there are decimal values with infinitely repeating representations, there are also
binary values that have infinitely repeating representations. However, the numbers that
are imprecise in decimal are not always the same ones that are imprecise in binary.
For example, the decimal value 1/10 has a finite decimal representation (0.1), but in
binary it has an infinitely repeating representation. In binary, the value converts to
0.000110011001100110011 ...
where the pattern 0011 is repeated indefinitely. As a result, the value will be rounded
when stored on a computer.
Performing calculations and comparisons on imprecise numbers in SAS can lead to
unexpected results. Even the simplest calculations can lead to a wrong conclusion.
Hardware cannot always match what might seem obvious and expected in the decimal
system.
For example, in decimal arithmetic, the expression (3 x 0.1) is expected to be equal
to
0.3, so the difference between (3 x 0.1) and (0.3), must be 0. Because the
decimal values 0.1 and 0.3 do not have exact binary representations, this equality does
not hold true in binary arithmetic. If you compute the difference between the two values
in a SAS program, the result is not 0, as Example Code 4.6 on page 61 illustrates.
In the example, SAS sets the variables point_three and
three_times_point_three to 0.3 and (3 x 0.1), respectively. It then compares
the two values by subtracting one from the other and writing the result to the SAS log:
Example Code 4.6 Comparing Imprecise Values in SAS
data a;
point_three=0.3;
three_times_point_one= 3 * 0.1;
difference= point_three - three_times_point_one;
put 'The difference is ' difference;
run;
Output 4.5 Log Output for Comparing Imprecise Values in SAS
The log output shows that (3 x 0.1) — 0.3 does not equal 0, as it does in decimal
arithmetic. This is because the variable "difference" is the result of calculations that are
performed on rounded values, or, infinitely repeating binary values.
Numerical Accuracy in SAS Software 61
There are many decimal fractions whose binary equivalents are infinitely repeating
binary numbers, so be careful when interpreting results from general rational numbers in
decimal. There are some rational numbers that do not present problems in either number
system. For example, 1/2 can be finitely represented in both the decimal and binary
systems.
To understand better why a simple calculation such as this one can go wrong, or how a
number can be out of range, it is important to understand in more detail how SAS stores
binary numbers.
How SAS Stores Numeric Values
Maximum Integer Size
SAS stores all numeric values in 8 bytes of storage unless you specify differently. This
does not mean that a value is limited to 8 digits, but rather that 8 bytes are allocated for
storing the value. In the previous section, you learned how storing non-integer values
(fractions) can lead to problems with precision. But you can also encounter problems of
magnitude, or range, when working with integers (whole numbers).
On any computer, there are limits to how large the absolute value of an integer can be. In
SAS, this maximum integer value depends on two factors:
the number of bytes that you explicitly specify for storing the variable (using the
LENGTH statement)
the operating environment on which SAS is running
If you have not explicitly specified the number of storage bytes, then SAS uses the
default length of 8 bytes, and the maximum integer then depends solely on what
operating system you are using.
The following table lists the largest integer that can be reliably stored by a SAS variable
in the mainframe, UNIX, and Windows operating environments.
Table 4.8 Largest Integer That Can Be Safely Stored in a Given Length
When Variable
Length
Equals ...
Largest Integer
z/OS
Largest Integer
Windows/UNIX
2 256 not applicable
3 65,536 8,192
4 16,777,216 2,097,152
5 4,294,967,296 536,870,912
6 1,099,511,627,776 137,438,953,472
7 281,474,946,710,656 35,184,372,088,832
8 (default) 72,057,594,037,927,936 9,007,199,254,740,992
When viewing this table, consider the following points:
62 Chapter 4 SAS Variables
The minimum length for a SAS variable on Windows and UNIX operating systems
is 3 bytes, and the maximum length is 8 bytes. On IBM mainframes, the minimum
length for a SAS variable is 2 bytes, and the maximum length is 8 bytes.
As the length of the variable increases so does the size of the integer that can be
reliably represented.
For any given variable length, the maximum integer varies by host. This is because
mainframes have different specifications for storing floating-point numbers than
UNIX and PC machines do.
Always store real numbers in the full 8 bytes of storage. If you want to save disk
space by using the LENGTH statement to reduce the length of your variables, you
can do so but only for variables whose values are integers. When adjusting the length
of variables, be sure that the values are less than or equal to the largest integer
allowed for that specified length.
For example, in the UNIX operating environment, if you know that the value of your
numeric variables will always be integers between -8192 and 8192, then you can
safely specify a length of 3 to store the number:
data myData;
length num 3;
num=8000;
run;
CAUTION:
Use the full 8 bytes to store variables that contain real numbers.
Floating-Point Representation
SAS stores numeric values in 8 bytes of data. The way that the numbers are stored and
the space available to store them also affects numerical accuracy. Although there are
various ways to store binary numbers internally, SAS uses floating-point representation
to store numeric values. Floating-point representation supports a wide range of values
(very large or very small numbers) with an adequate amount of numerical accuracy.
You might already be familiar with floating-point representation because it is similar to
scientific notation. In both scientific notation and floating-point representation, each
number is represented as a mantissa, a base, and an exponent.
987 =
mantissa
exponent
base
.987
x 10
3
the mantissa is the number that is being multiplied by the base. In the example, the
mantissa is .987.
the base is the number that is being raised to a power. In the example, the base is 10.
the exponent is the power to which the base is raised. In the example, the exponent is
3.
One major difference between scientific notation and floating-point representation is that
in scientific notation, the base is 10. In floating-point representation, on most operating
systems, the base is either 2 or 16 depending on the system.
The following figure shows the decimal value 987 written in the IEEE 754 binary
floating-point format. Because it is a small value, no rounding is needed.
Numerical Accuracy in SAS Software 63
987 = 0 100 0100 0111 0110
mantissa
exponent
sign
To store binary floating-point numbers, computers use standard formats called
interchange formats, or byte layouts. The byte layout is a standard way of grouping and
ordering bit strings, from left to right, so that the parts of the floating-point number are
represented in a standardized way. Each part of the floating-point value (sign, exponent,
mantissa) is allotted a specific number of bits in the string and a specific position in the
string. This allows for the exchange of floating-point data in an efficient and compact
form.
Figure 4.1 on page 64 shows the byte layout for a double-precision binary floating-
point number. This layout uses the first bit to encode the sign of the number, the next 11
bits to encode the exponent, and the final 52 bits to encode the mantissa. If the sign bit is
1, then the number is negative and if the sign bit is 0, the number is positive.
Figure 4.1 Byte Layout for a Double-Precision Binary Floating-Point Number
sign
exponent (11 bit)
mantissa (52 bit)
byte 1 byte 2 byte 3 byte 4 byte 5 byte 6 byte 7 byte 8
Different host computers can have different formats and specifications for floating-point
representation. All platforms on which the SAS System runs use 8-byte floating-point
representation.
Precision v. Magnitude
The largest integer value that can be represented exactly (without rounding) depends on
the base and the number of bits that are allotted to the exponent. The precision is
determined by the number of bits that are allotted for the mantissa. Whether an operating
system truncates or rounds digits affects errors in representation.
SAS stores truncated floating-point numbers using the LENGTH statement, which
reduces the number of mantissa bits.The following table shows some differences
between floating-point formats for the IBM mainframe and the IEEE standard. The IEEE
standard is used by the Windows and UNIX operating systems.
Table 4.9 IBM and IEEE Standard for Floating-Point Formats
Specifications
IBM
Mainframe
IEEE Standard
(Windows/UNIX) Affects
Base 16 2 magnitude
Exponent Bits 7 11 magnitude
Mantissa bits 56 52 precision
Round or Truncate Truncate Round precision
Bias for Exponent 64 1023
64 Chapter 4 SAS Variables
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.189.171.125