Operating Environment Information
See the documentation for your operating environment for information about how
STIMER differs from FULLSTIMER in your operating environment. The
information that these options display varies depending on your operating
environment, so statistics that you see might differ from the ones shown.
Interpreting FULLSTIMER and STIMER Statistics
Several types of resource usage statistics are reported by the STIMER and
FULLSTIMER options, including real time (elapsed time) and CPU time. Real time
represents the clock time it took to execute a job or step; it is heavily dependent on the
capacity of the system and the current load. As more users share a particular resource,
less of that resource is available to you. CPU time represents the actual processing time
required by the CPU to execute the job, exclusive of capacity and load factors. If you
must wait longer for a resource, your CPU time does not increase, but your real-time
increases. It is not advisable to use real time as the only criterion for the efficiency of
your program. The reason is that you cannot always control the capacity and load
demands on your system. A more accurate assessment of system performance is CPU
time, which decreases more predictably as you modify your program to become more
efficient.
The statistics reported by FULLSTIMER relate to the three critical computer resources:
I/O, memory, and CPU time. Under many circumstances, reducing the use of any of
these three resources usually results in better throughput of a particular job and a
reduction of real time used. However, there are exceptions, as described in the following
sections.
Techniques for Optimizing I/O
Overview of Techniques for Optimizing I/O
I/O is one of the most important factors for optimizing performance. Most SAS jobs
consist of repeated cycles of reading a particular set of data to perform various data
analysis and data manipulation tasks. To improve the performance of a SAS job, you
must reduce the number of times SAS accesses disk or tape devices.
To do this, you can modify your SAS programs to process only the necessary variables
and observations by:
using WHERE processing
using DROP and KEEP statements
using LENGTH statements
using the OBS= and FIRSTOBS= data set options
You can also modify your programs to reduce the number of times it processes the data
internally by:
creating SAS data sets
using indexes
accessing data through SAS views
using engines efficiently
Techniques for Optimizing I/O 197
using PROC DATASETS when modifying variable attributes
storing numeric values as characters
using techniques to optimize memory usage
You can reduce the number of data accesses by processing more data each time a device
is accessed by:
setting the ALIGNSASIOFILES, BUFNO=, BUFSIZE=, CATCACHE=,
COMPRESS= , DATAPAGESIZE=. STRIPESIZE=, UBUFNO=, and UBUFSIZE=
system options
using the SASFILE global statement to open a SAS data set and allocate enough
buffers to hold the entire data set in memory
When using SAS DATA step views, you can improve performance by:
specifying the VBUFSIZE= system option
specifying the OBSBUF= data set option
Note: Sometimes you might be able to use more than one method, making your SAS job
even more efficient.
Using WHERE Processing
You might be able to use a WHERE statement in a procedure to perform the same task as
a DATA step with a subsetting IF statement. The WHERE statement can eliminate extra
DATA step processing when performing certain analyses because unneeded observations
are not processed.
For example, the following DATA step creates the data set Seatbelt. This data set
contains only those observations from the Auto.Survey data set for which the value of
Seatbelt is YES. The new data set is then printed.
libname auto 'SAS-library';
data seatbelt;
set auto.survey;
if seatbelt='yes';
run;
proc print data=seatbelt;
run;
However, you can get the same output from the PROC PRINT step without creating a
data set if you use a WHERE statement in the PRINT procedure, as in the following
example:
proc print data=auto.survey;
where seatbelt='yes';
run;
The WHERE statement can save resources by eliminating the number of times that you
process the data. In this example, you might be able to use less time and memory by
eliminating the DATA step. Also, you use less I/O because there is no intermediate data
set. Note that you cannot use a WHERE statement in a DATA step that reads raw data.
The extent of savings that you can achieve depends on many factors, including the size
of the data set. It is recommended that you test your programs to determine the most
efficient solution. For more information, see “Deciding Whether to Use a WHERE
Expression or a Subsetting IF Statement” on page 193.
198 Chapter 12 Optimizing System Performance
Using DROP and KEEP Statements
Another way to improve efficiency is to use DROP and KEEP statements to reduce the
size of your observations. When you create a temporary data set and include only the
variables that you need, you can reduce the number of I/O operations that are required to
process the data. For more information, see “DROP Statement” in SAS Statements:
Reference and “KEEP Statement” in SAS Statements: Reference.
Using LENGTH Statements
You can also use LENGTH statements to reduce the size of your observations. When
you include only the necessary storage space for each variable, you can reduce the
number of I/O operations that are required to process the data. Before you change the
length of a numeric variable, however, see “Specifying Variable Lengths” on page 205.
For more information, see “LENGTH Statement” in SAS Statements: Reference.
Using the OBS= and FIRSTOBS= Data Set Options
You can also use the OBS= and FIRSTOBS= data set options to reduce the number of
observations processed. When you create a temporary data set and include only the
necessary observations, you can reduce the number of I/O operations that are required to
process the data. See “FIRSTOBS= Data Set Option” in SAS Data Set Options:
Reference and “OBS= Data Set Option” in SAS Data Set Options: Reference for more
information.
Creating SAS Data Sets
If you process the same raw data repeatedly, it is usually more efficient to create a SAS
data set. SAS can process SAS data sets more efficiently than it can process raw data
files.
Another consideration involves whether you are using data sets created with previous
releases of SAS. If you frequently process data sets created with previous releases, it is
sometimes more efficient to convert that data set to a new one by creating it in the most
recent version of SAS. See Chapter 33, “SAS 9.4 Compatibility with SAS Files from
Earlier Releases,” on page 719 for more information.
Using Indexes
An index is an optional file that you can create for a SAS data file to provide direct
access to specific observations. The index stores values in ascending value order for a
specific variable or variables and includes information as to the location of those values
within observations in the data file. In other words, an index enables you to locate an
observation by the value of the indexed variable.
Without an index, SAS accesses observations sequentially in the order in which they are
stored in the data file. With an index, SAS accesses the observation directly. Therefore,
by creating and using an index, you can access an observation faster.
In general, SAS can use an index to improve performance in these situations:
For WHERE processing, an index can provide faster and more efficient access to a
subset of data.
Techniques for Optimizing I/O 199
For BY processing, an index returns observations in the index order, which is in
ascending value order, without using the SORT procedure.
For the SET and MODIFY statements, the KEY= option enables you to specify an
index in a DATA step to retrieve particular observations in a data file.
Note: An index exists to improve performance. However, an index conserves some
resources at the expense of others. Therefore, you must consider costs associated
with creating, using, and maintaining an index. See “Understanding SAS Indexes”
on page 638 for more information about indexes and deciding whether to create one.
Accessing Data through SAS Views
You can use the SQL procedure or a DATA step to create SAS views of your data. A
SAS view is a stored set of instructions that subsets your data with fewer statements.
Also, you can use a SAS view to group data from several data sets without creating a
new one, saving both processing time and disk space. For more information, see Chapter
27, “SAS Views,” on page 669 and the Base SAS Procedures Guide.
For information about optimizing system performance with SAS views, see “Setting
VBUFSIZE= and OBSBUF= for SAS DATA Step Views” on page 203.
Using Engines Efficiently
If you do not specify an engine in a LIBNAME statement, SAS must perform extra
processing steps in order to determine which engine to associate with the SAS library.
SAS must look at all of the files in the directory until it has enough information to
determine which engine to use. For example, the following statement is efficient because
it explicitly tells SAS to use a specific engine for the libref Fruits:
/* Engine specified. */
libname fruits v9 'SAS-library';
The following statement does not explicitly specify an engine. In the output, notice the
Note about mixed engine types that is generated:
/* Engine not specified. */
libname fruits 'SAS-library';
Log 12.3 SAS Log Output from the LIBNAME Statement
NOTE: Directory for library FRUITS contains files of mixed engine types.
NOTE: Libref FRUITS was successfully assigned as follows:
Engine: V9
Physical Name: SAS-library
z/OS Specifics
In the z/OS operating environment, you do not need to specify an engine for certain
types of libraries.
See Chapter 35, “SAS Engines,” on page 739 for more information about SAS engines.
200 Chapter 12 Optimizing System Performance
Setting System Options to Improve I/O Performance
The following SAS system options can help you reduce the number of disk accesses that
are needed for SAS files, though they might increase memory usage and the SAS data
set size:
ALIGNSASIOFILES
A SAS data set consists of a header that is followed by one or more pages of data.
Normally, the header is 1K on Windows and 8K on UNIX. The
ALIGNSASIOFILES system option forces the header to be the same size as the data
pages so that the data pages are aligned to boundaries that allow for more efficient
I/O. The page size is set using the BUFSIZE= option.
For more information, see “ALIGNSASIOFILES System Option” in SAS System
Options: Reference and the SAS documentation for your operating environment.
BUFNO=
SAS uses the BUFNO= option to adjust the number of open page buffers when it
processes a SAS data set. Increasing this option's value can improve your
application's performance by allowing SAS to read more data with fewer passes;
however, your memory usage increases. Experiment with different values for this
option to determine the optimal value for your needs.
Note: You can also use the CBUFNO= system option to control the number of extra
page buffers to allocate for each open SAS catalog.
For more information, see “BUFNO= System Option” in SAS System Options:
Reference and the SAS documentation for your operating environment.
BUFSIZE=
When the BASE engine creates a data set, it uses the BUFSIZE= option to set the
permanent page size for the data set. The page size is the amount of data that can be
transferred for an I/O operation to one buffer. The default value for BUFSIZE= is
determined by your operating environment. Note that the default is set to optimize
the sequential access method. To improve performance for direct (random) access,
you should change the value for BUFSIZE=.
Whether you use your operating environment's default value or specify a value, the
engine always writes complete pages regardless of how full or empty those pages
are.
If you know that the total amount of data is going to be small, you can set a small
page size with the BUFSIZE= option, so that the total data set size remains small and
you minimize the amount of wasted space on a page. In contrast, if you know that
you are going to have many observations in a data set, you should optimize
BUFSIZE= so that as little overhead as possible is needed. Note that each page
requires some additional overhead.
Large data sets that are accessed sequentially benefit from larger page sizes because
sequential access reduces the number of system calls that are required to read the
data set. Note that because observations cannot span pages, typically there is unused
space on a page.
“Calculating Data Set Size” on page 206 discusses how to estimate data set size.
For more information, see “BUFSIZE= System Option” in SAS System Options:
Reference and the SAS documentation for your operating environment.
Techniques for Optimizing I/O 201
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.35.255