Interpreting FULLSTIMER and STIMER Statistics

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Operating Environment Information

See the documentation for your operating environment for information about how

STIMER differs from FULLSTIMER in your operating environment. The

information that these options display varies depending on your operating

environment, so statistics that you see might differ from the ones shown.

Interpreting FULLSTIMER and STIMER Statistics

Several types of resource usage statistics are reported by the STIMER and

FULLSTIMER options, including real time (elapsed time) and CPU time. Real time

represents the clock time it took to execute a job or step; it is heavily dependent on the

capacity of the system and the current load. As more users share a particular resource,

less of that resource is available to you. CPU time represents the actual processing time

required by the CPU to execute the job, exclusive of capacity and load factors. If you

must wait longer for a resource, your CPU time does not increase, but your real-time

increases. It is not advisable to use real time as the only criterion for the efficiency of

your program. The reason is that you cannot always control the capacity and load

demands on your system. A more accurate assessment of system performance is CPU

time, which decreases more predictably as you modify your program to become more

efficient.

The statistics reported by FULLSTIMER relate to the three critical computer resources:

I/O, memory, and CPU time. Under many circumstances, reducing the use of any of

these three resources usually results in better throughput of a particular job and a

reduction of real time used. However, there are exceptions, as described in the following

sections.

Techniques for Optimizing I/O

Overview of Techniques for Optimizing I/O

I/O is one of the most important factors for optimizing performance. Most SAS jobs

consist of repeated cycles of reading a particular set of data to perform various data

analysis and data manipulation tasks. To improve the performance of a SAS job, you

must reduce the number of times SAS accesses disk or tape devices.

To do this, you can modify your SAS programs to process only the necessary variables

and observations by:

• using WHERE processing

• using DROP and KEEP statements

• using LENGTH statements

• using the OBS= and FIRSTOBS= data set options

You can also modify your programs to reduce the number of times it processes the data

internally by:

• creating SAS data sets

• using indexes

• accessing data through SAS views

• using engines efficiently

Techniques for Optimizing I/O 197

• using PROC DATASETS when modifying variable attributes

• storing numeric values as characters

• using techniques to optimize memory usage

You can reduce the number of data accesses by processing more data each time a device

is accessed by:

• setting the ALIGNSASIOFILES, BUFNO=, BUFSIZE=, CATCACHE=,

COMPRESS= , DATAPAGESIZE=. STRIPESIZE=, UBUFNO=, and UBUFSIZE=

system options

• using the SASFILE global statement to open a SAS data set and allocate enough

buffers to hold the entire data set in memory

When using SAS DATA step views, you can improve performance by:

• specifying the VBUFSIZE= system option

• specifying the OBSBUF= data set option

Note: Sometimes you might be able to use more than one method, making your SAS job

even more efficient.

Using WHERE Processing

You might be able to use a WHERE statement in a procedure to perform the same task as

a DATA step with a subsetting IF statement. The WHERE statement can eliminate extra

DATA step processing when performing certain analyses because unneeded observations

are not processed.

For example, the following DATA step creates the data set Seatbelt. This data set

contains only those observations from the Auto.Survey data set for which the value of

Seatbelt is YES. The new data set is then printed.

libname auto 'SAS-library';

data seatbelt;

set auto.survey;

if seatbelt='yes';

run;

proc print data=seatbelt;

run;

However, you can get the same output from the PROC PRINT step without creating a

data set if you use a WHERE statement in the PRINT procedure, as in the following

example:

proc print data=auto.survey;

where seatbelt='yes';

run;

The WHERE statement can save resources by eliminating the number of times that you

process the data. In this example, you might be able to use less time and memory by

eliminating the DATA step. Also, you use less I/O because there is no intermediate data

set. Note that you cannot use a WHERE statement in a DATA step that reads raw data.

The extent of savings that you can achieve depends on many factors, including the size

of the data set. It is recommended that you test your programs to determine the most

efficient solution. For more information, see “Deciding Whether to Use a WHERE

Expression or a Subsetting IF Statement” on page 193.

198 Chapter 12 • Optimizing System Performance

Using DROP and KEEP Statements

Another way to improve efficiency is to use DROP and KEEP statements to reduce the

size of your observations. When you create a temporary data set and include only the

variables that you need, you can reduce the number of I/O operations that are required to

process the data. For more information, see “DROP Statement” in SAS Statements:

Reference and “KEEP Statement” in SAS Statements: Reference.

Using LENGTH Statements

You can also use LENGTH statements to reduce the size of your observations. When

you include only the necessary storage space for each variable, you can reduce the

number of I/O operations that are required to process the data. Before you change the

length of a numeric variable, however, see “Specifying Variable Lengths” on page 205.

For more information, see “LENGTH Statement” in SAS Statements: Reference.

Using the OBS= and FIRSTOBS= Data Set Options

You can also use the OBS= and FIRSTOBS= data set options to reduce the number of

observations processed. When you create a temporary data set and include only the

necessary observations, you can reduce the number of I/O operations that are required to

process the data. See “FIRSTOBS= Data Set Option” in SAS Data Set Options:

Reference and “OBS= Data Set Option” in SAS Data Set Options: Reference for more

information.

Creating SAS Data Sets

If you process the same raw data repeatedly, it is usually more efficient to create a SAS

data set. SAS can process SAS data sets more efficiently than it can process raw data

files.

Another consideration involves whether you are using data sets created with previous

releases of SAS. If you frequently process data sets created with previous releases, it is

sometimes more efficient to convert that data set to a new one by creating it in the most

recent version of SAS. See Chapter 33, “SAS 9.4 Compatibility with SAS Files from

Earlier Releases,” on page 719 for more information.

Using Indexes

An index is an optional file that you can create for a SAS data file to provide direct

access to specific observations. The index stores values in ascending value order for a

specific variable or variables and includes information as to the location of those values

within observations in the data file. In other words, an index enables you to locate an

observation by the value of the indexed variable.

Without an index, SAS accesses observations sequentially in the order in which they are

stored in the data file. With an index, SAS accesses the observation directly. Therefore,

by creating and using an index, you can access an observation faster.

In general, SAS can use an index to improve performance in these situations:

• For WHERE processing, an index can provide faster and more efficient access to a

subset of data.

Techniques for Optimizing I/O 199

• For BY processing, an index returns observations in the index order, which is in

ascending value order, without using the SORT procedure.

• For the SET and MODIFY statements, the KEY= option enables you to specify an

index in a DATA step to retrieve particular observations in a data file.

Note: An index exists to improve performance. However, an index conserves some

resources at the expense of others. Therefore, you must consider costs associated

with creating, using, and maintaining an index. See “Understanding SAS Indexes”

on page 638 for more information about indexes and deciding whether to create one.

Accessing Data through SAS Views

You can use the SQL procedure or a DATA step to create SAS views of your data. A

SAS view is a stored set of instructions that subsets your data with fewer statements.

Also, you can use a SAS view to group data from several data sets without creating a

new one, saving both processing time and disk space. For more information, see Chapter

27, “SAS Views,” on page 669 and the Base SAS Procedures Guide.

For information about optimizing system performance with SAS views, see “Setting

VBUFSIZE= and OBSBUF= for SAS DATA Step Views” on page 203.

Using Engines Efficiently

If you do not specify an engine in a LIBNAME statement, SAS must perform extra

processing steps in order to determine which engine to associate with the SAS library.

SAS must look at all of the files in the directory until it has enough information to

determine which engine to use. For example, the following statement is efficient because

it explicitly tells SAS to use a specific engine for the libref Fruits:

/* Engine specified. */

libname fruits v9 'SAS-library';

The following statement does not explicitly specify an engine. In the output, notice the

Note about mixed engine types that is generated:

/* Engine not specified. */

libname fruits 'SAS-library';

Log 12.3 SAS Log Output from the LIBNAME Statement

NOTE: Directory for library FRUITS contains files of mixed engine types.

NOTE: Libref FRUITS was successfully assigned as follows:

Engine: V9

Physical Name: SAS-library

z/OS Specifics

In the z/OS operating environment, you do not need to specify an engine for certain

types of libraries.

See Chapter 35, “SAS Engines,” on page 739 for more information about SAS engines.

200 Chapter 12 • Optimizing System Performance

Setting System Options to Improve I/O Performance

The following SAS system options can help you reduce the number of disk accesses that

are needed for SAS files, though they might increase memory usage and the SAS data

set size:

ALIGNSASIOFILES

A SAS data set consists of a header that is followed by one or more pages of data.

Normally, the header is 1K on Windows and 8K on UNIX. The

ALIGNSASIOFILES system option forces the header to be the same size as the data

pages so that the data pages are aligned to boundaries that allow for more efficient

I/O. The page size is set using the BUFSIZE= option.

For more information, see “ALIGNSASIOFILES System Option” in SAS System

Options: Reference and the SAS documentation for your operating environment.

BUFNO=

SAS uses the BUFNO= option to adjust the number of open page buffers when it

processes a SAS data set. Increasing this option's value can improve your

application's performance by allowing SAS to read more data with fewer passes;

however, your memory usage increases. Experiment with different values for this

option to determine the optimal value for your needs.

Note: You can also use the CBUFNO= system option to control the number of extra

page buffers to allocate for each open SAS catalog.

For more information, see “BUFNO= System Option” in SAS System Options:

Reference and the SAS documentation for your operating environment.

BUFSIZE=

When the BASE engine creates a data set, it uses the BUFSIZE= option to set the

permanent page size for the data set. The page size is the amount of data that can be

transferred for an I/O operation to one buffer. The default value for BUFSIZE= is

determined by your operating environment. Note that the default is set to optimize

the sequential access method. To improve performance for direct (random) access,

you should change the value for BUFSIZE=.

Whether you use your operating environment's default value or specify a value, the

engine always writes complete pages regardless of how full or empty those pages

are.

If you know that the total amount of data is going to be small, you can set a small

page size with the BUFSIZE= option, so that the total data set size remains small and

you minimize the amount of wasted space on a page. In contrast, if you know that

you are going to have many observations in a data set, you should optimize

BUFSIZE= so that as little overhead as possible is needed. Note that each page

requires some additional overhead.

Large data sets that are accessed sequentially benefit from larger page sizes because

sequential access reduces the number of system calls that are required to read the

data set. Note that because observations cannot span pages, typically there is unused

space on a page.

“Calculating Data Set Size” on page 206 discusses how to estimate data set size.

For more information, see “BUFSIZE= System Option” in SAS System Options:

Reference and the SAS documentation for your operating environment.

Techniques for Optimizing I/O 201

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Interpreting FULLSTIMER and STIMER Statistics

Create new playlist

Sign In

Sign Up

Table of Contents for
Interpreting FULLSTIMER and STIMER Statistics