Controlling Page Size and the Number of Buffers

Measuring I/O

Improvement in I/O can come at the cost of increased memory consumption. In order to understand the relationship between I/O and memory, it is helpful to know when data is copied to a buffer and where I/O is measured. When you create a SAS data set using a DATA step, the following actions occur:
  1. SAS copies a page of data from the input data set to a buffer in memory.
  2. One observation at a time is loaded from the buffer into the program data vector.
  3. Each observation is written from the PDV to an output buffer.
  4. The contents of the output buffer are written to the disk when the buffer is full.
The process for reading external files is similar. However, each record is first read from the system buffer into the single-record input buffer before it is parsed and read into the program data vector.
In both cases, I/O is measured when input data is copied to the buffer in memory and when it is copied from the output buffer to the output data set.

Page Size

Think of a buffer as a container in memory that holds exactly one page of data. A page is described as follows:
  • It is the unit of data transfer between the storage device and memory.
  • It is fixed in size when the data set is created, either to a default value or to a user-specified value.
A larger page size can reduce execution time by reducing the number of times SAS has to read from or write to the storage medium. However, the improvement in execution time comes at the cost of increased memory consumption.

Reporting Page Size

You can use the CONTENTS procedure or the CONTENTS statement in the DATASETS procedure to report the page size and the number of pages.
Partial PROC CONTENTS Output
proc contents 
     data=company.order_fact;
run;
The total number of bytes that a data file occupies equals the page size multiplied by the number of pages. For example, the page size for Company.Order_fact is 8192 and the number of pages is 9423. Therefore, the data file occupies 77,193,216 bytes.
Note: Information that is available from PROC CONTENTS depends on the operating environment.
Note: In uncompressed data files, there is a 40-byte overhead (in a 64-bit operating environment) or a 24-byte overhead (in a 32-bit operating environment) per page plus a 1-bit per observation overhead (rounded up to the nearest byte), used to denote an observation's status as deleted or not deleted. You can learn about the structure of uncompressed and compressed data files in Controlling Data Storage Space.

Using the BUFSIZE= Option

To select a default page size, SAS uses an algorithm that is based on observation length, engine, and operating environment. The default page size is optimal for most SAS activities, especially on computers that support multiple SAS jobs concurrently. However, in some cases, choosing a page size or buffer size that is larger than the default can speed up execution time by reducing the number of times that SAS must read from or write to the storage medium.
You can use the BUFSIZE= system option or data set option to control the page size of an output SAS data set. The new buffer size is a permanent attribute of the data set. After it is specified, it is used whenever the data set is processed.
General form, BUFSIZE= option:
BUFSIZE= MIN | MAX | n;
Here is an explanation of the syntax:
MIN
sets the page size to the smallest possible number in your operating environment.
MAX
sets the page size to the maximum possible number in your operating environment.
n
specifies the page size in bytes. For example, a value of 8 specifies a page size of 8 bytes, and a value of 4K specifies a page size of 4096 bytes. The default is 0, which causes SAS to use the optimal page size for the operating environment.
CAUTION:
MIN might cause unexpected results and should be avoided. Use BUFSIZE=0 to reset the buffer page size to the default value in your operating environment.
Note: The syntax that is shown here applies to the OPTIONS statement. On the command line or in a configuration file, the syntax is specific to your operating environment. For details, see the SAS documentation for your operating environment.
Only certain page size or buffer size values are valid for each operating environment. If you request an invalid value for your operating environment, SAS automatically rounds up to the next valid page size or buffer size. BUFSIZE=0 is interpreted as a request for the default page size or buffer size.
In the following program, the BUFSIZE= system option specifies a page size of 30720 bytes.
options bufsize=30720;
filename orders 'c:orders.dat';
data company.orders_fact;
   infile orders;
   <more SAS code>
run;
Before you change the default page size, it is important to consider the access pattern for the data as well as the I/O transfer rate of the underlying hardware. In some cases, increasing the page size might degrade performance, particularly when the data is processed using direct (random) access.
Note: The default value for BUFSIZE= is determined by your operating environment and is set to optimize sequential access. To improve performance for direct access, you should change the value for BUFSIZE=. For the default setting and possible settings for direct access, see the BUFSIZE= system option in the SAS documentation for your operating environment.
Note: You can override the BUFSIZE= system option by using the BUFSIZE= data set option.
CAUTION:
If you use the COPY procedure to copy a data set to a library that is accessed via a different engine, the original page size or buffer size is not necessarily retained.

Using the BUFNO= Option

You can use the BUFNO= system or data set option to control the number of buffers that are available for reading or writing a SAS data set. By increasing the number of buffers, you can control how many pages of data are loaded into memory with each I/O transfer.
Note: Increasing the number of buffers might not affect performance under the Windows and UNIX operating environments, especially when you work with large data sets. By default, the Windows and UNIX operating environments read one buffer at a time. Under the windowing environment, you can override this default by turning on the SGIO system option when you invoke SAS. For details about the SGIO system option, see the SAS documentation for the Windows operating environment.
The following techniques might help minimize I/O consumption:
  • When you work with a small data set, allocate as many buffers as there are pages in the data set so that the entire data set can be loaded into memory. This technique is most effective if you read the same observations several times during processing.
  • Under the z/OS operating environment, increase the number of buffers that are allocated, rather than the size of each buffer, as the size of the data set increases.
General form, BUFNO= option:
BUFNO= MIN | MAX |n;
Here is an explanation of the syntax:
MIN
causes SAS to use the minimum optimal value for the operating environment. This is the default.
MAX
sets the number of buffers to the maximum possible number in your operating environment, up to the largest four-byte, signed integer, which is 2³¹-1, or approximately 2 billion.
n
specifies the number of buffers to be allocated.
Note: The recommended maximum for this option is 10.
Note: The syntax that is shown here applies to the OPTIONS statement. On the command line or in a configuration file, the syntax is specific to your operating environment. For details, see the SAS documentation for your operating environment.
In the following program, the BUFNO= system option specifies that 4 buffers are available.
options bufno=4;
filename orders 'c:orders.dat';
data company.orders_fact;
   infile orders;
   <more SAS code>
run;
proc print data=company.orders_fact;
run;
The buffer number is not a permanent attribute of the data set and is valid only for the current step or SAS session.
Figure 20.1 Current SAS Session
Current SAS Session
Note: You can override the BUFNO= system option by using the BUFNO= data set option.
Note: In SAS 9 and later, the BUFNO= option has no effect on thread-enabled procedures under the z/OS operating environment.
The product of BUFNO= and BUFSIZE=, rather than the specific value of either option, determines how much data can be transferred in one I/O operation. Increasing the value of either option increases the amount of data that can be transferred in one I/O operation.
BUFSIZE
BUFNO
Bytes Transferred in One I/O Operation
6144
2
12,288
6144
10
61,440
30,720
2
61,440
30,720
10
307,200
The number of buffers and the buffer size have a minimal effect on CPU usage.

Comparative Example: Using the BUFSIZE= Option and the BUFNO= Option

Settings for the Examples

Suppose you want to compare the resource usage when a data set is read using different buffer sizes and a varying number of buffers. The following sample programs compare settings for the BUFNO= option and the BUFSIZE= option.
You can use these samples as models for creating benchmark programs in your own environment. Your results might vary depending on the structure of your data, your operating environment, and the resources that are available at your site.
Note: 6144 bytes is the default page size under the z/OS operating environment.

Programming Techniques

1 BUFSIZE=6144, BUFNO=2
This program reads the data set Retail.Order_fact and creates the data set Work.Orders. The BUFSIZE= option specifies that Work.Orders is created with a buffer size of 6144 bytes. The BUFNO= option specifies that 2 pages of data are loaded into memory with each I/O transfer.
data work.orders (bufsize=6144 bufno=2);
   set retail.order_fact;
run;
2 BUFSIZE=6144, BUFNO=5
This program reads the data set Retail.Order_fact and creates the data set Work.Orders. The BUFSIZE= option specifies that Work.Orders is created with a buffer size of 6144 bytes. The BUFNO= option specifies that 5 pages of data are loaded into memory with each I/O transfer.
data work.orders (bufsize=6144 bufno=5);
   set retail.order_fact;
run;
3 BUFSIZE=6144, BUFNO=10
This program reads the data set Retail.Order_fact and creates the data set Work.Orders. The BUFSIZE= option specifies that Work.Orders is created with a buffer size of 6144 bytes. The BUFNO= option specifies that 10 pages of data are loaded into memory with each I/O transfer.
data work.orders (bufsize=6144 bufno=10);
   set retail.order_fact;
run;
4 BUFSIZE=12288, BUFNO=2
This program reads the data set Retail.Order_fact and creates the data set Work.Orders. The BUFSIZE= option specifies that Work.Orders is created with a buffer size of 12288 bytes. The BUFNO= option specifies that 2 pages of data are loaded into memory with each I/O transfer.
data work.orders (bufsize=12288 bufno=2);
   set retail.order_fact;
run;
5 BUFSIZE=12288, BUFNO=5
This program reads the data set Retail.Order_fact and creates the data set Work.Orders. The BUFSIZE= option specifies that Work.Orders is created with a buffer size of 12288 bytes. The BUFNO= option specifies that 5 pages of data are loaded into memory with each I/O transfer.
data work.orders (bufsize=12288 bufno=5);
   set retail.order_fact;
run;
6 BUFSIZE=12288, BUFNO=10
This program reads the data set Retail.Order_fact and creates the data set Work.Orders. The BUFSIZE= option specifies that Work.Orders is created with a buffer size of 12288 bytes. The BUFNO= option specifies that 10 pages of data are loaded into memory with each I/O transfer.
data work.orders (bufsize=12288 bufno=10);
   set retail.order_fact;
run;
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.14.103.77