Storing Data in SAS Data Sets

Overview

In many cases, it is best practice for you to store data in SAS data sets. You can optimize performance if you know when you should create a SAS data set and when you should read data directly from an external file.
Before viewing the comparative example that illustrates different techniques for reading from a SAS data set versus from an external file, consider the following advantages of storing data in SAS data sets.
When you use SAS to repeatedly analyze or manipulate any particular group of data, it is more efficient to create a SAS data set than to read the raw data each time. Although SAS data sets can be larger than external files and can require more disk space, reading from SAS data sets saves CPU time that is associated with reading a raw data file.
Storing Data in SAS Data Sets
Here are other reasons for storing data in SAS data sets, rather than external files:
  • When the data is already in a SAS data set, you can use a SAS procedure on the data without further conversion.
  • SAS data sets are self-documenting.
The descriptor portion of a SAS data set documents data set properties, including the following:
  • data set labels
  • variable labels
  • variable formats
  • informats
  • variable names
Note: Create a temporary SAS data set if the data set is used for intermediate tasks such as merging and if it is needed in that SAS session only. Create a temporary SAS data set when the external file on which the data set is based might change between SAS sessions.

Comparative Example: Reading a SAS Data Set Versus an External File

Overview

Suppose you want to create a SAS data set that contains a large number of variables. One way to accomplish this task is to read an external file that is referenced by the fileref Rawdata. Another way to accomplish this is to read the same data from an existing SAS data set named Retail.Customer.
The following sample programs compare two techniques. You can use these samples as models for creating benchmark programs in your own environment. Your results might vary depending on the structure of your data, your operating environment, and the resources that are available at your site.

Programming Techniques

1 Reading from an External File
In this program, the INPUT statement reads fields of data from an external file that is referenced by the fileref Rawdata and creates 12 variables. For benchmarking purposes, the DATA statement specifies the _NULL_ argument so that you can measure resources used to read data isolated from resources used to write data.
data _null_;
   infile rawdata;
   input @1   Customer_ID         12.
         @13  Country             $2.
         @15  Gender              $1.
         @16  Personal_ID        $15.
         @31  Customer_Name      $40.
         @71  Customer_FirstName $20.
         @91  Customer_LastName  $30.
         @121 Birth_Date       date9.
         @130 Customer_Address   $45.
         @175 Street_ID           12.
         @199 Street_Number       $8.
         @207 Customer_Type_ID     8.;
run;
2 Reading from a SAS Data Set
In this program, the SET statement reads data directly from an existing SAS data set. As in the previous program, the DATA statement uses _NULL_ instead of naming a data set.
data _null_;  
   set retail.customer;
run;

General Recommendations

  • To save CPU resources, read a SAS data set instead of an external file.
  • To reduce I/O operations, read a SAS data set instead of an external file. Savings in I/O operations are largely dependent on the block size of the external file and the page size of the SAS data set.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.139.83.96