Eliminating Unnecessary Passes through the Data

Best practices specify that you should eliminate unnecessary passes through the data. To minimize I/O operations and CPU time, avoid reading or writing data more than necessary.

Comparative Example: Creating Multiple Subsets of a SAS Data Set

Overview

Suppose you want to create five subsets of data from the data set Retail.Customer. You need a subset for each of five countries.
The following sample programs compare two techniques. You can use these samples as models for creating benchmark programs in your own environment. Your results might vary depending on the structure of your data, your operating environment, and the resources that are available at your site.

Programming Techniques

1 Multiple DATA Steps
This program includes multiple DATA steps and subsequently reads data five times from the same Retail.Customer data set. Individual subsetting IF statements appear in five separate DATA steps.
data retail.UnitedStates;
   set retail.customer;
   if country='US';
run;

data retail.France;
   set retail.customer;
   if country='FR';
run;

data retail.Italy;
   set retail.customer;
   if country='IT';
run;

data retail.Germany;
   set retail.customer;
   if country='DE';
run;

data retail.Spain;
   set retail.customer;
   if country='ES';
run;
2 A Single DATA Step
This program uses only one DATA step to create five output data sets. The data set Retail.Customer is read only once. Also, IF-THEN/ELSE statements are used to conditionally output data to specific data sets.
data retail.UnitedStates 
     retail.France 
     retail.Italy 
     retail.Germany 
     retail.Spain;
   set retail.customer;
   if country='US' then output retail.UnitedStates;
   else if country='FR' then output retail.France;
   else if country='IT' then output retail.Italy;
   else if country='DE' then output retail.Germany;
   else if country='ES' then output retail.Spain;
run;

General Recommendations

When creating multiple subsets from a SAS data set, use a single DATA step with IF-THEN/ELSE IF logic to output to appropriate data sets.

Using the SORT Procedure with a WHERE Statement to Create Sorted Subsets

It is good programming practice to take advantage of the SORT procedure's ability to sort and subset in the same PROC step. This is more efficient than using two separate steps to accomplish this—a DATA step to subset followed by a procedure step that sorts.

Comparative Example: Creating a Sorted Subset of a SAS Data Set

Overview

Suppose you want to create a sorted subset of a SAS data set named Retail.Customer. You want only data for customers in the United States, France, Italy, Germany, and Spain.
The following sample programs compare two techniques. You can use these samples as models for creating benchmark programs in your own environment. Your results might vary depending on the structure of your data, your operating environment, and the resources that are available at your site.

Programming Techniques

1 A DATA Step and PROC SORT
This program has two steps. The first step creates a SAS data set by subsetting observations based on the value of the variable Country. The second step sorts the data according to the values for each country. Passing through all the data once and the subset again increases I/O and CPU operations.
data retail.CountrySubset;
   set retail.customer;
   where country in('US','FR','IT','DE','ES'),
run;

proc sort data=retail.CountrySubset;
   by country;
run;
2 PROC SORT with a WHERE Statement
In one step, this program sorts data and selects only those observations that meet the conditions of the WHERE statement. Processing only one data set once saves CPU and I/O resources.
Note that if this program did not create a second data set named Retail.CountrySubset, it would write over the data set named Retail.Customer with only part of the data.
proc sort data=retail.customer out=retail.CountrySubset;
   by country;
   where country in('US','FR','IT','DE','ES'),
run;

General Recommendations

  • When you need to process a subset of data with a procedure, use a WHERE statement in the procedure instead of creating a subset of data and reading that data with the procedure.
  • Write one program step that both sorts and subsets. This approach can take less programmer time and debugging time than writing separate program steps that subset and sort.

Using the DATASETS Procedure to Modify Variable Attributes

Use PROC DATASETS instead of a DATA step to modify data attributes. The DATASETS procedure uses fewer resources than the DATA step because it processes only the descriptor portion of the data set, not the data portion. PROC DATASETS retains the sort flag, as well as indexes.
Note: You cannot use the DATASETS procedure to modify the type, length, or position of variables because these attributes directly affect the data portion of the data set. To perform these operations, use the DATA step.

Comparative Example: Changing the Variable Attributes of a SAS Data Set

Overview

Suppose you want to change the variable attributes in Retail.NewCustomer to make them consistent with those in the Retail.Customer data set. The data set Retail.NewCustomer contains 89954 observations and 12 variables.
The following table shows the variable names and formats in each SAS data set.
SAS Data Set
Variable Name
Variable Format
Retail.Customer
Country
$COUNTRY.
Retail.Customer
Birth_Date
DATE9.
Retail.NewCustomer
Country_ID
$COUNTRY.
Retail.NewCustomer
Birth_Date
MMDDYYP10
The following sample programs compare two techniques. You can use these samples as models for creating benchmark programs in your own environment. Your results might vary depending on the structure of your data, your operating environment, and the resources that are available at your site.

Programming Techniques

1 A DATA Step
This program uses a DATA step with a RENAME statement and a FORMAT statement to modify attributes for the variables Country_ID and Birth_Date.
data retail.newcustomer;
   set retail.newcustomer;
   rename Country_ID=country;
   format birth_date date9.;
run;
2 PROC DATASETS
This program uses PROC DATASETS to modify the names and formats of the variables Country_ID and Birth_Date.
proc datasets lib=retail nolist;
   modify newcustomer;
   rename Country_ID=country;
   format birth_date date9.;
quit;

General Recommendations

  • To save significant resources, use the DATASETS procedure with the NOLIST option instead of a DATA step to change the attributes of a SAS data set.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.136.84