Executing Only Necessary Statements

Overview

When you execute the minimum number of statements in the most efficient order, you minimize the hardware resources that SAS uses. The resources that are affected include disk usage, memory usage, and CPU usage.
Here are two techniques to keep in mind:
  • Subset your data as soon as is logically possible.
  • Process your data conditionally by using the most appropriate technique.

Positioning of the Subsetting IF Statement

To subset your data based on a newly derived or computed variable, use the subsetting IF statement in a DATA step in order to process only those observations that meet a specified condition.
The subsetting IF statement causes the DATA step to continue processing only those observations that meet the condition of the expression that is specified in the subsetting IF statement. The resulting SAS data set or data sets contain a subset of the original external file or SAS data set.
Position the subsetting IF statement in the program so that it checks the subsetting condition as soon as it is logically possible. As a result, unnecessary statements do not execute. When the subsetting condition is false, no further statements are processed for that observation.
Also, remember to subset data before performing calculations and to minimize the use of function calls or arithmetic operators. Unnecessary processing of unwanted observations results in higher expenditure of hardware resources.

Comparative Example: Creating a Subset of Data

Overview

Suppose you want to create a subset of data, calculate six new variables, and conditionally output data by reading from the SAS data set Retail.Order_fact. The output data set should contain new variables for the following:
  • the month of the order
  • the elapsed time between the order date and the delivery date
  • the profit, based on the retail price, discount, and unit price
  • total profit
  • total discount
  • total wait time
The subset of data that includes only orders for the month of December is approximately 10% of the data.
You can accomplish this task by using a subsetting IF statement. Placement of this statement in the DATA step can affect the efficiency of the DATA step in terms of CPU time and real time.
The following sample programs compare two techniques. You can use these samples as models for creating benchmark programs in your own environment. Your results might vary depending on the structure of your data, your operating environment, and the resources that are available at your site.

Programming Techniques

1 A Subsetting IF Statement at the Bottom
This program calculates six new variables before the subsetting IF statement selects only observations whose values for Month are 12.
data profit;
   retain TotalProfit TotalDiscount TotalWait Count 0;    
   set retail.order_fact;
   MonthOfOrder=month(order_date);
   WaitTime=sum(delivery_date,-order_date);
   if discount gt . then
      CalcProfit=sum((total_retail_price*discount),-costprice_per_unit)
                 *quantity;
   else CalcProfit=sum(total_retail_price,-costprice_per_unit)
                 *quantity;
   TotalProfit=sum(totalprofit,calcprofit);
   TotalDiscount=sum(totaldiscount,discount);
   TotalWait=sum(totalwait,waittime);
   Count+1;    
   if monthoforder=12;
run;
2 A Subsetting IF Statement near the Top
In this program, the subsetting IF statement is positioned immediately after the value for MonthofOrder has been calculated. If the value is not 12, then no further statements are processed for that observation. In this program, calculations are performed on a smaller number of observations, which results in greater program efficiency.
data profit;
   retain TotalProfit TotalDiscount TotalWait Count 0;  
   set retail.order_fact;
   MonthOfOrder=month(order_date);
   if monthoforder=12;
   WaitTime=sum(delivery_date,-order_date);
   if discount gt . then
      CalcProfit=sum((total_retail_price*discount),-costprice_per_unit)
                 *quantity;
   else CalcProfit=sum(total_retail_price,-costprice_per_unit)
                 *quantity;
   TotalProfit=sum(totalprofit,calcprofit);
   TotalDiscount=sum(totaldiscount,discount);
   TotalWait=sum(totalwait,waittime);
   Count+1;
run;

General Recommendations

Position the subsetting IF statement in a DATA step as soon as is logically possible in order to save the most resources.

Using Conditional Logic Efficiently

You can use conditional logic to change how SAS processes selected observations. Two techniques—IF-THEN/ELSE statements and SELECT statements—can be used interchangeably and perform comparably. Based on the characteristics of your data and depending on your environment, one of these techniques might give you better performance. Choose a technique that conserves your programming time and makes the program easiest to read.
Note: The number of conditions that are tested and the type of variable or variables that are tested affect CPU resources.
For best practices, follow these guidelines for writing efficient IF/THEN or SELECT statement logic:
  • When using IF/THEN statements for mutually exclusive conditions, use the ELSE IF statement rather than an IF statement for all conditions except the first.
  • Check the most frequently occurring condition first, and continue checking conditions in descending order of frequency.
  • When you execute multiple statements based on a condition, put the statements in a DO group.
Before writing conditional logic, determine the distribution of your data values. You can use the following procedures:
  • FREQ procedure to examine the distribution of the data values
  • GCHART or GPLOT procedure to display the distribution graphically
  • UNIVARIATE procedure to examine distribution statistics and to display the information graphically

Comparative Example: Creating Variables Conditionally Using DO Groups

Overview

Suppose you want to calculate an adjusted profit based on the values of the variable Order_Type in the data set Retail.Order_fact. For retail sales, which are represented by the value 1, the adjusted profit should be calculated as 105% of profit. For catalog sales, which are represented by the value 2, the adjusted profit should be calculated as 103% of profit. For Internet sales, which are represented by the value 3, the adjusted profit should be equal to profit.
The following table shows that the values for the variable Order_Type are not uniformly distributed.
Creating Variables Conditionally Using DO Groups
The following table shows that the values for the variable Discount also are not uniformly distributed.
Creating Variables Conditionally Using DO Groups
The following sample programs compare two techniques. You can use these samples as models for creating benchmark programs in your own environment. Your results might vary depending on the structure of your data, your operating environment, and the resources that are available at your site.

Programming Techniques

1 IF-THEN/ELSE Statements
This program uses IF-THEN/ELSE statements with DO groups to conditionally execute multiple statements that calculate an adjusted profit. Conditions are checked in descending order of frequency.
data retail.order_info_1;
   set retail.order_fact;
   if order_type=1 then 
      do;                     /* Retail Sale */
         Float=delivery_date-order_date;
         RevenueQuarter=qtr(order_date);
         AveragePrice=total_retail_price/quantity;
         if discount=. then NetPrice=total_retail_price;
         else NetPrice=total_retail_price-discount;
         Profit=netPrice-(quantity*costprice_per_unit)*1.05;
      end;
   else if order_type=2 then
      do;                    /* Catalog Sale */
         Float=delivery_date-order_date;
         RevenueQuarter=qtr(order_date);
         AveragePrice=total_retail_price/quantity;
         if discount=. then NetPrice=total_retail_price;
         else NetPrice=total_retail_price-discount;
         Profit=netprice-(quantity*costprice_per_unit)*1.03;
      end;
   else 
      do;                    /* Internet Sale */
         Float=delivery_date-order_Date;
         RevenueQuarter=qtr(order_date);
         AveragePrice=total_retail_price/quantity;
         if discount=. then NetPrice=total_retail_price;
         else NetPrice=total_retail_price-discount;
         Profit=netprice-(quantity*costprice_per_unit);
      end;
run;
2 SELECT Statements
This program uses SELECT and WHEN statements with DO groups to conditionally execute multiple statements that calculate an adjusted profit. Conditions are checked in descending order of frequency.
data retail.order_info_2;
   set retail.order_fact;
   select(order_type);
      when (1)
          do;                                    /* Retail Sale */
            Float=delivery_date-order_date;
            RevenueQuarter=qtr(order_date);
            AveragePrice=total_retail_price/quantity;
            if discount=. then NetPrice=total_retail_price;
            else NetPrice=total_retail_price-discount;
            Profit=netprice-(quantity*costprice_per_unit)*1.05;
         end;
      when (2)
          do;                                   /* Catalog Sale */
            Float=delivery_date-order_date;
            RevenueQuarter=qtr(order_date);
            AveragePrice=total_retail_price/quantity;
            if discount=. then NetPrice=total_retail_price;
            else NetPrice=total_retail_price-discount;
            Profit=netprice-(quantity*costprice_per_unit)*1.03;
         end;
      otherwise
         do;                                  /* Internet Sale */
            Float=delivery_date-order_date;
            RevenueQuarter=qtr(order_date);
            AveragePrice=total_retail_price/quantity;
            if discount=. then NetPrice=total_retail_price;
            else NetPrice=total_retail_price-discount;
            Profit=netprice-(quantity*costprice_per_unit);
         end;
   end;
run;

General Recommendations

  • Check the most frequently occurring condition first, and continue checking conditions in descending order of frequency, regardless of whether you use IF-THEN/ELSE or SELECT and WHEN statements.
  • When you execute multiple statements based on a condition, put the statements in a DO group.

Comparative Example: Creating Variables Conditionally When Calling Functions

Overview

Suppose you want to create a report that includes a new variable that is based on the value of an existing variable in the SAS data set Retail.Order_fact. Values for the new Month variable are extracted from the existing variable Order_Date by using the MONTH function.
The following table shows that the values for Month are fairly evenly distributed.
Creating Variables Conditionally When Calling Functions
The following sample programs compare several techniques. You can use these samples as models for creating benchmark programs in your own environment. Your results might vary depending on the structure of your data, your operating environment, and the resources that are available at your site.

Programming Techniques

1 Parallel IF Statements
This program calls the MONTH function 12 times. With these non-exclusive cases, each IF statement executes for each observation that is read from Retail.Order_Fact. This is the least efficient approach.
data retail.orders;
   set retail.order_fact;
   if month(order_date)=1 then Month='Jan';
   if month(order_date)=2 then Month='Feb';
   if month(order_date)=3 then Month='Mar';
   if month(order_date)=6 then Month='Jun';
   if month(order_date)=7 then Month='Jul';
   if month(order_date)=8 then Month='Aug';
   if month(order_date)=9 then Month='Sep';
   if month(order_date)=10 then Month='Oct';
   if month(order_date)=11 then Month='Nov';
   if month(order_date)=12 then Month='Dec';
run;
2 ELSE IF Statements, Many Function References
This program uses ELSE IF statements that call the function MONTH. Once the true condition is found, subsequent ELSE IF statements are not executed. This is more efficient than using parallel IF statements, but the MONTH function is executed many times.
data retail.orders;
   set retail.order_fact;
   if month(order_date)=1 then Month='Jan';
   else if month(order_date)=2 then Month='Feb';
   else if month(order_date)=3 then Month='Mar';
   else if month(order_date)=4 then Month='Apr';
   else if month(order_date)=5 then Month='May';
   else if month(order_date)=6 then Month='Jun';
   else if month(order_date)=7 then Month='Jul';
   else if month(order_date)=10 then Month='Oct';
   else if month(order_date)=11 then Month='Nov';
   else if month(order_date)=12 then Month='Dec';
run;
3 ELSE IF Statements, One Function Reference
This program uses the MONTH function to find the value of Order_Date, but only once. The MONTH function is called immediately after reading the data set and before any IF-THEN/ELSE statements execute. This is efficient.
data retail.orders(drop=mon);
   set retail.order_fact;
   mon=month(order_date);
   if mon=1 then Month='Jan';
   else if mon=2 then Month='Feb';
   else if mon=3 then Month='Mar';
   else if mon=4 then Month='Apr';
   else if mon=5 then Month='May';
   else if mon=6 then Month='Jun';
   else if mon=7 then Month='Jul';
   else if mon=8 then Month='Aug';
   else if mon=9 then Month='Sep';
   else if mon=10 then Month='Oct';
   else if mon=11 then Month='Nov';
   else if mon=12 then Month='Dec';
run;
4 SELECT Group
In this program, the SELECT statement calls the MONTH function only once, before WHEN statements execute and assign values for Month. This is efficient.
data retail.orders;
   set retail.order_fact;
   select(month(order_date));
      when (1) Month='Jan';
      when (2) Month='Feb';
      when (3) Month='Mar';
      when (4) Month='Apr';
      when (5) Month='May';
      when (6) Month='Jun';
      when (7) Month='Jul';
      when (8) Month='Aug';
      when (11) Month='Nov';
      when (12) Month='Dec';
      otherwise;
   end;
run;

General Recommendations

  • Avoid using parallel IF statements, which use the most resources and are the least efficient way to conditionally execute statements.
  • Use IF-THEN/ELSE statements and SELECT blocks to be more efficient.
  • To significantly reduce the amount of resources used, write programs that call a function only once instead of repetitively using the same function in many statements. SAS functions are convenient, but they can be expensive in terms of CPU resources.

Using DO Groups Efficiently

You can conditionally execute only necessary statements by placing them in DO groups that are associated with IF-THEN/ELSE statements or with SELECT/WHEN statements. Groups of statements execute only when a particular condition is true.
When using a DO group with IF-THEN/ELSE statements, add DO after the THEN clause, and add an END statement after all of the statements that you want executed as a group.
data orders;
   set company.orders;
   if order_type = 1 then 
      do;
      <multiple executable statements here>
      end;
   else if order_type = 2 then 
      do;
      <multiple executable statements here>
       end;
   else if order_type = 3 then 
       do;
       <multiple executable statements here>
       end;
run;
Note: Use an IF-THEN DO group when you create multiple variables based on a condition.
When using a DO group with SELECT/WHEN statements, add DO after the WHEN condition, and add an END statement after all of the statements that you want executed as a group. Use an OTHERWISE statement to specify the statements that you want executed if no WHEN condition is met.
data orders;
   set company.orders;
   select (order_type);
      when (1) 
         do;
         <multiple executable statements here>
         end;
      when (2) 
         do;
         <multiple executable statements here>
         end;
      when (3) 
         do;
         <multiple executable statements here>
         end;
      otherwise;
   end;
run;
Remember that IF-THEN/ELSE and SELECT/WHEN logic require no intervening statements between the IF and the ELSE conditions or between the SELECT and the WHEN conditions.

Comparative Example: Creating Data in DO Groups

Overview

Suppose you want to identify which customer groups are Club Members, Club Gold Members, or Internet/Catalog members, based on data from the data set Retail.Customer_hybrid. You also want to identify the nature of customer activity as inactive, low activity, medium activity, or high activity.
The following table shows the distribution of values for Customer_Type_ID.
Creating Data in DO Groups
The following sample programs compare several techniques. You can use these samples as models for creating benchmark programs in your own environment. Your results might vary depending on the structure of your data, your operating environment, and the resources that are available at your site.

Programming Techniques

1 Parallel IF Statements
This program creates a permanent SAS data set named Retail.Customers by reading the Retail.Customer_hybrid data set. Serial IF statements are used to populate the variables Customer_Group and Customer_Activity.
data retail.customers;
   length Customer_Group $ 26 Customer_Activity $ 15;
   set retail.customer_hybrid;
   if substr(put(customer_type_ID,4.),1,2)='10' then
      customer_group='Orion Club members';
   if substr(put(customer_type_ID,4.),1,2)='20' then
      customer_group='Orion Club Gold members';
   if substr(put(customer_type_ID,4.),1,2)='30' then
      customer_group='Internet/Catalog Customers';
   if substr(put(customer_type_ID,4.),1,2) in ('10', '20') and
      substr(put(customer_type_ID,4.),3,2)='10' then
         customer_activity='inactive';
   if substr(put(customer_type_ID,4.),1,2) in ('10', '20') and
      substr(put(customer_type_ID,4.),3,2)='20' then
         customer_activity='low activity';
   if substr(put(customer_type_ID,4.),1,2) in ('10', '20') and
      substr(put(customer_type_ID,4.),3,2)='30' then
         customer_activity='medium activity';
   if substr(put(customer_type_ID,4.),1,2) in ('10', '20') and
      substr(put(customer_type_ID,4.),3,2)='40' then
         customer_activity='high activity';
run;
2 SELECT, IF/SELECT Statements
This program creates a permanent SAS data set named Retail.Customers by reading the Retail.Customer_hybrid data set. SELECT/WHEN logic and SELECT/WHEN statements in an IF/THEN DO group populate the variables Customer_Group and Customer_Activity. If the value of the first two digits of Customer_Type_ID is 10, 20, or 30, then Customer_Group is populated. If the value of the first two digits of Customer_Type_ID is 10 or 20, then Customer_Activity is populated by reading the last two digits of Customer_Type_ID.
data retail.customers;
   length Customer_Group $ 26 Customer_Activity $ 15;
   set retail.customer_hybrid;
      select(substr(put(customer_type_ID,4.),1,2));
         when ('10') customer_group='Orion Club members';
         when ('20') customer_group='Orion Club Gold members';
         when ('30') customer_group='Internet/Catalog Customers';
         otherwise;
      end;
   if substr(put(customer_type_ID,4.),1,2) in ('10', '20') then
      do;
         select(substr(put(customer_type_ID,4.),3,2));
            when ('10') customer_activity='inactive';
            when ('20') customer_activity='low activity';
            when ('30') customer_activity='medium activity';
            when ('40') customer_activity='high activity';
            otherwise;
         end;
      end;
run;
3 Nested SELECT Statements
This program creates a permanent SAS data set named Retail.Customers by reading the Retail.Customer_hybrid data set. Nested SELECT statements are used to populate the variables Customer_Group and Customer_Activity.
data retail.customers;
   length Customer_Group $ 26 Customer_Activity $ 15;
   set retail.customer_hybrid;
   select(substr(put(customer_type_ID,4.),1,2));
      when ('10')
         do;
            customer_group='Orion Club members';
            select(substr(put(customer_type_ID,4.),3,2));
               when ('10') customer_activity='inactive';
               when ('20') customer_activity='low activity';
               when ('30') customer_activity='medium activity';
               when ('40') customer_activity='high activity';
               otherwise;
            end;
         end;
      when ('20')
         do;
            customer_group='Orion Club Gold members';
            select(substr(put(customer_type_ID,4.),3,2));
               when ('10') customer_activity='inactive';
               when ('20') customer_activity='low activity';
               when ('30') customer_activity='medium activity';
               when ('40') customer_activity='high activity';
               otherwise;
            end;
         end;
      when ('30') customer_group='Internet/Catalog Customers';
      otherwise;
   end;
run;
4 IF-THEN/ELSE IF Statements with a Link
This program creates a permanent SAS data set named Retail.Customers by reading the Retail.Customer_hybrid data set. IF-THEN/ELSE IF statements are used with a link to populate the variables Customer_Group and Customer_Activity.
data retail.customers;
   length Customer_Group $ 26 Customer_Activity $ 15;
   set retail.customer_hybrid;
   if substr(put(customer_type_ID,4.),1,2)='10' then
      do;
         customer_group='Orion Club members';
         link activity;
      end;
   else if substr(put(customer_type_ID,4.),1,2)='20' then
      do;
         customer_group='Orion Club Gold members';
         link activity;
      end;
   else if substr(put(customer_type_ID,4.),1,2)='30' then
      customer_group='Internet/Catalog Customers';
   return;
   activity:
   if substr(put(customer_type_ID,4.),3,2)='10' then
      customer_activity='inactive';
   else if substr(put(customer_type_ID,4.),3,2)='20' then
      customer_activity='low activity';
   else if substr(put(customer_type_ID,4.),3,2)='30' then
      customer_activity='medium activity';
   else if substr(put(customer_type_ID,4.),3,2)='40' then
      customer_activity='high activity';
   return;
run;

General Recommendations

  • Avoid parallel IF statements because they use extra resources.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.198.83