Summary

Text Summary

Reviewing Terminology

You can review definitions of terms that are important in this chapter. You can also review diagrams and descriptions of the various relationships between input sources for a table lookup operation.

Working with Lookup Values Outside of SAS Data Sets

You can use the IF-THEN/ELSE statement in the DATA step to combine data from a base table with lookup values that are not stored in a SAS data set. You can also use the FORMAT procedure or the ARRAY statement to combine data from a base table with lookup values that are not stored in a SAS data set.

Combining Data with the DATA Step Match-Merge

You can use the MERGE statement in the DATA step to combine data from multiple data sets as long as the input data sets have a common variable. You can merge more than two data sets that lack a common variable in multiple DATA steps if each input data set contains at least one variable that is also in at least one other input data set.

Using PROC SQL to Join Data

You can also use PROC SQL to join data from multiple tables if there is no single column that is common to all input tables. If you create a new table with the results of an inner join in a PROC SQL step, the results can be very similar to the results of a DATA step match-merge.

Comparing DATA Step Match-Merges and PROC SQL Joins

It is possible to create identical results with a basic DATA step match-merge and a PROC SQL join. However, there are significant differences between these two methods, as well as advantages and disadvantages to each. In some cases, such as when there is a one-to-one or a one-to-many match on values of the BY variables in the input data sets, these two methods produce identical results. In other cases, such as when there is a many-to-many match on values of the BY variables, or if there are nonmatching values of the BY variables, these two methods produce different results. These differences reflect the fact that the processing is different for a DATA step match-merge and a PROC SQL join. Even if you are working with many-to-many matches or nonmatching data, it is possible to use other DATA step techniques such as multiple SET statements to create results that are identical to the results that a PROC SQL step creates.

Combining Summary Data and Detail Data

In order to perform tasks such as calculating percentages based on individual values from a data set based on a summary statistic of the data, you need to combine summary data and detail data. One way to create a summary data set is to use PROC MEANS. Once you have a summary data set, you can use multiple SET statements to combine the summary data with the detail data in the original data set. It is also possible to create summary data with a sum statement and to combine it with detail data in one DATA step.

Using an Index to Combine Data

You can use an index to combine data from matching observations in multiple data sets if the index is built on variables that are common to all input data sets. Especially if one of the input data sets is very large, an index can improve the efficiency of the merge. You use the KEY= option in a SET statement in conjunction with another SET statement to use an index to combine data. You can use the _IORC_ variable to prevent unmatched data from being included in the output data set.

Using a Transaction Data Set

Sometimes, you might want to update the data in one data set with data that is stored in another data set. You use the UPDATE statement to update a master data set with a transaction data set. The UPDATE statement replaces values in the master data set with values from the transaction data set for each observations with a matching value of the BY variable.

Sample Programs

Combining Data with the IF-THEN/ELSE Statement

data mylib.employees_new;
   set mylib.employees;
   if IDnum=1001 then Birthdate='01JAN1963'd;
   else if IDnum=1002 then Birthdate='08AUG1946'd;
   else if IDnum=1003 then Birthdate='23MAR1950'd;
   else if IDnum=1004 then Birthdate='17JUN1973'd;
run;

Combining Data with the ARRAY Statement

data mylib.employees_new;
   array birthdates{1001:1004} _temporary_ ('01JAN1963'd
         '08AUG1946'd '23MAR1950'd '17JUN1973'd);
   set mylib.employees;
   Birthdate=birthdates(IDnum);
run;

Combining Data with the FORMAT Procedure

proc format;
   value birthdate 1001 = '01JAN1963'
                    1002 = '08AUG1946'
                    1003 = '23MAR1950'
                    1004 = '17JUN1973';
run;

data mylib.employees_new;
   set mylib.employees;
   Birthdate=input(put(IDnum,birthdate.),date9.);
run;

Performing a DATA Step Match-Merge

proc sort data=sasuser.expenses out=expenses;
   by flightid date;
run;

proc sort data=sasuser.revenue out=revenue;
   by flightid date;
run;

data revexpns (drop=rev1st revbusiness revecon
     expenses);
   merge expenses(in=e) revenue(in=r);
   by flightid date;
   if e and r;
   Profit=sum(rev1st, revbusiness, revecon,
          -expenses);
run;

proc sort data=revexpns;
   by dest;
run;

proc sort data=sasuser.acities out=acities;
   by code;
run;

data sasuser.alldata;
   merge revexpns(in=r) acities
         (in=a rename=(code=dest)
         keep=city name code);
   by dest;
   if r and a;
run;

Performing a PROC SQL Join

proc sql;
   create table sqljoin as
   select revenue.flightid,
          revenue.date format=date9.,
          revenue.origin, revenue.dest,
          sum(revenue.rev1st,
              revenue.revbusiness,
               revenue.revecon)
          -expenses.expenses as Profit,
          acities.city, acities.name
   from sasuser.expenses, sasuser.revenue,
        sasuser.acities
   where expenses.flightid=revenue.flightid
         and expenses.date=revenue.date
         and acities.code=revenue.dest
   order by revenue.dest, revenue.flightid,
            revenue.date;
quit;

Combining Summary Data and Detail Data

proc means data=sasuser.monthsum noprint;
   var revcargo;
   output out=sasuser.summary sum=Cargosum;
run;

data sasuser.percent1;
   if _n_=1 then set sasuser.summary
                     (keep=cargosum);
   set sasuser.monthsum
       (keep=salemon revcargo);
   PctRev=revcargo/cargosum;
run;

data sasuser.percent2(drop=totalrev); 
   if _n_=1 then do until(lastobs); 
      set sasuser.monthsum(keep=revcargo) 
          end=lastobs; 
      totalrev+revcargo; 
   end; 
   set sasuser.monthsum (keep=salemon revcargo);
   PctRev=revcargo/totalrev;
run;

Using an Index to Combine Data

data work.profit work.errors;
   set sasuser.dnunder;
   set sasuser.sale2000(keep=routeid
       flightid date rev1st revbusiness
       revecon revcargo)key=flightdate;
   if _iorc_=0 then do;
      Profit=sum(rev1st, revbusiness, revecon,
             revcargo, -expenses);
      output work.profit;
   end;
   else do;
      _error_=0;
      output work.errors;
   end;
run;

Using a Transaction Data Set

proc sort data=mylib.empmaster;
   by empid;
run;

proc sort data=mylib.empchanges;
   by empid;
run;

data mylib.empmaster;
   update mylib.empmaster mylib.empchanges;
   by empid;
run;

Points to Remember

  • In a DATA step match-merge, you can use the RENAME= option to give identical names to variables in input data sets if those variables contain the same data and have the same type and length.
  • You use the OUTPUT statement and the NOPRINT option with the MEANS procedure to route the statistics to an output data set and suppress the default report.
  • The automatic variable _N_ tracks how many times a DATA step has iterated. The _N_ variable is useful when combining data from a summary data set with data from a larger detail data set.
  • When you use the UPDATE statement, both data sets must be sorted by or have indexes based on the BY variable.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.235.23