Add rows to a table so that all possible combinations of specific columns are present in each BY group in the output table. The columns that are not used to define the combinations will have missing values.
This example is similar to Example 8.9 in that it fills out BY groups so that there is the same number of rows in each BY group and each BY group has the same combinations of specific variables. It differs from Example 8.9 in that the columns that define the combinations are character columns and not columns with sequential numeric values. Sequential numeric values as in Example 8.9 are straightforward to program with arrays and the DATA step while working with character values adds some complexity to the coding.
Featured Step | PROC FREQ and PROC SQL |
Featured Step Options and Statements | PROC FREQ: TABLES statement with NOPRINT, OUT=, and SPARSE options PROC SQL: Left join |
Table WOODPECKERS records the presence of six species of woodpeckers at four tract locations on seven dates. Only tract MV28 was visited on all seven dates.
Value "H" means thebird was heard. Value "N" means the bird was not seen or heard. Value "S" means the bird was observed.
WOODPECKERS red_ Obs visit_date tract bellied sapsucker downy hairy flicker pileated 1 03/15/2009 JN72 H N S S N N 2 03/15/2009 MV28 N N S N N N 3 03/20/2009 JN72 S N S H N H 4 03/20/2009 KT05 N S S N N N 5 03/20/2009 MV28 S N H H N S 6 03/25/2009 JN72 N H S S N N 7 03/25/2009 MV28 N S S S N H 8 03/30/2009 JN72 H S N H N N 9 03/30/2009 KT05 N N H S N H 10 03/30/2009 LT83 H S S S N H 11 03/30/2009 MV28 S N S N N S 12 04/04/2009 LT83 N N S S H N 13 04/09/2009 JN72 S N S S N S 14 04/09/2009 LT83 N N S H N N 15 04/09/2009 MV28 S S S S N H 16 04/14/2009 LT83 N N S S S H 17 04/14/2009 MV28 H N S S H S
Output 8.10 FULLBIRDSURVEY TableExample 8.10 FULLBIRDSURVEY Table Created with PROC SQL red_ Obs visit_date tract bellied sapsucker downy hairy flicker pileated 1 03/15/2009 JN72 H N S S N N 2 03/15/2009 KT05 3 03/15/2009 LT83 4 03/15/2009 MV28 N N S N N N 5 03/20/2009 JN72 S N S H N H 6 03/20/2009 KT05 N S S N N N 7 03/20/2009 LT83 8 03/20/2009 MV28 S N H H N S 9 03/25/2009 JN72 N H S S N N 10 03/25/2009 KT05 11 03/25/2009 LT83 12 03/25/2009 MV28 N S S S N H 13 03/30/2009 JN72 H S N H N N 14 03/30/2009 KT05 N N H S N H 15 03/30/2009 LT83 H S S S N H 16 03/30/2009 MV28 S N S N N S 17 04/04/2009 JN72 18 04/04/2009 KT05 19 04/04/2009 LT83 N N S S H N 20 04/04/2009 MV28 21 04/09/2009 JN72 S N S S N S 22 04/09/2009 KT05 23 04/09/2009 LT83 N N S H N N 24 04/09/2009 MV28 S S S S N H 25 04/14/2009 JN72 26 04/14/2009 KT05 27 04/14/2009 LT83 N N S S S H 28 04/14/2009 MV28 H N S S H S |
The following program uses PROC FREQ to determine all the possible combinations of specific variables, and then combine the data set that contains the combinations with the input data set by using PROC SQL. The resulting output table has the same number of observations in each BY group.
Table WOODPECKERS records observations on the presence of six species of woodpeckers in four tracts on seven dates. Observations were made in tract MV28 for all seven dates while observations were made on five of the seven dates in tract JN72, two of the seven dates in tract KT05, and four of the seven dates in tract LT83.
The goal of the PROC SQL step is to add rows to the table so that each tract has seven rows, one for each of the seven observation dates.
By adding the SPARSE option to the TABLES statement, the PROC FREQ step creates a data set that has all possible combinations of VISIT_DATE and TRACT even when the combination does not exist in the input data set. With seven values of VISIT_DATE and the four values of TRACT in the data set, PROC FREQ creates an output data set with 28 observations.
Observations that are unique to ALLVISITS have missing values in output table FULLBIRDSUVEY for columns found only in WOODPECKERS.
Create a table of all possible combinations of VISIT_DATE and TRACT. Suppress printing the table. Save the table in data set ALLVISITS. Keep only the variables that are needed to define all the possible combinations. Include all possible combinations of VISIT_DATE and TRACT even when a combination does not occur in the input data set.
Create table FULLBIRDSURVEY. Perform a left join of ALLVISITS and WOODPECKERS. Put ALLVISITS on the left side of the join so that all rows from ALLVISITS are present in FULLBIRDSURVEY whether or not a row has a match in WOODPECKERS. Assign aliases to both tables. Specify how to match the rows.
proc freq data=woodpeckers; tables visit_date*tract / noprint out=allvisits(keep=visit_date tract) sparse; run; proc sql; create table fullbirdsurvey as select * from allvisits v left join woodpeckers w on v.visit_date=w.visit_date and v.tract=w.tract; quit;
13.59.18.83