Choosing between the DATA Step and PROC SQL

As illustrated earlier, you can sometimes use either the DATA step or PROC SQL to combine data sets. A good understanding of your input data sets and the results you want to obtain will help you determine which tool is the better one to use to combine your data sets or tables. An additional important consideration is which method is easier for you to code and support.

In general, PROC SQL requires more computer resources than the equivalent DATA step. However, you might find coding PROC SQL in certain situations to be much simpler. What you lose in computer resources you might gain in your time to write and support the code.

The DATA step and PROC SQL follow different processes when combining data sets. It is important to understand how the two methods process the input data sets because the results of each method can produce different results.

Many of the examples in this book show how to use both tools. Discussion of efficiency and applicability of both methods is included.

Table 1.2 lists some advantages and disadvantages of the DATA step.

Table 1.2. Advantages and Disadvantages of Using the DATA Step
AdvantagesDisadvantages
There is no limit to the number of input data sets, other than memory.

Multiple data sets can be created in one DATA step.

With the SAS language in the DATA step, complex logic can be programmed that can include arrays and DO loops, and options in the SET, MERGE, UPDATE, and MODIFY statements.

Multiple BY variables enable lookups that depend on more than one variable.
Data sets must be sorted by or indexed on the BY variables prior to match-merging.

When match-merging, the BY variables must be present in all data sets, and the names of the BY variables must match exactly.

An exact match on the key values must be found. Sometimes inexact matches can be made if additional programming logic is added.

Table 1.3 lists some advantages and disadvantages of PROC SQL.

Table 1.3. Advantages and Disadvantages of Using PROC SQL
AdvantagesDisadvantages
Tables do not have to be sorted or indexed, but an index might improve performance.

Multiple tables can be joined in one step without having common columns in all tables.

You can create tables, views, or query reports with the combined data.

Combining tables when variable values do not match exactly can often be much easier to accomplish than with a DATA step.

Matching tables based on the values of variables with different names does not require renaming of the variables as it does in the DATA step.
Complex logic can be difficult to code.

PROC SQL might require more resources for a simple join than a DATA step that uses the MERGE statement. For example, even though input tables do not have to be sorted, PROC SQL might perform a sort behind the scenes, which could require more resources than PROC SORT and a DATA step.

PROC SQL can create only one table from a query.

The maximum number of tables that can be joined at one time is 256.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.53.168