10.3. Sorting and Performance

Sorting data in the SQL procedure, as in other parts of the SAS System, is a CPU and memory-intensive operation. When sufficient amounts of CPU and memory resources are available, the process is usually successful. But if either of these resources is in short supply or simply not available, the sort step is doomed for failure. The first order of business for SAS users is to minimize the number of sorts in their programs. By keeping a few simple guidelines in mind, problems can be minimized.

CPU-related bottlenecks can occur if sorts are performed on disk as opposed to in memory. Because most disks are slower than physical memory, this presents an important performance issue. The most logical and efficient place to perform sorts is in memory. If the sort requires more space than can fit in available memory, the sort must be performed on disk. The objective is to determine how much space a sort will require as well as where the sort will be performed before the sort is executed.

10.3.1. User-Specified Sorting (SORTPGM= System Options)

You can control what sort utility the SAS System uses when performing sorts. By specifying the SORTPGM= system option, you can direct the SAS System to use the best possible sort utility for the environment in question. The SORTPGM= system options are displayed in the following table.

SORTPGM= System Options
Sort OptionPurpose
BESTThe BEST option uses the sort utility best suited to the data.
HOSTThe HOST option tells the SAS System to use the host sort utility available on your host computer. This option may be the most efficient for large tables containing many rows of data.
SASThe SAS option tells the SAS System to use the sort utility supplied with the SAS System.

The next example illustrates using the SORTPGM= option to select the sort utility most suited to the data. Both options use the name that is specified in the SORTNAME= option.

OPTIONS SORTPGM=BEST;

OPTIONS SORTPGM=HOST;

10.3.2. Automatic Sorting

Using the SELECT DISTINCT clause invokes an internal sort to remove duplicate rows. The single exception is when an index exists. The index is then used to eliminate the duplicate rows.

The results of a grouped query are automatically sorted using the grouping columns. When the SELECT clause contains only the columns listed in the GROUP BY clause along with any summary functions, then the duplicates in each group based on the grouping columns are removed as soon as any defined summary functions are performed. If additional columns then appear in the SELECT clause the rows are not collapsed and therefore duplicates are not removed.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.231.15