Generating a Cartesian Product

The most basic type of join combines data from two tables that are specified in the FROM clause of a SELECT statement. When you specify multiple tables in the FROM clause but do not include a WHERE statement to subset data, PROC SQL returns the Cartesian product of the tables. In a Cartesian product, each row in the first table is combined with every row in the second table. The following example illustrates a Cartesian product where table One and table Two are joined using a FROM clause.
proc sql;
   select *;
      from certadv.one, certadv.two
;
quit;
The output shown below displays all possible combinations of each row in table One with all rows in table Two. Note that each table has a column named X, and both of these columns appear in the output. A Cartesian product includes all columns from the source tables. Columns that have common names are not overlaid.
Output 3.1 Cartesian Product of Tables Certadv.One and Certadv.Two
Cartesian Product of Tables One and Two
In most cases, generating all possible combinations of rows from multiple tables does not yield useful results, so a Cartesian product is rarely the query outcome that you want. For example, in the Cartesian product of two tables that contain employee information, each row of output might contain information about two different employees. Usually, you want your join to return only a subset of rows from the tables.
The size of a Cartesian product can also be problematic. The number of rows in a Cartesian product is equal to the product of the number of rows in the contributing tables.
Tables One and Two, used in the preceding example, contain three rows each. The number of rows in the Cartesian product of tables One and Two is calculated as follows:
3 x 3 = 9 rows
Joining small tables such as One and Two results in a relatively small Cartesian product. However, the Cartesian product of large tables can be huge and can require a large amount of system resources for processing.
For example, joining two tables of 1,000 rows each results in output of the following size:
1,000 x 1,000 = 1,000,000 rows
When you run a query that involves a Cartesian product that cannot be optimized, PROC SQL writes the following warning message to the SAS log.
Log 3.1 SAS Log
NOTE: The execution of this query involves performing one or more Cartesian product joins that cannot be optimized.
Although you will not often choose to create a query that returns a Cartesian product, it is important to understand how a Cartesian product is generated. In all types of joins, PROC SQL generates a Cartesian product first, and then eliminates rows that do not meet any subsetting criteria that you have specified.
Note: In many cases, PROC SQL can optimize the processing of a join, thereby minimizing the resources that are required to generate a Cartesian product.
Last updated: October 16, 2019
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.22.77.117