The most basic type of join combines data from two tables
that are specified in the FROM clause of a SELECT statement. When
you specify multiple tables in the FROM clause but do not include
a WHERE statement to subset data, PROC SQL returns the Cartesian product
of the tables. In a Cartesian product, each row in the first table
is combined with every row in the second table. The following example
illustrates a Cartesian product where table One and table Two are
joined using a FROM clause.
proc sql;
select *;
from certadv.one, certadv.two
;
quit;
The output shown below
displays all possible combinations of each row in table One with all
rows in table Two. Note that each table has a column named X, and
both of these columns appear in the output. A Cartesian product includes
all columns from the source tables. Columns that have common names
are not overlaid.
Output 3.1 Cartesian Product of Tables Certadv.One and Certadv.Two
In most cases, generating
all possible combinations of rows from multiple tables does not yield
useful results, so a Cartesian product is rarely the query outcome
that you want. For example, in the Cartesian product of two tables
that contain employee information, each row of output might contain
information about two different employees. Usually, you want your
join to return only a subset of rows from the tables.
The size of a Cartesian
product can also be problematic. The number of rows in a Cartesian
product is equal to the product of the number of rows in the contributing
tables.
Tables One and Two,
used in the preceding example, contain three rows each. The number
of rows in the Cartesian product of tables One and Two is calculated
as follows:
3 x 3 = 9 rows
Joining small tables
such as One and Two results in a relatively small Cartesian product.
However, the Cartesian product of large tables can be huge and can
require a large amount of system resources for processing.
For example, joining
two tables of 1,000 rows each results in output of the following size:
1,000 x 1,000 = 1,000,000 rows
When you run a query
that involves a Cartesian product that cannot be optimized, PROC SQL
writes the following warning message to the SAS log.
Log 3.1 SAS Log
NOTE: The execution of this query involves performing one or more Cartesian product joins that cannot be optimized.
Although you will not
often choose to create a query that returns a Cartesian product, it
is important to understand how a Cartesian product is generated. In
all types of joins, PROC SQL generates a Cartesian product first,
and then eliminates rows that do not meet any subsetting criteria
that you have specified.
Note: In many cases, PROC SQL can
optimize the processing of a join, thereby minimizing the resources
that are required to generate a Cartesian product.