Querying Multiple Tables

Overview

This topic deals with the more complex task of extracting data from two or more tables.
Previously, you learned how to write a PROC SQL step to query a single table. Suppose you now want to examine data that is stored in two tables. PROC SQL enables you to combine tables horizontally, in other words, to combine rows of data.
Table a, table b
In SQL terminology, combining tables horizontally is called joining tables. Joins do not alter the original tables.
Suppose you want to create a report that displays the following information for employees of a company: employee identification number, last name, original salary, and new salary. There is no single table that contains all of these columns, so you must join the two tables Sasuser.Salcomps and Sasuser.Newsals. In your query, you want to select four columns, two from the first table and two from the second table. You also need to ensure that the rows that you join belong to the same employee. To check this, you want to match employee identification numbers for rows that you merge and to select only the rows that match.
SAS table Salcomps and Newsals
This type of join is known as an inner join. An inner join returns a result set for all of the rows in a table that have one or more matching rows in another table.
Note: For more information about PROC SQL joins, see Combining Tables Horizontally Using PROC SQL.
You can write a PROC SQL step to combine tables. To join two tables for a query, you can use a PROC SQL step such as the one below. This step uses the SELECT statement to join data from the tables Salcomps and Newsals. Both of these tables are stored in a SAS library to which the libref Sasuser has been assigned.
proc sql;
   select salcomps.empid,lastname,
          newsals.salary,newsalary
      from sasuser.salcomps,sasuser.newsals
      where salcomps.empid=newsals.empid
      order by lastname;
We examine each clause of this PROC SQL step.

Specifying Columns That Appear in Multiple Tables

When you join two or more tables, list the columns that you want to select from both tables in the SELECT clause. Separate all column names with commas.
If the tables that you are querying contain same-named columns and you want to list one of these columns in the SELECT clause, you must specify a table name as a prefix for that column. Specifying a table-name prefix with a column that only exists in one table is syntactically acceptable.
Note: Prefixing a table name to a column name is called qualifying the column name.
The following PROC SQL step joins the two tables Sasuser.Salcomps and Sasuser.Newsals, both of which contain columns named EmpID. To tell PROC SQL where to read the column EmpID, the SELECT clause specifies the table name Salcomps as a prefix for Empid. The Newsals prefix for Salary is not required, but it is correct syntax and it identifies the source table for this column.
proc sql;
   select salcomps.empid,lastname,
          newsals.salary,newsalary
      from sasuser.salcomps,sasuser.newsals
 where salcomps.empid=newsals.empid
order by lastname;

Specifying Multiple Table Names

When you join multiple tables in a PROC SQL query, you specify each table name in the FROM clause, as shown below:
proc sql;
   select salcomps.empid,lastname,
          newsals.salary,newsalary
      from sasuser.salcomps,sasuser.newsals
      where salcomps.empid=newsals.empid
      order by lastname;
As in the SELECT clause, you separate names in the FROM clause (in this case, table names) with commas.

Specifying a Join Condition

As in a query on a single table, the WHERE clause in the SELECT statement selects rows from two or more tables, based on a condition. When you join multiple tables, ensure that the WHERE clause specifies columns with data whose values match. If none of the values match, then zero rows are returned. Also, the columns in the join condition must be of the same type. The SQL procedure does not attempt to convert data types.
In the following example, the WHERE clause selects only rows in which the value for EmpID in Sasuser.Salcomps matches the value for EmpID in Sasuser.Newsals. Qualified column names must be used in the WHERE clause to specify each of the two EmpID columns.
proc sql;
   select salcomps.empid,lastname,
          newsals.salary,newsalary
      from sasuser.salcomps,sasuser.newsals
      where salcomps.empid=newsals.empid
      order by lastname;
The output is shown, in part, below.
Tables joined by empid
Note: In the table Sasuser.Newsals, the Salary column has the label Employee Salary, as shown in this output.
CAUTION:
If you join tables that do not contain one or more columns with tables that do not have matching data values, several unexpected results might occur. Either you might produce a large amount of data or you might produce all possible row combinations.

Ordering Rows

As in PROC SQL steps that query just one table, the ORDER BY clause specifies which column(s) should be used to sort rows in the output. In the following query, the rows are sorted by LastName:
proc sql;
  select salcomps.empid,lastname,
         newsals.salary,newsalary
     from sasuser.salcomps,sasuser.newsals
     where salcomps.empid=newsals.empid
     order by lastname;
Rows Sorted by LastName
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.229.111