Summary

Text Summary

Understanding Joins

A PROC SQL join is a query that combines tables horizontally (side by side) by combining rows. The two main types of joins are inner joins and outer joins.

Generating a Cartesian Product

When you specify multiple tables in the FROM clause but do not include a WHERE statement to subset data, PROC SQL returns the Cartesian product of the tables. In a Cartesian product, each row in the first table is combined with every row in the second table. In all types of joins, PROC SQL generates a Cartesian product first, and then eliminates rows that do not meet any subsetting criteria that you have specified.

Using Inner Joins

An inner join combines and displays the rows from the first table that match rows from the second table, based on the matching criteria (also known as join conditions) that are specified in the WHERE clause. When the tables that are being joined contain a column with a common name, you might want to eliminate the duplicate column from results or specify a column alias to rename one of the duplicate columns. To refer to tables in an inner join, or in any PROC SQL step, you can specify a temporary name called a table alias.

Using Outer Joins

An outer join combines and displays all rows that match across tables, based on the specified matching criteria (also known as join conditions), plus some or all of the rows that do not match. There are three types of outer joins:
  • A left outer join retrieves all rows that match across tables, based on the specified matching criteria (join conditions), plus nonmatching rows from the left table (the first table specified in the FROM clause).
  • A right outer join retrieves all rows that match across tables, based on the specified matching criteria (join conditions), plus nonmatching rows from the right table (the second table specified in the FROM clause).
  • A full outer join retrieves both matching rows and nonmatching rows from both tables.

Creating an Inner Join with Outer Join-Style Syntax

If you want to use a consistent syntax for all joins, you can write an inner join using the same style of syntax as used for an outer join.

Comparing SQL Joins and DATA Step Match-Merges

DATA step match-merges and PROC SQL joins can produce the same results, although there are important differences between these two techniques.
  • When all the values of the BY variable (column) match and there are no duplicate BY variables, you can use a PROC SQL inner join.
  • When only some of the values of the BY variable match, you can use a PROC SQL full outer join. To overlay common columns, you must use the COALESCE function with the PROC SQL join.

Using In-Line Views

An in-line view is a subquery that appears in a FROM clause. An in-line view selects data from one or more tables to produce a temporary (or virtual) table that the outer query uses to select data for output. You can reference an in-line view with other views or tables, reference multiple tables in an in-line view, and assign an alias to an in-line view.

Joining Multiple Tables and Views

When you perform a complex query that combines more than two tables or views, you might need to choose between several techniques.

Sample Programs

Combining Tables By Using an Inner Join

proc sql outobs=15;
title 'New York Employees';
   select substr(firstname,1,1) || '. ' || lastname 
          as Name,
          jobcode,
          int((today() - dateofbirth)/365.25)
          as Age
      from sasuser.payrollmaster as p,
           sasuser.staffmaster as s
      where p.empid =
            s.empid
            and state='NY'
      order by 2, 3;
quit;

Combining Tables By Using a Left Outer Join

proc sql outobs=20;
title 'All March Flights';
   select m.date,
          m.flightnumber 
	         label='Flight Number',
          m.destination
          label='Left',
          f.destination
          label='Right',
          delay
          label='Delay in Minutes'
      from sasuser.marchflights as m 
           left join 
           sasuser.flightdelays as f
      on m.date=f.date
         and m.flightnumber=
             f.flightnumber
      order by delay;
quit;

Overlaying Common Columns in a Full Outer Join

 proc sql;
    select coalesce(p.empid, s.empid) 
           as ID, firstname, lastname, gender
       from sasuser.payrollmaster as p 
	               full join 
            sasuser.staffmaster as s
       on p.empid = s.empid
       order by id;
quit;

Joining Tables By Using a Subquery and an In-Line View

 proc sql;
    select empid
       from sasuser.supervisors as m,
          (select substr(jobcode,1,2) as JobCategory,
                  state
             from sasuser.staffmaster as s,
                  sasuser.payrollmaster as p
             where s.empid=p.empid and s.empid in
                (select empid
                   from sasuser.flightschedule
                   where date='04mar2000'd
                         and destination='CPH')) as c
       where  m.jobcategory=c.jobcategory
              and m.state=c.state;
quit;

Points to Remember

  • In most cases, generating all possible combinations of rows from multiple tables does not yield useful results, so a Cartesian product is rarely the query outcome that you want.
  • The maximum number of tables that you can combine in a single inner join depends on your version of SAS. If the join involves views, it is the number of tables that underlie the views, not the number of views, that counts toward the limit. An outer join can be performed on only two tables or views at a time.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.97.126