In this chapter, I discuss a number of problems that you may encounter when dealing with data, along with possible solutions. I suggest that you try to find a solution before reading the solution in the text. Also, I should mention that there are usually many different ways of solving a given problem. In fact, you may very well be able to find a more efficient solution than the one given. The main purpose of these problems and solutions is to give you some food for thought.
Before beginning, let us note that many of the upcoming solutions involve the use of subqueries. We discussed subqueries in Chapter 6, but let us review quickly here.
Access SQL permits the use of SELECT
statements within other SELECT
statements (as well as in other
statements, such as INSERT
INTO
statements). The internal, or nested,
SELECT
statement is referred to as a
subquery.
Note that you may include a nested SELECT
statement within a main SELECT
statement only if the internal SELECT
statement returns at most one
record. To illustrate, consider the main SQL
statement:
SELECT Hour, (SELECT Count(Interval) FROM StartTimes WHERE (StartTime <= Hour)) FROM Hours
Here, the internal SQL statement:
SELECT Count(Interval) FROM StartTimes WHERE (StartTime <= Hour)
returns at most a single record, because it returns a Count. Note
also that the WHERE
clause in the
internal SQL statement refers to the Hour field
that is part of the main SQL, thus linking the return value of the
internal statement to the current record in the HOURS table.
The computation of running sums is a common operation. To illustrate, consider Table 19-1, which contains the duration (in hours, say) for various events.
For each event, we want to compute the sum of all the durations of the events that precede that event. This sum is a running sum.
One approach is to use the Cartesian product of the table with itself. In this way, we can access all records whose Event number precedes that of a given record. For instance, for the record with Event number 5, we need access to the records with Event numbers 1 through 4. The Cartesian product provides us with these records.
Here is the SQL statement that does the job:
SELECT Running.Event, Sum(RunningCopy.Duration) AS StartTime FROM Running, Running AS RunningCopy WHERE (RunningCopy.Event < Running.Event) GROUP BY Running.Event
The FROM
clause creates the
Cartesian product of the table with itself. The WHERE
clause restricts the records to
those for which:
RunningCopy.Event < Running.Event
that is, to the records that provide information about the
records preceding each record in Table 19-1. Finally, we
GROUP
BY
Event and compute the sum of the
durations.
The problem is that Cartesian products are very inefficient and use a lot of resources. (If Table 19-1 has 100,000 rows, then the Cartesian product has 100,000 x 100,000 = 10,000,000,000 rows!)
A more efficient solution is to use a nested SELECT
statement, that is, to use a
SELECT
statement within the main
SELECT
statement. Recall that
this is permitted in Access SQL, provided that the internal SELECT
statement returns at most
one record.
In the following SQL statement, note the use of table aliases, which are needed because we must refer to Table 19-1 in two contexts:
SELECT R1.Event, (SELECT SUM(R2.Duration) FROM Running As R2 WHERE R2.Event < R1.Event) AS StartTime FROM Running As R1
The internal SQL statement:
SELECT SUM(R2.Duration) FROM Running As R2 WHERE R2.Event < R1.Event
returns the sum of the duration for all events
preceding the current event, which is denoted
by R1.Event
.
Example 19-1
shows a VBA procedure to execute this SQL statement. The
DoCmd object is used in VBA to run an Access
action. Thus, the line DoCmd.OpenQuery
opens a query in
Access.
Private Sub RunningSumSQL( ) Dim db As Database Set db = CurrentDb Dim qry As QueryDef Dim sSQL As String On Error Resume Next db.QueryDefs.Delete "temp" On Error GoTo 0 sSQL = "SELECT R1.Event," & _ " (SELECT SUM(R2.Duration)" & _ " FROM Running As R2" & _ " WHERE R2.Event < R1.Event)" & _ " AS StartTime" & _ " FROM Running As R1" Set qry = db.CreateQueryDef("temp", sSQL) DoCmd.OpenQuery qry.Name End Sub
Another approach is to use DAO, which provides a very simple solution in this case. It creates a permanent result table, whereas the previous solution creates a select query. Example 19-2 shows the DAO code performing the same operation. The results are placed in Table 19-1.
Private Sub RunningSumDAO( ) Dim db As Database Dim rs As Recordset Dim lRunningSum As Long Set db = CurrentDb lRunningSum = 0 Set rs = db.OpenRecordset("SELECT * FROM Running ORDER BY Event") Do While Not rs.EOF rs.Edit rs!RunningSum = lRunningSum rs.Update lRunningSum = lRunningSum + rs!Duration rs.MoveNext Loop rs.Close End Sub
In Table 19-2, the rows denote intervals of time. The problem is determining, for each hour of the day, the number of intervals that contain this hour.
Interval | StartTime | EndTime |
1 | 4:00:00 PM | 7:00:00 PM |
2 | 5:00:00 PM | 9:00:00 PM |
3 | 2:00:00 PM | 6:00:00 PM |
4 | 8:00:00 PM | 11:59:00 PM |
5 | 12:00:00 PM | 4:00:00 PM |
For this, we also use an HOURS table (see Table 19-3).
Hours |
12:00:00 PM |
1:00:00 PM |
2:00:00 PM |
3:00:00 PM |
4:00:00 PM |
5:00:00 PM |
6:00:00 PM |
7:00:00 PM |
8:00:00 PM |
9:00:00 PM |
10:00:00 PM |
11:00:00 PM |
11:59:00 PM |
This problem can be solved using a nested SELECT
statement (that is, a subquery).
Here is the code:
Private Sub OverlappingIntervals( ) Dim db As Database Set db = CurrentDb Dim qry As QueryDef Dim sSQL As String On Error Resume Next db.QueryDefs.Delete "temp" On Error GoTo 0 sSQL = "SELECT Hours.Hour," & _ " (SELECT Count(Interval) AS CountOfIntervals" & _ " FROM Overlap" & _ " WHERE (StartTime <= Hours.Hour) And" & _ " (Hours.Hour < EndTime))" & _ " FROM Hours" Set qry = db.CreateQueryDef("temp", sSQL) DoCmd.OpenQuery qry.Name End Sub
We invite you to create a DAO solution. The problem in the next section illustrates the overlapping intervals technique.
A company employs workers and supervisors. During a typical day, each worker and supervisor works one or more shifts, which consist of consecutive hours. Table 19-4 shows a typical day (from 12 noon to 12 midnight).
EmpID | EmpType | StartHour | EndHour |
1 | Super | 12:00:00 PM | 5:59:00 PM |
2 | Super | 6:00:00 PM | 11:59:00 PM |
3 | Super | 4:00:00 PM | 8:59:00 PM |
4 | Worker | 4:00:00 PM | 6:59:00 PM |
5 | Worker | 5:00:00 PM | 8:59:00 PM |
6 | Worker | 2:00:00 PM | 5:59:00 PM |
7 | Worker | 8:00:00 PM | 11:59:00 PM |
8 | Worker | 12:00:00 PM | 3:59:00 PM |
We want to compute the maximum number of workers that each supervisor must supervise at one time.
This problem can be handled in a two-step process. First, we count the number of workers in each hour. Consider the following SQL statement:
SELECT Hours.Hour, (SELECT Count(EmpType) FROM SuperLoad WHERE (Starthour <= Hours.Hour) And (Hours.Hour < EndHour) And (EmpType='Worker')) AS CountOfWorkers FROM Hours
This, again, uses a subquery that returns a single record giving the number of workers that are working during a given hour.
Using this SQL statement, we make a query named qry1
, so it can be used in the next step.
See Table 19-5 for
the result of this query.
Hours | CountOfWorkers |
12:00:00 PM | 1 |
1:00:00 PM | 1 |
2:00:00 PM | 2 |
3:00:00 PM | 2 |
4:00:00 PM | 2 |
5:00:00 PM | 3 |
6:00:00 PM | 2 |
7:00:00 PM | 1 |
8:00:00 PM | 2 |
9:00:00 PM | 1 |
10:00:00 PM | 1 |
11:00:00 PM | 1 |
The next step is computing the supervisor load as the maximum number of workers in each supervisor’s shift. Do this by using the name of the query from the previous step in the following SQL statement:
sSQL2 = "SELECT SuperLoad.EmpID, SuperLoad.EmpType," & _ " (SELECT Max(CountOfWorkers) AS WorkerLoad" & _ " FROM [" & qry1.Name & "]" & _ " WHERE ((Hours.Hour >= StartHour) And (Hours.Hour < Endhour)))" & _ " FROM SuperLoad" & _ " WHERE SuperLoad.EmpType = 'Super'"
The results are shown in Table 19-6.
The complete code for this solution is in Example 19-3.
Private Sub SupervisorLoad( ) Dim db As Database Set db = CurrentDb Dim qry1 As QueryDef Dim qry2 As QueryDef Dim sSQL1 As String Dim sSQL2 As String On Error Resume Next db.QueryDefs.Delete "temp1" db.QueryDefs.Delete "temp2" On Error GoTo 0 sSQL1 = "SELECT Hours.Hour," & _ " (SELECT Count(EmpType) FROM SuperLoad" & _ " WHERE (Starthour <= Hours.Hour) And (Hours.Hour < EndHour)" & _ " And (EmpType='Worker'))" & _ " AS CountOfWorkers" & _ " FROM Hours" Set qry1 = db.CreateQueryDef("temp1", sSQL1) ' Uncomment to see how this step looks 'DoCmd.OpenQuery qry1.Name sSQL2 = "SELECT SuperLoad.EmpID, SuperLoad.EmpType," & _ " (SELECT Max(CountOfWorkers) AS WorkerLoad" & _ " FROM [" & qry1.Name & "]" & _ " WHERE ((Hours.Hour >= StartHour) And (Hours.Hour < Endhour)))" & _ " FROM SuperLoad" & _ " WHERE SuperLoad.EmpType = 'Super'" Set qry2 = db.CreateQueryDef("temp2", sSQL2) DoCmd.OpenQuery qry2.Name End Sub
Imagine a conference where your task is assigning conference rooms to attendees. Table 19-7 shows the preregistered attendees along with corresponding room numbers.
Name | Room |
_default | 15 |
_default | 14 |
_default | 13 |
_default | 12 |
Bach | 123 |
Beethoven | 231 |
Mozart | 455 |
Chopin | 455 |
Elgar | 231 |
Gluck | 123 |
Liszt | 455 |
Note that the table contains several default choices. If an individual is not in the table, then you want to assign one of the default rooms to that individual. Moreover, to avoid overcrowding, you want to assign the default room numbers randomly. How do you do this?
This problem can be solved in a variety of ways, one of which
provides a nice use of both subqueries and the UNION
statement. First, consider the SQL
statement:
sSQL1 = "SELECT Room FROM Assignment WHERE (Name = [Enter Name])"
Recall that [Enter
name]
is a parameter. When the query is
run, the user will be prompted for a name, which will be substituted
for [Enter
name]
. This statement will return the
record associated with a given name if it is in the table;
otherwise, it will return the empty recordset.
Now consider the statement:
sSQL2 = "SELECT Room FROM Assignment WHERE (Name = '_default') AND ([Enter Name] NOT IN (SELECT Name FROM Assignment))"
The clause:
[Enter Name] NOT IN (SELECT Name FROM Assignment)
returns TRUE
if and only if
the name entered by the user is not in the
table. Hence, the clause sSQL2
can be rewritten based on two cases, name in table:
sSQL2 = "SELECT Room FROM Assignment WHERE (Name = '_default') AND FALSE"
and name not in table:
sSQL1 = "SELECT Room FROM Assignment WHERE (Name = '_default') AND TRUE"
This simplifies further to name in table:
sSQL2 = "SELECT Room FROM Assignment WHERE FALSE"
and name not in table:
sSQL1 = "SELECT Room FROM Assignment WHERE (Name = '_default')"
Thus, this statement returns the empty recordset if the name is in the table and the default records if the name is not in the table.
Now we take the union:
sSQL3 = sSQL1 & " UNION " & sSQL2
This SQL statement will return the room number for a name if the name is in the table; otherwise, it returns the default records.
Now, all we need to do is return a random record! Note that this will work in either case, because if the name is in the table, there is only one record, so a randomly chosen record must be that record.
The complete code is shown in Example 19-4.
Private Sub AssignmentWithDefault( ) Dim db As Database Set db = CurrentDb Dim sName As String Dim qry1 As QueryDef Dim rs As Recordset Dim sSQL1 As String Dim sSQL2 As String Dim sSQL3 As String Dim lRandom As Long Dim lcRecords As Long On Error Resume Next db.QueryDefs.Delete "temp1" On Error GoTo 0 sSQL1 = "SELECT Room FROM Assignment" & _ " WHERE (Name = [Enter Name])" sSQL2 = "SELECT Room FROM Assignment" & _ " WHERE (Name = '_default') AND ([Enter Name] NOT IN (SELECT Name FROM Assignment))" sSQL3 = sSQL1 & " UNION " & sSQL2 Set qry1 = db.CreateQueryDef("temp1", sSQL3) sName = InputBox("Enter name") qry1.Parameters(0) = sName ' To see the results ''DoCmd.OpenQuery qry1.Name Set rs = qry1.OpenRecordset ' Populate and get recordcount rs.MoveLast lcRecords = rs.RecordCount ' Random record Randomize Timer ' lRandom is between 0 and lcRecords-1 lRandom = Int(lcRecords * Rnd) rs.MoveFirst rs.Move lRandom MsgBox "Room for " & sName & " is " & rs!Room End Sub
Here is a simple time-to-completion problem. Table 19-8 shows the status of widget production for your company. At various stages in the production process, the workers enter a record into the table indicating the remaining time to completion for the widget.
WidgetID | TimeToCompletion |
1 | 5 |
1 | 3 |
1 | 2 |
1 | 1 |
2 | 6 |
2 | 3 |
2 | 0 |
3 | 8 |
3 | 7 |
3 | 6 |
3 | 4 |
4 | 9 |
4 | 4 |
4 | 2 |
4 | 0 |
We want to identify those widgets that are not yet completed.
The next SQL statement does the job. Note the use of the
NOT
IN
form of subquery, which is discussed in
Section 6.7.10 in
Chapter 6.
SELECT DISTINCT WidgetID FROM Widgets As W1 WHERE 0 NOT IN (SELECT TimeToCompletion FROM Widgets As W2 WHERE W2.WidgetID = W1.WidgetID)
Let’s make the time-to-completion problem more complicated. Imagine again that you are keeping track of the status of widget production for your company. Each widget is composed of four modules, each of which is assembled separately. Table 19-9 shows some sample data.
WidgetID | ModuleID | TimeToCompletion |
1 | 1 | 0 |
1 | 2 | 1 |
1 | 3 | 2 |
1 | 4 | 1 |
2 | 1 | 1 |
2 | 2 | 2 |
2 | 3 | 3 |
2 | 4 | 4 |
3 | 1 | 0 |
3 | 2 | 5 |
3 | 3 | 4 |
3 | 4 | 0 |
4 | 1 | 0 |
4 | 2 | 1 |
4 | 3 | 1 |
4 | 4 | 2 |
We are trying to determine the widgets in which Module 1 is the only completed module—that is, where TimeToCompletion is equal to zero for Module 1, but not for any of the other modules in the widget. Thus, for our table, Widgets 1 and 4 qualify.
Consider the following SQL statement:
SELECT WidgetID FROM Widgets AS W1 WHERE (TimeToCompletion = 0) AND 0 NOT IN (SELECT TimeToCompletion FROM Widgets AS W2 WHERE (W2.WidgetID=W1.WidgetID) AND (W2.ModuleID <> 1))
The subquery selects, for a particular widget, all TimeToCompletions for all modules except the first module. We can then test to see if that set of TimeToCompletions contains a zero.
Example 19-5 shows the complete code.
Private Sub TimeToCompletion( ) Dim db As Database Set db = CurrentDb Dim qry1 As QueryDef Dim sSQL1 As String On Error Resume Next db.QueryDefs.Delete "temp1" On Error GoTo 0 sSQL1 = "SELECT WidgetID FROM Widgets AS W1" & _ " WHERE (TimeToCompletion = 0) AND" & _ " 0 NOT IN" & _ " (SELECT TimeToCompletion FROM Widgets AS W2" & _ " WHERE (W2.WidgetID=W1.WidgetID) AND (W2.ModuleID <> 1))" Set qry1 = db.CreateQueryDef("temp1", sSQL1) DoCmd.OpenQuery qry1.Name End Sub
Let’s make the time-to-completion problem even more involved. Suppose each module is composed of several parts. Periodically, the workers involved with a particular part will make an entry into a database table, as shown in Table 19-10.
WidgetID | ModuleID | PartID | TimeToCompletion |
1 | 1 | 1 | 3 |
1 | 1 | 1 | 4 |
1 | 1 | 1 | 5 |
1 | 1 | 2 | 2 |
1 | 1 | 2 | 4 |
1 | 1 | 2 | 1 |
1 | 2 | 1 | 6 |
1 | 2 | 1 | 5 |
1 | 2 | 1 | 3 |
1 | 2 | 2 | 7 |
1 | 2 | 2 | 4 |
1 | 2 | 2 | 3 |
1 | 2 | 3 | 4 |
1 | 2 | 3 | 5 |
1 | 2 | 3 | 6 |
1 | 3 | 1 | 8 |
1 | 3 | 1 | 5 |
1 | 3 | 2 | 2 |
1 | 3 | 2 | 4 |
We want to compute the time to completion for each part, module, and widget. Note that there may be several entries for a given part. The time to complete a given part is the minimum of the times in these rows.
Let’s take a step-by-step approach to the solution. Later, we can present a more elegant, but less readable, solution.
First, we create an SQL statement that returns only those rows of the table that, for each widget/module, have the smallest part TimeToCompletion. We can do this in two steps. The first SQL statement selects the TimeToCompletion field for all records in Widgets2 that have a given WidgetID, ModuleID, and PartID.
' Times to completion for given WidgetID/ModuleID/PartID sSQL1 = "SELECT TimeToCompletion FROM Widgets2 AS W2" & _ " WHERE (W2.WidgetID = W1.WidgetID)" & _ " And (W2.ModuleID = W1.ModuleID)" & _ " And (W2.PartID = W1.PartID)"
The second SQL statement returns all records whose TimeToCompletion is less than or equal to all records returned in the first SQL statement—that is, all records for the given WidgetID, ModuleID, and PartID:
' Those records that have minimum time to completion for each part sSQL2 = "SELECT WidgetID, ModuleID, PartID," & _ " TimeToCompletion AS TimeToFinishPart FROM Widgets2 AS W1" & _ " WHERE TimeToCompletion <= ALL (" & sSQL1 & ")"
An alternative approach is to use a single nested SELECT
statement:
sSQL2 = "SELECT DISTINCT WidgetID, ModuleID, PartID," & _ " (SELECT MIN(TimeToCompletion)" & _ " FROM Widgets2 as W2 WHERE" & _ " (W2.WidgetID = W1.WidgetID) And" & _ " (W2.ModuleID = W1.ModuleID) And" & _ " (W2.PartID = W1.PartID))" & _ " AS TimeToFinishPart" & _ " FROM Widgets2 AS W1"
Running this query
Set qry1 = db.CreateQueryDef("temp1", sSQL2) DoCmd.OpenQuery qry1.Name
will result in Table 19-11.
WidgetID | ModuleID | PartID | TimeToFinishPart |
1 | 1 | 1 | 3 |
1 | 1 | 2 | 1 |
1 | 2 | 1 | 3 |
1 | 2 | 2 | 3 |
1 | 2 | 3 | 4 |
1 | 3 | 1 | 5 |
1 | 3 | 2 | 2 |
Using this query, it is simple to get the time to completion for each module:
' Time to finish each module sSQL3 = "SELECT WidgetID, ModuleID," & _ " Max(TimeToFinishPart) AS TimeToFinishModule FROM " & qry1.Name & _ " GROUP BY WidgetID, ModuleID" Set qry2 = db.CreateQueryDef("temp2", sSQL3) ' Show it DoCmd.OpenQuery qry2.Name
This query results in Table 19-12.
Finally, we can compute the time to completion for each widget:
' Time to finish each Widget sSQL4 = "SELECT WidgetID," & _ " Max(TimeToFinishModule) AS TimeToFinishWidget FROM " & qry2.Name & _ " GROUP BY WidgetID" Set qry3 = db.CreateQueryDef("temp3", sSQL4)
This results in Table 19-13.
It is possible to get the time to completion in a single SQL statement, although I definitely do not recommend doing so. The result may be more elegant, but it is also harder to read. For instance, for modules, we have:
sSQL1 = "SELECT DISTINCT WidgetID, ModuleID," & _ " TimeToCompletion FROM Widgets2 AS W1" & _ " WHERE TimeToCompletion =" & _ " (SELECT MAX(TimeToCompletion) FROM Widgets2 As W2" & _ " WHERE TimeToCompletion =" & _ " (SELECT MIN(TimeToCompletion) FROM Widgets2 AS W3" & _ " WHERE (W3.WidgetID = W2.WidgetID)" & _ " And (W3.ModuleID = W2.ModuleID)" & _ " And (W3.PartID = W2.PartID)" & _ " Group BY W3.WidgetID, W3.ModuleID, W3.PartID)" & _ " AND (W2.WidgetID = W1.WidgetID) And (W2.ModuleID = W1.ModuleID)" & _ " GROUP BY W2.WidgetID, W2.ModuleID)"
Digesting this SQL statement will probably take time, and I hope it will make you reconsider using such a statement in your own applications.
Imagine a database of personal statistics with two tables (Table 19-14 and Table 19-15).
Notice that Table 19-15 has one row per statistic. Thus, the data for an individual person is arranged vertically. Notice also that some data is missing. For instance, there is no data at all for Liszt.
EmpID | StatType | Value |
1 | Age | 45 |
1 | Height | 63 |
1 | Weight | 150 |
2 | Age | 46 |
2 | Height | 67 |
3 | Age | 35 |
3 | Weight | 135 |
4 | Age | 44 |
5 | Height | 76 |
Now, we want to view the data horizontally, as in Table 19-16.
EmpID | Name | Age | Height | Weight |
1 | Beethoven | 45 | 63 | 150 |
2 | Chopin | 46 | 67 | |
3 | Mozart | 35 | 135 | |
4 | Schubert | 44 | ||
5 | Brahms | 76 | ||
6 | Liszt |
One solution is given by the following SQL statement:
SELECT DISTINCT Composers.EmpID, Name, (SELECT Value FROM ComposerData As T2 WHERE (T2.StatType='Age') And (T2.EmpID=Composers.EmpID)) As Age, (SELECT Value FROM ComposerData As T2 WHERE (T2.StatType='Height') And (T2.EmpID=Composers.EmpID)) As Height, (SELECT Value FROM ComposerData As T2 WHERE (T2.StatType='Weight') And (T2.EmpID=Composers.EmpID)) As Weight FROM (Composers INNER JOIN ComposerData ON Composers.EmpID=ComposerData.EmpID)
Here, we have multiple SELECT
subquery statements within the main
SELECT
clause. For instance, the
clause:
(SELECT Value FROM ComposerData As T2 WHERE (T2.StatType='Age') And (T2.EmpID=Composers.EmpID)) As Age,
selects the age for the person selected by the main SELECT
clause.
As the number of statistics grows, this SQL statement becomes more complex. Example 19-6 shows an alternative solution using DAO that does not require adjusting when additional statistics are added.
Private Sub VerticalToHorizontal2( ) Dim db As Database Set db = CurrentDb Dim rsEmp As Recordset Dim rsData As Recordset Dim rsHor As Recordset Set rsEmp = db.OpenRecordset("Composers") Set rsHor = db.OpenRecordset("ComposersOutput") Do While Not rsEmp.EOF Set rsData = db.OpenRecordset( _ "SELECT * FROM ComposerData WHERE EmpID = " & rsEmp!EmpID) rsHor.AddNew rsHor!EmpID = rsEmp!EmpID rsHor!Name = rsEmp!Name Do While Not rsData.EOF rsHor.Fields(rsData!StatType).Value = rsData!Value rsData.MoveNext Loop rsHor.Update rsEmp.MoveNext Loop rsEmp.Close rsData.Close rsHor.Close End Sub
Table 19-17 presents programmers and their language skills. Table 19-18 specifies the language requirements for a number of different jobs. We want to display a list of the jobs and their respective qualified programmers.
Name | Language |
Blaise Pascal | VB |
Blaise Pascal | C++ |
Blaise Pascal | Access |
Blaise Pascal | Excel |
Gauss | VB |
Gauss | Access |
Gauss | Delphi |
Gauss | SQL Server |
Smith | C++ |
Von Neuman | VB |
Von Neuman | C++ |
Wordsworth | Delphi |
Wordsworth | C++ |
Wordsworth | Word |
One solution is given by the following SQL statement:
SELECT ProgrammingJobs.JobID, Programmers.Name FROM Programmers INNER JOIN ProgrammingJobs ON Programmers.Language = ProgrammingJobs.Language GROUP BY ProgrammingJobs.JobID, Programmers.Name HAVING Count(Programmers.Language)= (SELECT Count([Language]) FROM ProgrammingJobs AS PJ WHERE PJ.JobID=ProgrammingJobs.JobID)
We begin with an INNER
JOIN
of the two tables on the
Language field. For each job/programmer pair,
this INNER
JOIN
creates a set of records of the
form:
JobID X - Language 1 - ProgrammerName Y JobID X - Language 2 - ProgrammerName Y JobID X - Language 3 - ProgrammerName Y . . .
where the job requires the language, and the programmer is skilled in that language.
Now, for each job/programmer pair, we need to ensure that the
number of such records is the same as the number of languages
required by that job. This is accomplished by grouping the records
by job/programmer pair and then using a HAVING
clause that compares a count of
those records with the count of languages for that job. The
resulting table is Table
19-19.
A common problem is determining when two sets are equal, that is, when they have the same elements. Consider Table 19-20, which shows five sets and their members. To simplify this as much as possible, we simply number the sets and assume they contain numbers themselves. We want to get a list of which sets are equal.
This problem has an elegant solution using a single SQL statement. While, in general, SQL does not permit us to compare two sets directly, as in:
(SELECT Members FROM Equality WHERE Set=1) = (SELECT Members FROM Equality WHERE _ Set=2)
it will accept such clauses if the two SELECT
statements return a single
value.
Consider now the SQL statement:
SELECT Equality.Set, E2.Set FROM Equality INNER JOIN Equality AS E2 ON (Equality.Member = E2.Member) And (Equality.Set < E2.Set) GROUP BY Equality.Set, E2.Set HAVING ((SELECT Count(Member) FROM Equality As E3 WHERE E3.Set=Equality.Set) = (SELECT Count(Member) FROM Equality As E3 WHERE E3.Set=E2.Set)) AND (Count(Equality.Set) = (SELECT Count(Member) FROM Equality As E3 WHERE E3.Set=E2.Set))
The INNER
JOIN
is on the clause:
(Equality.Member = E2.Member) And (Equality.Set < E2.Set)
The important part of this clause is the first part. It states that we want all set pairs that have a common member. The second part prevents returning duplicate set pairs. For instance, if sets 1 and 2 both contain the number 3, we don’t want to return both pairs [(1,2) and (2,1)].
To illustrate further, since the number 3 is in sets 1, 2, 3, and 4, the records returned for the member 3 are as follows:
(1,2) (from member 3) (1,3) (from member 3) (1,4) (from member 3) (2,3) (from member 3) (2,4) (from member 3) (3,4) (from member 3)
If it were not for the clause Equality.Set
<
E2.Set
, we would also be getting (1,1),
(2,2), ... (4,4), as well as (2,1), (3,1), and so on.
Now we ask the question, “How many times will a given set pair appear?” A given set pair, say (1,2), will appear as many times as there are common elements between the two sets. That is, it will appear as many times as the size of the intersection of the two sets.
So if we GROUP
BY
set pair, we can examine these
intersections and restrict the returns using a HAVING
clause. The HAVING
clause we want says that the two
sets are equal. But two sets A and B are equal if the sizes of A, B,
and the intersection of A and B are all the same! The clause:
(SELECT Count(Member) FROM Equality As E3 WHERE E3.Set=Equality.Set) = (SELECT Count(Member) FROM Equality As E3 WHERE E3.Set=E2.Set)
says that, for a given set pair (Equality.Set
, E2.Set
) from the main SELECT
clause, the size of Equality.Set
is equal to the size of
E2.Set
. The clause:
Count(Equality.Set) = (SELECT Count(Member) FROM Equality As E3 WHERE E3.Set=E2.Set)
says that the size of the intersection of Equality.Set
and E2.Set
is the same as the size of E2.Set
. That’s it.
18.222.117.157