Chapter 19. Some Common Data Manipulation Problems

In this chapter, I discuss a number of problems that you may encounter when dealing with data, along with possible solutions. I suggest that you try to find a solution before reading the solution in the text. Also, I should mention that there are usually many different ways of solving a given problem. In fact, you may very well be able to find a more efficient solution than the one given. The main purpose of these problems and solutions is to give you some food for thought.

Before beginning, let us note that many of the upcoming solutions involve the use of subqueries. We discussed subqueries in Chapter 6, but let us review quickly here.

Access SQL permits the use of SELECT statements within other SELECT statements (as well as in other statements, such as INSERT INTO statements). The internal, or nested, SELECT statement is referred to as a subquery.

Note that you may include a nested SELECT statement within a main SELECT statement only if the internal SELECT statement returns at most one record. To illustrate, consider the main SQL statement:

SELECT Hour,
 (SELECT Count(Interval) FROM StartTimes WHERE (StartTime <= Hour))
 FROM Hours

Here, the internal SQL statement:

SELECT Count(Interval) FROM StartTimes WHERE (StartTime <= Hour)

returns at most a single record, because it returns a Count. Note also that the WHERE clause in the internal SQL statement refers to the Hour field that is part of the main SQL, thus linking the return value of the internal statement to the current record in the HOURS table.

Running Sums

The computation of running sums is a common operation. To illustrate, consider Table 19-1, which contains the duration (in hours, say) for various events.

Table 19-1. A running sum

Event

Duration

1

1

2

5

3

6

4

3

5

4

6

1

7

8

8

2

For each event, we want to compute the sum of all the durations of the events that precede that event. This sum is a running sum.

Solution

One approach is to use the Cartesian product of the table with itself. In this way, we can access all records whose Event number precedes that of a given record. For instance, for the record with Event number 5, we need access to the records with Event numbers 1 through 4. The Cartesian product provides us with these records.

Here is the SQL statement that does the job:

SELECT Running.Event, Sum(RunningCopy.Duration) AS StartTime
FROM Running, Running AS RunningCopy
WHERE (RunningCopy.Event < Running.Event)
GROUP BY Running.Event

The FROM clause creates the Cartesian product of the table with itself. The WHERE clause restricts the records to those for which:

RunningCopy.Event < Running.Event

that is, to the records that provide information about the records preceding each record in Table 19-1. Finally, we GROUP BY Event and compute the sum of the durations.

The problem is that Cartesian products are very inefficient and use a lot of resources. (If Table 19-1 has 100,000 rows, then the Cartesian product has 100,000 x 100,000 = 10,000,000,000 rows!)

A more efficient solution is to use a nested SELECT statement, that is, to use a SELECT statement within the main SELECT statement. Recall that this is permitted in Access SQL, provided that the internal SELECT statement returns at most one record.

In the following SQL statement, note the use of table aliases, which are needed because we must refer to Table 19-1 in two contexts:

SELECT R1.Event,
(SELECT SUM(R2.Duration) FROM Running As R2 WHERE R2.Event < R1.Event)
   AS StartTime
FROM Running As R1

The internal SQL statement:

SELECT SUM(R2.Duration) FROM Running As R2 WHERE R2.Event < R1.Event

returns the sum of the duration for all events preceding the current event, which is denoted by R1.Event.

Example 19-1 shows a VBA procedure to execute this SQL statement. The DoCmd object is used in VBA to run an Access action. Thus, the line DoCmd.OpenQuery opens a query in Access.

Example 19-1. Calculating running sums using nested SQL statements
Private Sub RunningSumSQL(  )

Dim db As Database
Set db = CurrentDb

Dim qry As QueryDef
Dim sSQL As String

On Error Resume Next
db.QueryDefs.Delete "temp"
On Error GoTo 0

sSQL = "SELECT R1.Event," & _
" (SELECT SUM(R2.Duration)" & _
" FROM Running As R2" & _
" WHERE R2.Event < R1.Event)" & _
" AS StartTime" & _
" FROM Running As R1"

Set qry = db.CreateQueryDef("temp", sSQL)

DoCmd.OpenQuery qry.Name

End Sub

Another approach is to use DAO, which provides a very simple solution in this case. It creates a permanent result table, whereas the previous solution creates a select query. Example 19-2 shows the DAO code performing the same operation. The results are placed in Table 19-1.

Example 19-2. Calculating a running sum using DAO
Private Sub RunningSumDAO(  )

Dim db As Database
Dim rs As Recordset
Dim lRunningSum As Long

Set db = CurrentDb

lRunningSum = 0

Set rs = db.OpenRecordset("SELECT * FROM Running ORDER BY Event")
Do While Not rs.EOF
   rs.Edit
   rs!RunningSum = lRunningSum
   rs.Update
   lRunningSum = lRunningSum + rs!Duration
   rs.MoveNext
Loop

rs.Close

End Sub

Overlapping Intervals I

In Table 19-2, the rows denote intervals of time. The problem is determining, for each hour of the day, the number of intervals that contain this hour.

Table 19-2. Overlap table: Rows denote time intervals

Interval

StartTime

EndTime

1

4:00:00 PM

7:00:00 PM

2

5:00:00 PM

9:00:00 PM

3

2:00:00 PM

6:00:00 PM

4

8:00:00 PM

11:59:00 PM

5

12:00:00 PM

4:00:00 PM

For this, we also use an HOURS table (see Table 19-3).

Table 19-3. Hours table

Hours

12:00:00 PM

1:00:00 PM

2:00:00 PM

3:00:00 PM

4:00:00 PM

5:00:00 PM

6:00:00 PM

7:00:00 PM

8:00:00 PM

9:00:00 PM

10:00:00 PM

11:00:00 PM

11:59:00 PM

Solution

This problem can be solved using a nested SELECT statement (that is, a subquery). Here is the code:

Private Sub OverlappingIntervals(  )

Dim db As Database
Set db = CurrentDb

Dim qry As QueryDef
Dim sSQL As String

On Error Resume Next
db.QueryDefs.Delete "temp"
On Error GoTo 0

sSQL = "SELECT Hours.Hour," & _
" (SELECT Count(Interval) AS CountOfIntervals" & _
" FROM Overlap" & _
" WHERE (StartTime <= Hours.Hour) And" & _
" (Hours.Hour < EndTime))" & _
" FROM Hours"

Set qry = db.CreateQueryDef("temp", sSQL)

DoCmd.OpenQuery qry.Name

End Sub

We invite you to create a DAO solution. The problem in the next section illustrates the overlapping intervals technique.

Overlapping Intervals II

A company employs workers and supervisors. During a typical day, each worker and supervisor works one or more shifts, which consist of consecutive hours. Table 19-4 shows a typical day (from 12 noon to 12 midnight).

Table 19-4. Superload table: Hours worked by supervisors and workers

EmpID

EmpType

StartHour

EndHour

1

Super

12:00:00 PM

5:59:00 PM

2

Super

6:00:00 PM

11:59:00 PM

3

Super

4:00:00 PM

8:59:00 PM

4

Worker

4:00:00 PM

6:59:00 PM

5

Worker

5:00:00 PM

8:59:00 PM

6

Worker

2:00:00 PM

5:59:00 PM

7

Worker

8:00:00 PM

11:59:00 PM

8

Worker

12:00:00 PM

3:59:00 PM

We want to compute the maximum number of workers that each supervisor must supervise at one time.

Solution

This problem can be handled in a two-step process. First, we count the number of workers in each hour. Consider the following SQL statement:

SELECT Hours.Hour,
(SELECT Count(EmpType) FROM SuperLoad
 WHERE (Starthour <= Hours.Hour) And (Hours.Hour < EndHour)
   And (EmpType='Worker')) AS CountOfWorkers
FROM Hours

This, again, uses a subquery that returns a single record giving the number of workers that are working during a given hour.

Using this SQL statement, we make a query named qry1, so it can be used in the next step. See Table 19-5 for the result of this query.

Table 19-5. Number of workers working at a particular hour

Hours

CountOfWorkers

12:00:00 PM

1

1:00:00 PM

1

2:00:00 PM

2

3:00:00 PM

2

4:00:00 PM

2

5:00:00 PM

3

6:00:00 PM

2

7:00:00 PM

1

8:00:00 PM

2

9:00:00 PM

1

10:00:00 PM

1

11:00:00 PM

1

The next step is computing the supervisor load as the maximum number of workers in each supervisor’s shift. Do this by using the name of the query from the previous step in the following SQL statement:

sSQL2 = "SELECT SuperLoad.EmpID, SuperLoad.EmpType," & _
" (SELECT Max(CountOfWorkers) AS WorkerLoad" & _
" FROM [" & qry1.Name & "]" & _
" WHERE ((Hours.Hour >= StartHour) And (Hours.Hour < Endhour)))" & _
" FROM SuperLoad" & _
" WHERE SuperLoad.EmpType = 'Super'"

The results are shown in Table 19-6.

Table 19-6. Maximum number of workers per supervisor

EmpID

EmpType

WorkerLoad

1

Super

3

2

Super

2

3

Super

3

The complete code for this solution is in Example 19-3.

Example 19-3. Calculating the maximum number of workers per supervisor
Private Sub SupervisorLoad(  )

Dim db As Database
Set db = CurrentDb

Dim qry1 As QueryDef
Dim qry2 As QueryDef
Dim sSQL1 As String
Dim sSQL2 As String

On Error Resume Next
db.QueryDefs.Delete "temp1"
db.QueryDefs.Delete "temp2"
On Error GoTo 0

sSQL1 = "SELECT Hours.Hour," & _
" (SELECT Count(EmpType) FROM SuperLoad" & _
" WHERE (Starthour <= Hours.Hour) And (Hours.Hour < EndHour)" & _
" And (EmpType='Worker'))" & _
" AS CountOfWorkers" & _
" FROM Hours"

Set qry1 = db.CreateQueryDef("temp1", sSQL1)

' Uncomment to see how this step looks
'DoCmd.OpenQuery qry1.Name

sSQL2 = "SELECT SuperLoad.EmpID, SuperLoad.EmpType," & _
" (SELECT Max(CountOfWorkers) AS WorkerLoad" & _
" FROM [" & qry1.Name & "]" & _
" WHERE ((Hours.Hour >= StartHour) And (Hours.Hour < Endhour)))" & _
" FROM SuperLoad" & _
" WHERE SuperLoad.EmpType = 'Super'"

Set qry2 = db.CreateQueryDef("temp2", sSQL2)

DoCmd.OpenQuery qry2.Name

End Sub

Making Assignments with Default

Imagine a conference where your task is assigning conference rooms to attendees. Table 19-7 shows the preregistered attendees along with corresponding room numbers.

Table 19-7. Assignment table

Name

Room

_default

15

_default

14

_default

13

_default

12

Bach

123

Beethoven

231

Mozart

455

Chopin

455

Elgar

231

Gluck

123

Liszt

455

Note that the table contains several default choices. If an individual is not in the table, then you want to assign one of the default rooms to that individual. Moreover, to avoid overcrowding, you want to assign the default room numbers randomly. How do you do this?

Solution

This problem can be solved in a variety of ways, one of which provides a nice use of both subqueries and the UNION statement. First, consider the SQL statement:

sSQL1 = "SELECT Room FROM Assignment WHERE (Name = [Enter Name])"

Recall that [Enter name] is a parameter. When the query is run, the user will be prompted for a name, which will be substituted for [Enter name]. This statement will return the record associated with a given name if it is in the table; otherwise, it will return the empty recordset.

Now consider the statement:

sSQL2 = "SELECT Room FROM Assignment
         WHERE (Name = '_default') AND
         ([Enter Name] NOT IN (SELECT Name FROM Assignment))"

The clause:

[Enter Name] NOT IN (SELECT Name FROM Assignment)

returns TRUE if and only if the name entered by the user is not in the table. Hence, the clause sSQL2 can be rewritten based on two cases, name in table:

sSQL2 = "SELECT Room FROM Assignment WHERE (Name = '_default') AND FALSE"

and name not in table:

sSQL1 = "SELECT Room FROM Assignment WHERE (Name = '_default') AND TRUE"

This simplifies further to name in table:

sSQL2 = "SELECT Room FROM Assignment WHERE FALSE"

and name not in table:

sSQL1 = "SELECT Room FROM Assignment WHERE (Name = '_default')"

Thus, this statement returns the empty recordset if the name is in the table and the default records if the name is not in the table.

Now we take the union:

sSQL3 = sSQL1 & " UNION " & sSQL2

This SQL statement will return the room number for a name if the name is in the table; otherwise, it returns the default records.

Now, all we need to do is return a random record! Note that this will work in either case, because if the name is in the table, there is only one record, so a randomly chosen record must be that record.

The complete code is shown in Example 19-4.

Example 19-4. Handling preregistered and default room assignments
Private Sub AssignmentWithDefault(  )

Dim db As Database
Set db = CurrentDb

Dim sName As String
Dim qry1 As QueryDef
Dim rs As Recordset
Dim sSQL1 As String
Dim sSQL2 As String
Dim sSQL3 As String

Dim lRandom As Long
Dim lcRecords As Long

On Error Resume Next
db.QueryDefs.Delete "temp1"
On Error GoTo 0

sSQL1 = "SELECT Room FROM Assignment" & _
" WHERE (Name = [Enter Name])"

sSQL2 = "SELECT Room FROM Assignment" & _
" WHERE (Name = '_default') AND ([Enter Name] NOT IN (SELECT Name FROM Assignment))"

sSQL3 = sSQL1 & " UNION " & sSQL2

Set qry1 = db.CreateQueryDef("temp1", sSQL3)

sName = InputBox("Enter name")
qry1.Parameters(0) = sName

' To see the results
''DoCmd.OpenQuery qry1.Name

Set rs = qry1.OpenRecordset

' Populate and get recordcount
rs.MoveLast
lcRecords = rs.RecordCount

' Random record
Randomize Timer
' lRandom is between 0 and lcRecords-1
lRandom = Int(lcRecords * Rnd)

rs.MoveFirst
rs.Move lRandom

MsgBox "Room for " & sName & " is " & rs!Room

End Sub

Time to Completion I

Here is a simple time-to-completion problem. Table 19-8 shows the status of widget production for your company. At various stages in the production process, the workers enter a record into the table indicating the remaining time to completion for the widget.

Table 19-8. Widgets table: Time to completion for widgets

WidgetID

TimeToCompletion

1

5

1

3

1

2

1

1

2

6

2

3

2

0

3

8

3

7

3

6

3

4

4

9

4

4

4

2

4

0

We want to identify those widgets that are not yet completed.

Solution

The next SQL statement does the job. Note the use of the NOT IN form of subquery, which is discussed in Section 6.7.10 in Chapter 6.

SELECT DISTINCT WidgetID FROM Widgets As W1
WHERE 0 NOT IN
(SELECT TimeToCompletion FROM Widgets As W2
WHERE W2.WidgetID = W1.WidgetID)

Time to Completion II

Let’s make the time-to-completion problem more complicated. Imagine again that you are keeping track of the status of widget production for your company. Each widget is composed of four modules, each of which is assembled separately. Table 19-9 shows some sample data.

Table 19-9. Widgets table: Time to completion of multimodule widgets

WidgetID

ModuleID

TimeToCompletion

1

1

0

1

2

1

1

3

2

1

4

1

2

1

1

2

2

2

2

3

3

2

4

4

3

1

0

3

2

5

3

3

4

3

4

0

4

1

0

4

2

1

4

3

1

4

4

2

We are trying to determine the widgets in which Module 1 is the only completed module—that is, where TimeToCompletion is equal to zero for Module 1, but not for any of the other modules in the widget. Thus, for our table, Widgets 1 and 4 qualify.

Solution

Consider the following SQL statement:

SELECT WidgetID 
FROM Widgets AS W1
WHERE (TimeToCompletion = 0) AND
0 NOT IN
(SELECT TimeToCompletion FROM Widgets AS W2 
WHERE (W2.WidgetID=W1.WidgetID) AND (W2.ModuleID <> 1))

The subquery selects, for a particular widget, all TimeToCompletions for all modules except the first module. We can then test to see if that set of TimeToCompletions contains a zero.

Example 19-5 shows the complete code.

Example 19-5. TimeToCompletion example
Private Sub TimeToCompletion(  )

Dim db As Database
Set db = CurrentDb

Dim qry1 As QueryDef
Dim sSQL1 As String

On Error Resume Next
db.QueryDefs.Delete "temp1"
On Error GoTo 0

sSQL1 = "SELECT WidgetID FROM Widgets AS W1" & _
" WHERE (TimeToCompletion = 0) AND" & _
" 0 NOT IN" & _
" (SELECT TimeToCompletion FROM Widgets AS W2" & _
" WHERE (W2.WidgetID=W1.WidgetID) AND (W2.ModuleID <> 1))"

Set qry1 = db.CreateQueryDef("temp1", sSQL1)

DoCmd.OpenQuery qry1.Name

End Sub

Time to Completion III—A MaxMin Problem

Let’s make the time-to-completion problem even more involved. Suppose each module is composed of several parts. Periodically, the workers involved with a particular part will make an entry into a database table, as shown in Table 19-10.

Table 19-10. Widgets table: Time to completion of a widget whose modules consist of multiple parts

WidgetID

ModuleID

PartID

TimeToCompletion

1

1

1

3

1

1

1

4

1

1

1

5

1

1

2

2

1

1

2

4

1

1

2

1

1

2

1

6

1

2

1

5

1

2

1

3

1

2

2

7

1

2

2

4

1

2

2

3

1

2

3

4

1

2

3

5

1

2

3

6

1

3

1

8

1

3

1

5

1

3

2

2

1

3

2

4

We want to compute the time to completion for each part, module, and widget. Note that there may be several entries for a given part. The time to complete a given part is the minimum of the times in these rows.

Solution 1

Let’s take a step-by-step approach to the solution. Later, we can present a more elegant, but less readable, solution.

First, we create an SQL statement that returns only those rows of the table that, for each widget/module, have the smallest part TimeToCompletion. We can do this in two steps. The first SQL statement selects the TimeToCompletion field for all records in Widgets2 that have a given WidgetID, ModuleID, and PartID.

' Times to completion for given WidgetID/ModuleID/PartID
sSQL1 = "SELECT TimeToCompletion FROM Widgets2 AS W2" & _
" WHERE (W2.WidgetID = W1.WidgetID)" & _
" And (W2.ModuleID = W1.ModuleID)" & _
" And (W2.PartID = W1.PartID)"

The second SQL statement returns all records whose TimeToCompletion is less than or equal to all records returned in the first SQL statement—that is, all records for the given WidgetID, ModuleID, and PartID:

' Those records that have minimum time to completion for each part
sSQL2 = "SELECT WidgetID, ModuleID, PartID," & _
" TimeToCompletion AS TimeToFinishPart FROM Widgets2 AS W1" & _
" WHERE TimeToCompletion <= ALL (" & sSQL1 & ")"

An alternative approach is to use a single nested SELECT statement:

sSQL2 = "SELECT DISTINCT WidgetID, ModuleID, PartID," & _
" (SELECT MIN(TimeToCompletion)" & _
" FROM Widgets2 as W2 WHERE" & _
" (W2.WidgetID = W1.WidgetID) And" & _
" (W2.ModuleID = W1.ModuleID) And" & _
" (W2.PartID = W1.PartID))" & _
" AS TimeToFinishPart" & _
" FROM Widgets2 AS W1"

Running this query

Set qry1 = db.CreateQueryDef("temp1", sSQL2)
DoCmd.OpenQuery qry1.Name

will result in Table 19-11.

Table 19-11. Results table for qry1

WidgetID

ModuleID

PartID

TimeToFinishPart

1

1

1

3

1

1

2

1

1

2

1

3

1

2

2

3

1

2

3

4

1

3

1

5

1

3

2

2

Using this query, it is simple to get the time to completion for each module:

' Time to finish each module
sSQL3 = "SELECT WidgetID, ModuleID," & _
" Max(TimeToFinishPart) AS TimeToFinishModule FROM " & qry1.Name & _
" GROUP BY WidgetID, ModuleID"

Set qry2 = db.CreateQueryDef("temp2", sSQL3)

' Show it
DoCmd.OpenQuery qry2.Name

This query results in Table 19-12.

Table 19-12. Results table for qry2

WidgetID

ModuleID

TimeToFinishModule

1

1

3

1

2

4

1

3

5

Finally, we can compute the time to completion for each widget:

' Time to finish each Widget
sSQL4 = "SELECT WidgetID," & _
" Max(TimeToFinishModule) AS TimeToFinishWidget FROM " & qry2.Name & _
" GROUP BY WidgetID"

Set qry3 = db.CreateQueryDef("temp3", sSQL4)

This results in Table 19-13.

Table 19-13. Results table for qry3

WidgetID

TimeToFinishWidget

1

5

Solution 2

It is possible to get the time to completion in a single SQL statement, although I definitely do not recommend doing so. The result may be more elegant, but it is also harder to read. For instance, for modules, we have:

sSQL1 = "SELECT DISTINCT WidgetID, ModuleID," & _
" TimeToCompletion FROM Widgets2 AS W1" & _
" WHERE TimeToCompletion =" & _
"    (SELECT MAX(TimeToCompletion) FROM Widgets2 As W2" & _
"     WHERE TimeToCompletion =" & _
"       (SELECT MIN(TimeToCompletion) FROM Widgets2 AS W3" & _
"       WHERE (W3.WidgetID = W2.WidgetID)" & _
"              And (W3.ModuleID = W2.ModuleID)" & _
"              And (W3.PartID = W2.PartID)" & _
"       Group BY W3.WidgetID, W3.ModuleID, W3.PartID)" & _
"     AND (W2.WidgetID = W1.WidgetID) And (W2.ModuleID = W1.ModuleID)" & _
"     GROUP BY W2.WidgetID, W2.ModuleID)"

Digesting this SQL statement will probably take time, and I hope it will make you reconsider using such a statement in your own applications.

Vertical to Horizontal

Imagine a database of personal statistics with two tables (Table 19-14 and Table 19-15).

Table 19-14. Composers table

EmpID

Name

1

Beethoven

2

Chopin

3

Mozart

4

Schubert

5

Brahms

6

Liszt

Notice that Table 19-15 has one row per statistic. Thus, the data for an individual person is arranged vertically. Notice also that some data is missing. For instance, there is no data at all for Liszt.

Table 19-15. ComposersData table

EmpID

StatType

Value

1

Age

45

1

Height

63

1

Weight

150

2

Age

46

2

Height

67

3

Age

35

3

Weight

135

4

Age

44

5

Height

76

Now, we want to view the data horizontally, as in Table 19-16.

Table 19-16. Combination of Table 19-14 and Table 19-15

EmpID

Name

Age

Height

Weight

1

Beethoven

45

63

150

2

Chopin

46

67

 

3

Mozart

35

 

135

4

Schubert

44

  

5

Brahms

 

76

 

6

Liszt

   

Solution

One solution is given by the following SQL statement:

SELECT DISTINCT Composers.EmpID, Name,
 (SELECT Value FROM ComposerData As T2 WHERE
   (T2.StatType='Age') And (T2.EmpID=Composers.EmpID)) As Age,
 (SELECT Value FROM ComposerData As T2 WHERE
   (T2.StatType='Height') And (T2.EmpID=Composers.EmpID)) As Height,
 (SELECT Value FROM ComposerData As T2 WHERE
   (T2.StatType='Weight') And (T2.EmpID=Composers.EmpID)) As Weight
FROM (Composers INNER JOIN ComposerData
ON Composers.EmpID=ComposerData.EmpID)

Here, we have multiple SELECT subquery statements within the main SELECT clause. For instance, the clause:

(SELECT Value FROM ComposerData As T2 WHERE
   (T2.StatType='Age') And (T2.EmpID=Composers.EmpID)) As Age,

selects the age for the person selected by the main SELECT clause.

As the number of statistics grows, this SQL statement becomes more complex. Example 19-6 shows an alternative solution using DAO that does not require adjusting when additional statistics are added.

Example 19-6. VerticalToHorizontal example
Private Sub VerticalToHorizontal2(  )

Dim db As Database
Set db = CurrentDb

Dim rsEmp As Recordset
Dim rsData As Recordset
Dim rsHor As Recordset

Set rsEmp = db.OpenRecordset("Composers")
Set rsHor = db.OpenRecordset("ComposersOutput")

Do While Not rsEmp.EOF

   Set rsData = db.OpenRecordset( _
      "SELECT * FROM ComposerData WHERE EmpID = " & rsEmp!EmpID)
   rsHor.AddNew
   rsHor!EmpID = rsEmp!EmpID
   rsHor!Name = rsEmp!Name
   Do While Not rsData.EOF
      rsHor.Fields(rsData!StatType).Value = rsData!Value
      rsData.MoveNext
   Loop
   rsHor.Update

   rsEmp.MoveNext
Loop

rsEmp.Close
rsData.Close
rsHor.Close

End Sub

A Matching Problem

Table 19-17 presents programmers and their language skills. Table 19-18 specifies the language requirements for a number of different jobs. We want to display a list of the jobs and their respective qualified programmers.

Table 19-17. Programmers table: Programmers and their language skills

Name

Language

Blaise Pascal

VB

Blaise Pascal

C++

Blaise Pascal

Access

Blaise Pascal

Excel

Gauss

VB

Gauss

Access

Gauss

Delphi

Gauss

SQL Server

Smith

C++

Von Neuman

VB

Von Neuman

C++

Wordsworth

Delphi

Wordsworth

C++

Wordsworth

Word

Table 19-18. ProgrammingJobs table

JobID

Language

1

VB

1

Access

2

C++

3

C++

3

SQL Server

4

Delphi

5

VB

5

Pascal

Solution

One solution is given by the following SQL statement:

SELECT ProgrammingJobs.JobID, Programmers.Name
FROM Programmers INNER JOIN ProgrammingJobs
ON Programmers.Language = ProgrammingJobs.Language
GROUP BY ProgrammingJobs.JobID, Programmers.Name
HAVING Count(Programmers.Language)=
(SELECT Count([Language]) FROM ProgrammingJobs AS PJ
WHERE PJ.JobID=ProgrammingJobs.JobID)

We begin with an INNER JOIN of the two tables on the Language field. For each job/programmer pair, this INNER JOIN creates a set of records of the form:

JobID X - Language 1 - ProgrammerName Y
JobID X - Language 2 - ProgrammerName Y
JobID X - Language 3 - ProgrammerName Y 
. . .

where the job requires the language, and the programmer is skilled in that language.

Now, for each job/programmer pair, we need to ensure that the number of such records is the same as the number of languages required by that job. This is accomplished by grouping the records by job/programmer pair and then using a HAVING clause that compares a count of those records with the count of languages for that job. The resulting table is Table 19-19.

Table 19-19. Jobs and programmers qualified for these jobs

JobID

Name

1

Blaise Pascal

1

Gauss

2

Blaise Pascal

2

Smith

2

Von Neuman

2

Wordsworth

4

Gauss

4

Wordsworth

Equality of Sets

A common problem is determining when two sets are equal, that is, when they have the same elements. Consider Table 19-20, which shows five sets and their members. To simplify this as much as possible, we simply number the sets and assume they contain numbers themselves. We want to get a list of which sets are equal.

Table 19-20. Equality

Set

Member

1

1

1

2

1

3

2

1

2

2

2

3

3

1

3

2

3

3

3

4

4

1

4

2

4

3

4

4

5

1

5

2

5

8

Solution

This problem has an elegant solution using a single SQL statement. While, in general, SQL does not permit us to compare two sets directly, as in:

(SELECT Members FROM Equality WHERE Set=1) = (SELECT Members FROM Equality WHERE _
  Set=2)

it will accept such clauses if the two SELECT statements return a single value.

Consider now the SQL statement:

SELECT Equality.Set, E2.Set
FROM Equality INNER JOIN Equality AS E2 ON
 (Equality.Member = E2.Member) And (Equality.Set < E2.Set)
GROUP BY Equality.Set, E2.Set
HAVING 
((SELECT Count(Member) FROM Equality As E3 WHERE E3.Set=Equality.Set) = 
 (SELECT Count(Member) FROM Equality As E3 WHERE E3.Set=E2.Set))
AND
(Count(Equality.Set) = 
 (SELECT Count(Member) FROM Equality As E3 WHERE E3.Set=E2.Set))

The INNER JOIN is on the clause:

(Equality.Member = E2.Member) And (Equality.Set < E2.Set)

The important part of this clause is the first part. It states that we want all set pairs that have a common member. The second part prevents returning duplicate set pairs. For instance, if sets 1 and 2 both contain the number 3, we don’t want to return both pairs [(1,2) and (2,1)].

To illustrate further, since the number 3 is in sets 1, 2, 3, and 4, the records returned for the member 3 are as follows:

(1,2) (from member 3)
(1,3) (from member 3)
(1,4) (from member 3)
(2,3) (from member 3)
(2,4) (from member 3)
(3,4) (from member 3)

If it were not for the clause Equality.Set < E2.Set, we would also be getting (1,1), (2,2), ... (4,4), as well as (2,1), (3,1), and so on.

Now we ask the question, “How many times will a given set pair appear?” A given set pair, say (1,2), will appear as many times as there are common elements between the two sets. That is, it will appear as many times as the size of the intersection of the two sets.

So if we GROUP BY set pair, we can examine these intersections and restrict the returns using a HAVING clause. The HAVING clause we want says that the two sets are equal. But two sets A and B are equal if the sizes of A, B, and the intersection of A and B are all the same! The clause:

(SELECT Count(Member) FROM Equality As E3 WHERE E3.Set=Equality.Set) = 
 (SELECT Count(Member) FROM Equality As E3 WHERE E3.Set=E2.Set)

says that, for a given set pair (Equality.Set, E2.Set) from the main SELECT clause, the size of Equality.Set is equal to the size of E2.Set. The clause:

Count(Equality.Set) = 
 (SELECT Count(Member) FROM Equality As E3 WHERE E3.Set=E2.Set)

says that the size of the intersection of Equality.Set and E2.Set is the same as the size of E2.Set. That’s it.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.117.157