© Clare Churcher 2016

Clare Churcher, Beginning SQL Queries, 10.1007/978-1-4842-1955-3_11

11. How to Tackle a Query

Clare Churcher

(1)Studio B Productions, Great Neck, New York, USA

In the previous chapters, we saw different ways to express a query. We looked at the process approach, which describes how tables and data could be manipulated to produce the required result. These queries are expressed using keywords describing operations such as INNER JOIN and INTERSECTION. We also looked at how to express queries in terms of the outcome approach, which describes the criteria that the resulting data must satisfy rather than the process for retrieving the result.

However, sometimes when I am presented with a complicated natural language description of a query, it is not uncommon to find that my mind goes blank. I have a lot of ammunition at hand, but for a moment or two, have no idea which weapons to choose.

Usually, it is just a matter of being confident and relaxed. Large, complicated queries can always be broken down into a series of smaller, simpler queries that can be combined later. This chapter describes how to do just that.

Understanding the Data

It may sound like stating the obvious, but you can’t retrieve information from a database without understanding where all the different elements of data are stored and how the relevant tables are interrelated. Most of the time you will be querying a database designed by someone else, and probably maintained and altered over time by various people. As well as understanding the tables and relationships that have been implemented, it is also necessary to have a feel for the underlying real-world scenario. You also must be alert to the unfortunate reality that the database may have been badly designed. This might mean that you are not able to retrieve the required information accurately. We will consider this problem of working against bad design a bit more in Chapter 12.

Determine the Relationships Between Tables

The best way to get an overview of the implementation of a database is to look at a schematic of the relationships between the tables. Most database management software provides a way of viewing the fields in the tables and the foreign key relationships between the tables. Figures 11-1 and 11-2 show the foreign key relationship diagrams for our club database as depicted by SQL Server and Microsoft Access.

A158240_2_En_11_Fig1_HTML.jpg
Figure 11-1. A database diagram from SQL Server
A158240_2_En_11_Fig2_HTML.jpg
Figure 11-2. A relationship diagram from Microsoft Access

On the surface, the diagrams in Figures 11-1 and 11-2 look a bit different, but they represent exactly the same database. The Access schematic in Figure 11-2 displays an additional copy of the Member and Team tables. The two copies of the Member table arise from the self relationship between members (that is, a member can coach other members). The additional copy of the Team table is because of the two relationships between Member and Team: a member can be the manager of a team, and a member can belong to a team. These relationships are depicted in the SQL Server diagram in Figure 11-1 by showing two lines between the tables so that the tables are not shown twice. The different diagrammatic representations are just a quirk of the different management systems. Both schematics represent the same set of tables and relationships.

The lines in the two diagrams in Figures 11-1 and 11-2 represent the foreign keys that were set up when the tables were created. For example, the statement for creating the Member table contains two foreign key constraints:

CREATE TABLE Member(
MemberID Int PRIMARY KEY,
LastName CHAR(20),
FirstName CHAR (20),
MemberType CHAR (20) FOREIGN KEY REFERENCES Type,
Phone CHAR (20),
Handicap INT,
JoinDate DATETIME,
Coach INT FOREIGN KEY REFERENCES Member,
Team CHAR (20),
Gender CHAR (1));

Recall from Chapter 1 that this line of code:

MemberType CHAR (20) FOREIGN KEY REFERENCES Type

means that if there is a value in the MemberType field then that value must exist in the primary key field in the Type table. A line representing this foreign key relationship between the Member table and the Type table can be seen in Figures 11-1 and 11-2.

This line of code:

Coach INT FOREIGN KEY REFERENCES Member

means that the values in the Coach field must already exist in the primary key field in the Member table; that is, there is a self relationship on the Member table. This relationship is expressed in Figure 11-1 with the loop connecting the Member table to itself. In Figure 11-2 the relationship is depicted by displaying a second copy of the Member table.

Real World Versus Implementation

The database diagrams in the previous section represent how the database has been implemented and in particular which foreign keys have been set up. When the database is first set up, the design will be based on a conceptual data model that describes how the tables for a particular problem are interrelated. A number of methods exist for representing a data model, such as entity-relationship (ER) diagrams and the UML class diagrams we have been using in this book. Figure 11-3 shows the class diagram for the golf club. Refer back to Chapter 1 if you need a refresher on how to interpret the lines and numbers.

A158240_2_En_11_Fig3_HTML.jpg
Figure 11-3. Class diagram representing the conceptual model

The class diagram in Figure 11-3 does not display foreign key fields in the classes. You can see this by comparing the Entry table in Figure 11-3 with those in the two earlier database diagrams. The foreign keys MemberID and TourID are missing as attributes in the class diagram. Foreign keys are simply a way of representing the relationship between classes if we choose to implement the data model in a relational database. If we decide to implement it in an object-oriented database we might not need foreign key fields at all.

A class diagram with well-labelled relationships gives us a much greater understanding of the real-world situation than do the implementation diagrams in Figures 11-1 and 11-2. Have a look at the relationships between Member and Team to see what I mean.

The database diagrams presented by relational database software show you the foreign keys that have actually been set up. These may not tell the whole story. The developer may not have implemented the relationship for coaching (for example) with a foreign key constraint on the Coach field. He or she may have overlooked the requirement or may have decided to enforce the constraint that a coach must be an existing member some other way (with a trigger or via the interface). However, even if there is no foreign key constraint on the Coach field in the Member table, we still need to understand that members coach other members if we want to design reliable queries about coaching.

In some cases, the implemented database may not have much in common with an accurate data model. For example, if the golf club database contained separate tables for members, coaches, and managers or one of the relationships between the Member and Team tables was not implemented, then the database diagram and the data model would look quite different. The likelihood of getting reliable information would be low. Chapter 12 looks at problems like this, although short of a major redesign there is sometimes not much you can do.

What Tables Are Involved?

Once we have an understanding of the tables in the database and how they are related (conceptually as well as by the existence of foreign keys), we can look at which tables you will need in order to extract the subset of data required. Consider a query like “Find all the men who have entered a Leeston tournament.” This sentence contains a few key words. Nouns are often a clue to what tables or fields we are going to need. Verbs often help us find relationships. Let’s look at the nouns. “Tournament” is a big clue, and we have a Tournament table, so that is a start. The word “men” is another noun in the query description. We don’t have a Men table, but we do have a Member table with a Gender field.

It is fairly clear then that the Member and Tournament tables are going to play a part in our query. Now we need to get a feel for how these two tables are related. Figure 11-4 shows the part of the SQL Server database diagram containing these two tables. We see that that they are not directly related, but rather are connected via the Entry table. That makes sense, because the verb “enter” is in our query description.

A158240_2_En_11_Fig4_HTML.jpg
Figure 11-4. Part of the database diagram showing the Member and Tournament tables

So, it looks like at least three tables will be involved in our query: Member, Tournament, and Entry. We then use our understanding of the relational operators to determine how these tables could be combined. Do we need a join or a union, or some combination of these and other relational operators? We’ll look at ways to help decide on the appropriate operations in later sections in this chapter.

Look at Some Data Values

Requests for information from a database are usually couched in rather informal and imprecise natural language. Even a simple request, such as “Find all the men who have entered a Leeston tournament,” has a few things we need to clarify. Having a look at the actual data in the tables can sometimes help.

Our query does not actually “find” the men, but rather returns some information about them. Looking at the data values in the table will help us decide what information might be helpful. Presumably, the questioner would like to see the names of the men. Do we need the IDs as well? We will need IDs if we want to distinguish two members with the same name.

It may not always be clear what some of the words in the question refer to. What is a “Leeston” tournament? Is Leeston the name of a tournament, a type of tournament, or a location? Looking at a few rows of the Tournament table can help us. We see that the TourName field has the value “Leeston” here and there. Sometimes it might not be so easy to determine what imprecise words in the query description refer to. It may be necessary to talk to the developer or users to get a better understanding of what information they are trying to retrieve.

How do we determine which members are men? Fortunately, the Member table has a Gender column, and it looks like we want values of M. Is selecting rows with values of M going to be enough? Might there be some rows that have m or Male as the values? We’ll look at how to deal with issues of inconsistent data in the next chapter. For now, let’s assume that men are denoted by M.

For the simple query in this example, we now have a more precise description. It is something like “Retrieve the MemberID, LastName, and FirstName of the men (Gender = 'M') who have entered the tournament where TourName = 'Leeston'.”

You might think of some other particulars that need clearing up. It is often a good idea to ask why this information is required. Do we just want to find which men have ever been to Leeston (because we want to ask one of them some questions about the golf course), or do we want to know how many times our male club members have entered Leeston tournaments (because we are interested in how popular the tournament is with the members of the club)? These questions can have different answers, as you will see in the “Retain the Appropriate Columns” section coming up soon.

Big Picture Method

My first attempt at a query is seldom elegant or complete. For a query such as “Find all the men who have entered a Leeston tournament,” there are two ways I might tackle it, depending how my muses are working. One way is the big picture. I do this if I have a bit of an idea of how to combine the tables. I will cover another tactic in the section “No Idea Where to Start?”, which I use when I have no idea where to start!

In the big picture method, I like to combine all the tables I’ll need and retain all the columns, so I can see what is happening. I usually find it easiest to have an SQL window of some sort open so I can try small queries to see if the intermediate results look promising for answering the overall question.

Let’s look at the big picture approach to the query “Find all the men who have entered the Leeston tournament.” We decided we needed three tables: Member, Entry, and Tournament. These tables are all connected by foreign keys, and this often suggests that joins will be useful. If it isn’t clear to you that a join is what is required for the query, then resort to the methods in the “No Idea Where to Start?” section later in this chapter.

Combine the Tables

Let’s assume that we think joining tables looks like a promising approach for the query about men entering the Leeston tournament. You don’t have to do everything at once. Start slowly with some small queries to see how things shape up.

To carry out a join, we need to find the fields on which to join. Review Chapter 3 if you need to refresh your understanding of join-compatible fields. The Entry table is critical to this query, as it connects the Member and Tournament tables. The Entry table has a foreign key field labeled TourID, which we can join with the primary key of the Tournament table. Do that much first.

SELECT * FROM
Tournament t INNER JOIN Entry e ON t.TourID = e.TourID;

Figure 11-5 shows a few rows of the resulting virtual table.

A158240_2_En_11_Fig5_HTML.jpg
Figure 11-5. Part of the result of joining the Tournament and Entry tables

The result shown in Figure 11-5 is certainly helpful. We can see the entries and the names of the corresponding tournaments. We can see from the first two rows that members 118 and 228 have entered a Leeston tournament. Now we need to find out whether 118, 228, and other members entering the tournament are men and find their names. We can get this additional information by joining the virtual table in Figure 11-5 to the Member table on the MemberID fields:

SELECT * FROM
(Tournament t INNER JOIN Entry e ON t.TourID=e.TourID)
INNER JOIN Member m ON m.MemberID = e.MemberID;

Figure 11-6 shows the result. I haven’t included all the columns in Figure 11-6 because there are a lot of them. You will see shortly why I like to leave all the columns in as long as possible.

A158240_2_En_11_Fig6_HTML.jpg
Figure 11-6. Part of the result of joining the Tournament, Entry, and Member tables (just some columns)

The virtual table shown in Figure 11-6 has all the information we need to find the required data. The first two rows show that members 118 and 228 are women. The row for member 286 (with the circles) looks more promising. How do we amend the query to find the appropriate subset of rows and columns?

Find the Subset of Rows

From Figure 11-6 we can see that the rows that we want to retain from the result of the join are where the Gender field has the value M and the TourName field has the value Leeston. We can select these rows by adding an appropriate WHERE clause to the previous query:

SELECT * FROM
(Entry e INNER JOIN Tournament t ON t.TourID=e.TourID)
INNER JOIN Member m ON m.MemberID = e.MemberID
WHERE m.Gender = 'M' AND t.TourName = 'Leeston';

Figure 11-7 shows just some of the columns from the result of the query above. It has four rows: three for Robert Pollard and one for William Taylor.

A158240_2_En_11_Fig7_HTML.jpg
Figure 11-7. Men who have entered Leeston tournaments (just some columns)

Why do we have three rows for Robert Pollard? The rows are identical except for the value of the Year field. Robert has entered the Leeston tournament in three different years. We can see this quite clearly from Figure 11-6 because we have left the Year column in the output. Had we retained only the name columns, we might initially be a bit puzzled at having Robert Pollard repeated three times. What we do about the repetition of Robert Pollard depends on understanding the initial question a bit more clearly, as you will see in the next section.

Retain the Appropriate Columns

We have the appropriate subset of rows from our large join. Now we need to retain just the columns we require by amending the SELECT clause, which is currently returning all the columns (SELECT *). This is not always as simple as it might sound. The three rows for Robert Pollard give us a bit of a clue that things may not be as straightforward. We have two possibilities.

If we only want to know who has entered the tournament in any year, then we want just the distinct names Robert Pollard and William Taylor and perhaps their ID numbers. Amending the SELECT clause as in the following query will provide that outcome:

SELECT DISTINCT m.MemberID, m.LastName, m.FirstName
FROM ...

If the objective of the question is to find out how often men enter Leeston tournaments, then we want to retain all the entries. In that case, it might be useful to retain the year as well to distinguish the rows as in the following:

SELECT m.MemberID, m.LastName, m.FirstName, e.Year
FROM ...

Consider an Intermediate View

The SQL for the joining the Entry, Member, and Tournament tables is likely to be the basis of many queries about entries in tournaments. For example, the following questions will all require a join of the Member, Entry, and Tournament tables:

  • Do junior members enter open tournaments?

  • Which tournaments did William Taylor enter in 2015?

  • What is the average number of Social tournaments that members entered in 2013?

As we are likely to use this large join many times, it can be convenient to make a view. A view is an instruction for how to create a temporary table that we can use in other queries. The following is a first attempt at the SQL for creating a view that retains all the fields from the joins:

--First Attempt (unsuccessful)
CREATE VIEW AllTournamentInfo AS
SELECT * FROM
(Entry e INNER JOIN Tournament t ON t.TourID=e.TourID)
INNER JOIN Member m ON m.MemberID = e.MemberID;

As it stands, this query will not run in most versions of SQL. This is because the view would have fields with the same name; for example, there will be two fields called MemberID: one from the Entry table and one from the Member table.

When you create a view, all the field names must be distinct. The view will not use the aliases to differentiate the columns in the resulting table. The * in the SELECT clause needs to be altered to list all the field names. We need to either include just one of the fields with duplicated names (MemberID and TourID) or rename those that are duplicated (e.g., SELECT m.MemberID AS MMember, e.MemberID AS EMember). This is a bit tedious, but if you are creating a view that you are likely to use many times, it is worth the effort.

Once we have the view AllTournamentInfo, it can be used in the same way as any other table in our queries. To find the names of men who have entered a Leeston tournament, we can use the view as shown here:

SELECT DISTINCT LastName, FirstName
FROM AllTournamentInfo
WHERE Gender = 'M' AND TourName = 'Leeston';

Spotting Keywords in Questions

The big picture approach assumes that we have decided how to combine the tables that will contribute to the query. Sometimes, it will be obvious that, for example, certain tables need to be joined. Other times, it may not be at all clear initially. In this section, we will look at some keywords that often appear in questions and that can provide a clue about which relational operations are needed. If none of these help, remember that we still have the “No Idea Where to Start?” section coming up!

And, Both, Also

And and also are words that can be misleading when it comes to interpreting queries, and we will consider this further in the next chapter. In this section, we will look at queries that have the idea of two conditions needing to be met simultaneously. Queries that require two conditions to be met fall into two categories: those that can be carried out with a simple WHERE clause containing a Boolean AND operator , and those that require an intersection or self join.

To decide if a query really needs two conditions to be met, I usually look at a natural language statement and see if I can reword it with the word both connecting the conditions. Consider these examples:

  • Find the junior boys. (Both a male and a junior? Yes.)

  • Find those members who entered tournaments 24 and 38. (Both tournaments? Yes.)

  • Find the women and children. (Both a female and a child? No.)

The last query is the one that can sometimes trip people up. Although it contains the word and, the common interpretation of “women and children” doesn’t mean someone who is both a female and a child (that is a girl). Rather, the phrase means anyone who is either a female or a child (especially when populating lifeboats).

The diagram in Figure 11-8 is a useful way to visualize whether the natural language word both really means both or either. The circles represent the two sets: woman and children. Figure 11-8a shows the union (only one condition must be satisfied) and Figure 11-8b the intersection (both conditions must be satisfied).

A158240_2_En_11_Fig8_HTML.gif
Figure 11-8. Visualizing whether a union or an intersection is needed

When two conditions must be met, we are looking at the intersection of two groups of data, as in Figure 11-8b. This doesn’t necessarily mean we must use the INTERSECT keyword. I find the following question helpful in deciding what to do next:

Do I need to look at more than one row to decide if both conditions are satisfied?

Consider the query to find junior boys. This is going to need the Member table. Can we look at a single row and determine if the member is both a junior and a boy? We can see in Figure 11-9 that both pieces of information are available in a single row.

A158240_2_En_11_Fig9_HTML.jpg
Figure 11-9. Information about membership type and gender are available in a single row

In this situation, we can use a simple SELECT operation with the Boolean AND to check for both conditions, as discussed in Chapter 2:

SELECT * FROM Member m
WHERE m.Gender = 'M' AND m.MemberType = 'Junior';

Now consider a different type of query. What about finding the members who have entered both tournaments 24 and 36? To do this, we need to look at the Entry table (probably joined with the Member table if we want the names). As we can see in Figure 11-10, we cannot check that a member, e.g., member 228, has entered both tournaments by looking at a single row.

A158240_2_En_11_Fig10_HTML.jpg
Figure 11-10. We need to investigate more than one row to check both tournaments

Where we need to satisfy both of two conditions and we need to look at more than one row in the table, we can either use a self join (discussed in Chapter 5) or an intersection (discussed in Chapter 7).

If we use the self join then the query is:

SELECT DISTINCT e1.MemberID
FROM Entry e1 INNER JOIN Entry e2 ON e1.MemberID = e2.MemberID
WHERE e1.TourID = 24 AND e2.TourID = 36;

A query producing the same output but using the INTERSECT keyword is:

SELECT MemberID FROM Entry WHERE TourID = 24
INTERSECT
SELECT MemberID FROM Entry WHERE TourID = 36;

Not, Never

Here are some examples of queries involving the words not or never:

  • Find the members who are not seniors.

  • Find members who are not in a team.

  • Find members who have never entered a tournament.

Often when people see not in a description of a query, they immediately think of using a Boolean NOTor a <> operator in a WHERE clause. This is fine for some queries, but will fail for others. As in the previous section, I find the following test helpful to understand the category of the query.

Do I need to look at more than one row to decide if a condition is not true?

For the first two queries in the preceding bulleted list, we can look at a single row in the Member table and decide whether that member satisfies the condition. In the first query, the condition in the WHERE clause would be NOT MemberType = 'Senior' or MemberType <> 'Senior'. To find members who are not in a team, we want the Team field to be empty, so a clause like WHERE Team IS NULL would do the trick.

To find the members who have never entered a tournament, what tables do we need? We are certainly going to need the Entry table. We can decide if a member has entered a tournament by finding just one row with his or her value of MemberID. To see if he or she has not entered a tournament, we need to look at every row in the Entry table. We also must look at the Member table, because those members who have not entered a tournament will not appear in the Entry table at all.

In situations like this, it can be helpful to think in terms of sets as in Figure 11-11.

A158240_2_En_11_Fig11_HTML.jpg
Figure 11-11. Finding members who have not entered tournaments by considering sets

In Chapter 7 we looked at how to represent the difference between two sets by using the process approach and the keyword EXCEPT. The following query will return the IDs of members who have not entered a tournament:

SELECT MemberID FROM Member
EXCEPT
SELECT MemberID FROM Entry;

If we think in terms of the outcome approach we can describe the criteria for returning a particular MemberID. The following query is an example of using NOT IN to find the IDs of members who have never entered a tournament:

SELECT m.MemberID FROM Member m
WHERE m.MemberID NOT IN
    (SELECT e.MemberID FROM Entry e);

Chapter 7 has many examples of how to use nested queries such as this one.

All, Every

Wherever you see the words all or every in a description of a query you should immediately think of the division operator . Here are some examples of such queries:

  • Find members who have entered every open tournament.

  • Has anyone coached all the juniors?

Examples of the SQL to carry out these types of query are explained in detail in Chapter 7.

No Idea Where to Start?

Now let’s look at the case where we have a good understanding of the intention of the natural language query and have an idea of which tables are involved. We’ve checked for some key words, but still feel confused. Now what? This is not uncommon (it happens to me regularly), so just relax.

When I have no idea where to start, I forget all about set operations and SQL. I stop thinking about tables, foreign keys, joins, and so on. Instead, I open the tables I think will be needed to answer the question and look at some of the data. I try to find examples that should be retrieved by the query. Then I try to write down the conditions that make that particular data acceptable.

This is the outcome approach describing what conditions the rows returned by the query should obey. It is a great way to proceed if you are having trouble deciding on the operations that could be involved in manipulating the tables (the process approach).

Let’s try a query that stumped me a bit when I first thought of it: “Which teams have a coach as their manager?” The steps described here can really help.

Find Some Helpful Tables

Let’s look at the key words in the query “Which teams have a coach as a manager?” We have the nouns “team,” “coach,” and “manager.” We have a table called Team, and Coach and Manager are fields in the Member and Team tables, respectively. So the Team and Member tables look like a good place to start.

Try to Answer the Question by Hand

Next, take a look at the data in the tables and see how you would decide if a team had a coach as a manager. Figure 11-12 shows some relevant columns of the two tables. Can you find a team that satisfies the condition?

A158240_2_En_11_Fig12_HTML.jpg
Figure 11-12. How do we tell if a team has a coach as a manager?

We can find the IDs of the two team managers easily enough. They are the values in the Manager column of the Team table (239 and 153). Now, how do we check if these members are coaches? Looking at the Member table, we see that the coaches are in the Coach column. We need to check if either of our two managers appears in the Coach column. Member 153 does appear in the Coach column, so (TeamB) is managed by a coach.

Write Down a Description of the Retrieved Result

Figure 11-12 illustrates how we determined that TeamB has a coach as its manager. We now need to write a description of the logic that leads to that conclusion. This is where I like to use my fingers to point to the relevant rows to make it easier to describe the query, as in Figure 11-13.

A158240_2_En_11_Fig13_HTML.jpg
Figure 11-13. Naming the rows to help describe the required data

We are going to check every team to decide if it should be retrieved. In Figure 11-13 this is represented by the finger labeled t, which will visit each row in turn. We can describe whether the current row meets the criteria as follows:

I’ll write out the TeamName from row t in the Team table, if there exists a row m in the Member table where the value of coach m.Coach is the same as the manager of the team t.Manager.

We can now translate this almost directly into SQL using a nested query (discussed in Chapter 4). One possible query would be:

SELECT t.TeamName FROM Team t
WHERE EXISTS
   (SELECT * FROM Member m WHERE m.Coach = t.Manager);

Are There Alternatives?

First attempts at queries aren’t necessarily the most elegant. After all, we are following this route because we were stumped in the first place. This may not be a problem for the execution of the query, as the optimizer will likely find an efficient process. However, an inelegant SQL statement might be difficult for you and others to understand at a later time. Following the technique of solving the query by hand and describing the conditions often helps you understand what you are trying to do. That often makes the query seem much easier than you first thought.

Having made a first attempt at the query described in the previous section, we might realize that we could have thought of it this way: “The manager just has to be in the set of coaches.” We can easily find the IDs of coaches with the query:

SELECT m.Coach FROM Member m;

We can then use that in a nested query, as shown here:

SELECT t.TeamName FROM Team t
WHERE t.Manager IN
   (SELECT m.Coach FROM Member m);

For me, the preceding query is simpler and easier to understand than the earlier one even though they have equivalent results.

We could have phrased the condition illustrated in Figure 11-13 like this:

If I have rows t in the Team table and m in the Member table then I’ll write out the TeamName from row t in the Team table, if t.Manager = m.Coach

Here is the preceding sentence translated into SQL:

SELECT t.TeamName FROM Team t, Member m
WHERE t.Manager = m.Coach;

The preceding query can be restated as a join:

SELECT t.TeamName
FROM Team t INNER JOIN Member m ON t.Manager = m.Coach;

Personally, I don’t find the join particularly intuitive for this query. I doubt if someone else looking at the query would quickly understand its purpose.

Given there are several options for phrasing this query, it can be useful to check their relative efficiencies (as discussed in Chapter 10) if you think that might be important (unlikely in this case). If we add a DISTINCT phrase in the SELECT clause for the join queries then all four alternatives will produce the same result. For SQL Server 2012, each of the queries had the same execution plan, so they were all carried out in exactly the same way under the hood.

Checking Queries

We’ve written a query, run it, and retrieved some results. Is all well and good? Not necessarily. Just as first attempts at a query may not be elegant, neither may they be correct. Mistakes might arise from simple errors in the query syntax. These are usually easy to spot and correct. However, errors that result from subtle misunderstandings of the question or of the data can be more difficult to find.

I can’t offer a foolproof way of checking that your query is correct, but I can give you some ideas for catching potential errors. Basically, they boil down to checking that you do not have extra, incorrect rows in your result and checking that you aren’t missing any rows. In this section, we will look at ways to spot that your query might have a problem. In the next chapter, we will look at some of the common mistakes that might be behind the errors.

Check a Row That Should Be Returned

It is a good idea to have a rough idea of how many rows should be returned by your query: none, one, a few, or lots. If you get a surprising number then that can be a clue that something could be wrong. Next, take a look at your data and determine one record or row that should be returned by the query. In our example about teams with managers as coaches, we can check through the tables and find a team that satisfies the query. In Figure 11-13, we see that TeamB satisfies the conditions, so check that this team is in the output.

Remember that some queries may quite legitimately have no output. For example, it’s perfectly reasonable that, with the data we have at any particular time, no teams are managed by a coach. However, your query must work in all situations. If it is at all possible, make a copy of the tables, alter the data so that a row meets the condition, and check that it is returned correctly.

Check a Row That Should Not Be Returned

Similar to checking for a row that should be returned, look through the data and find a team that doesn’t have a coach as a manger. TeamA’s manager (member 239) does not appear as a coach in the Member table, so make sure that team is not included in your output. Once again, it is a good idea to use some dummy data to check this if the real data does not cover all eventualities.

Check Boundary Conditions

If a query has any sort of numeric comparison , then as well as checking for example data that should be returned and that which shouldn’t, we should also check the edge cases. Consider a query where we want to find people who have been members of our club for more than ten years. To be certain of the correctness, we need to check three possibilities:

  • Make sure no record is returned for someone with less than 10 years of membership (for example, 8 years of membership).

  • Make sure that someone who has belonged to the club for 12 years does get his record retrieved.

  • Check for someone who has been a member for exactly 10 years.

The last boundary condition is always tricky. It comes down to an interpretation of the natural language question. Does “more than ten years” include people who joined in the season exactly ten years ago? Well, it probably does, given that a single season covers a whole year. With numerical comparisons of this sort the decision is whether we use > or >= in the select condition. It is important to check with users if there is any doubt about the intention of the query.

Finding data in the tables that are exactly on the boundaries is not always easy. However, it is usually possible to change the numeric value in your query to match the data. Find a particular member and change the value you are checking against in the query to match their years of membership. If Harry joined 16 years ago, change the query to compare with the value 16 and see if Harry is included (or not) as you expect.

Another important boundary condition, especially for aggregates and counts (covered in Chapter 8), is the value 0. Consider a query such as “Find members who have entered fewer than six tournaments.” Doing a grouped by count on the Entry table will return some rows for sure, and we can check for those who have less than, more than, or exactly six entries. However, what about members who have never entered a tournament? They won’t appear in the Entry table at all and will be missing from the results. So, whenever aggregates are involved, always check for what happens for a count of 0. For example, does your query return members who have entered no tournaments?

Check Null Values

Be aware that some of the values you are checking against may be nulls (discussed in Chapter 2). How does your query about team managers cope with the situation where the Manager field is null? Try it out on some dummy data and see. What do we expect (or want) to happen if there is a null in the JoinDate field when we run the query about length of membership?

Summary

The first rule about starting a query is to not panic. The next rule is to take small steps and look at the intermediate output to see if what you have done so far is helping you. Retain as many columns as possible in the initial queries so you can check that you understand what is happening.

Figure 11-14 gives a summary of some of the steps you can take when first starting out on a query. The diagram doesn’t cover the whole process, but you should be able to make a reasonable start with these steps. Refer to the relevant chapters for more help.

A158240_2_En_11_Fig14_HTML.gif
Figure 11-14. Some steps to help you get started on a tricky query
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.31.206