SQL (Structured Query Language), commonly pronounced "sequel," is the language relational database management systems such as Access use to perform their various tasks. To tell Access to perform any kind of query, you must convey your instructions in SQL. Don't panic; the truth is that you have already been building and using SQL statements without knowing it.
Here, you will discover the role that SQL plays in your dealings with Access and learn how to understand the SQL statements generated when building queries. You will also explore some of the advanced actions you can take with SQL statements, allowing you to accomplish actions that go beyond the Access user interface. The basics you learn here will lay the foundation for your ability to perform the advanced techniques you will encounter throughout thisbook.
A major reason your exposure to SQL is limited is that Access is more user-friendly than most people give it credit for being. The fact is that Access performs a majority of its actions in user-friendly environments that hide the real grunt work that goes on behind the scenes.
For a demonstration of this, follow these steps:
In Design view, build the query you see in Figure B-1. In this relatively simple query, you are asking for the sum of revenue by period.
Go up to the Design tab on the application ribbon and select View
As you can see in Figure B-2, while you were busy designing your query in Design view, Access was diligently creating the SQL statement that will allow the query to run. This example shows that with the user-friendly interface provided by Access, you don't necessarily need to know the SQL behind each query. The question now becomes this: If you can run queries just fine without knowing SQL, why bother to learn it?
Admittedly, the convenient query interface provided by Access does make it a bit tempting to go through life not really understanding SQL. However, if you want to harness the real power of data analysis with Access, it is important to understand the fundamentals of SQL. Throughout this appendix, you will get a solid understanding of SQL as well as insights into some techniques that leverage it to enhance your data analysis.
The SELECT
statement, the cornerstone of SQL, enables you to retrieve records from a dataset. The basic syntax of a SELECT
statement is:
SELECT column_name(s) FROM table_name
The SELECT
statement is most often used with a FROM
clause. The FROM
clause indentifies the table or tables that make up the source for the data.
Try this:
Start a new query in Design view.
Go to the Design tab on the application ribbon and select View
In the SQL view, type the SELECT
statement shown in Figure B-3, and then run the query by selecting Run in the Design tab of the ribbon.
Congratulations! You have just written your first query manually.
You may notice that the SQL statement automatically created by Access in Figure B-2 has a semicolon at the end of it. This semicolon is not required for Access to run the query. The semicolon is a standard way to end a SQL statement and is required by some database programs. However, it is not necessary to end your SQL statements with a semicolon in Access, as Access will automatically add it when the query compiles.
You can retrieve specific columns from your dataset by explicitly defining the columns in your SELECT
statement, as follows:
SELECT AccountManagerID, FullName,[Email Address] FROM Dim_AccountManagers
Any column in your database that has a name that includes spaces or a non-alphanumeric character must be enclosed within brackets ([ ]
) in your SQL statement. For example, the SQL statement selecting data from a column called Email Address would be referred to as [Email Address]
.
You can use the WHERE
clause in a SELECT
statement to filter your dataset and conditionally select specific records. The WHERE
clause is always used in combination with an operator such as: =
(equal), <>
(not equal), >
(greater than), <
(less than), >=
(greater than or equal to), <=
(less than or equal to), BETWEEN
(within general range).
The following SQL statement retrieves only those employees whose last name is Winston:
SELECT AccountManagerID, [Last Name], [First Name] FROM Dim_AccountManagers WHERE [Last Name] = "Winston"
And this SQL statement retrieves only those employees whose hire date is later than May 16, 2007:
SELECT AccountManagerID, [Last Name], [First Name] FROM Dim_AccountManagers WHERE HireDate > #5/16/2007#
Notice in the preceding two examples that the word Winston is wrapped in quotes ("Winston"
) and the date 5/16/2004 is wrapped in the number signs (#5/16/2007#
). When referring to a text value in a SQL statement, you must place quotes around the value, while referring to a date requires you use the numbers signs.
You will often need to build queries that require two or more related tables be joined to achieve the desired results. For example, you may want to join an employee table to a transaction table in order create a report that contains both transaction details and information on the employees who logged those transactions. The type of join you use determines the records outputted.
An inner join operation tells Access to select only those records from both tables that have matching values. Records with values in the joined field that do not appear in both tables are omitted from the query results. Figure B-4 represents the inner join operation visually.
Figure B.4. An inner join operation only selects the records that match values in both tables. The arrows point to the records included in the results.
The following SQL statement selects only those records where the employee numbers in the AccountManagerID field are in both the Dim_AccountManagers table and the Dim_Territory table.
SELECT Region, Market, Dim_AccountManagers.AccountManagerID, FullName FROM Dim_AccountManagers INNER JOIN Dim_Territory ON Dim_AccountManagers.AccountManagerID = Dim_Territory.AccountManagerID
An outer join operation tells Access to select all the records from one table and only the records from a second table with matching values in the joined field. There are two types of outer joins: left joins and right joins.
A left join operation (sometimes called an outer left join) tells Access to select all the records from the first table regardless of matching and only those records from the second table that have matching values in the joined field. Figure B-5 represents the left join operation visually.
Figure B.5. A left join operation selects all records from the first table and only those records from the second table with matching values in both tables. The arrows point to the records included in the results.
This SQL statement selects all records from the Dim_AccountManagers table and only those records in the Dim_Territory table where values for the AccountManagerID field exist in the Dim_AccountManagers table.
SELECT Region, Market, Dim_AccountManagers.AccountManagerID, FullName FROM Dim_AccountManagers LEFT JOIN Dim_Territory ON Dim_AccountManagers.AccountManagerID = Dim_Territory.AccountManagerID
A right join operation (sometimes called an outer right join) tells Access to select all the records from the second table regardless of matching, and only those records from the first table that have matching values in the joined field (see Figure B-6).
Figure B.6. A right join operation selects all records from the second table and only those records from the first table with matching values in both tables. The arrows point to the records that are included in the results.
This SQL statement selects all records from the Dim_Territory table and only those records in the Dim_AccountManagers table where values for the AccountManagerID field exist in the Dim_Territory table.
SELECT Region, Market, Dim_AccountManagers.AccountManagerID, FullName FROM Dim_AccountManagers RIGHT JOIN Dim_Territory ON Dim_AccountManagers.AccountManagerID = Dim_Territory.AccountManagerID
Notice that in the preceding join statements, table names are listed before each column name separated by a dot (for example, Dim_AccountManager.AccountManagerID). When you are building a SQL statement for a query that utilizes multiple tables, it is generally a good practice to refer to the table names as well as field names in order to avoid confusion and errors. Access does this for all queries automatically.
You will soon realize that the SQL language itself is a quite versatile, allowing you to go far beyond basic SELECT, FROM, WHERE
statements. In this section, you will explore some of the advanced actions you can accomplishwith SQL.
By itself, the Like
operator is no different from the equal (=
) operator. For instance, these two SQL statements will return the same number of records:
SELECT AccountManagerID, [Last Name], [First Name] FROM Dim_AccountManagers WHERE [Last Name] = "Winston" SELECT AccountManagerID, [Last Name], [First Name] FROM Dim_AccountManagers WHERE [Last Name] Like "Winston"
The Like
operator is typically used with wildcard characters to expand the scope of your search to include any record that matches a pattern. The wildcard characters valid in Access are shown in Table B-1.
Table B.1. Wildcard Characters Used with the Like Operator
WILDCARD CHARACTERS | DESCRIPTION | PURPOSE |
---|---|---|
* | Asterisk | Represents any number and type characters |
? | Question mark | Represents any single character |
# | Pound or hash symbol | Represents any single digit |
[] | Brackets | Allow you to pass a single character or an array of characters to the |
[!] | The brackets with an embedded exclamation point | Allow you to pass a single character or an array of characters to the |
Listed in Table B-2 are some example SQL statements that use the Like
operator to select different records from the same table column.
The DISTINCT
predicate enables you to retrieve only unique values from the selected fields in your dataset. For example, the following SQL statement selects only unique job titles from the Dim_AccountManagers table, resulting in six records:
Table B.2. Selection Methods using the Like Operator
WILDCARD CHARACTER(S) USED | SQL STATEMENT EXAMPLE | RESULT |
---|---|---|
* | SELECT Field1 FROM Table1 WHERE Field1 Like "A*" | Selects all records where Field1 starts with the letter "A" |
* | SELECT Field1 FROM Table1 WHERE Field1 Like "*A*" | Selects all records where Field1 includes the letter "A" |
? | SELECT Field1 FROM Table1 WHERE Field1 Like "???" | Selects all records where the length of Field1 is three characters long |
? | SELECT Field1 FROM Table1 WHERE Field1 Like "B??" | Selects all records where Field1 is a three-letter word that starts with "B" |
# | SELECT Field1 FROM Table1 WHERE Field1 Like "###" | Selects all records where Field1 is a number that is exactly three digits long |
# | SELECT Field1 FROM Table1 WHERE Field1 Like "A#A" | Selects all records where the value in Field1 is a three-character value that starts with "A," contains one digit, and ends with "A" |
#, * | SELECT Field1 FROM Table1 WHERE Field1 Like "A#*" | Selects all records where Field1 begins with "A" any digit length. |
[], * | SELECT Field1 FROM Table1 WHERE Field1 Like "*[$%!*/]*" | Selects all records where Field1 includes any one of the special characters shown in the SQL statement |
[!], * | SELECT Field1 FROM Table1 WHERE Field1 Like "*[!a-z]*" | Selects all records where the value of Field1 is not a a string consisting of only characters from a-z |
[!], * | SELECT Field1 FROM Table1 WHERE Field1 Like "*[!0-9]*" | Selects all records where the value of Field1 is a text value |
SELECT DISTINCT AccountManagerID FROM Dim_AccountManagers
If your SQL statement selects more than one field, the combination of values from all fields must be unique for a given record to be included in the results.
Using SELECT DISTINCT
is different from using GROUP BY
or an aggregate query. There is no grouping going on here; Access is simply running through the records and retrieving the unique values. To see how GROUP BY
compares to SELECT DISTINCT
, read the following sections.
If you require that the entire row be unique, you could use the DISTINCTROW
predicate. The DISTINCTROW
predicate enables you to retrieve only those records for which the entire row is unique. That is to say, the combination of all values in the selected fields does not match any other record in the returned dataset. You would use the DISTINCTROW
predicate just as you would in a SELECT DISTINCT
clause.
SELECT DISTINCTROW AccountManagerID FROM Dim_AccountManagers
The GROUP BY
clause makes is possible to aggregate records in your dataset by column values. When you create an aggregate query in Design view, you are essentially using the GROUP BY
clause. The following SQL statement groups the Market field and gives you the count of states in each market:
SELECT Market, Count(State) FROM Dim_Territory GROUP BY Market
When you are using the GROUP BY
clause, you cannot specify criteria using the WHERE
clause. Instead, you need to use the HAVING
clause. This SQL statement groups the Market field and only gives you the count of states in the Dallas market:
SELECT Market, Count(State) FROM Dim_Territory GROUP BY Market HAVING Market = "Dallas"
The ORDER BY
clause enables you to sort data by a specified field. The default sort order is ascending; therefore, sorting your fields in ascending order requires no explicit instruction. The following SQL statement sorts the resulting records in by Last Name ascending, then First Name ascending:
SELECT AccountManagerID, [Last Name], [First Name] FROM Dim_AccountManagers ORDER BY [Last Name], [First Name]
To sort in descending order, you must use the DESC
reserved word after each column you want sorted in descending order. The following SQL statement sorts the resulting records in by Last Name descending, then First Name ascending:
SELECT AccountManagerID, [Last Name], [First Name] FROM Dim_AccountManagers ORDER BY [Last Name] DESC, [First Name]
The AS
clause enables you to assign aliases to your columns and tables. There are generally two reasons you would want to use aliases: Either you want to make column or table names shorter and easier to read, or you are working with multiple instances of the same table and you need a way to refer to one instance or the other.
The following SQL statement groups the Market field and gives you the count of states in each market. In addition, the alias State Count
has been given to the column containing the count of states by including the AS
clause.
SELECT Market, Count(State) AS [State Count] FROM Dim_Territory GROUP BY Market HAVING Market = "Dallas"
When you run a SELECT
query, you are retrieving all records that meet your definitions and criteria. When you run the SELECT TOP
statement, or a top values query, you are telling Access to filter your returned dataset to show only a specific number of records.
To get a clear understanding of what the SELECT TOP
statement does, follow these steps:
Build the aggregate query shown in Figure B-7.
Right-click the grey area above the white query grid and then select Properties. This activates the Property Sheet dialog box shown in Figure B-8. In the Property Sheet dialog, change the Top Values property to 25.
As you can see in Figure B-9, after you run this query, only the customers that fall into the top 25 by sum of revenue are returned. If you want the bottom 25 customers, simply change the sort order of the LineTotal field to ascending.
The SELECT TOP
statement is easy to spot. This is the same query used to run the results in Figure B-9.
SELECT TOP 25 Customer_Name, Sum(LineTotal) AS SumOfLineTotal FROM Dim_Customers INNER JOIN Dim_Transactions ON Dim_Customers.CustomerID = Dim_Transactions.CustomerID GROUP BY Customer_Name ORDER BY Sum(LineTotal) DESC
Bear in mind that you don't have to be working with totals or currency to use a top values query. In the following SQL statement, you are returning the ten account managers that have the earliest hire date in the company, effectively producing a seniority report:
SELECT Top 10 AccountManagerID, [Last Name], [First Name] FROM Dim_AccountManagers ORDER BY HireDate ASC
The SELECT TOP PERCENT
statement works in exactly the same way as SELECT TOP
except the records returned in a SELECT TOP PERCENT
statement represent the Nth percent of total records rather than the Nth number of records. For example, the following SQL statement will return the top 25 percent of records by revenue:
SELECT TOP 25 PERCENT Customer_Name, Sum(LineTotal) AS SumOfLineTotal FROM Dim_Customers INNER JOIN Dim_Transactions ON Dim_Customers.CustomerID = Dim_Transactions.CustomerID GROUP BY Customer_Name ORDER BY Sum(LineTotal) DESC
Keep in mind that SELECT TOP PERCENT
statements only give you the top or bottom percent of the total number of records in the returned dataset, not the percent of the total value in your records. For example, the preceding SQL statement does not give you only those records that make up 25 percent of the total value in the LineTotal field. It gives you the top 25 percent of total records in the queried dataset.
You may not have thought about it before, but when you build an action query, you are building a SQL statement that is specific to that action. These SQL statements make it possible for you to go beyond just selecting records.
Make-Table queries use the SELECT...INTO
statement to make a hard-coded table that contains the results of your query. The following example first selects account manager number, last name, and first name and then creates a new table called Employees:
SELECT AccountManagerID, [Last Name], [First Name] INTO Employees FROM Dim_AccountManagers
Append queries use the INSERT INTO
statement to insert new rows into a specified table. The following example will insert new rows into the Employees table from the Dim_AccountManagers table:
INSERT INTO Employees (AccountManagerID, [Last Name], [First Name]) SELECT AccountManagerID, [Last Name], [First Name] FROM Dim_AccountManagers
The TRANSFORM
statement allows the creation of a Crosstab dataset that displays data in a compact view. The TRANSFORM
statement requires three main components to work:
The field to be aggregated
The SELECT
statement that determines the row content for the crosstab
The field that makes up the column of the crosstab (the "pivot field")
The syntax is as follows:
TRANSFORM Aggregated_Field SELECT Field1, Field2 FROM Table1 GROUP BY Field1, Field2 PIVOT Pivot_Field
For example, the following statement creates a crosstab that shows region and market on the rows and products on the columns, with revenue in the center of the crosstab.
TRANSFORM Sum(Revenue) AS SumOfRevenue SELECT Region, Market FROM PvTblFeed GROUP BY Region, Market PIVOT Product_Description
SQL specific queries are essentially action queries that cannot be run through Access' query grid. These queries must be run either in SQL view or via code (macro or VBA). Several types of SQL Specific queries perform a specific action. This section introduces a few of these queries, focusing on those that you can use in Access to shape and configure data tables.
The UNION
operator is used to merge two compatible SQL statements to produce one read-only dataset. For example, the following Select statement produces a dataset (Figure B-10) that shows revenue by region and market.
SELECT Region, Market, Sum(Revenue) AS Sales FROM PvTblFeed GROUP BY Region, Market
A second Select statement produces a separate dataset (Figure B-11) that shows total revenue by region.
SELECT Region, "Total" AS [Market], Sum(Revenue) AS Sales FROM PvTblFeed GROUP BY Region
The idea is to bring these two datasets together to create an analysis that will show detail and totals all in one table. The UNION
operator is ideal for this type of work, merging the results of the two Select statements. To use the UNION
operator, simply start a new query in SQL view and enter the following syntax:
SELECT Region, Market, Sum(Revenue) AS Sales
FROM PvTblFeed
GROUP BY Region, Market
UNION
SELECT Region, "Total" AS [Market], Sum(Revenue) AS Sales
FROM PvTblFeed
GROUP BY Region
As you can see, the preceding statement is nothing more than the two SQL statements brought together with a Union
operator. When the two are merged (Figure B-12), the result is a dataset that shows both details and totals in one table!
When a union query is run, Access matches the columns from both datasets by their position in the SELECT
statement. That means two things: your SELECT
statements must have the same number of columns, and the columns in both statements should, in most cases, be in the same order.
Often in your analytical processes, you will need to create a temporary table in order to group, manipulate, or simply hold data. The CREATE TABLE
statement allows you to do just that with one SQL specific query.
Unlike a Make-Table query, the CREATE TABLE
statement is designed to create only the structure or schema of a table. No records are ever returned with a CREATE TABLE
statement. This statement allows you to strategically create an empty table at any point in your analytical process.
The basic syntax for a CREATE TABLE
statement is as follows:
CREATE TABLE TableName
(<Field1Name> Type(<Field Size>), <Field2Name> Type(<Field Size>))
To use the CREATE TABLE
statement, simply start a new query in SQL view and define the structure for your table.
In the following example, a new table called TempLog is created with three fields. The first field is a Text field that can accept 50 characters, the second field is a Text field that can accept 150 characters, and the third field is a Date field.
CREATE TABLE TempLog ([User] Text(50), [Description] Text, [LogDate] Date)
You will notice that in the preceding example, no field size is specified for the second text column. If the field size is omitted, Access will use the default field size specified for the database.
The ALTER TABLE
statement provides some additional methods of altering the structure of a table behind the scenes. There are several clauses you can use with the ALTER TABLE
statement, four of which are quite useful in Access data analysis: ADD, ALTER COLUMN, DROP COLUMN
, and ADD CONSTRAINT
.
The ALTER TABLE
statement, along with its various clauses, is used much less frequently than the SQL statements mentioned earlier in this appendix. However, the ALTER TABLE
statement comes in handy when your analytical processes require you to change the structure of tables on the fly, helping you avoid any manual manipulations that may have to be done.
It should be noted that there is no way to undo any actions performed using an ALTER TABLE
statement. This fact obviously calls for some caution when using these statements.
As the name implies, the ADD
clause enables you to add a column to an existing table. The basic syntax is as follows:
ALTER TABLE <TableName> ADD <ColumnName> Type(<Field Size>)
To use the ADD
statement, simply start a new query in SQL view and define the structure for your new column. For instance, running the example statement shown here creates a new column called SupervisorPhone, which is added to a table called TempLog.
ALTER TABLE TempLog ADD SupervisorPhone Text(10)
When using the ALTER COLUMN
clause, you specify an existing column in an existing table to work edit. You primarily use this clause to change the data type and field size of a given column. The basic syntax is as follows:
ALTER TABLE <TableName> ALTER COLUMN <ColumnName> Type(<Field Size>)
To use the ALTER COLUMN
statement, simply start a new query in SQL view and define changes for the column in question. For instance, the example statement shown here changes the field size of the SupervisorPhone
field.
ALTER TABLE TempLog ALTER COLUMN SupervisorPhone Text(13)
The DROP COLUMN
clause enables you to delete a given column from an existing table. The basic syntax is as follows:
ALTER TABLE <TableName>
DROP COLUMN <ColumnName>
To use the DROP COLUMN
statement, simply start a new query in SQL view and define which column you want to delete. For instance, running the example statement shown here deletes the column called SupervisorPhone
from the TempLog table.
ALTER TABLE TempLog DROP COLUMN SupervisorPhone
For many analysts, Access serves as an easy-to-use ETL (Extract, Transform, Load) tool. That is, Access allows us to extract data from many sources, then reformat and cleanse that data into consolidated tables. Many analysts also automate ETL processes with the use of macros that fire a series of queries. This works quite well in most cases.
There are, however, instances where an ETL process requires primary keys be added to temporary tables in order to keep data normalized during processing. In these situations, most people do one of two things. They stop the macro in the middle of processing to manually add the required primary keys. Or they create a permanent table solely for the purpose of holding a table where the primary keys are already set.
There is a third option. The ADD CONSTRAINT
clause allows you to dynamically create the primary keys. The basic syntax is as follows:
ALTER TABLE <TableName> ADD CONSTRAINT CONSTRAINTNAME PRIMARY KEY (<Field Name>)
To use the ADD CONSTRAINT
clause, simply start a new query in SQL view and define the new primary key you are implementing. For instance, the example statement shown here applies a compound key to three fields in the TempLog table.
ALTER TABLE TempLog ADD CONSTRAINT CONSTRAINTNAME PRIMARY KEY (User, Description, LogDate)
18.119.253.168