Clare Churcher

Beginning SQL Queries

From Novice to Professional

2nd ed. 2016

Clare Churcher

Studio B Productions, Great Neck, New York, USA

Any source code or other supplementary material referenced by the author in this text is available to readers at www.apress.com . For detailed information about how to locate your book’s source code, go to www.apress.com/source-code/ .

ISBN 978-1-4842-1954-6

e-ISBN 978-1-4842-1955-3

DOI 10.1007/978-1-4842-1955-3

Library of Congress Control Number: 2016944320

© Clare Churcher 2016

Beginning SQL Queries

Managing Director: Welmoed Spahr

Lead Editor: Jonathan Gennick

Technical Reviewer: George Anderson

Editorial Board: Steve Anglin, Pramila Balen, Louise Corrigan, Jonathan Gennick, Robert Hutchinson, Celestin Suresh John, Nikhil Karkal, James Markham, Susan McDermott, Matthew Moodie, Ben Renow-Clarke, Gwenan Spearing

Coordinating Editor: Jill Balzano

Copy Editor: April Rondeau

Compositor: SPi Global

Indexer: SPi Global

Artist: SPi Global

For information on translations, please e-mail [email protected] , or visit www.apress.com .

Apress and friends of ED books may be purchased in bulk for academic, corporate, or promotional use. eBook versions and licenses are also available for most titles. For more information, reference our Special Bulk Sales–eBook Licensing web page at www.apress.com/bulk-sales .

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed.

Trademarked names, logos, and images may appear in this book. Rather than use a trademark symbol with every occurrence of a trademarked name, logo, or image we use the names, logos, and images only in an editorial fashion and to the benefit of the trademark owner, with no intention of infringement of the trademark. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights.

While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein.

Printed on acid-free paper

Distributed to the book trade worldwide by Springer Science+Business Media New York, 233 Spring Street, 6th Floor, New York, NY 10013. Phone 1-800-SPRINGER, fax (201) 348-4505, e-mail [email protected], or visit www.springer.com. Apress Media, LLC is a California LLC and the sole member (owner) is Springer Science + Business Media Finance Inc (SSBM Finance Inc). SSBM Finance Inc is a Delaware corporation.

To Mark and Ali

Introduction

Overview

The syntax of SQL is quite easy to learn. A few basic ideas and a handful of keywords allow you to tackle a huge range of queries. However, many users often find themselves completely stumped when faced with a particular problem. It isn’t really a great deal of help for someone to say “this is how I would do it.” What you need is a variety of ways to get started on a tricky problem. Once you have made a start on a query, you need to be able to check, amend, and refine your solution until you have what you need.

Two-Pronged Approach

Throughout the book I have approached different types of queries from two directions. The two approaches have their roots in the formal relational algebra and calculus. In the body of the book I have kept the descriptions non-mathematical, however, Appendix 2 provides an introduction to the formal notation for those keen to understand the underlying theory. The first approach, which I’ve called the process approach , looks at how tables need to be manipulated in order to retrieve the subset of data required. You will find explanations of the different types of operations that can be performed on tables; e.g., joins, intersections, selections. Explanations are provided to help you decide which of these might be useful in particular situations. Once you understand what operations are needed, translating them into SQL is relatively straightforward.

The second approach is what I use when I just can’t figure out what operations will give me the required results. This approach, which I’ve called the outcome approach , lets you describe what an expected row in your result might be like i.e., what conditions it must obey. By looking at the data, it is surprisingly easy to develop a semi-formal description of what a “correct” retrieved row would be like (and by implication, how you would recognize an “incorrect” row). Translating this semi-formal description into a working query is straightforward.

I am always surprised at which approach my students take when confronting a new problem. Some will instantly see the operations that are needed, and others will find the outcome approach more intuitive. The choice of approach changes from query to query, from person to person, and (I suspect) from day to day. Having more than one way to get started means you are less likely to get completely baffled by a new problem.

Who Is This Book For?

This book is for anyone who has a well-designed relational database and needs to extract information from it. You might have noticed in the previous sentence that the database must be “well designed.” I can’t overemphasize this point. If your database is badly designed, then it will not be able to store accurate and consistent data, and so the information your queries retrieve will always be prone to inaccuracies. If you are looking to design a database from scratch, you should read my first book Beginning Database Design .” 1 The final chapter in this book will outline a few common design problems you are likely to come across and give some advice about how to mitigate the impact or correct the problem.

Objective of This Book

In this book you will be introduced to all the main techniques and keywords needed to create SQL queries. You will learn about joins, intersections, unions, differences, selection of rows, and projection of columns. You will see how to implement these ideas in different ways using simple and nested queries, and you will be introduced to a variety of techniques for aggregating and summarizing data, including the use of window functions. You will also learn how you can investigate and improve the efficiency of your queries.

Most important of all, you will learn different ways to get started on a troublesome problem. There are almost always several different ways to express a query, and my objective is that for any particular situation I will provide you with a method of attack that matches your psyche and mood (just kidding).

New in the Second Edition

I have added a chapter on window functions describing the functionality these recently introduced concepts give to aggregating and summarizing data.

An appendix that provides an easily understood introduction to formal relational concepts and notation is also included.

Acknowledgments

First of all, many, many thanks to my husband, Neville, for reading every chapter and providing so many valuable suggestions. I would like to acknowledge one of my readers, Scott Lawley, who has given me helpful feedback and suggested the terms process approach and outcome approach as being friendlier than algebra and calculus . Thank you to my editor, Jonathan Gennick, for making this second edition possible, and to Jill Balzano for her excellent coordination. Thanks also to my employer, Tai Poutini Polytechnic, for its support.

Contents

  1. Chapter 1:​ Relational Database Overview
    1. Introducing Database Tables
      1. Attributes
      2. The Primary Key
      3. Inserting and Updating Rows in a Table
      4. Designing Appropriate Tables
    2. Introducing Data Models
    3. Retrieving Information from a Database
      1. Process Approach
      2. Outcome Approach
      3. Why We Consider Two Approaches
    4. Summary
  2. Chapter 2:​ Simple Queries on One Table
    1. Subsets of Rows and Columns
    2. Using Aliases
    3. Saving Queries
    4. Specifying Conditions for Selecting Rows
      1. Comparison Operators
      2. Logical Operators
    5. Dealing with Nulls
      1. Finding Nulls
      2. Comparisons Involving Null Values
    6. Managing Duplicates
    7. Ordering Output
    8. Performing Simple Counts
    9. Avoiding Common Mistakes
      1. Incorrectly Using a WHERE Clause to Answer Questions with the Word “both”
      2. Incorrectly Using a WHERE Clause to Answer Questions with the Word “not”
    10. Summary
  3. Chapter 3:​ A First Look at Joins
    1. The Process Approach to Joins
      1. Cartesian Product
      2. Inner Join
    2. Outcome Approach to Joins
    3. Extending Join Queries
      1. A Process Approach
      2. Order of Operations
      3. An Outcome Approach
      4. Expressing Joins Through Diagrammatic Interfaces
    4. Other Types of Joins
      1. Outer Joins
    5. Summary
  4. Chapter 4:​ Subqueries
    1. IN Keyword
    2. Using IN with Subqueries
    3. Being Careful with NOT and <>
    4. EXISTS Keyword
    5. Different Types of Subqueries
      1. Inner Queries Returning a Single Value
      2. Inner Queries Returning a Set of Values
      3. Inner Queries Checking for Existence
    6. Using Subqueries for Updating
    7. Summary
      1. Examples of Different Types of Subqueries
      2. Examples of Different Uses for Subqueries
  5. Chapter 5:​ Self Joins
    1. Self Relationships
      1. Creating a Self Join
      2. Queries Involving a Self Join
      3. An Outcome Approach to Self Joins
    2. Questions Involving “Both”
      1. An Outcome Approach to Questions Involving “Both”
      2. A Process Approach to Questions Involving “Both”
    3. Summary
      1. Self Relationships
      2. Questions Involving the Word “Both”
  6. Chapter 6:​ Multiple Relationships Between Tables
    1. Two Relationships Between the Same Tables
    2. Extracting Information from Multiple Relationships
      1. Process Approach
      2. Outcome Approach
    3. Business Rules
    4. Summary
  7. Chapter 7:​ Set Operations
    1. Overview of Basic Set Operations
    2. Union-Compatible Tables
      1. Ensuring Union Compatibility
    3. Union
      1. Selecting the Appropriate Columns
      2. Uses of Union
      3. Union and Full Outer Joins
    4. Intersection
      1. Uses of Intersection
      2. The Importance of Projecting Appropriate Columns
      3. Managing Without the INTERSECT Keyword
    5. Difference
      1. Uses of Difference
      2. Managing Without the EXCEPT Keyword
    6. Division
      1. Projecting Appropriate Columns
      2. SQL for Division
    7. Summary
      1. Union
      2. Intersection
      3. Difference
      4. Division
  8. Chapter 8:​ Aggregate Operations
    1. Simple Aggregate Functions
      1. The COUNT() Function
      2. The AVG() Function
      3. The ROUND() Function
      4. Other Aggregate Functions
    2. Grouping
      1. Filtering the Result of an Aggregate Query
      2. Using Aggregates to Perform Division Operations
    3. Nested Queries and Aggregates
    4. Summary
  9. Chapter 9:​ Window Functions
    1. Simple Aggregates
    2. Partitions
    3. Order By Clause
      1. Cumulative Aggregates
      2. Ranking
      3. Combining Ordering with Partitions
    4. Framing
    5. Summary
      1. OVER()
      2. OVER(PARTITION BY <…>)
      3. OVER(ORDER BY <…>)
      4. OVER(PARTITION BY <…> ORDER BY <…>)
      5. OVER(ROWS BETWEEN <…> AND <…>)
  10. Chapter 10:​ Efficiency Considerations
    1. What Happens to a Query
    2. Finding a Record
      1. Storing Records in Order
      2. Clustered Index
      3. Non-Clustered Indexes
      4. Clustered Index on a Compound Key
      5. Updating Indexes
      6. Covering Indexes
      7. Selectivity of Indexes
    3. Join Techniques
      1. Nested Loops
      2. Merge Join
      3. Different SQL Expressions for Joins
    4. Summary
      1. Primary Key
      2. Foreign Keys
      3. WHERE Conditions
      4. ORDER BY, GROUP BY, and DISTINCT
      5. Use the Tools
  11. Chapter 11:​ How to Tackle a Query
    1. Understanding the Data
      1. Determine the Relationships Between Tables
      2. Real World Versus Implementation
      3. What Tables Are Involved?​
      4. Look at Some Data Values
    2. Big Picture Method
      1. Combine the Tables
      2. Find the Subset of Rows
      3. Retain the Appropriate Columns
      4. Consider an Intermediate View
    3. Spotting Keywords in Questions
      1. And, Both, Also
      2. Not, Never
      3. All, Every
    4. No Idea Where to Start?​
      1. Find Some Helpful Tables
      2. Try to Answer the Question by Hand
      3. Write Down a Description of the Retrieved Result
      4. Are There Alternatives?​
    5. Checking Queries
      1. Check a Row That Should Be Returned
      2. Check a Row That Should Not Be Returned
      3. Check Boundary Conditions
      4. Check Null Values
    6. Summary
  12. Chapter 12:​ Common Problems
    1. Poor Database Design
      1. Data That Is Not Normalized
      2. Tables with No Primary Key
      3. Tables with Missing Foreign Keys
      4. Similar Data in Two Tables
      5. Inappropriate Types
    2. Problems with Data Values
      1. Unexpected Nulls
      2. Incorrect or Inconsistent Spelling
      3. Extraneous Characters in Text Fields
      4. Inconsistent Case in Text Fields
    3. Diagnosing Problems
      1. Check Parts of Nested Queries Independently
      2. Understand How the Tables Are Being Combined
      3. Remove Extra WHERE Clauses
      4. Retain All the Columns
      5. Check Underlying Queries in Aggregates
    4. Common Symptoms
      1. No Rows Are Returned
      2. Rows Are Missing
      3. More Rows Than There Should Be
      4. Incorrect Statistics or Aggregates
      5. The Order Is Wrong
    5. Common Typos and Syntax Problems
    6. Summary
  13. Appendix 1:​ Example Database
  14. Appendix 2:​ Relational Notation
    1. Introduction
      1. Relations, Tuples, and Attributes
      2. SQL, Algebra, and Calculus
    2. Relational Algebra:​ Specifying the Operations
      1. Select
      2. Project
      3. Combining Select and Project
      4. Cartesian Product
      5. Inner Join
      6. Union, Difference, and Intersection
      7. Division
    3. Relational Calculus:​ Specifying the Outcome
      1. Simple Calculus Expressions
      2. Free and Bound Variables
      3. Existential Quantifier and SQL
      4. Universal Quantifier and SQL
    4. An Example
      1. Algebra
      2. Calculus
      3. Conclusion
  15. Index

About the Author and About the Technical Reviewer

About the Author

A158240_2_En_BookFrontmatter_Figb_HTML.jpg

Clare Churcher was a senior academic at Lincoln University, Christchurch, New Zealand, for twenty years and won a teaching award for her contribution to developing and delivering several undergraduate and postgraduate courses, including the analysis and design of databases. Following her time at Lincoln, she spent two years as a business analyst at Orion Health Software. She is currently developing graduate-level software courses for Tai Poutini Polytechnic, Christchurch, New Zealand.

About the Technical Reviewer

A158240_2_En_BookFrontmatter_Figc_HTML.jpg

George AndersonJr. is a working database administrator with nearly a decade of SQL experience. He credits his exposure to SQL, both at work and through the SQL community, with providing him many great opportunities for learning, growing, and networking. When not protecting data and writing code, George enjoys reading, playing golf very poorly, and spending time with his family.

Footnotes

1 Clare Churcher, Beginning Database Design: From Novice to Professional (New York: Apress, 2012).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.219.111.195