Preface

SQL is the lingua franca of the data professional. At the same time it doesn’t always get the attention it deserves compared to the hot tool du jour. As result of the second it’s common to find people who use SQL frequently but rarely or never go beyond the simplest queries, often enough because they believe that’s all there is.

This book shows how much SQL can do, expanding users’ tool boxes. By the end of the book you will have seen how SQL can be used for statistical analysis; to do reporting in a manner similar to BI tool; to match text data; to perform sophisticated analysis on date data - and much more.

The first edition of SQL Cookbook has been a popular choice as the second book on SQL - the book people read after they learn the basics - since its original release. It has many strengths, such as its wide range of topics and its friendly style.

However, computing is known to move fast, even when it comes to something as mature as SQL, which has roots going back to the 1970s. While this new edition doesn’t cover brand new language features, an important change is that features that were novel at the time of the first edition, and found in some implementations and not in others, are now stabilised and standardised. As a result, we have a lot more scope for developing standard solutions than was possible earlier.

There are two key examples that are important to highlight. Common Table Expressions (CTEs), including recursive CTEs, were available in a couple of implementations at the time the first edition was released, but are now available in all five. They were introduced to solve some practical limitations of SQL, some of which can be seen directly in these recipes. A new appendix on recursive CTEs in this edition underlines their importance and explains their relevance.

Window functions were also new enough at the time of the first edition’s release that they weren’t available in every implemenation. They were also new enough that a special appendix was written to explain them, which remains. Now, however, window functions are in all implementations in this book. They are also in every other SQL implementation that we’re aware of, although there are so many out databases out there, it’s impossible to guarantee there isn’t one that neglects window functions and/ or CTEs.

In addition to the standardising queries where possible, we’ve brought in new material into the Working With Numbers and Working with Strings sections. The material in the Working With Numbers section unlocks new data analysis applications in recipes such as the Median Absolute Deviation and Benford’s Law. In the Strings chapter we have a new recipe to help match data by the sound of the text, and have moved material on Regular Expressions to this chapter from the Odds ‘n Ends chapter.

Who This Book Is For

This book is meant to be for any SQL user who wants to take their queries further. In terms of ability, it’s meant for someone who knows at least some SQL - you might have read Alan Bealieu’s Learning SQL for example, and ideally you’ve had to write queries on data in the wild to answer a real-life problem.

Other than those loose parameters, this is a book for all SQL users - DBAs, Data Scientists, Data Visualisation folk, BI people etc etc. Some of these users may never or rarely access databases directly, but use their data visualization, BI or statistical tool to query and fetch data. The emphasis is on practical queries that can solve real world problems. Where a small amount of theory appears, it’s there to directly support the practical elements.

What’s Missing from This Book

This is a practical book, chiefly about using SQL to understand data. It doesn’t cover theoretical aspects of databases, database design or the theory behind SQL except where needed to explain specific recipes or techniques.

It also doesn’t cover extensions to databases to handle data types such as XML and JSON. There are other resources available for those specialist topics.

Platform and Version

SQL is a moving target. Vendors are constantly pumping new features and functionality into their products. Thus you should know up front which versions of the various platforms were used in the preparation of this text:

  • DB2 11.5

  • Oracle Database 19 c

  • PostgreSQL 12

  • SQL Server 2017

  • MySQL 8.0

Tables Used in This Book

The majority of the examples in this book involve the use of two tables, EMP and DEPT. The EMP table is a simple 14-row table with only numeric, string, and date fields. The DEPT table is a simple four-row table with only numeric and string fields. These tables appear in many old database texts, and the many-to-one relationship between departments and employees is well understood.

All but a very few solutions in this book run against these tables. Nowhere do we tweak the example data to set up a solution that you would be unlikely to have a chance of implementing in the real world, as some books do.

The contents of EMP and DEPT are shown below, respectively:

	select * from emp;

	EMPNO ENAME  JOB        MGR HIREDATE     SAL COMM  DEPTNO
	----- ------ --------- ---- ----------- ---- ---- -------
	 7369 SMITH  CLERK     7902 17-DEC-1980  800           20
	 7499 ALLEN  SALESMAN  7698 20-FEB-1981 1600  300      30
	 7521 WARD   SALESMAN  7698 22-FEB-1981 1250  500      30
	 7566 JONES  MANAGER   7839 02-APR-1981 2975           20
	 7654 MARTIN SALESMAN  7698 28-SEP-1981 1250 1400      30
	 7698 BLAKE  MANAGER   7839 01-MAY-1981 2850           30
	 7782 CLARK  MANAGER   7839 09-JUN-1981 2450           10
	 7788 SCOTT  ANALYST   7566 09-DEC-1982 3000           20
	 7839 KING   PRESIDENT      17-NOV-1981 5000           10
	 7844 TURNER SALESMAN  7698 08-SEP-1981 1500    0      30
	 7876 ADAMS  CLERK     7788 12-JAN-1983 1100           20
	 7900 JAMES  CLERK     7698 03-DEC-1981  950           30
	 7902 FORD   ANALYST   7566 03-DEC-1981 3000           20
	 7934 MILLER CLERK     7782 23-JAN-1982 1300           10


	select * from dept;

	DEPTNO DNAME          LOC
	------ -------------- ---------
	    10 ACCOUNTING     NEW YORK
	    20 RESEARCH       DALLAS
	    30 SALES          CHICAGO
	    40 OPERATIONS     BOSTON

Additionally, you will find four pivot tables used in this book; T1, T10, T100, and T500. Because these tables exist only to facilitate pivots, we didn’t give them clever names. The number following the “T” in each of the pivot tables signifies the number of rows in each table starting from 1. For example, the values for T1 and T10:

	select id from t1;

	        ID
	----------
	         1

	select id from t10;

	        ID
	----------
	         1
	         2
	         3
	         4
	         5
	         6
	         7
	         8
	         9
	        10

The pivot tables are a useful shortcut when we need to create a series of rows to facilitate a query.

As an aside, some vendors allow partial SELECT statements. For example, you can have SELECT without a FROM clause. Sometimes in this book we will use a support table, T1, with a single row, rather than using partial queries for clarity. This is similar in usage to Oracle’s DUAL table, but by using the T1 table, we do the same thing in a standardized way across all the implementations we are looking at.

Any other tables are specific to particular recipes and chapters, and will be introduced in the text when appropriate.

Conventions Used in This Book

We use a number of typographical and coding conventions in this book. Take time to become familiar with them. Doing so will enhance your understanding of the text. Coding conventions in particular are important, because we can’t repeat them for each recipe in the book. Instead, we list the important conventions here.

Typographical Conventions

The following typographical conventions are used in this book:

UPPERCASE

Used to indicate SQL keywords within text

lowercase

Used for all queries in code examples. Other languages such as C and JAVA use lowercase for most keywords and we find it far more readable than uppercase. Thus all queries will be lowercase.

Constant width bold

Indicates user input in examples showing an interaction.

Tip

Indicates a tip, suggestion, or general note.

Warning

Indicates a warning or caution.

Coding Conventions

Our preference for case in SQL statements is to always use lowercase, for both keywords and user-specified identifiers. For example:

	select empno, ename
	  from emp;

Your preference may be otherwise. For example, many prefer to uppercase SQL keywords. Use whatever coding style you prefer, or whatever your project requires.

Despite the use of lowercase in code examples, we consistently use uppercase for SQL keywords and identifiers in the text. We do this to make those items stand out as something other than regular prose. For example:

The preceding query represents a SELECT against the EMP table.

While this book covers databases from five different vendors, we’ve decided to use one format for all the output:

	 EMPNO ENAME
	 ----- ------
	  7369 SMITH
	  7499 ALLEN
	…

Many solutions make use of inline views, or subqueries in the FROM clause. The ANSI SQL standard requires that such views be given table aliases. (Oracle is the only vendor that lets you get away without specifying such aliases.) Thus, our solutions use aliases such as x and y to identify the result sets from inline views:

	select job, sal
	  from (select job, max(sal) sal
	          from emp
	        group by job)x;

Notice the letter X following the final, closing parenthesis. That letter X becomes the name of the “table” returned by the subquery in the FROM clause. While column aliases are a valuable tool for writing self-documenting code, aliases on inline views (for most recipes in this book) are simply formalities. They are typically given trivial names such as X, Y, Z, TMP1, and TMP2. In cases where a better alias might provide more understanding, we do so.

You will notice that the SQL in the SOLUTION section of the recipes is typically numbered, for example:

	1 select ename
	2     from emp
	3  where deptno = 10

The number is not part of the syntax; it is just reference parts of the query by number in the discussion section.

2nd Edition Acknowledgements

A bunch of great people have helped with this second edition. Thanks to Jess Haberman, Virginia Wilson, Kate Galloway and Gary O’Brien at O’Reilly. Thanks to Nicholas Adams for repeatedly saving the day in Atlas. Many thanks to the tech reviewers - Alan Beaulieu, Scott Haines and Thomas Nield.

Finally, many thanks to my family - Clare, Maya and Leda - for graciously bearing losing me to another book for a while.

  • Robert de Graaf

1st Edition Acknowledgements

This book would not exist without all the support I’ve received from a great many people. I would like to thank my mother, Connie, to whom this book is dedicated. Without your hard work and sacrifice I would not be where I am today. Thank you for everything, Mom. I am thankful and appreciative of everything you’ve done for my brother and me. I have been blessed to have you as my mother.

To my brother, Joe: every time I came home from Baltimore to take a break from writing, you were there to remind me how great things are when we’re not working, and how I should finish writing so I can get back to the more important things in life. You’re a good man and I respect you. I am extremely proud of you, and proud to call you my brother.

To my wonderful fiancee, Georgia: Without your support I would not have made it through all 600-plus pages of this book. You were here sharing this experience with me, day after day. I know it was just as hard on you as it was on me. I spent all day working and all night writing, but you were great through it all. You were understanding and supportive and I am forever grateful. Thank you. I love you.

To my future in-laws: to my mother-in-law and father-in-law, Kiki and George. Thank you for your support throughout this whole experience. You always made me feel at home whenever I took a break and came to visit, and you made sure Georgia and I were always well fed. To my sister-in-laws, Anna and Kathy, it was always fun coming home and hanging out with you guys, giving Georgia and I a much needed break from the book and from Baltimore.

To my editor Jonathan Gennick, without whom this book would not exist. Jonathan, you deserve a tremendous amount of credit for this book. You went above and beyond what an editor would normally do and for that you deserve much thanks. From supplying recipes, to tons of rewrites, to keeping things humorous despite oncoming deadlines, I could not have done it without you. I am grateful to have had you as my editor and grateful for the opportunity you have given me. An experienced DBA and author yourself, it was a pleasure to work with someone of your technical level and expertise. I can’t imagine there are too many editors out there that can, if they decided to, stop editing and work practically anywhere as a database administrator (DBA); Jonathan can. Being a DBA certainly gives you an edge as an editor as you usually know what I want to say even when I’m having trouble expressing it. O’Reilly is lucky to have you on staff and I am lucky to have you as an editor.

I would like to thank Ales Spetic and Jonathan Gennick for Transact-SQL Cookbook. Isaac Newton famously said, “If I have seen a little further it is by standing on the shoulders of giants.” In the acknowledgments section of the Transact-SQL Cookbook, Ales Spetic wrote something that is a testament to this famous quote and I feel should be in every SQL book. I include it here:

I hope that this book will complement the exiting opuses of outstanding authors like Joe Celko, David Rozenshtein, Anatoly Abramovich, Eugine Berger, Iztik Ben-Gan, Richard Snodgrass, and others. I spent many nights studying their work, and I learned almost everything I know from their books. As I am writing these lines, I’m aware that for every night I spent discovering their secrets, they must have spent 10 nights putting their knowledge into a consistent and readable form. It is an honor to be able to give something back to the SQL community.

I would like to thank Sanjay Mishra for his excellent Mastering Oracle SQL book, and also for putting me in touch with Jonathan. If not for Sanjay, I may have never been in touch with Jonathan and never would have written this book. Amazing how a simple email can change your life. I would like to thank David Rozenshtein, especially, for his Essence of SQL book, which provided me with a solid understanding of how to think and problem solve in sets/SQL. I would like to thank David Rozenshtein, Anatoly Abramovich, and Eugene Birger for their book Optimizing Transact-SQL, from which I learned many of the advanced SQL techniques I use today.

I would like to thank the whole team at Wireless Generation, a great company with great people. A big thank you to all of the people who took the time to review, critique, or offer advice to help me complete this book: Jesse Davis, Joel Patterson, Philip Zee, Kevin Marshall, Doug Daniels, Otis Gospodnetic, Ken Gunn, John Stewart, Jim Abramson, Adam Mayer, Susan Lau, Alexis Le-Quoc, and Paul Feuer. I would like to thank Maggie Ho for her careful review of my work and extremely useful feedback regarding the window function refresher. I would like to thank Chuck Van Buren and Gillian Gutenberg for their great advice about running. Early morning workouts helped me clear my mind and unwind. I don’t think I would have been able to finish this book without getting out a bit. I would like to thank Steve Kang and Chad Levinson for putting up with all my incessant talk about different SQL techniques on the nights when all they wanted was to head to Union Square to get a beer and a burger at Heartland Brewery after a long day of work. I would like to thank Aaron Boyd for all his support, kind words, and, most importantly, good advice. Aaron is honest, hardworking, and a very straightforward guy; people like him make a company better. I would like to thank Olivier Pomel for his support and help in writing this book, in particular for the DB2 solution for creating delimited lists from rows. Olivier contributed that solution without even having a DB2 system to test it with! I explained to him how the WITH clause worked, and minutes later he came up with the solution you see in this book.

Jonah Harris and David Rozenshtein also provided helpful technical review feedback on the manuscript. And Arun Marathe, Nuno Pinto do Souto, and Andrew Odewahn weighed in on the outline and choice of recipes while this book was in its formative stages. Thanks, very much, to all of you.

I want to thank John Haydu and the MODEL clause development team at Oracle Corporation for taking the time to review the MODEL clause article I wrote for O’Reilly, and for ultimately giving me a better understanding of how that clause works. I would like to thank Tom Kyte of Oracle Corporation for allowing me to adapt his TO_BASE function into a SQL-only solution. Bruno Denuit of Microsoft answered questions I had regarding the functionality of the window functions introduced in SQL Server 2005. Simon Riggs of PostgreSQL kept me up to date about new SQL features in PostgreSQL (very big thanks: Simon, by knowing what was coming out and when, I was able to incorporate some new SQL features such as the ever-so-cool GENERATE_SERIES function, which I think made for more elegant solutions compared to pivot tables).

Last but certainly not least, I’d like to thank Kay Young. When you are talented and passionate about what you do, it is great to be able to work with people who are likewise as talented and passionate. Many of the recipes you see in this text have come from working with Kay and coming up with SQL solutions for everyday problems at Wireless Generation. I want to thank you and let you know I absolutely appreciate all the help you given me throughout all of this; from advice, to grammar corrections, to code, you played an integral role in the writing of this book. It’s been great working with you, and Wireless Generation is a better company because you are there.

—Anthony Molinaro

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.179.186