Chapter 1. SQL Crash Course

What is a Database?

Let’s start with the basics. A database is a place to store data in an organized way. There are many ways to organize data, and as a result, there are many databases to choose from. The two categories that databases fall into are SQL and NoSQL.

SQL

SQL is short for Structured Query Language. Imagine you have an app that remembers all of your friend’s birthdays. SQL is the most popular language you would use to talk to the app.

English: “Hey app. When is my husband’s birthday?”

SQL: SELECT * FROM birthdays WHERE person = 'husband';

SQL databases are often called relational databases because they are made up of relations, which are more commonly referred to as tables. Many tables connected to each other make up a database. Figure 1-1 shows a picture of a relation in an SQL database.

basic sql terms
Figure 1-1. A relation (also known as a table) in an SQL database

The main thing to note about SQL databases is that they require predefined schemas. You can think of a schema as the way that data in a database is organized or structured. Let’s say you’d like to create a table. Before loading any data into the table, the structure of the table must first be decided on, including things like what columns are in the table, whether those columns hold integer or decimal values, etc.

There comes a time though, when data cannot be organized in such a structured way. Your data may have varying fields or need to be spread across multiple machines. That’s where NoSQL comes in.

NoSQL

NoSQL stands for “not only SQL”. It will not be covered in detail in this book, but I wanted to point it out because the term has grown a lot in popularity in the 2010s and it’s important to understand there are ways to store data beyond just tables.

NoSQL databases are often referred to as non-relational databases and they come in all shapes and sizes. Their main characteristic is that they have dynamic schemas, meaning the schema doesn’t have to be locked in upfront.

The most popular NoSQL database is MongoDB, which is more specifically a document database. Figure 1-2 shows a picture of how data is stored in MongoDB, a NoSQL database. You’ll notice that the data is no longer in a structured table and the number of fields (similar to a column) varies for each document (similar to a row).

basic nosql terms
Figure 1-2. A collection (a variant of a table) in MongoDB, a NoSQL database

That all said, the focus of this book is on SQL databases. Even with the introduction of NoSQL, most companies still store the majority of their data in tables in relational databases.

Database Management Systems (DBMS)

You may have heard terms like PostgreSQL or SQLite, and be wondering to yourself how they are different from SQL. They are two types of Database Management Systems (DBMS), which is software used to work with a database.

This includes things like figuring out how to import data and organize it, as well as things like managing how users or other programs access the data. A Relational Database Management System (RDBMS) is software that is specifically for relational databases, or databases made up of tables.

Each RDBMS has a different implementation of SQL, meaning that the syntax slightly varies from software to software. For example, this is how you would output 10 rows of data in five different RDBMS’s.

MySQL / PostgreSQL / SQLite:

SELECT * FROM birthdays LIMIT 10;

Microsoft SQL Server:

SELECT TOP 10 * FROM birthdays;

Oracle:

SELECT * FROM birthdays WHERE ROWNUM <= 10;

This book covers SQL basics along with the nuances of five popular database management systems: MySQL, PostgreSQL, SQLite, Microsoft SQL Server and Oracle. The first three are open source, meaning they are free for anyone to use, and the latter two are proprietary, meaning they are owned by a company and cost money to use. Table 1-1 details out the differences between the RDBMS’s.

Table 1-1. RDBMS comparison table
Owner RDBMS Highlights

Open Source

MySQL

- Popular open source RDBMS

- Often used alongside web development languages like HTML / CSS / Javascript

- Acquired by Oracle, though still open source

Open Source

PostgreSQL

- Quickly growing in popularity

- Often used alongside open source technologies like Docker and Kubernetes

- Efficient and great for large datasets

Open Source

SQLite

- World’s most used database engine

- Common on iOS and Android platforms

- Lightweight and great for a small database

Microsoft

Microsoft SQL Server

- Popular proprietary RDBMS

- Often used alongside other Microsoft products including Microsoft Azure and the .NET framework

- Common on the Windows platform

Oracle

Oracle

- Popular proprietary RDBMS

- Often used at large corporations given the amount of features, tools and support available

Installation instructions and code snippets for each RDBMS can be found in the RDBMS Software section.

An SQL Query

A common acronym in the SQL world is CRUD, which stands for Create, Read, Update and Delete. These are the four major operations that are available within a database.

SQL Statements

People who have read and write access to a database are able to perform all four operations. They can create and delete tables, update data in tables and read data from tables. In other words, they have all the power.

They write SQL statements, which is general SQL code that can be written to perform any of the CRUD operations. These people often have titles like Database Administrator (DBA) or Database Engineer.

SQL Queries

People who have read access to a database are only able to perform the read operation, meaning they can look at data in tables.

They write SQL queries, which are a more specific type of SQL statement. The code is used for finding and displaying data, otherwise known as “reading” data. This action is often called querying tables. These people often have titles like Data Analyst or Data Scientist.

The next two sections are a quick start guide for writing SQL queries, since it is the most common type of SQL code that you’ll see. More details on creating and updating tables can be found in the Creating Databases chapter.

The SELECT Statement

The most basic SQL query is:

SELECT * FROM table;

which says, show me all of the data within the table - all of the columns and all of the rows.

While SQL is case-insensitive (SELECT and select are equivalent), you’ll notice that some words are capitalized and others are not.

  • The capitalized words in the query are called keywords, meaning that SQL has reserved them to perform some sort of operation on the data. They do not have to be capitalized, but it is recommended for readability sake.

  • All other words are lowercase. This include table names, column names, etc.

Let’s go back to this query:

SELECT * FROM table;

Let’s say that instead of returning all of the data in its current state, I want to:

  • Filter the data

  • Sort the data

This is where you would modify the SELECT statement to include a few more clauses (a type of keyword) and the result would look something like this:

SELECT *
FROM table
WHERE column1 > 100
ORDER BY column2;

More details on all of the clauses can be found in the Querying Tables chapter, but the main thing to note is this - the clauses must always be in the same order.

The classic mnemonic to remember the order of the clauses is:

Sweaty feet will give horrible odors

If you don’t want to think about sweaty feet each time you write a query, here’s one that I made up:

Start Fridays with grandma’s homemade oatmeal

Order of Execution

While this is something that’s typically not taught in a beginner SQL course, I’m including it here because it’s a common question I received when I taught SQL to students coming from a Python coding background.

A sensible assumption would be that the order that you write the clauses is the same order that the computer executes the clauses, but that is not the case. After a query is run, this is the order that the computer works through the data.

  1. FROM

  2. WHERE

  3. GROUP BY

  4. HAVING

  5. SELECT

  6. ORDER BY

It may look like a completely different order, but actually the only change is that the SELECT has been moved to the fifth position. The high level takeaway here is that SQL works in this order:

  1. Gathers all of the data with the FROM

  2. Filters rows of data with the WHERE

  3. Groups rows together with the GROUP BY

  4. Filters within those groups with the HAVING

  5. Specifies columns to display with the SELECT

  6. Rearranges the results with the ORDER BY

A Data Model

I’d like to spend the final section of the crash course going over a simple data model and point out some terms that you’ll often hear in fun SQL conversations around the office.

A data model is a visualization that summarizes how all of the tables in a database are related to one another, along with some details about each table. Figure 1-3 is a simple data model of a student grades database. Table 1-2 lists out the technical terms that describe what’s happening in the data model.

a simple data model
Figure 1-3. A data model of student grades
Table 1-2. Terms used to describe what’s in a data model
Term Definition Example

Database

A database is a place to store data in an organized way.

This data model shows all of the data in the student grades database.

Table

A table is made up of rows and columns. In the data model, they are represented by rectangles.

There are two tables in the student grades database: Students and Grades.

Column

Within each table, there is a list of columns. These are sometimes referred to more formally as attributes or more casually as fields.

In the Students table, the columns are student_id, student_name and date_of_birth.

Primary Key

A primary key uniquely identifies each row of data in a table. A primary key can be made up of one or more columns in a table. In a data model, it is flagged as pk or with a key icon.

In the Students table, the primary key is the student_id column, meaning that the student_id is different for each row of data.

Foreign Key

A foreign key in a table refers to a primary key in another table. The two tables can be linked together by the common column. A table can have multiple foreign keys. In a data model, it is flagged as fk.

In the Grades table, student_id is a foreign key, meaning that the values in that column match up with values in the corresponding primary key in the Students table.

Relationships

A relationship describes how the rows in one table map to the rows in another table. In a data model, it is represented by a line with symbols at the end points. Common types are one-to-one and one-to-many relationships.

In this data model, the two tables have a one-to-many relationship represented by the fork. One student can have many grades, or one row of the Students table maps to multiple rows in the Grades table.

More details on the terms in this table can be found in the Creating Databases: Database Design section of the book.

You might be wondering why we’re spending so much time reading a data model instead of writing SQL code already! The reason is because you’ll often be writing queries that link up a number of tables, so it’s a good idea to first get familiar with the data model to know how they all connect. You may want to print out the data models that you frequently work with - both for easy reference and easy desk decor.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.184.214