This section briefly introduces the authors, the coverage of this book, the technical skills you'll need to get started, and the hardware and software requirements required to complete all of the included activities and exercises.
Understanding and finding patterns in data has become one of the most important ways to improve business decisions. If you know the basics of SQL, but don't know how to use it to gain business insights from data, this book is for you.
SQL for Data Analytics covers everything you need to progress from simply knowing basic SQL to telling stories and identifying trends in data. You'll be able to start exploring your data by identifying patterns and unlocking deeper insights. You'll also gain experience working with different types of data in SQL, including time series, geospatial, and text data. Finally, you'll learn how to become productive with SQL with the help of profiling and automation to gain insights faster.
By the end of the book, you'll able to use SQL in everyday business scenarios efficiently and look at data with the critical eye of an analytics professional.
Upom Malik is a data scientist who has worked in the technology industry for over 6 years. He has a master's degree in chemical engineering from Cornell University and a bachelor's degree in biochemistry from Duke University. He uses SQL and other tools to solve interesting challenges in finance, energy, and consumer technologies. While working on analytical problems, he has lived out of a suitcase and spent the last year as a digital nomad. Outside of work, he likes to read, hike the trails of the Northeastern United States, and savor ramen bowls from around the world.
Matt Goldwasser is a lead data scientist at T. Rowe Price. He enjoys demystifying data science for business stakeholders and deploying production machine learning solutions. Matt has been using SQL to perform data analytics in the financial industry for the last 8 years. He has a bachelor's degree in mechanical and aerospace engineering from Cornell University. In his spare time, he enjoys teaching his infant son data science.
Benjamin Johnston is a senior data scientist for one of the world's leading data-driven medtech companies and is involved in the development of innovative digital solutions throughout the entire product development pathway, from problem definition to solution research and development, through to final deployment. He is currently completing his PhD in machine learning, specializing in image processing and deep convolutional neural networks. He has more than 10 years' experience in medical device design and development, working in a variety of technical roles, and holds first-class honors bachelor's degrees in both engineering and medical science from the University of Sydney, Australia.
By the end of this book, you will be able to:
If you're a database engineer looking to transition into analytics, or a backend engineer who wants to develop a deeper understanding of production data, you will find this book useful. This book is also ideal for data scientists or business analysts who want to improve their data analytics skills using SQL. Knowledge of basic SQL and database concepts will aid in understanding the concepts covered in this book.
SQL for Data Analysis perfectly balances theory and practical exercises and provides a hands-on approach to analyzing data. It focuses on providing practical instruction for both SQL and statistical analysis so that you can better understand your data. The book takes away the crumbs and focuses on being practical. It contains multiple activities that use real-life business scenarios for you to practice and apply your new skills in a highly relevant context.
For the optimal experience, we recommend the following hardware configuration:
We also recommend that you have the following software installed in advance:
Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows:
"It is worth noting here that the formatting can look a little messy for the copy command, because it does not allow for commands with new lines. A simple way around this is to create a view containing your data before the copy command and then drop the view after your copy command has finished."
A block of code is set as follows:
CREATE TEMP VIEW customers_sample AS (
SELECT *
FROM customers
LIMIT 5
);
copy customers_sample TO 'my_file.csv' WITH CSV HEADER
DROP VIEW customers_sample;
Each great journey begins with a humble step, and our upcoming adventure in the land of data wrangling is no exception. Before we can do awesome things with data, we need to be prepared with the most productive environment. In this short section, we shall see how to do that.
Installing on Windows:
Download the PostgreSQL version 10 installer via https://www.postgresql.org/download/windows/ and follow the prompts.
Installing on Linux:
You can install PostgreSQL on Ubuntu or Debian Linux via the command line using:
sudo apt-get install postgresl-11
Installing on macOS:
Download the PostgreSQL version 10 installer via https://www.postgresql.org/download/macosx/ and follow the prompts.
Installing Python on Windows:
Installing Python on Linux:
To install Python on Linux, you have a couple of good options:
sudo apt-get update
sudo apt-get install python3.7
Installing Python on macOS:
Similar to Linux, you have a couple of methods for installing Python on a Mac. To install Python on macOS X, do the following:
Installing Git on Windows or macOS X:
Git for Windows/Mac can be downloaded and installed via https://git-scm.com/. However, for an improved user experience, it is recommended that you install Git through an advanced client such as GitKraken (https://www.gitkraken.com/).
Installing Git on Linux:
Git can be easily installed via the command line:
sudo apt-get install git
If you prefer a graphical user interface, GitKraken (https://www.gitkraken.com/) is also available for Linux.
The vast majority of exercises in this book use a sample database, sqlda, which contains fabricated data for a fictional electric vehicle company called ZoomZoom. To install the database on PostgreSQL, copy the data.dump file from the Datasets folder in the GitHub repository of the book (https://github.com/TrainingByPackt/SQL-for-Data-Analytics/tree/master/Datasets). Then, load the data.dump file from a command line using the command:
psql < data.dump
Here, psql is the postgreSQL client.
Commands and statements can be executed via a *.sql file from the command line using the command:
psql < commands.sql
Alternatively, they can be executed via the SQL interpreter:
database=#
The code bundle for this book is also hosted on GitHub at https://github.com/TrainingByPackt/SQL-for-Data-Analytics. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
You can download the graphic bundle for the book from here: https://github.com/TrainingByPackt/SQL-for-Data-Analytics/blob/master/Graphic%20Bundle/Graphic%20Bundle_ColorImages.pdf.
3.12.164.101