0%

Book Description

Take your first steps to become a fully qualified data analyst by learning how to explore large relational datasets

Key Features

  • Explore a variety of statistical techniques to analyze your data
  • Integrate your SQL pipelines with other analytics technologies
  • Perform advanced analytics such as geospatial and text analysis

Book Description

Understanding and finding patterns in data has become one of the most important ways to improve business decisions. If you know the basics of SQL, but don't know how to use it to gain the most effective business insights from data, this book is for you.

SQL for Data Analytics helps you build the skills to move beyond basic SQL and instead learn to spot patterns and explain the logic hidden in data. You'll discover how to explore and understand data by identifying trends and unlocking deeper insights. You'll also gain experience working with different types of data in SQL, including time-series, geospatial, and text data. Finally, you'll learn how to increase your productivity with the help of profiling and automation.

By the end of this book, you'll be able to use SQL in everyday business scenarios efficiently and look at data with the critical eye of an analytics professional.

What you will learn

  • Perform advanced statistical calculations using the WINDOW function
  • Use SQL queries and subqueries to prepare data for analysis
  • Import and export data using a text file and psql
  • Apply special SQL clauses and functions to generate descriptive statistics
  • Analyze special data types in SQL, including geospatial data and time data
  • Optimize queries to improve their performance for faster results
  • Debug queries that won't run
  • Use SQL to summarize and identify patterns in data

Who this book is for

If you're a database engineer looking to transition into analytics, or a backend engineer who wants to develop a deeper understanding of production data, you will find this book useful. This book is also ideal for data scientists or business analysts who want to improve their data analytics skills using SQL. Knowledge of basic SQL and database concepts will aid in understanding the concepts covered in this book.

Table of Contents

  1. Preface
    1. About the Book
      1. About the Authors
      2. Learning Objectives
      3. Audience
      4. Approach
      5. Hardware Requirements
      6. Software Requirements
      7. Conventions
      8. Installation and Setup
      9. Installing PostgreSQL 10.9
      10. Installing Python
      11. Installing Git
      12. Loading the Sample Databases
      13. Running SQL Files
      14. Additional Resources
  2. Chapter 1
  3. Understanding and Describing Data
    1. Introduction
    2. The World of Data
      1. Types of Data
      2. Data Analytics and Statistics
      3. Types of Statistics
      4. Activity 1: Classifying a New Dataset
    3. Methods of Descriptive Statistics
      1. Univariate Analysis
      2. Data Frequency Distribution
      3. Exercise 1: Creating a Histogram
      4. Exercise 2: Calculating the Quartiles for Add-on Sales
      5. Central Tendency
      6. Exercise 3: Calculating the Central Tendency of Add-on Sales
      7. Dispersion
      8. Exercise 4: Dispersion of Add-on Sales
      9. Bivariate Analysis
      10. Scatterplots
      11. Exercise 5: Calculating the Pearson Correlation Coefficient for Two Variables
      12. Activity 2: Exploring Dealership Sales Data
      13. Working with Missing Data
    4. Statistical Significance Testing
      1. Common Statistical Significance Tests
    5. Summary
  4. Chapter 2
  5. The Basics of SQL for Analytics
    1. Introduction
    2. Relational Databases and SQL
      1. Advantages and Disadvantages of SQL Databases
    3. Basic Data Types of SQL
      1. Numeric
      2. Character
      3. Boolean
      4. Datetime
      5. Data Structures: JSON and Arrays
    4. Reading Tables: The SELECT Query
      1. Basic Anatomy and Working of a SELECT Query
      2. Basic Keywords in a SELECT Query
      3. Exercise 6: Querying the Salespeople Table Using Basic Keywords in a SELECT Query
      4. Activity 3: Querying the customers Table Using Basic Keywords in a SELECT Query
    5. Creating Tables
      1. Creating Blank Tables
      2. Exercise 7: Creating a Table in SQL
      3. Creating Tables with SELECT
    6. Updating Tables
      1. Adding and Removing Columns
      2. Adding New Data
      3. Updating Existing Rows
      4. Exercise 8: Updating Tables in SQL
    7. Deleting Data and Tables
      1. Deleting Values from a Row
      2. Deleting Rows from a Table
      3. Deleting Tables
      4. Exercise 9: Unnecessary Reference Table
      5. Activity 4: Marketing Operations
    8. SQL and Analytics
    9. Summary
  6. Chapter 3
  7. SQL for Data Preparation
    1. Introduction
    2. Assembling Data
      1. Connecting Tables Using JOIN
      2. Types of Joins
      3. Exercise 10: Using Joins to Analyze Sales Dealership
      4. Subqueries
      5. Unions
      6. Exercise 11: Generating an Elite Customer Party Guest List using UNION
      7. Common Table Expressions
    3. Transforming Data
      1. Exercise 12: Using the CASE WHEN Function to Get Regional Lists
      2. Activity 5: Building a Sales Model Using SQL Techniques
    4. Summary
  8. Chapter 4
  9. Aggregate Functions for Data Analysis
    1. Introduction
    2. Aggregate Functions
      1. Exercise 13: Using Aggregate Functions to Analyze Data
    3. Aggregate Functions with GROUP BY
      1. GROUP BY
      2. Multiple Column GROUP BY
      3. Exercise 14: Calculating the Cost by Product Type Using GROUP BY
      4. Grouping Sets
      5. Ordered Set Aggregates
    4. The HAVING Clause
      1. Exercise 15: Calculating and Displaying Data Using the HAVING Clause
    5. Using Aggregates to Clean Data and Examine Data Quality
      1. Finding Missing Values with GROUP BY
      2. Measuring Data Quality with Aggregates
      3. Activity 6: Analyzing Sales Data Using Aggregate Functions
    6. Summary
  10. Chapter 5
  11. Window Functions for Data Analysis
    1. Introduction
    2. Window Functions
      1. The Basics of Window Functions
      2. Exercise 16: Analyzing Customer Data Fill Rates over Time
      3. The WINDOW Keyword
    3. Statistics with Window Functions
      1. Exercise 17: Rank Order of Hiring
      2. Window Frame
      3. Exercise 18: Team Lunch Motivation
      4. Activity 7: Analyzing Sales Using Window Frames and Window Functions
    4. Summary
  12. Chapter 6
  13. Importing and Exporting Data
    1. Introduction
    2. The COPY Command
      1. Getting Started with COPY
      2. Copying Data with psql
      3. Configuring COPY and copy
      4. Using COPY and copy to Bulk Upload Data to Your Database
      5. Exercise 19: Exporting Data to a File for Further Processing in Excel
    3. Using R with Our Database
      1. Why Use R?
      2. Getting Started with R
    4. Using Python with Our Database
      1. Why Use Python?
      2. Getting Started with Python
      3. Exercise 20: Exporting Data from a Database within Python
      4. Improving Postgres Access in Python with SQLAlchemy and Pandas
      5. What is SQLAlchemy?
      6. Using Python with Jupyter Notebooks
      7. Reading and Writing to our Database with Pandas
      8. Performing Data Visualization with Pandas
      9. Exercise 21: Reading Data and Visualizing Data in Python
      10. Writing Data to the Database Using Python
      11. Improving Python Write Speed with COPY
      12. Reading and Writing CSV Files with Python
    5. Best Practices for Importing and Exporting Data
      1. Going Password-Less
      2. Activity 8: Using an External Dataset to Discover Sales Trends
    6. Summary
  14. Chapter 7
  15. Analytics Using Complex Data Types
    1. Introduction
    2. Date and Time Data Types for Analysis
      1. Starting with the Date Type
      2. Transforming Date Types
      3. Intervals
      4. Exercise 22: Analytics with Time Series Data
    3. Performing Geospatial Analysis in Postgres
      1. Latitude and Longitude
      2. Representing Latitude and Longitude in Postgres
      3. Exercise 23: Geospatial Analysis
    4. Using Array Data Types in Postgres
      1. Starting with Arrays
    5. Using JSON Data Types in Postgres
      1. JSONB: Pre-Parsed JSON
      2. Accessing Data from a JSON or JSONB Field
      3. Creating and Modifying Data in a JSONB Field
      4. Exercise 24: Searching through JSONB
    6. Text Analytics Using Postgres
      1. Tokenizing Text
      2. Exercise 25: Performing Text Analytics
      3. Performing Text Search
      4. Optimizing Text Search on Postgres
      5. Activity 9: Sales Search and Analysis
    7. Summary
  16. Chapter 8
  17. Performant SQL
    1. Introduction
    2. Database Scanning Methods
      1. Query Planning
      2. Scanning and Sequential Scans
      3. Exercise 26: Interpreting the Query Planner
      4. Activity 10: Query Planning
      5. Index Scanning
      6. The B-tree Index
      7. Exercise 27: Creating an Index Scan
      8. Activity 11: Implementing Index Scans
      9. Hash Index
      10. Exercise 28: Generating Several Hash Indexes to Investigate Performance
      11. Activity 12: Implementing Hash Indexes
      12. Effective Index Use
    3. Performant Joins
      1. Exercise 29: Determining the Use of Inner Joins
      2. Activity 13: Implementing Joins
    4. Functions and Triggers
      1. Function Definitions
      2. Exercise 30: Creating Functions without Arguments
      3. Activity 14: Defining a Maximum Sale Function
      4. Exercise 31: Creating Functions with Arguments Using a Single Function
      5. Activity 15: Creating Functions with Arguments
      6. Triggers
      7. Exercise 32: Creating Triggers to Update Fields
      8. Activity 16: Creating a Trigger to Track Average Purchases
      9. Killing Queries
      10. Exercise 33: Canceling a Long Query
      11. Activity 17: Terminating a Long Query
    5. Summary
  18. Chapter 9
  19. Using SQL to Uncover the Truth – a Case Study
    1. Introduction
    2. Case Study
      1. Scientific Method
      2. Exercise 34: Preliminary Data Collection Using SQL Techniques
      3. Exercise 35: Extracting the Sales Information
      4. Activity 18: Quantifying the Sales Drop
      5. Exercise 36: Launch Timing Analysis
      6. Activity 19: Analyzing the Difference in the Sales Price Hypothesis
      7. Exercise 37: Analyzing Sales Growth by Email Opening Rate
      8. Exercise 38: Analyzing the Performance of the Email Marketing Campaign
      9. Conclusions
      10. In-Field Testing
    3. Summary
  20. Appendix
    1. Chapter 1: Understanding and Describing Data
      1. Activity 1: Classifying a New Dataset
      2. Activity 2: Exploring Dealership Sales Data
    2. Chapter 2: The Basics of SQL for Analytics
      1. Activity 3: Querying the customers Table Using Basic Keywords in a SELECT Query
      2. Activity 4: Marketing Operations
    3. Chapter 3: SQL for Data Preparation
      1. Activity 5: Building a Sales Model Using SQL Techniques
    4. Chapter 4: Aggregate Functions for Data Analysis
      1. Activity 6: Analyzing Sales Data Using Aggregate Functions
    5. Chapter 5: Window Functions for Data Analysis
      1. Activity 7: Analyzing Sales Using Window Frames and Window Functions
    6. Chapter 6: Importing and Exporting Data
      1. Activity 8: Using an External Dataset to Discover Sales Trends
    7. Chapter 7: Analytics Using Complex Data Types
      1. Activity 9: Sales Search and Analysis
    8. Chapter 8: Performant SQL
      1. Activity 10: Query Planning
      2. Activity 11: Implementing Index Scans
      3. Activity 12: Implementing Hash Indexes
      4. Activity 13: Implementing Joins
      5. Activity 14: Defining a Maximum Sale Function
      6. Activity 15: Creating Functions with Arguments
      7. Activity 16: Creating a Trigger to Track Average Purchases
      8. Activity 17: Terminating a Long Query
    9. Chapter 9: Using SQL to Uncover the Truth – a Case Study
      1. Activity 18: Quantifying the Sales Drop
      2. Activity 19: Analyzing the Difference in the Sales Price Hypothesis
3.19.31.73