0%

Book Description

With the explosion of computing power, thanks to analytic databases and cloud data warehouses, SQL has become an even more powerful and flexible tool for the savvy analyst or data scientist. This practical book reveals hidden ways to get the most out of your SQL workflow.

You'll learn how to use both common and exotic SQL functions such as joins, window functions, subqueries, and regular expressions in new, innovative ways--as well as how to combine SQL techniques to accomplish your goals faster, with more understandable code. If you work with SQL databases, this is a must-have reference.

SQL for Data Analysis covers useful applications such as:

  • Cohort analysis
  • Text analysis
  • Anomaly detection
  • Time series analysis
  • Experiment analysis
  • Creating complex data sets for further exploration in statistical and visualization tools
  • And more

Table of Contents

  1. 1. Analysis with SQL
    1. 1.1 What is data analysis?
    2. 1.2 Why SQL
      1. 1.2.0 What is SQL?
      2. 1.2.1 Benefits of SQL
      3. 1.2.2 SQL vs. R or Python
      4. 1.2.3 SQL as part of the analysis workflow
    3. 1.3 Database Types and How to Work with Them
      1. 1.3.1 Row-store databases
      2. 1.3.2 Column-store databases
      3. 1.3.3 Other flavors of data infrastructure
    4. 1.4 Conclusion
  2. 2. Preparing Data for Analysis
    1. 2.0 Types of Data
      1. 2.0.1 Database data types
      2. 2.0.1 Structured vs. Unstructured
      3. 2.0.2 First-party, Third-party, and Cloud Vendor data
      4. 2.0.3 Sparse data
      5. 2.0.4 Quantitative vs. qualitative data
      6. 2.0.5 Categorical vs. continuous
    2. 2.1 Profiling: Distributions
      1. 2.1.1 Histograms and frequencies
      2. 2.1.3 Binning
      3. 2.1.2 N-tiles
    3. 2.2 Profiling: Data Quality
      1. 2.2.1 Detecting duplicates
      2. 2.2.2 Deduplication with GROUP BY and DISTINCT
      3. 2.2.3 Missing data
    4. 2.4 Data cleaning
      1. 2.4.1 CASE transformations
      2. 2.4.2 Dealing with nulls: COALESCE, NULLIF, NVL
      3. 2.4.3 Casting and type conversions
    5. 2.3 Shaping Data
      1. 2.3.1 For which output: BI, Visualization, statistics, ML
      2. 2.3.2 Pivoting with CASE statements
      3. 2.3.3 Unpivot with UNION statements
      4. 2.3.4 PIVOT and UNPIVOT
    6. 2.4 Conclusion
18.116.13.113