Table of Contents


Section 1: Fundamentals Of Amazon Athena

Chapter 1: Your First Query

Technical requirements

What is Amazon Athena?

Use cases

Separation of storage and compute

Obtaining and preparing sample data

Running your first query

Creating your first table

Running your first analytics queries


Chapter 2: Introduction to Amazon Athena

Technical requirements

Getting to know Amazon Athena

Understanding the "serverless" trend

Beyond "serverless" with 'fully managed' offerings

Key features

What is Presto?

Understanding scale and latency

TableScan performance

Memory-bound operations

Writing results

Metering and billing

Additional costs

File formats affect cost and performance

Cost controls

Connecting and securing

Determining when to use Amazon Athena

Ad hoc analytics

Adding analytics features to your application

Serverless ETL pipeline

Other use cases


Further reading

Chapter 3: Key Features, Query Types, and Functions

Technical requirements

Running ETL queries



Running approximate queries

Organizing workloads with WorkGroups and saved queries

Using Athena's APIs


Section 2: Building and Connecting to Your Data Lake

Chapter 4: Metastores, Data Sources, and Data Lakes

Technical requirements

What is a metastore?

Data sources, connectors, and catalogs

Databases and schemas


What is a data source?

S3 data sources

Other data sources

Registering S3 datasets in your metastore

Using Athena CREATE TABLE statements

Using Athena's Create Table wizard

Using the AWS Glue console

Using AWS Glue Crawlers

Discovering your datasets on S3 using AWS Glue Crawlers

How do AWS Glue Crawlers work?

AWS Glue Crawler best practices for Athena

Designing a data lake architecture

Stages of data

Transforming data using Athena


Further reading

Chapter 5: Securing Your Data

Technical requirements

General best practices to protect your data on AWS

Separating permissions based on IAM users, roles, or even accounts

Least privilege for IAM users, roles, and accounts

Rotating IAM user credentials frequently

Blocking public access on S3 buckets

Enabling data and metadata encryption and enforcing it

Ensuring that auditing is enabled

Good intentions cannot replace good mechanisms

Encrypting your data and metadata in Glue Data Catalog

Encrypting your data

Encrypting your metadata in Glue Data Catalog

Enabling coarse-grained access controls with IAM resource policies for data on S3

Enabling FGACs with Lake Formation for data on S3

Auditing with CloudTrail and S3 access logs

Auditing with AWS CloudTrail

Auditing with S3 server access logs


Further reading

Chapter 6: AWS Glue and AWS Lake Formation

Technical requirements

What AWS Glue and AWS Lake Formation can do for you

Securing your data lake with Lake Formation

What AWS Lake Formation governed tables can do for you


Further reading

Section 3: Using Amazon Athena

Chapter 7: Ad Hoc Analytics

Technical requirements

Understanding the ad hoc analytics hype

Building an ad hoc analytics strategy

Choosing your storage

Sharing data

Selecting query engines

Deploying to customers

Using QuickSight with Athena

Getting sample data

Setting up QuickSight

Using Jupyter Notebooks with Athena


Matplotlib and Seaborn

SciPy and NumPy

Using our notebook to explore


Chapter 8: Querying Unstructured and Semi-Structured Data

Technical requirements

Why isn't all data structured to begin with?

Querying JSON data

Reading our customer's dataset

Parsing JSON fields

Other considerations when reading JSON

Querying comma-separated value and tab-separated value data

Querying arbitrary log data

Doing full log scans on S3

Reading application log data


Further reading

Chapter 9: Serverless ETL Pipelines

Technical requirements

Understanding the uses of ETL

ETL for integration

ETL for aggregation

ETL for modularization

ETL for performance

Deciding whether to ETL or query in place

Designing ETL queries for Athena

Don't forget about performance

Begin with integration points

Use an orchestrator

Using Lambda as an orchestrator

Creating an ETL function

Coding the ETL function

Testing your ETL function

Triggering ETL queries with S3 notifications


Chapter 10: Building Applications with Amazon Athena

Technical requirements

Connecting to Athena


Which one should I use?

Best practices for connecting to Athena

Idempotency tokens

Query tracking

Securing your application

Credential management

Network safety

Optimizing for performance and cost

Workload isolation

Application monitoring

CTAS for large result sets


Chapter 11: Operational Excellence – Monitoring, Optimization, and Troubleshooting

Technical requirements

Monitoring Athena to ensure queries run smoothly

Optimizing for cost and performance

Troubleshooting failing queries


Further reading

Section 4: Advanced Topics

Chapter 12: Athena Query Federation

Technical requirements

What is Query Federation?

Athena Query Federation features

How Athena Connectors work

Using Lambda for big data

Federating queries across VPCs

Using pre-built Connectors

Building a custom connector

Setting up your development environment

Writing your connector code


Chapter 13: Athena UDFs and ML

Technical requirements

What are UDFs?

Writing a new UDF

Setting up your development environment

Writing your UDF code

Building your UDF code

Deploying your UDF code

Using your UDF

Using built-in ML UDFs

Pre-setup requirements

Setting up your SageMaker notebook

Using our notebook to train a model

Using our trained model in an Athena UDF


Chapter 14: Lake Formation – Advanced Topics

Reinforcing your data perimeter with Lake Formation

Establishing a data perimeter

Shared responsibility security model

How Lake Formation can help

Understanding the benefits of governed tables

ACID transactions on S3-backed tables


Further reading

Other Books You May Enjoy

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.