Home Page Icon
Home Page
Table of Contents for
Serverless Analytics with Amazon Athena
Close
Serverless Analytics with Amazon Athena
by Anthony Virtuoso, Mert Turkay Hocanin, Aaron Wishnick, Rahul Pathak
Serverless Analytics with Amazon Athena
Serverless Analytics with Amazon Athena
Foreword
Contributors
About the authors
About the reviewers
Preface
Section 1: Fundamentals Of Amazon Athena
Chapter 1: Your First Query
Chapter 2: Introduction to Amazon Athena
Chapter 3: Key Features, Query Types, and Functions
Section 2: Building and Connecting to Your Data Lake
Chapter 4: Metastores, Data Sources, and Data Lakes
Chapter 5: Securing Your Data
Chapter 6: AWS Glue and AWS Lake Formation
Section 3: Using Amazon Athena
Chapter 7: Ad Hoc Analytics
Chapter 8: Querying Unstructured and Semi-Structured Data
Chapter 9: Serverless ETL Pipelines
Chapter 10: Building Applications with Amazon Athena
Chapter 11: Operational Excellence – Monitoring, Optimization, and Troubleshooting
Section 4: Advanced Topics
Chapter 12: Athena Query Federation
Chapter 13: Athena UDFs and ML
Chapter 14: Lake Formation – Advanced Topics
Other Books You May Enjoy
Search in book...
Toggle Font Controls
Playlists
Add To
Create new playlist
Name your new playlist
Playlist description (optional)
Cancel
Create playlist
Sign In
Email address
Password
Forgot Password?
Create account
Login
or
Continue with Facebook
Continue with Google
Sign Up
Full Name
Email address
Confirm Email Address
Password
Login
Create account
or
Continue with Facebook
Continue with Google
Prev
Previous Chapter
Serverless Analytics with Amazon Athena
Next
Next Chapter
Preface
Table of Contents
Preface
Section 1: Fundamentals Of Amazon Athena
Chapter 1
: Your First Query
Technical requirements
What is Amazon Athena?
Use cases
Separation of storage and compute
Obtaining and preparing sample data
Running your first query
Creating your first table
Running your first analytics queries
Summary
Chapter 2
: Introduction to Amazon Athena
Technical requirements
Getting to know Amazon Athena
Understanding the "serverless" trend
Beyond "serverless" with 'fully managed' offerings
Key features
What is Presto?
Understanding scale and latency
TableScan performance
Memory-bound operations
Writing results
Metering and billing
Additional costs
File formats affect cost and performance
Cost controls
Connecting and securing
Determining when to use Amazon Athena
Ad hoc analytics
Adding analytics features to your application
Serverless ETL pipeline
Other use cases
Summary
Further reading
Chapter 3
: Key Features, Query Types, and Functions
Technical requirements
Running ETL queries
Using CREATE-TABLE-AS-SELECT
Using INSERT-INTO
Running approximate queries
Organizing workloads with WorkGroups and saved queries
Using Athena's APIs
Summary
Section 2: Building and Connecting to Your Data Lake
Chapter 4
: Metastores, Data Sources, and Data Lakes
Technical requirements
What is a metastore?
Data sources, connectors, and catalogs
Databases and schemas
Tables/datasets
What is a data source?
S3 data sources
Other data sources
Registering S3 datasets in your metastore
Using Athena CREATE TABLE statements
Using Athena's Create Table wizard
Using the AWS Glue console
Using AWS Glue Crawlers
Discovering your datasets on S3 using AWS Glue Crawlers
How do AWS Glue Crawlers work?
AWS Glue Crawler best practices for Athena
Designing a data lake architecture
Stages of data
Transforming data using Athena
Summary
Further reading
Chapter 5
: Securing Your Data
Technical requirements
General best practices to protect your data on AWS
Separating permissions based on IAM users, roles, or even accounts
Least privilege for IAM users, roles, and accounts
Rotating IAM user credentials frequently
Blocking public access on S3 buckets
Enabling data and metadata encryption and enforcing it
Ensuring that auditing is enabled
Good intentions cannot replace good mechanisms
Encrypting your data and metadata in Glue Data Catalog
Encrypting your data
Encrypting your metadata in Glue Data Catalog
Enabling coarse-grained access controls with IAM resource policies for data on S3
Enabling FGACs with Lake Formation for data on S3
Auditing with CloudTrail and S3 access logs
Auditing with AWS CloudTrail
Auditing with S3 server access logs
Summary
Further reading
Chapter 6
: AWS Glue and AWS Lake Formation
Technical requirements
What AWS Glue and AWS Lake Formation can do for you
Securing your data lake with Lake Formation
What AWS Lake Formation governed tables can do for you
Summary
Further reading
Section 3: Using Amazon Athena
Chapter 7
: Ad Hoc Analytics
Technical requirements
Understanding the ad hoc analytics hype
Building an ad hoc analytics strategy
Choosing your storage
Sharing data
Selecting query engines
Deploying to customers
Using QuickSight with Athena
Getting sample data
Setting up QuickSight
Using Jupyter Notebooks with Athena
pandas
Matplotlib and Seaborn
SciPy and NumPy
Using our notebook to explore
Summary
Chapter 8
: Querying Unstructured and Semi-Structured Data
Technical requirements
Why isn't all data structured to begin with?
Querying JSON data
Reading our customer's dataset
Parsing JSON fields
Other considerations when reading JSON
Querying comma-separated value and tab-separated value data
Querying arbitrary log data
Doing full log scans on S3
Reading application log data
Summary
Further reading
Chapter 9
: Serverless ETL Pipelines
Technical requirements
Understanding the uses of ETL
ETL for integration
ETL for aggregation
ETL for modularization
ETL for performance
Deciding whether to ETL or query in place
Designing ETL queries for Athena
Don't forget about performance
Begin with integration points
Use an orchestrator
Using Lambda as an orchestrator
Creating an ETL function
Coding the ETL function
Testing your ETL function
Triggering ETL queries with S3 notifications
Summary
Chapter 10
: Building Applications with Amazon Athena
Technical requirements
Connecting to Athena
JDBC and ODBC
Which one should I use?
Best practices for connecting to Athena
Idempotency tokens
Query tracking
Securing your application
Credential management
Network safety
Optimizing for performance and cost
Workload isolation
Application monitoring
CTAS for large result sets
Summary
Chapter 11
: Operational Excellence – Monitoring, Optimization, and Troubleshooting
Technical requirements
Monitoring Athena to ensure queries run smoothly
Optimizing for cost and performance
Troubleshooting failing queries
Summary
Further reading
Section 4: Advanced Topics
Chapter 12
: Athena Query Federation
Technical requirements
What is Query Federation?
Athena Query Federation features
How Athena Connectors work
Using Lambda for big data
Federating queries across VPCs
Using pre-built Connectors
Building a custom connector
Setting up your development environment
Writing your connector code
Summary
Chapter 13
: Athena UDFs and ML
Technical requirements
What are UDFs?
Writing a new UDF
Setting up your development environment
Writing your UDF code
Building your UDF code
Deploying your UDF code
Using your UDF
Using built-in ML UDFs
Pre-setup requirements
Setting up your SageMaker notebook
Using our notebook to train a model
Using our trained model in an Athena UDF
Summary
Chapter 14
: Lake Formation – Advanced Topics
Reinforcing your data perimeter with Lake Formation
Establishing a data perimeter
Shared responsibility security model
How Lake Formation can help
Understanding the benefits of governed tables
ACID transactions on S3-backed tables
Summary
Further reading
Other Books You May Enjoy
Add Highlight
No Comment
..................Content has been hidden....................
You can't read the all page of ebook, please click
here
login for view all page.
Day Mode
Cloud Mode
Night Mode
Reset