Home Page Icon
Home Page
Table of Contents for
Cover
Close
Cover
by John Russell
Getting Started with Impala
Introduction
Who Is This Book For?
Conventions Used in This Book
Using Code Examples
Safari® Books Online
How to Contact Us
Acknowledgments
1. Why Impala?
Impala’s Place in the Big Data Ecosystem
Flexibility for Your Big Data Workflow
High-Performance Analytics
Exploratory Business Intelligence
2. Getting Up and Running with Impala
Installation
Connecting to Impala
Your First Impala Queries
3. Impala for the Database Developer
The SQL Language
Standard SQL
Limited DML
No Transactions
Numbers
Recent Additions
Big Data Considerations
Billions and Billions of Rows
HDFS Block Size
Parquet Files: The Biggest Blocks of All
How Impala Is Like a Data Warehouse
Physical and Logical Data Layouts
The HDFS Storage Model
Distributed Queries
Normalized and Denormalized Data
File Formats
Text File Format
Parquet File Format
Getting File Format Information
Switching File Formats
Aggregation
4. Common Developer Tasks for Impala
Getting Data into an Impala Table
INSERT Statement
LOAD DATA Statement
External Tables
Figuring Out Where Impala Data Resides
Manually Loading Data Files into HDFS
Hive
Sqoop
Kite
Porting SQL Code to Impala
Using Impala from a JDBC or ODBC Application
JDBC
ODBC
Using Impala with a Scripting Language
Running Impala SQL Statements from Scripts
Variable Substitution
Saving Query Results
The impyla Package for Python Scripting
Optimizing Impala Performance
Optimizing Query Performance
Optimizing Memory Usage
Working with Partitioned Tables
Finding the Ideal Granularity
Inserting into Partitioned Tables
Adding and Loading New Partitions
Writing User-Defined Functions
Collaborating with Your Administrators
Designing for Security
Understanding Resource Management
Helping to Plan for Performance (Stats, HDFS Caching)
Understanding Cluster Topology
Always Close Your Queries
5. Tutorials and Deep Dives
Tutorial: From Unix Data File to Impala Table
Tutorial: Queries Without a Table
Tutorial: The Journey of a Billion Rows
Generating a Billion Rows of CSV Data
Normalizing the Original Data
Converting to Parquet Format
Making a Partitioned Table
Next Steps
Deep Dive: Joins and the Role of Statistics
Creating a Million-Row Table to Join With
Loading Data and Computing Stats
Reviewing the EXPLAIN Plan
Trying a Real Query
The Story So Far
Final Join Query with 1B x 1M Rows
Anti-Pattern: A Million Little Pieces
Tutorial: Across the Fourth Dimension
TIMESTAMP Data Type
Format Strings for Dates and Times
Working with Individual Date and Time Fields
Date and Time Arithmetic
Let’s Solve the Y2K Problem
More Fun with Dates
Tutorial: Verbose and Quiet impala-shell Output
Tutorial: When Schemas Evolve
Numbers Versus Strings
Dealing with Out-of-Range Integers
Tutorial: Levels of Abstraction
String Formatting
Temperature Conversion
Colophon
Copyright
Search in book...
Toggle Font Controls
Playlists
Add To
Create new playlist
Name your new playlist
Playlist description (optional)
Cancel
Create playlist
Sign In
Email address
Password
Forgot Password?
Create account
Login
or
Continue with Facebook
Continue with Google
Sign Up
Full Name
Email address
Confirm Email Address
Password
Login
Create account
or
Continue with Facebook
Continue with Google
Next
Next Chapter
Getting Started with Impala
Add Highlight
No Comment
..................Content has been hidden....................
You can't read the all page of ebook, please click
here
login for view all page.
Day Mode
Cloud Mode
Night Mode
Reset