Home Page Icon
Home Page
Table of Contents for
Enhanced aggregation
Close
Enhanced aggregation
by Dayong Du
Apache Hive Essentials
Title Page
Copyright and Credits
Apache Hive Essentials Second Edition
Dedication
Packt Upsell
Why subscribe?
PacktPub.com
Contributors
About the author
About the reviewers
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
Overview of Big Data and Hive
A short history
Introducing big data
The relational and NoSQL databases versus Hadoop
Batch, real-time, and stream processing
Overview of the Hadoop ecosystem
Hive overview
Summary
Setting Up the Hive Environment
Installing Hive from Apache
Installing Hive from vendors
Using Hive in the cloud 
Using the Hive command
Using the Hive IDE
Summary
Data Definition and Description
Understanding data types
Data type conversions
Data Definition Language
Database
Tables
Table creation
Table description
Table cleaning
Table alteration
Partitions
Buckets
Views
Summary
Data Correlation and Scope
Project data with SELECT
Filtering data with conditions
Linking data with JOIN
INNER JOIN
OUTER JOIN
Special joins
Combining data with UNION
Summary
Data Manipulation
Data exchanging with LOAD
Data exchange with INSERT
Data exchange with [EX|IM]PORT
Data sorting
Functions
Function tips for collections
Function tips for date and string
Virtual column functions
Transactions and locks
Transactions
UPDATE statement
DELETE statement
MERGE statement
Locks
Summary
Data Aggregation and Sampling
Basic aggregation 
Enhanced aggregation
Grouping sets
Rollup and Cube
Aggregation condition
Window functions
Window aggregate functions
Window sort functions
Window analytics functions
Window expression
Sampling
Random sampling
Bucket table sampling
Block sampling
Summary
Performance Considerations
Performance utilities
EXPLAIN statement
ANALYZE statement
Logs
Design optimization
Partition table design
Bucket table design
Index design
Use skewed/temporary tables
Data optimization
File format
Compression
Storage optimization
Job optimization
Local mode
JVM reuse
Parallel execution
Join optimization
Common join
Map join
Bucket map join
Sort merge bucket (SMB) join
Sort merge bucket map (SMBM) join
Skew join
Job engine
Optimizer
Vectorization optimization
Cost-based optimization
Summary
Extensibility Considerations
User-defined functions
UDF code template
UDAF code template
UDTF code template
Development and deployment
HPL/SQL
Streaming
SerDe
Summary
Security Considerations
Authentication
Metastore authentication
Hiveserver2 authentication
Authorization
Legacy mode
Storage-based mode
SQL standard-based mode
Mask and encryption
The data-hashing function
The data-masking function
The data-encryption function
Other methods
Summary
Working with Other Tools
The JDBC/ODBC connector
NoSQL
The Hue/Ambari Hive view
HCatalog
Oozie
Spark
Hivemall
Summary
Other Books You May Enjoy
Leave a review - let other readers know what you think
Search in book...
Toggle Font Controls
Playlists
Add To
Create new playlist
Name your new playlist
Playlist description (optional)
Cancel
Create playlist
Sign In
Email address
Password
Forgot Password?
Create account
Login
or
Continue with Facebook
Continue with Google
Sign Up
Full Name
Email address
Confirm Email Address
Password
Login
Create account
or
Continue with Facebook
Continue with Google
Prev
Previous Chapter
Basic aggregation 
Next
Next Chapter
Grouping sets
Enhanced aggregation
Hive offers enhanced aggregation by using the
GROUPING SETS
,
CUBE
, and
ROLLUP
keywords.
Add Highlight
No Comment
..................Content has been hidden....................
You can't read the all page of ebook, please click
here
login for view all page.
Day Mode
Cloud Mode
Night Mode
Reset