Home Page Icon
Home Page
Table of Contents for
In-Memory Analytics with Apache Arrow
Close
In-Memory Analytics with Apache Arrow
by Matthew Topol, Wes McKinney
In-Memory Analytics with Apache Arrow
In-Memory Analytics with Apache Arrow
Foreword
Acknowledgments
Contributors
About the author
About the reviewers
Preface
Section 1: Overview of What Arrow Is, its Capabilities, Benefits, and Goals
Chapter 1: Getting Started with Apache Arrow
Chapter 2: Working with Key Arrow Specifications
Chapter 3: Data Science with Apache Arrow
Section 2: Interoperability with Arrow: pandas, Parquet, Flight, and Datasets
Chapter 4: Format and Memory Handling
Chapter 5: Crossing the Language Barrier with the Arrow C Data API
Chapter 6: Leveraging the Arrow Compute APIs
Chapter 7: Using the Arrow Datasets API
Chapter 8: Exploring Apache Arrow Flight RPC
Section 3: Real-World Examples, Use Cases, and Future Development
Chapter 9: Powered by Apache Arrow
Chapter 10: How to Leave Your Mark on Arrow
Chapter 11: Future Development and Plans
Other Books You May Enjoy
Search in book...
Toggle Font Controls
Playlists
Add To
Create new playlist
Name your new playlist
Playlist description (optional)
Cancel
Create playlist
Sign In
Email address
Password
Forgot Password?
Create account
Login
or
Continue with Facebook
Continue with Google
Sign Up
Full Name
Email address
Confirm Email Address
Password
Login
Create account
or
Continue with Facebook
Continue with Google
Prev
Previous Chapter
In-Memory Analytics with Apache Arrow
Next
Next Chapter
Preface
Table of Contents
Preface
Section 1: Overview of What Arrow Is, its Capabilities, Benefits, and Goals
Chapter 1
: Getting Started with Apache Arrow
Technical requirements
Understanding the Arrow format and specifications
Why does Arrow use a columnar in-memory format?
Learning the terminology and physical memory layout
Quick summary of physical layouts, or TL;DR
How to speak Arrow
Arrow format versioning and stability
Would you download a library? Of course!
Setting up your shooting range
Using pyarrow For Python
C++ for the 1337 coders
Go Arrow go!
Summary
References
Chapter 2
: Working with Key Arrow Specifications
Technical requirements
Playing with data, wherever it might be!
Working with Arrow tables
Accessing data files with pyarrow
Accessing data files with Arrow in C++
pandas firing Arrow
Putting pandas in your quiver
Making pandas run fast
Keeping pandas from running wild
Sharing is caring… especially when it's your memory
Diving into memory management
Managing buffers for performance
Crossing the boundaries
Summary
Chapter 3
: Data Science with Apache Arrow
Technical requirements
ODBC takes an Arrow to the knee
Lost in translation
SPARKing new ideas on Jupyter
Understanding the integration
Everyone gets a containerized development environment!
SPARKing joy with Arrow and PySpark
Interactive charting powered by Arrow
Stretching workflows onto Elasticsearch
Indexing the data
Summary
Section 2: Interoperability with Arrow: pandas, Parquet, Flight, and Datasets
Chapter 4
: Format and Memory Handling
Technical requirements
Storage versus runtime in-memory versus message-passing formats
Long-term storage formats
In-memory runtime formats
Message-passing formats
Summing up
Passing your Arrows around
What is this sorcery?!
Producing and consuming Arrows
Learning about memory cartography
The base case
Parquet versus CSV
Mapping data into memory
Too long; didn't read (TL;DR) – Computers are magic
Summary
Chapter 5
: Crossing the Language Barrier with the Arrow C Data API
Technical requirements
Using the Arrow C data interface
The ArrowSchema structure
The ArrowArray structure
Example use cases
Using the C Data API to export Arrow-formatted data
Importing Arrow data with Python
Exporting Arrow data with the C Data API from Python to Go
Streaming across the C Data API
Streaming record batches from Python to Go
Other use cases
Some exercises
Summary
Chapter 6
: Leveraging the Arrow Compute APIs
Technical requirements
Letting Arrow do the work for you
Input shaping
Value casting
Types of functions
Executing compute functions
Using the C++ compute library
Using the compute library in Python
Picking the right tools
Adding a constant value to an array
Summary
Chapter 7
: Using the Arrow Datasets API
Technical requirements
Querying multifile datasets
Creating a sample dataset
Discovering dataset fragments
Filtering data programmatically
Expressing yourself – a quick detour
Using expressions for filtering data
Deriving and renaming columns (projecting)
Using the Datasets API in Python
Creating our sample dataset
Discovering the dataset
Using different file formats
Filtering and projecting columns with Python
Streaming results
Working with partitioned datasets
Summary
Chapter 8
: Exploring Apache Arrow Flight RPC
Technical requirements
The basics and complications of gRPC
Building modern APIs for data
Efficiency and streaming are important
Arrow Flight's building blocks
Horizontal scalability with Arrow Flight
Adding your business logic to Flight
Other bells and whistles
Understanding the Flight Protocol Buffer definitions
Using Flight, choose your language!
Building a Python Flight Server
Building a Go Flight server
What is Flight SQL?
Setting up a performance test
Running the performance test
Flight SQL, the new kid on the block
Summary
Section 3: Real-World Examples, Use Cases, and Future Development
Chapter 9
: Powered by Apache Arrow
Swimming in data with Dremio Sonar
Clarifying Dremio Sonar's architecture
The library of the Gods…of data analysis
Spicing up your ML workflows
Bringing the AI engine to where the data lives
Arrow in the browser using JavaScript
Gaining a little perspective
Taking flight with Falcon
Summary
Chapter 10
: How to Leave Your Mark on Arrow
Technical requirements
Contributing to open source projects
Communication is key
You don't necessarily have to contribute code
There are a lot of reasons why you should contribute!
Preparing your first pull request
Navigating JIRA
Setting up Git
Orienting yourself in the code base
Building the Arrow libraries
Creating the PR
Understanding the CI configuration
Development using Archery
Find your interest and expand on it
Getting that sweet, sweet approval
Finishing up with style!
C++ styling
Python code styling
Go code styling
Summary
Chapter 11
: Future Development and Plans
Examining Flight SQL (redux)
Why Flight SQL?
Defining the Flight SQL protocol
Firing a Ballista using Data(Fusion)
What about Spark?
Looking at Ballista's development roadmap
Building a cross-language compute serialization
Why Substrait?
Working with Substrait serialization
Getting involved with Substrait development
Final words
Why subscribe?
Other Books You May Enjoy
Packt is searching for authors like you
Share Your Thoughts
Add Highlight
No Comment
..................Content has been hidden....................
You can't read the all page of ebook, please click
here
login for view all page.
Day Mode
Cloud Mode
Night Mode
Reset