The HDFStore
class is the pandas abstraction responsible for dealing with HDF5 data. Using random data and temporary files, we will demonstrate this functionality. These are the steps to do so:
Give the HDFStore
constructor the path to a temporary file and create a store:
store = pd.io.pytables.HDFStore(tmpf.name) print store
The preceding code snippet will print the file path to the store and its contents, which is empty at the moment:
<class 'pandas.io.pytables.HDFStore'> File path: /var/folders/k_/xx_xz6xj0hx627654s3vld440000gn/T/tmpfmwPPB Empty
HDFStore
has a dict-like interface, meaning that we can store values, for instance, a pandas DataFrame
with a corresponding lookup key. Store a DataFrame
containing random data in HDFStore
as follows:
store['df'] = df print store
Now the store contains data as illustrated in the following output:
<class 'pandas.io.pytables.HDFStore'> File path: /var/folders/k_/xx_xz6xj0hx627654s3vld440000gn/T/tmpfwyLIN /df frame (shape->[365,4])
We can access the DataFrame
in three ways: with the get()
method, a dict-like lookup, or dotted access. So let's try this out:
print "Get", store.get('df').shape print "Lookup", store['df'].shape print "Dotted", store.df.shape
The shape of the DataFrame
is the same for all three access methods:
Get (365, 4) Lookup (365, 4) Dotted (365, 4)
We can delete an item in the store by calling the remove()
method or with the del
operator. Obviously, we can remove an item only once. Delete the DataFrame
from the store:
del store['df'] print "After del ", store
The store is now empty again:
After del <class 'pandas.io.pytables.HDFStore'> File path: /var/folders/k_/xx_xz6xj0hx627654s3vld440000gn/T/tmpR6j_K5 Empty
The is_open
attribute indicates whether the store is open or not. The store can be closed with the close()
method. Close the store and check that it is closed:
print "Before close", store.is_open store.close() print "After close", store.is_open
Once closed, the store is no longer open as confirmed by the following:
Before close True After close False
pandas also provides a DataFrame to_hdf()
method and a top-level read_hdf()
function to read and write HDF data. Call the to_hdf()
method and read the data:
df.to_hdf(tmpf.name, 'data', format='table') print pd.read_hdf(tmpf.name, 'data', where=['index>363'])
The arguments of the reading and writing API are a file path, an identifier for the group in the store, and an optional format string. The format can either be fixed or table. The fixed format is faster, but you cannot append or search. The table format corresponds to a PyTables Table
structure and allows searching and selection. We get the following values for the query on the DataFrame
:
0 1 2 3 364 0.753342 0.381158 1.289753 0.673181 [1 rows x 4 columns]
The pd_hdf.py
file in this book's code bundle contains the following code:
import numpy as np import pandas as pd from tempfile import NamedTemporaryFile np.random.seed(42) a = np.random.randn(365, 4) tmpf = NamedTemporaryFile() store = pd.io.pytables.HDFStore(tmpf.name) print store df = pd.DataFrame(a) store['df'] = df print store print "Get", store.get('df').shape print "Lookup", store['df'].shape print "Dotted", store.df.shape del store['df'] print "After del ", store print "Before close", store.is_open store.close() print "After close", store.is_open df.to_hdf(tmpf.name, 'data', format='table') print pd.read_hdf(tmpf.name, 'data', where=['index>363'])
18.117.7.131