Reading and writing pandas DataFrames to HDF5 stores

The HDFStore class is the pandas abstraction responsible for dealing with HDF5 data. Using random data and temporary files, we will demonstrate this functionality. These are the steps to do so:

Give the HDFStore constructor the path to a temporary file and create a store:

store = pd.io.pytables.HDFStore(tmpf.name)
print store

The preceding code snippet will print the file path to the store and its contents, which is empty at the moment:

<class 'pandas.io.pytables.HDFStore'>
File path: /var/folders/k_/xx_xz6xj0hx627654s3vld440000gn/T/tmpfmwPPB
Empty

HDFStore has a dict-like interface, meaning that we can store values, for instance, a pandas DataFrame with a corresponding lookup key. Store a DataFrame containing random data in HDFStore as follows:

store['df'] = df
print store

Now the store contains data as illustrated in the following output:

<class 'pandas.io.pytables.HDFStore'>
File path: /var/folders/k_/xx_xz6xj0hx627654s3vld440000gn/T/tmpfwyLIN
/df            frame        (shape->[365,4])

We can access the DataFrame in three ways: with the get() method, a dict-like lookup, or dotted access. So let's try this out:

print "Get", store.get('df').shape
print "Lookup", store['df'].shape
print "Dotted", store.df.shape

The shape of the DataFrame is the same for all three access methods:

Get (365, 4)
Lookup (365, 4)
Dotted (365, 4)

We can delete an item in the store by calling the remove() method or with the del operator. Obviously, we can remove an item only once. Delete the DataFrame from the store:

del store['df']
print "After del
", store

The store is now empty again:

After del
<class 'pandas.io.pytables.HDFStore'>
File path: /var/folders/k_/xx_xz6xj0hx627654s3vld440000gn/T/tmpR6j_K5
Empty

The is_open attribute indicates whether the store is open or not. The store can be closed with the close() method. Close the store and check that it is closed:

print "Before close", store.is_open
store.close()
print "After close", store.is_open

Once closed, the store is no longer open as confirmed by the following:

Before close True
After close False

pandas also provides a DataFrame to_hdf() method and a top-level read_hdf() function to read and write HDF data. Call the to_hdf() method and read the data:

df.to_hdf(tmpf.name, 'data', format='table')
print pd.read_hdf(tmpf.name, 'data', where=['index>363'])

The arguments of the reading and writing API are a file path, an identifier for the group in the store, and an optional format string. The format can either be fixed or table. The fixed format is faster, but you cannot append or search. The table format corresponds to a PyTables Table structure and allows searching and selection. We get the following values for the query on the DataFrame:

            0         1         2         3
364  0.753342  0.381158  1.289753  0.673181

[1 rows x 4 columns]

The pd_hdf.py file in this book's code bundle contains the following code:

import numpy as np
import pandas as pd
from tempfile import NamedTemporaryFile

np.random.seed(42)
a = np.random.randn(365, 4)

tmpf = NamedTemporaryFile()
store = pd.io.pytables.HDFStore(tmpf.name)
print store

df = pd.DataFrame(a)
store['df'] = df
print store

print "Get", store.get('df').shape
print "Lookup", store['df'].shape
print "Dotted", store.df.shape

del store['df']
print "After del
", store

print "Before close", store.is_open
store.close()
print "After close", store.is_open

df.to_hdf(tmpf.name, 'data', format='table')
print pd.read_hdf(tmpf.name, 'data', where=['index>363'])
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.7.131