Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 5. Retrieving, Processing, and Storing Data

Data can be found everywhere in all shapes and forms. We can get it from the Web, by e-mail and FTP, or create it ourselves in a lab experiment or marketing poll. An exhaustive overview of how to acquire data in various formats will require many more pages than what we have available. Sometimes, we need to store data before we can analyze it or after we are done with our analysis. We will also discuss storing data in this chapter. Chapter 8, Working with Databases, gives information about various databases (relational and NoSQL) and related APIs. The following is a list of the topics that we are going to cover in this chapter:

Writing CSV files with NumPy and pandas
The binary .npy and pickle formats
Reading and writing to Excel with pandas
JSON
REST web services
Parsing RSS feeds
Scraping the Web
Parsing HTML
Storing data with PyTables
HDF5 pandas I/O

Writing CSV files with NumPy and pandas

In the previous chapters, we learned about reading CSV files. Writing CSV files is just as straightforward, but uses different functions and methods. Let's first generate some data to be stored in the CSV format. Generate a 3 x 4 NumPy array after seeding the random generator in the following code snippet.

Set one of the array values to NaN:

np.random.seed(42)

a = np.random.randn(3, 4)
a[2][2] = np.nan
print a

This code will print the array as follows:

[[ 0.49671415 -0.1382643   0.64768854  1.52302986]
 [-0.23415337 -0.23413696  1.57921282  0.76743473]
 [-0.46947439  0.54256004         nan -0.46572975]]

The NumPy savetxt() function is the counterpart of the NumPy loadtxt() function and can save arrays in delimited file formats such as CSV. Save the array we created with the following function call:

np.savetxt('np.csv', a, fmt='%.2f', delimiter=',', header=" #1,  #2,  #3,  #4")

In the preceding function call, we specified the name of the file to be saved, the array, an optional format, a delimiter (the default is space), and an optional header.

Note

The format parameter is documented at http://docs.python.org/2/library/string.html#format-specification-mini-language.

View the np.csv file we created with the cat command (cat np.csv) or an editor, such as Notepad on Windows. The contents of the file should be displayed as follows:

#  #1,  #2,  #3,  #4
0.50,-0.14,0.65,1.52
-0.23,-0.23,1.58,0.77
-0.47,0.54,nan,-0.47

Create a pandas DataFrame from the random values array:

df = pd.DataFrame(a)
print df

As you can observe, pandas automatically comes up with column names for our data:

          0         1         2         3
0  0.496714 -0.138264  0.647689  1.523030
1 -0.234153 -0.234137  1.579213  0.767435
2 -0.469474  0.542560NaN -0.465730

Write a DataFrame to a CSV file with the pandas to_csv() method as follows:

df.to_csv('pd.csv', float_format='%.2f', na_rep="NAN!")

We gave this method the name of the file, an optional format string analogous to the format parameter of the NumPy savetxt() function, and an optional string that represents NaN. View the pd.csv file to see the following:

,0,1,2,3
0,0.50,-0.14,0.65,1.52
1,-0.23,-0.23,1.58,0.77
2,-0.47,0.54,NAN!,-0.47

Take a look at the code in the writing_csv.py file in this book's code bundle:

import numpy as np
import pandas as pd

np.random.seed(42)

a = np.random.randn(3, 4)
a[2][2] = np.nan
print a
np.savetxt('np.csv', a, fmt='%.2f', delimiter=',', header=" #1,  #2,  #3,  #4")
df = pd.DataFrame(a)
print df
df.to_csv('pd.csv', float_format='%.2f', na_rep="NAN!")

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 5. Retrieving, Processing, and Storing Data

Create new playlist

Sign In

Sign Up

Chapter 5. Retrieving, Processing, and Storing Data

Writing CSV files with NumPy and pandas

Note

Table of Contents for
5. Retrieving, Processing, and Storing Data