How to do it...

Read in the crime hdf5 dataset, set the index as the REPORTED_DATE, and then sort it to increase performance for the rest of the recipe:

>>> crime_sort = pd.read_hdf('data/crime.h5', 'crime') 
                   .set_index('REPORTED_DATE') 
                   .sort_index()

In order to count the number of crimes per week, we need to form a group for each week. The resample method takes a DateOffset object or alias and returns an object ready to perform an action on all groups. The object returned from the resample method is very similar to the object produced after calling the groupby method:

>>> crime_sort.resample('W')
DatetimeIndexResampler [freq=<Week: weekday=6>, axis=0, closed=right, label=right, convention=start, base=0]

The offset alias, W, was used to inform pandas that we want to group by each week. There isn't much that happened in the preceding step. Pandas has simply validated our offset and returned an object that is ready to perform an action on each week as a group. There are several methods that we can chain after calling resample to return some data. Let's chain the size method to count the number of weekly crimes:

>>> weekly_crimes = crime_sort.resample('W').size()
>>> weekly_crimes.head()
REPORTED_DATE
2012-01-08     877
2012-01-15    1071
2012-01-22     991
2012-01-29     988
2012-02-05     888
Freq: W-SUN, dtype: int64

We now have the weekly crime count as a Series with the new index incrementing one week at a time. There are a few things that happen by default that are very important to understand. Sunday is chosen as the last day of the week and is also the date used to label each element in the resulting Series. For instance, the first index value January 8, 2012 is a Sunday. There were 877 crimes committed during that week ending on the 8th. The week of Monday, January 9th to Sunday, January 15th recorded 1,071 crimes. Let's do some sanity checks and ensure that our resampling is doing exactly this:

>>> len(crime_sort.loc[:'2012-1-8'])
877

>>> len(crime_sort.loc['2012-1-9':'2012-1-15'])
1071

Let's choose a different day to end the week besides Sunday with an anchored offset:

>>> crime_sort.resample('W-THU').size().head()
REPORTED_DATE
2012-01-05     462
2012-01-12    1116
2012-01-19     924
2012-01-26    1061
2012-02-02     926
Freq: W-THU, dtype: int64

Nearly all the functionality of resample may be reproduced by the groupby method. The only difference is that you must pass the offset in the pd.Grouper object:

>>> weekly_crimes_gby = crime_sort.groupby(pd.Grouper(freq='W')) 
                                  .size()
>>> weekly_crimes_gby.head()
REPORTED_DATE
2012-01-08     877
2012-01-15    1071
2012-01-22     991
2012-01-29     988
2012-02-05     888
Freq: W-SUN, dtype: int64

>>> weekly_crimes.equal(weekly_crimes_gby)
True

Table of Contents for How to do it...

Create new playlist

Sign In

Sign Up

Table of Contents for
How to do it...