How to do it...

  1. Read in the crime hdf5 dataset, set the index as the REPORTED_DATE, and then sort it to increase performance for the rest of the recipe:
>>> crime_sort = pd.read_hdf('data/crime.h5', 'crime') 
.set_index('REPORTED_DATE')
.sort_index()
  1. In order to count the number of crimes per week, we need to form a group for each week. The resample method takes a DateOffset object or alias and returns an object ready to perform an action on all groups. The object returned from the resample method is very similar to the object produced after calling the groupby method:
>>> crime_sort.resample('W')
DatetimeIndexResampler [freq=<Week: weekday=6>, axis=0, closed=right, label=right, convention=start, base=0]
  1. The offset alias, W, was used to inform pandas that we want to group by each week. There isn't much that happened in the preceding step. Pandas has simply validated our offset and returned an object that is ready to perform an action on each week as a group. There are several methods that we can chain after calling resample to return some data. Let's chain the size method to count the number of weekly crimes:
>>> weekly_crimes = crime_sort.resample('W').size()
>>> weekly_crimes.head()
REPORTED_DATE 2012-01-08 877 2012-01-15 1071 2012-01-22 991 2012-01-29 988 2012-02-05 888 Freq: W-SUN, dtype: int64
  1. We now have the weekly crime count as a Series with the new index incrementing one week at a time. There are a few things that happen by default that are very important to understand. Sunday is chosen as the last day of the week and is also the date used to label each element in the resulting Series. For instance, the first index value January 8, 2012 is a Sunday. There were 877 crimes committed during that week ending on the 8th. The week of Monday, January 9th to Sunday, January 15th recorded 1,071 crimes. Let's do some sanity checks and ensure that our resampling is doing exactly this:
>>> len(crime_sort.loc[:'2012-1-8'])
877

>>> len(crime_sort.loc['2012-1-9':'2012-1-15'])
1071
  1. Let's choose a different day to end the week besides Sunday with an anchored offset:
>>> crime_sort.resample('W-THU').size().head()
REPORTED_DATE 2012-01-05 462 2012-01-12 1116 2012-01-19 924 2012-01-26 1061 2012-02-02 926 Freq: W-THU, dtype: int64
  1. Nearly all the functionality of resample may be reproduced by the groupby method. The only difference is that you must pass the offset in the pd.Grouper object:
>>> weekly_crimes_gby = crime_sort.groupby(pd.Grouper(freq='W')) 
.size()
>>> weekly_crimes_gby.head()
REPORTED_DATE 2012-01-08 877 2012-01-15 1071 2012-01-22 991 2012-01-29 988 2012-02-05 888 Freq: W-SUN, dtype: int64

>>> weekly_crimes.equal(weekly_crimes_gby)
True
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.96.105