How to do it...

  1. Read in the Denver crime dataset, place the REPORTED_DATE in the index, and sort it:
>>> crime_sort = pd.read_hdf('data/crime.h5', 'crime') 
.set_index('REPORTED_DATE')
.sort_index()
  1. Find the last full month of data:
>>> crime_sort.index.max()
Timestamp('2017-09-29 06:16:00')
  1. As we don't quite have all of September's data, let's drop it from our dataset:
>>> crime_sort = crime_sort[:'2017-8']
>>> crime_sort.index.max()
Timestamp('2017-08-31 23:52:00')
  1. Let's count the number of crimes and traffic accidents for every month:
>>> all_data = crime_sort.groupby([pd.Grouper(freq='M'),
'OFFENSE_CATEGORY_ID']).size()
>>> all_data.head()
REPORTED_DATE OFFENSE_CATEGORY_ID 2012-01-31 aggravated-assault 113 all-other-crimes 124 arson 5 auto-theft 275 burglary 343 dtype: int64
  1. Although the merge_asof function can work with the index, it will be easier to just reset it:
>>> all_data = all_data.sort_values().reset_index(name='Total')
>>> all_data.head()
  1. Let's get the current month's crime count and make a new column to represent the goal:
>>> goal = all_data[all_data['REPORTED_DATE'] == '2017-8-31'] 
.reset_index(drop=True)
>>> goal['Total_Goal'] = goal['Total'].mul(.8).astype(int)
>>> goal.head()
  1. Now use the merge_asof function to find the last time a monthly crime total was less than the column Total_Goal for each offense category:
>>> pd.merge_asof(goal, all_data, left_on='Total_Goal',
right_on='Total', by='OFFENSE_CATEGORY_ID',
suffixes=('_Current', '_Last'))

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.139.97.40