How to do it...

Read in the Denver crime dataset, place the REPORTED_DATE in the index, and sort it:

>>> crime_sort = pd.read_hdf('data/crime.h5', 'crime') 
               .set_index('REPORTED_DATE') 
               .sort_index()

Find the last full month of data:

>>> crime_sort.index.max()
Timestamp('2017-09-29 06:16:00')

As we don't quite have all of September's data, let's drop it from our dataset:

>>> crime_sort = crime_sort[:'2017-8']
>>> crime_sort.index.max()
Timestamp('2017-08-31 23:52:00')

Let's count the number of crimes and traffic accidents for every month:

>>> all_data = crime_sort.groupby([pd.Grouper(freq='M'),
                                   'OFFENSE_CATEGORY_ID']).size()
>>> all_data.head()
REPORTED_DATE  OFFENSE_CATEGORY_ID
2012-01-31     aggravated-assault     113
               all-other-crimes       124
               arson                    5
               auto-theft             275
               burglary               343
dtype: int64

Although the merge_asof function can work with the index, it will be easier to just reset it:

>>> all_data = all_data.sort_values().reset_index(name='Total')
>>> all_data.head()

Let's get the current month's crime count and make a new column to represent the goal:

>>> goal = all_data[all_data['REPORTED_DATE'] == '2017-8-31'] 
                   .reset_index(drop=True)
>>> goal['Total_Goal'] = goal['Total'].mul(.8).astype(int)
>>> goal.head()

Now use the merge_asof function to find the last time a monthly crime total was less than the column Total_Goal for each offense category:

>>> pd.merge_asof(goal, all_data, left_on='Total_Goal',
                  right_on='Total', by='OFFENSE_CATEGORY_ID',
                  suffixes=('_Current', '_Last'))

Table of Contents for How to do it...

Create new playlist

Sign In

Sign Up

Table of Contents for
How to do it...