- Read in the Denver crime hdf5 file, place the REPORTED_DATE column in the index, and sort it:
>>> crime_sort = pd.read_hdf('data/crime.h5', 'crime')
.set_index('REPORTED_DATE')
.sort_index()
- The DatetimeIndex itself has many of the same attributes and methods as a pandas Timestamp. Let's take a look at some that they have in common:
>>> common_attrs = set(dir(crime_sort.index)) &
set(dir(pd.Timestamp))
>>> print([attr for attr in common_attrs if attr[0] != '_'])
['to_pydatetime', 'normalize', 'day', 'dayofyear', 'freq', 'ceil',
'microsecond', 'tzinfo', 'weekday_name', 'min', 'quarter', 'month',
'tz_convert', 'tz_localize', 'is_month_start', 'nanosecond', 'tz',
'to_datetime', 'dayofweek', 'year', 'date', 'resolution', 'is_quarter_end',
'weekofyear', 'is_quarter_start', 'max', 'is_year_end', 'week', 'round',
'strftime', 'offset', 'second', 'is_leap_year', 'is_year_start',
'is_month_end', 'to_period', 'minute', 'weekday', 'hour', 'freqstr',
'floor', 'time', 'to_julian_date', 'days_in_month', 'daysinmonth']
- We can then use the index to find weekday names, similarly to what was done in step 2 of the preceding recipe:
>>> crime_sort.index.weekday_name.value_counts()
Monday 70024
Friday 69621
Wednesday 69538
Thursday 69287
Tuesday 68394
Saturday 58834
Sunday 55213
Name: REPORTED_DATE, dtype: int64
- Somewhat surprisingly, the groupby method has the ability to accept a function as an argument. This function will be implicitly passed the index and its return value is used to form groups. Let's see this in action by grouping with a function that turns the index into a weekday name and then counts the number of crimes and traffic accidents separately:
>>> crime_sort.groupby(lambda x: x.weekday_name)
['IS_CRIME', 'IS_TRAFFIC'].sum()
- You can use a list of functions to group by both the hour of day and year, and then reshape the table to make it more readable:
>>> funcs = [lambda x: x.round('2h').hour, lambda x: x.year]
>>> cr_group = crime_sort.groupby(funcs)
['IS_CRIME', 'IS_TRAFFIC'].sum()
>>> cr_final = cr_group.unstack()
>>> cr_final.style.highlight_max(color='lightgrey')