How it works...

All DataFrame columns containing Timestamps have access to numerous other attributes and methods with the dt accessor. In fact, all of these methods and attributes available from the dt accessor are also available directly from a single Timestamp object.

In step 2, we use the dt accessor, which only works on a Series, to extract the weekday name and simply count the occurrences. Before making a plot in step 3, we manually rearrange the order of the index with the reindex method, which, in its most basic use case, accepts a list containing the desired order. This task could have also been accomplished with the .loc indexer like this:

>>> wd_counts.loc[days]
Monday 70024 Tuesday 68394 Wednesday 69538 Thursday 69287 Friday 69621 Saturday 58834 Sunday 55213 Name: REPORTED_DATE, dtype: int64

The reindex method is actually more performant and has many parameters for more diverse situations than .loc. We then use the weekday_name attribute of the dt accessor to retrieve the name of each day of the week, and count the occurrences before making a horizontal bar plot.

In step 4, we do a very similar procedure, and retrieve the year using the dt accessor again, and then count the occurrences with the value_counts method. In this instance, we use sort_index over reindex, as years will naturally sort in the desired order.

The goal of the recipe is to group by both weekday and year together so this is exactly what we do in step 5. The groupby method is very flexible and can form groups in multiple ways. In this recipe, we pass it two Series, year and weekday, from which all unique combinations form a group. We then chain the size method to it, which returns a single value, the length of each group.

After step 5, our Series is long with only a single column of data, which makes it difficult to make comparisons by year and weekday. To ease the readability, we pivot the weekday level into horizontal column names with unstack.

In step 7, we use boolean indexing to select only the crimes in 2017 and then use dayofyear from the dt accessor again to find the total elapsed days from the beginning of the year. The maximum of this Series should tell us how many days we have data for in 2017.

Step 8 is quite complex. We first create a boolean Series by testing whether each crime was committed on or before the 272nd day of the year with crime['REPORTED_DATE'].dt.dayofyear.le(272). From here, we again use the flexible groupby method to form groups by the previously calculated year Series and then use the mean method to find the percentage of crimes committed on or before the 272nd day for each year.

The .loc indexer selects the entire 2017 row of data in step 9. We adjust this row by dividing by the median percentage found in step 8.

Lots of crime visualizations are done with heatmaps and one is done here in step 10 with the help of the seaborn visualization library. The cmap parameter takes a string name of the several dozen available matplotlib colormaps (http://bit.ly/2yJZOvt).

In step 12, we create a crime rate per 100k residents by dividing by the population of that year. This is actually a fairly tricky operation. Normally, when you divide one DataFrame by another, they align on their columns and index. However, in this step, crime_table has no columns in common denver_pop so no values will align if we try and divide them. To work around this, we create the den_100k Series with the squeeze method. We still can't simply divide these two objects as, by default, division between a DataFrame and a Series aligns the columns of the DataFrame with the index of the Series, like this:

>>> crime_table / den_100k

We need the index of the DataFrame to align with the index of Series and to do this, we use the div method, which allows us to change the direction of alignment with the axis parameter. A heatmap of the adjusted crime rate is plotted in step 13.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.107.100