We had experience working with matplotlib already—first, in Chapter 3, Functions, and later, in Chapter 8, Simulation with Classes and Inheritance. Luckily, pandas has a built-in interface for working with matplotlib, making visualization very easy and intuitive. But first, we need to prepare the Notebook to display charts:
%matplotlib inline
import pylab as plt
plt.style.use('fivethirtyeight')
Now, let's plot the histogram of total casualties. It's possible to do that with the matplotlib itself—but pandas has a simple interface built-in. For better or worse, we still have to set labels and titles via a standard matplotlib interface—in this case, as follows:
data.loc[~mask, 'killed total'].hist(bins=20, figsize=(10,10))
plt.suptitle('Histogram, overall casualties per battle')
plt.xlabel("killed")
plt.ylabel("frequency")
plt.tight_layout();
We will get the following result:
We were able to gather a general understanding of the data, but some pieces are still missing. To drill down and understand the properties of significant subsets—for example, several events in a period of time or average casualties for specific fronts and operations—we need to aggregate our data.