How it works...

In step 2, the cut function places each value of the DIST column into one of five bins. The bins are created by a sequence of six numbers defining the edges. You always need one more edge than the number of bins. You can pass the bins parameter an integer, which automatically creates that number of equal-width bins. Negative infinity and positive infinity objects are available in NumPy and ensure that all values get placed in a bin. If you have values that are outside of the bin edges, they will be made missing and not be placed in a bin.

The cuts variable is now a Series of five ordered categories. It has all the normal Series methods and in step 3, the value_counts method is used to get a sense of its distribution.

Very interestingly, pandas allows you to pass the groupby method any object. This means that you are able to form groups from something completely unrelated to the current DataFrame. Here, we group by the values in the cuts variable. For each grouping, we find the percentage of flights per airline with value_counts by setting normalize to True.

Some interesting insights can be drawn from this result. Looking at the full result, SkyWest is the leading airline for under 200 miles but has no flights over 2,000 miles. In contrast, American Airlines has the fifth highest total for flights under 200 miles but has by far the most flights between 1,000 and 2,000 miles.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.164.34