How it works...

Our goal is to determine the distribution of members among the five largest data science meetup groups in Houston over time. To do this, we need to find the total membership at every point in time since each group began. We have the exact date and time when each person joined each group. In step 2, we group by each week (offset alias W) and meetup group and return the number of sign-ups for that week with the size method.

The resulting Series is not suitable to make plots with pandas. Each meetup group needs its own column, so we reshape the group index level as columns. We set the option fill_value to zero so that groups with no memberships during a particular week will not have missing values.

We are in need of the total number of members each week. The cumsum method in step 4 provides this for us. We could create our stacked area plot directly after this step, which would be a nice way to visualize the raw total membership. In step 5, we find the distribution of each group as a percentage of the total members in all groups by dividing each value by its row total. By default, pandas automatically aligns objects by their columns, so we cannot use the division operator. Instead, we must use the div method to change the axis of alignment to the index

The data is now perfectly suited for a stacked area plot, which we create in step 6. Notice that pandas allows you to set the axis limits with a datetime string. This will not work if done directly in matplotlib using the ax.set_xlim method. The starting date for the plot is moved up a couple years because the Houston R Users group began much earlier than any of the other groups.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.123.147