The stack() function

When stacking, a set of column labels get converted to an index level. To explore stacking further, let's use a DataFrame with a MultiIndex along the row-index and column-index:

multi_df = sales_data[["Sales", "Quantity", "Category", "ShipMode"]].groupby(["Category", "ShipMode"]).agg([np.sum, np.mean])
multi_df

The following will be the output:

Hierarchical data for stacking and unstacking

Applying stack() makes a wide DataFrame longer. Let's apply stack() on the preceding DataFrame. The column labels on the last level get added to the MultiIndex:

multi_df.stack()

The following will be the output:

Result of stacking

The stack() function accepts a level argument. In this case, the default level setting is 1. Let's try stacking at level 0:

multi_df.stack(level = 0)

The following will be the output:

Stacking using the level parameter

Instead of specifying level numbers, level names can also be specified when stacking. To stack multiple levels, a list of level names or level numbers can be passed to the level argument. However, the list cannot be a combination of both level names and level numbers:

multi_df.stack(level = [0,1])

The following will be the output:

Stacking multiple levels at once

Let's explore the attributes of the index after stacking. The index attribute of a DataFrame helps us understand the various levels, labels, and names of each index:

multi_df.stack(level = 0).index

The following will be the output:

Index properties after stacking

At times, stacking introduces missing values when there are no values for a certain combination of index and column name. Consider the following DataFrame:

multicol = pd.MultiIndex.from_tuples([('Male', 'M'),
('Female', 'F')])
missing_info = pd.DataFrame([[20, None], [34, 78]],
index=['ClassA', 'ClassB'],
columns=multicol)
missing_info

The following will be the output:

Handling missing values when stacking

Upon stacking, the dropna parameter of the stack function, which is set to True by default, automatically drops all NAs:

missing_info.stack(dropna = False)

The following will be the output:

dropna set to False when stacking

By default, it will drop the rows with all missing values, as shown following:

missing_info.stack()

The following will be the output:

Dropping NAs by default when stacking
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.102.235