Masking

Now, both loc and simple square brackets accept masks. Mask can be represented by a Series, a NumPy array, or a simple list of Boolean values of the same length as the number of rows in the dataframe. If given, this collection will be interpreted as a mask—essentially, an explanation of which rows to return. For example, we can use our third column, z, as a mask to filter on. Because we only have a True value in the first row, a dataframe of one row will be returned:

>>> df[df['z']]
x y z new_column
1 2 b True -1

This is a very important technique, which we'll be using all the time! Such a mask can be generated using any logic operations, for example, an equality operator. Take a look: here, we are creating a mask by checking whether the values in column x are equal to 2:

>>> mask = df['x'] == 2
>>> mask
0 False
1 True
2 False
Name:x, dtype:bool

This mask can now be used to filter rows in our dataframe or any other one with the same indices. Only the second row will be retrieved—as only the second value in the masking series is true:

>>> df.loc[mask, 'y']
1 b
Name: y, dtype: object
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.34.105