Permutation and random sampling

Well, now we have some more mathematical terms to learn: permutation and random sampling. Let's examine how we can perform permutation and random sampling using the pandas library:

With NumPy's numpy.random.permutation() function, we can randomly select or permute a series of rows in a dataframe. Let's understand this with an example:

dat = np.arange(80).reshape(10,8)
df = pd.DataFrame(dat)
df

And the output of the preceding code is as follows:

Next, we call the np.random.permutation() method. This method takes an argument – the length of the axis we require to be permuted – and gives an array of integers indicating the new ordering:

sampler = np.random.permutation(10)
sampler

The output of the preceding code is as follows:

array([1, 5, 3, 6, 2, 4, 9, 0, 7, 8])

The preceding output array is used in ix-based indexing for the take() function from the pandas library. Check the following example for clarification:

df.take(sampler)

The output of the preceding code is as follows:

It is essential that you understand the output. Note that our sampler array contains array([1, 5, 3, 6, 2, 4, 9, 0, 7, 8]). Each of these array items represents the rows of the original dataframe. So, from the original dataframe, it pulls the first row, then the fifth row, then the third row, and so on. Compare this with the original dataframe output and it will make more sense.

Table of Contents for Permutation and random sampling

Create new playlist

Sign In

Sign Up

Table of Contents for
Permutation and random sampling