Random undersampling

In this method, you randomly undersample the majority class in a distribution to make it match the distribution of other minority classes you would like to predict. Your data can suffer from a probability of over predicting the under served classes. In the generative paradigm, undersampling can bias the model to over-represent minority classes.

We have a simple code example that we will go through in this section:

Import all of the necessary classes to random undersampling:

import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import make_classification
from sklearn.decomposition import PCA
from imblearn.under_sampling import RandomUnderSampler

Use scikit-learn to generate a dataset to demonstrate the random undersampling:

# Generate the dataset
X, y = make_classification(n_classes=2, class_sep=2, weights=[0.15, 
                           0.95],
                           n_informative=3, n_redundant=1, flip_y=0,
                           n_features=20, n_clusters_per_class=2,
                           n_samples=1000, random_state=10)

Instantiate a Principal Component Analysis (PCA), and PCA object and fit a transform:

pca = PCA(n_components=3)
X_vis = pca.fit_transform(X)

Use the RandomUnderSampler class and fit to the same—transform using PCA:

# Apply the random under-sampling
rus = RandomUnderSampler(return_indices=True)
X_resampled, y_resampled, idx_resampled = rus.fit_sample(X, y)
X_res_vis = pca.transform(X_resampled)

Create the basic plot for showing the new balanced data:

fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)

idx_samples_removed = np.setdiff1d(np.arange(X_vis.shape[0]),
                                   idx_resampled)

idx_class_0 = y_resampled == 0
plt.scatter(X_res_vis[idx_class_0, 0], X_res_vis[idx_class_0, 1],
            alpha=.8, label='Class #0')
plt.scatter(X_res_vis[~idx_class_0, 0], X_res_vis[~idx_class_0, 1],
            alpha=.8, label='Class #1')
plt.scatter(X_vis[idx_samples_removed, 0], X_vis[idx_samples_removed, 1],
            alpha=.8, label='Removed samples')

Add some additional parameters to clean up the plot:

# make nice plotting
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.get_xaxis().tick_bottom()
ax.get_yaxis().tick_left()
ax.spines['left'].set_position(('outward', 10))
ax.spines['bottom'].set_position(('outward', 10))
ax.set_xlim([-6, 6])
ax.set_ylim([-6, 6])

plt.title('Under-sampling using random under-sampling')
plt.legend()
plt.tight_layout()
plt.show()

Create a Dockerfile and install in the imbalanced-learn package:

FROM base_image

RUN pip install -U imbalanced-learn

ADD demo.py /demo.py

Create a run file:

#/bin/bash
nvidia-docker build -t ch2 . 

xhost +
docker run -it 
   --runtime=nvidia 
   --rm 
   -e DISPLAY=$DISPLAY 
   -v /tmp/.X11-unix:/tmp/.X11-unix 
   ch2 python demo.py

Run the code by issuing this command at the Terminal:

sudo ./run.sh

Following are the results from running this code:

Table of Contents for Random undersampling

Create new playlist

Sign In

Sign Up

Table of Contents for
Random undersampling