Random undersampling

In this method, you randomly undersample the majority class in a distribution to make it match the distribution of other minority classes you would like to predict. Your data can suffer from a probability of over predicting the under served classes. In the generative paradigm, undersampling can bias the model to over-represent minority classes.

We have a simple code example that we will go through in this section:

  1. Import all of the necessary classes to random undersampling:
import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import make_classification
from sklearn.decomposition import PCA
from imblearn.under_sampling import RandomUnderSampler
  1. Use scikit-learn to generate a dataset to demonstrate the random undersampling:
# Generate the dataset
X, y = make_classification(n_classes=2, class_sep=2, weights=[0.15,
0.95],
n_informative=3, n_redundant=1, flip_y=0,
n_features=20, n_clusters_per_class=2,
n_samples=1000, random_state=10)
  1. Instantiate a Principal Component Analysis (PCA), and PCA object and fit a transform:
pca = PCA(n_components=3)
X_vis = pca.fit_transform(X)
  1. Use the RandomUnderSampler class and fit to the same—transform using PCA:
# Apply the random under-sampling
rus = RandomUnderSampler(return_indices=True)
X_resampled, y_resampled, idx_resampled = rus.fit_sample(X, y)
X_res_vis = pca.transform(X_resampled)
  1. Create the basic plot for showing the new balanced data:
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)

idx_samples_removed = np.setdiff1d(np.arange(X_vis.shape[0]),
idx_resampled)

idx_class_0 = y_resampled == 0
plt.scatter(X_res_vis[idx_class_0, 0], X_res_vis[idx_class_0, 1],
alpha=.8, label='Class #0')
plt.scatter(X_res_vis[~idx_class_0, 0], X_res_vis[~idx_class_0, 1],
alpha=.8, label='Class #1')
plt.scatter(X_vis[idx_samples_removed, 0], X_vis[idx_samples_removed, 1],
alpha=.8, label='Removed samples')
  1. Add some additional parameters to clean up the plot:
# make nice plotting
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.get_xaxis().tick_bottom()
ax.get_yaxis().tick_left()
ax.spines['left'].set_position(('outward', 10))
ax.spines['bottom'].set_position(('outward', 10))
ax.set_xlim([-6, 6])
ax.set_ylim([-6, 6])

plt.title('Under-sampling using random under-sampling')
plt.legend()
plt.tight_layout()
plt.show()
  1. Create a Dockerfile and install in the imbalanced-learn package:
FROM base_image

RUN pip install -U imbalanced-learn

ADD demo.py /demo.py
  1. Create a run file:
#/bin/bash
nvidia-docker build -t ch2 .

xhost +
docker run -it
--runtime=nvidia
--rm
-e DISPLAY=$DISPLAY
-v /tmp/.X11-unix:/tmp/.X11-unix
ch2 python demo.py
  1. Run the code by issuing this command at the Terminal:
sudo ./run.sh
  1. Following are the results from running this code:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.74.231