Building datasets from base features

The first step is to load our base features and create our train, validation and test datasets. For this, we will need to load our base features and labels from disk:

features = joblib.load('base_features.pkl') 
labels = joblib.load('dataset_labels.pkl') 
data = np.array(list(zip(features, labels))) 
features.shape, labels.shape 

((30500, 64, 64, 3), (30500,))

We will now randomly shuffle our data and create our train, validation, and test datasets:

np.random.shuffle(data) 
train, validate, test = np.split(data, [int(.6*len(data)),int(.8*len(data))]) 
train.shape, validate.shape, test.shape 

((18300, 2), (6100, 2), (6100, 2))

Finally, we can also check the per class distribution in each of these datasets using the following snippet:

print('Train:', Counter(item[1] for item in train),'nValidate:', Counter(item[1] for item in validate),'nTest:',Counter(item[1] for item 
in test))

Train: Counter({9: 2448, 2: 2423, 0: 2378, 5: 2366, 8: 2140, 7: 2033, 4: 2020, 3: 1753, 1: 542, 6: 197}) Validate: Counter({0: 802, 5: 799, 2: 774, 9: 744, 8: 721, 7: 705, 4: 688, 3: 616, 1: 183, 6: 68}) Test: Counter({0: 813, 9: 808, 2: 750, 8: 750, 5: 745, 7: 735, 4: 697, 3: 543, 1: 188, 6: 71})

Thus, we can see a consistent and uniform distribution of data points per class across the datasets.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.28.93