Building datasets from base features

The first step is to load our base features and create our train, validation and test datasets. For this, we will need to load our base features and labels from disk:

features = joblib.load('base_features.pkl') 
labels = joblib.load('dataset_labels.pkl') 
data = np.array(list(zip(features, labels))) 
features.shape, labels.shape 

((30500, 64, 64, 3), (30500,))

We will now randomly shuffle our data and create our train, validation, and test datasets:

np.random.shuffle(data) 
train, validate, test = np.split(data, [int(.6*len(data)),int(.8*len(data))]) 
train.shape, validate.shape, test.shape 

((18300, 2), (6100, 2), (6100, 2))

Finally, we can also check the per class distribution in each of these datasets using the following snippet:

print('Train:', Counter(item[1] for item in train),'nValidate:', Counter(item[1] for item in validate),'nTest:',Counter(item[1] for item 
        in test)) 

Train: Counter({9: 2448, 2: 2423, 0: 2378, 5: 2366, 8: 2140,  
                7: 2033, 4: 2020, 3: 1753, 1: 542, 6: 197})  
Validate: Counter({0: 802, 5: 799, 2: 774, 9: 744, 8: 721,  
                   7: 705, 4: 688, 3: 616, 1: 183, 6: 68})  
Test: Counter({0: 813, 9: 808, 2: 750, 8: 750, 5: 745, 7: 735,  
               4: 697, 3: 543, 1: 188, 6: 71})

Thus, we can see a consistent and uniform distribution of data points per class across the datasets.

Table of Contents for Building datasets from base features

Create new playlist

Sign In

Sign Up

Table of Contents for
Building datasets from base features