Training an image classifier using Extremely Random Forests

We will use Extremely Random Forests (ERFs) to train our image classifier. An object recognition system uses an image classifier to classify the images into known categories. ERFs are very popular in the field of machine learning because of their speed and accuracy. We basically construct a bunch of decision trees that are based on our image signatures, and then train the forest to make the right decision. You can learn more about random forests at https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm. You can learn about ERFs at http://www.montefiore.ulg.ac.be/~ernst/uploads/news/id63/extremely-randomized-trees.pdf.

How to do it…

  1. Create a new Python file, and import the following packages:
    import argparse 
    import cPickle as pickle 
    
    import numpy as np
    from sklearn.ensemble import ExtraTreesClassifier
    from sklearn import preprocessing
  2. Define an argument parser:
    def build_arg_parser():
        parser = argparse.ArgumentParser(description='Trains the classifier')
        parser.add_argument("--feature-map-file", dest="feature_map_file", required=True,
    help="Input pickle file containing the feature map")
        parser.add_argument("--model-file", dest="model_file", required=False,
    help="Output file where the trained model will be stored")
        return parser
  3. Define a class to handle ERF training. We will use a label encoder to encode our training labels:
    class ERFTrainer(object):
        def __init__(self, X, label_words):
            self.le = preprocessing.LabelEncoder()  
            self.clf = ExtraTreesClassifier(n_estimators=100, max_depth=16, random_state=0)
  4. Encode the labels and train the classifier:
            y = self.encode_labels(label_words)
            self.clf.fit(np.asarray(X), y)
  5. Define a function to encode the labels:
    def encode_labels(self, label_words):
        self.le.fit(label_words) 
        return np.array(self.le.transform(label_words), dtype=np.float32)
  6. Define a function to classify an unknown datapoint:
    def classify(self, X):
        label_nums = self.clf.predict(np.asarray(X))
        label_words = self.le.inverse_transform([int(x) for x in label_nums]) 
        return label_words
  7. Define the main function and parse the input arguments:
    if __name__=='__main__':
        args = build_arg_parser().parse_args()
        feature_map_file = args.feature_map_file
        model_file = args.model_file
  8. Load the feature map that we created in the previous recipe:
        # Load the feature map
        with open(feature_map_file, 'r') as f:
            feature_map = pickle.load(f)
  9. Extract the feature vectors:
        # Extract feature vectors and the labels
        label_words = [x['object_class'] for x in feature_map]
        dim_size = feature_map[0]['feature_vector'].shape[1]  
        X = [np.reshape(x['feature_vector'], (dim_size,)) for x in feature_map]
  10. Train the ERF, which is based on the training data:
        # Train the Extremely Random Forests classifier
        erf = ERFTrainer(X, label_words) 
  11. Save the trained ERF model, as follows:
        if args.model_file:
            with open(args.model_file, 'w') as f:
                pickle.dump(erf, f)
  12. The full code is given in the trainer.py file that is provided to you. You should run the code in the following way:
    $ python trainer.py --feature-map-file feature_map.pkl --model-file erf.pkl
    

    This will generate a file called erf.pkl. We will use this file in the next recipe.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.58.182.29