Normalizing with Shogun

The shogun::CRescaleFeatures class in the Shogun library implements min-max normalization (or rescaling). We can reuse objects of this class for scaling different data with the same learned statistics. It can be useful in cases when we train a machine learning algorithm on one data format with applied rescaling, and then we use the algorithm for predictions on new data. To make this algorithm work as we want, we have to rescale new data in the same way as we did in the training process, as follows:

include <shogun/preprocessor/RescaleFeatures.h>
...
auto features = shogun::some<shogun::CDenseFeatures<DataType>>(inputs);
...
auto scaler = shogun::wrap(new shogun::CRescaleFeatures());
scaler->fit(features); // learn statistics - min and max values
scaler->transform(features); // apply scaling

To learn statistics values, we use the fit() method, and for features modification, we use the transform() method of the CRescaleFeatures class.

We can print updated features with the display_vector() method of the SGVector class, as follows:

auto features_matrix = features->get_feature_matrix();
for (int i = 0; i < n; ++i) {
std::cout << "Sample idx " << i << " ";
features_matrix.get_column(i).display_vector();
}

Some algorithms in the Shogun library can perform normalization of input data as an internal step of their implementation, so we should read the documentation to determine if manual normalization is required.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.69.163