Reading CSV files with the Shogun library

The Shogun library also has functionality for reading CSV files, and it also interprets them as numerical matrices only. So, to load a CSV file as a dataset with the Shogun library, we need to preprocess it and replace string values with numeric ones, as we did in an earlier section. We can load the CSV file with the Iris dataset to the matrix object, and then use this matrix to initialize the Shogun library dataset objects for use in machine learning algorithms. First of all, we need to include the required headers and make definitions for the helper types, as follows:

#include <shogun/base/init.h>
#include <shogun/base/some.h>
#include <shogun/io/File.h>

using namespace shogun;
using DataType = float64_t;
using Matrix = shogun::SGMatrix<DataType>

Then, we define the shogun::CCSVFile object to parse the dataset file. The initialized shogun::CCSVFile object is used for loading values into a matrix object, as illustrated in the following code snippet:

auto csv_file = shogun::some<shogun::CCSVFile>("iris_fix.csv");
Matrix data;
data.load(csv_file);

To be able to use this data for machine learning algorithms, we need to split this matrix object into two parts: one will contain training samples, and the second one will contain labels. The Shogun CSV parser loads matrices in the column-major order. So, to make the matrix look like the original file, we need to transpose, as illustrated in the following code snippet:

 Matrix::transpose_matrix(data.matrix, data.num_rows, data.num_cols);
Matrix inputs = data.submatrix(0, data.num_cols - 1); // make a view
inputs = inputs.clone(); // copy exact data
Matrix outputs = data.submatrix(data.num_cols - 1, data.num_cols);
// make a view
outputs = outputs.clone(); // copy exact data

Now, we have our training data in the inputs matrix object and labels in the outputs matrix object. To be able to use the inputs object in the Shogun algorithms, we need to transpose it back, because Shogun algorithms expect that training samples are placed in matrix columns. To do this, we run the following code:

Matrix::transpose_matrix(inputs.matrix, inputs.num_rows, inputs.num_cols);

We can use these matrices for initializing the shogun::CDenseFeatures and the shogun::CMulticlassLabels objects, which we can eventually use for the training of machine learning algorithms. To do this, we run the following code:

 auto features = shogun::some<shogun::CDenseFeatures<DataType>>(inputs);
auto labels =
shogun::wrap(new shogun::CMulticlassLabels(outputs.get_column(0)));

After initialization of these objects, we can print some statistics about training data, as follows:

std::cout << "samples num = " << features->get_num_vectors() << "
"
<< "features num = " << features->get_num_features() << std::endl;
auto features_matrix = features->get_feature_matrix();
// Show first 5 samples
for (int i = 0; i < 5; ++i) {
std::cout << "Sample idx " << i << " ";
features_matrix.get_column(i).display_vector();
}
std::cout << "labels num = " << labels->get_num_labels() << std::endl;
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.59.48.161