Reading CSV files with the Shark-ML library

Many machine learning frameworks already have routines for reading the CSV file format to their internal representations. In the following code sample, we show how to load a CSV file with the Shark-ML library to the ClassificationDataset object. The CSV parser in this library assumes that all values in a file have a numerical type only, so it is unable to read the original file with the Iris dataset we used in the previous example. However, in the previous section, we already fixed this problem by replacing string values with numeric ones, and we can use our new file named iris_fix.csv.

To read a CSV file with the Shark-ML library, we have to include corresponding headers, as follows:

#include <shark/Data/Csv.h>
#include <shark/Data/Dataset.h>
using namespace shark;

We can use the importCSV() method of the ClassificationDataset object to load the CSV data from a file. Notice that the last function's argument specifies which column in the dataset contains labels, as illustrated in the following code snippet:

ClassificationDataset dataset;
importCSV(dataset, "iris_fix.csv", LAST_COLUMN);

Then, we can use this object in machine learning algorithms provided by the Shark-ML library. Also, we can also print some statistics about the imported dataset, as follows:

std::size_t classes = numberOfClasses(dataset);
std::cout << "Number of classes " << classes << std::endl;
std::vector<std::size_t> sizes = classSizes(dataset);
std::cout << "Class size: " << std::endl;
for (auto cs : sizes) {
std::cout << cs << std::endl;
}
std::size_t dim = inputDimension(dataset);
std::cout << "Input dimension " << dim << std::endl;
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.35.255