Ensembles with Shark-ML

There is only one ensemble learning algorithm in the Shark-ML library, which is the random forest, and it can be trained only for solving classification tasks. So, for this set of samples, we will use the Breast Cancer Wisconsin (Diagnostic) dataset located at https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic). It is taken from Dua, D. and Graff, C. (2019) UCI Machine Learning Repository, Irvine, CA: University of California, School of Information and Computer Science: http://archive.ics.uci.edu/ml

There are 569 instances in this dataset, and each instance has 32 attributes: the ID, the diagnosis, and 30 real-value input features. The diagnosis can have two values: M = malignant, and B = benign. Other attributes have 10 real-value features computed for each cell nucleus, as follows:

  • Radius (mean distances from the center to the perimeter)
  • Texture (standard deviation of grayscale values)
  • Perimeter
  • Area
  • Smoothness (local variation in radius lengths)
  • Compactness
  • Concavity (severity of concave portions of the contour)
  • Concave points (number of concave portions of the contour)
  • Symmetry
  • Fractal dimension (coastline approximation—1)
This dataset can be used for a binary classification task.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.118.229