Using decision trees to diagnose breast cancer

Now that we have built our first decision tree, it's time to turn our attention to a real dataset: the Breast Cancer Wisconsin dataset (https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)).

This dataset is a direct result of medical imaging research and is considered a classic today. The dataset was created from digitized images of healthy (benign) and cancerous (malignant) tissues. Unfortunately, I wasn't able to find any public-domain examples from the original study, but the images look similar to the following screenshot:

The goal of the research was to classify tissue samples into benign and malignant (a binary classification task).

To make the classification task feasible, the researchers performed feature extraction on the images, as we did in Chapter 4, Representing Data and Engineering Features. They went through a total of 569 images and extracted 30 different features that described the characteristics of the cell nuclei present in the images, including the following:

  • Cell nucleus texture (represented by the standard deviation of the grayscale values)
  • Cell nucleus size (calculated as the mean of distances from the center to points on the perimeter)
  • Tissue smoothness (local variation in radius lengths)
  • Tissue compactness

Let's put our newly gained knowledge to good use and build a decision tree to do the classification!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.239.44