Breast cancer detection

The breast is made up of a set of glands and adipose tissue, and is placed between the skin and the chest wall. In fact, it is not a single gland but a set of glandular structures, called lobules, joined together to form a lobe. In a breast, there are 15 to 20 lobes. The milk reaches the nipple from the lobules through small tubes called milk ducts.

Breast cancer is a potentially serious disease if it is not detected and treated for a long time. It is caused by uncontrolled multiplication of some cells in the mammary gland that are transformed into malignant cells. This means that they have the ability to detach themselves from the tissue that has generated them to invade the surrounding tissues and eventually other organs of the body. In theory, cancers can be formed from all types of breast tissues, but the most common ones are from glandular cells or from those forming the walls of the ducts.

The objective of this example is to identify each of a number of benign or malignant classes. To do this, we will use the data contained in the dataset named Breast Cancer (Wisconsin Breast Cancer database). This data has been taken from databases in the UCI Machine Learning Repository; DNA samples arrive periodically as Dr. Wolberg reports his clinical cases. The database therefore reflects this chronological grouping of the data. This grouping information appears immediately, having been removed from the data itself. Each variable except the first was converted into 11 primitive numerical attributes, with values ranging from 0 through 10.

To get the data, we draw on the large collection of data available in the UCI Machine Learning Repository at: http://archive.ics.uci.edu/ml.

The data frames contain 699 observations on 11 variables:

  • Id: Sample code number

  • Cl.thickness: Clump thickness

  • Cell.size: Uniformity of cell size

  • Cell.shape: Uniformity of cell shape

  • Marg.adhesion: Marginal adhesion

  • Epith.c.size: Single epithelial cell size

  • Bare.nuclei: Bare nuclei

  • Bl.cromatin: Bland chromatin

  • Normal.nucleoli: Normal nucleoli

  • Mitoses: Mitoses

  • Class: Class (0 = benign, 1 = malignant)

As we said before, the essential steps of a project are listed here:

  1. Get the data
  2. Prepare the data
  3. Train the model
  4. Score and evaluate the model

So the first thing to do is recover the data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.240.222