The Doctor Will See You Now

We have, so far, used deep networks for image, text, and time series processing. While most of our examples were interesting and relevant, they weren't enterprise-grade. Now, we'll tackle an enterprise-grade problem—medical diagnosis. We make the enterprise-grade designation because medical data has attributes one does not typically deal with outside large enterprises, namely proprietary data formats, large native sizes, inconvenient class data, and atypical features.

In this chapter, we will cover the following topics:

  • Medical imaging files and their peculiarities
  • Dealing with large image files
  • Extracting class data from typical medical files
  • Applying networks "pre-trained" with non-medical data
  • Scaling training to accommodate the scale typically with medical data

Obtaining medical data is a challenge on its own, so we'll piggyback on a popular site all readers should become familiarized with—Kaggle. While there are a good number of medical datasets freely available, most require an involved sign-up process to even access them. Many are only publicized in specific sub-communities of the medical image processing field, and most have bespoke submission procedures. Kaggle is probably the most normalized source for a significant medical imaging dataset as well as non-medical ones you can try your hand on. We'll focus specifically on Kaggle's Diabetic Retinopathy Detection challenge.

The dataset has a training set and a blind test set. The training set is used for, of course, training our network, and the test set is used to submit our results using our network on the Kaggle website.

As the data is quite large (32 GB for the training set and 49 GB for the test set), both of them are divided into multiple ZIP files of about 8 GB.

The test set here is blind—we don't know their labels. This is for the purpose of having fair submissions of the test set results from our trained network.

As far as the training set goes, its labels are present in the trainLabels.csv file.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.130.199