Visualizing the human body using imaging studies is a vital part of a patient’s care journey. The generation, storage, and interpretation of these images are widely known as medical imaging. There are multiple types of medical images that address different diagnostic needs of patients. Some of the common types include X-ray, ultrasound, computerized tomography (CT), and magnetic resonance imaging (MRI). Each of these types of imaging studies needs specific equipment, lab setup, and trained professionals who can operate the equipment. Radiologists are medical professionals who specialize in diagnosing medical conditions using medical images. A radiologist also performs imaging studies using specialized equipment and is able to generate and interpret radiology reports. The rate of growth in imaging studies has continued to rise since the early 2000s. However, since these studies are costly, it is important to exercise caution when prescribing them and they should only be performed when really necessary. Moreover, ordering unnecessary imaging studies will add to individuals’ workloads, and may cause radiologists to get burned out.
ML can help radiologists become more efficient in the interpretation of medical images. Computer vision models can interpret the information in medical images and support radiologists by helping them make clinical decisions faster and more accurately. In addition, radiology equipment such as MRI machines and CT scanners are also getting smarter through the use of ML. Models embedded directly onto these devices allow them to perform better and triage image quality locally even before a radiologist looks at it. This pattern is common in a large category of medical devices, such as glucometers and electrocardiogram (ECG), that utilize the Internet of Things (IoT) to run ML-based inference either locally or on data aggregated from these devices in a central data store.
In this chapter, we will get an understanding of medical devices and how smart connected devices are revolutionizing healthcare. We will then dive into the details of the medical imaging system components. We will look at the different components in the workflow and understand the role that each of those components plays in the overall workflow. We will also look at how ML can optimize medical device workflows. Finally, we will use Amazon SageMaker to build a medical image classification model. These topics are shown in the following sections:
The following are the technical requirements that you need to complete before building the example implementation at the end of this chapter:
Note
At the step where you need to choose the notebook instance type, please select ml.p2.xlarge.
$cd Sagemaker
$git clone https://github.com/PacktPublishing/Applied-Machine-Learning-for-Healthcare-and-Life-Sciences-using-AWS.git
You should now see a folder named Applied-Machine-Learning-for-Healthcare-and-Life-Sciences-using-AWS.
Note
If you have already cloned the repository in a previous exercise, you should already have this folder. You do not need to do step 6 again.
Once you have completed these steps, you should be all set to execute the steps in the example implementation in the last section of this chapter.
The medical device industry comprises a wide range of organizations that do research, development manufacturing, and distribute devices and associated technologies that help prevent, diagnose, and treat medical conditions. It’s important to note that the term medical device has wide applicability. It includes things you normally get over the counter, such as a Band-Aid or a testing kit, but it also includes highly specialized instruments, such as CT scanners and respirators. The key differentiator for a medical device is that it achieves its purpose due to its physical structure instead of a chemical reaction.
The US medical devices industry accounts for billions of dollars in exports, which has continued to rise over the years. In the US, the Food and Drug Administration (FDA) Center for Medical Devices and Radiological Health (CDRH) is the federal regulating agency that authorizes the use of medical devices. It also monitors adverse events associated with medical device use and alerts consumers and medical professionals as needed. The process of regulating a medical device depends on its classification. Let’s look at the different classes of medical devices as defined by the FDA.
Due to the wide applicability of the term, medical devices are categorized into different classes based on the risk they pose. There are three classes for a medical device:
All medical devices must be registered and listed with the FDA. To make it easy for consumers to find the appropriate classification of a medical device, the FDA provides a classification database (https://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfpcd/classification.cfm) to search using the part number or the name of the device. To learn more about the classification of a medical device, visit the following link: https://www.fda.gov/medical-devices/overview-device-regulation/classify-your-medical-device.
In the US, the CDRH determines the approval process for the medical device, depending on the classification it belongs to. The process may include steps such as the submission of premarket approval and clinical evidence via a clinical trial that has evidence that the device is safe to be used on humans.
Let us now look at a special category of medical devices that involve computer software, also known as software as a medical device (SaMD).
Software forms an integral part of a modern medical device. The software could be a part of the medical device. It may also be related to the support, maintenance, or manufacturing of the medical device itself. A third category of software related to medical devices is known as SaMD. SaMD is a type of software intended to be used for medical purposes without being part of the hardware of the medical device. It provides a wide-range of applications for medical technology, from off-the-self solutions to custom-built solutions for specific scenarios.
The FDA has been working over the last few years to develop or modernize regulatory standards and the approval process to cater to the needs of the digital health market enabled via the use of SaMD. In 2017, the FDA released new guidelines for evaluating SaMD applications that propose an accelerated review of digital health products. The revised framework for the approval of SaMD proposes a regulatory development kit (RDK) containing tools to clarify the requirements at each stage of the approval process and provide templates and guidance for frequently asked questions. It also proposes a precertification program to accelerate the approval of SaMD for organizations that meet certain criteria of excellence. To learn more about the clinical evaluation of SaMDs, visit the following link: https://www.fda.gov/regulatory-information/search-fda-guidance-documents/software-medical-device-samd-clinical-evaluation.
In this section, we were introduced to medical devices and how they are regulated for use. One of the common uses of medical devices is in the radiology department. This department is responsible for generating, storing, and interpreting different modalities of radiology images. Let’s understand the different components of a radiology imaging system and the role they play in the imaging workflow.
The need for medical imaging in a patient’s care journey depends on the specialty that is treating them. For example, a cardiology department may need a medical image-driven diagnosis at a different point in time in patient care from the gastroenterology department. This makes standardizing the medical imaging workflow really difficult. However, there are some standard components that are common across all medical imaging systems. These components work in sync with each other to create a workflow specific to the specialty in question. Let us look at these components in more detail:
Now that we understand the key components of the medical imaging system and the various associated medical devices, let us dive into how ML can be applied to such workflows.
Unlike traditional software, ML models evolve over time and improve as they interact with real-world data. Also, due to the probabilistic nature of the models, it is likely that the output of these models will change as the statistics behind the data shift. This poses a challenge in applying these models for regulated medical workflows because the medical decision-making process needs to be consistent and supported by the same evidence over and over again. Moreover, the results of an ML model aiding in a clinical decision-making process need to be explainable. In other words, we cannot treat the model as a “black box”; we need to understand its inner workings and explain its behavior in specific scenarios.
In spite of these challenges, the FDA recognizes that AI/ML has the potential to transform healthcare due to its ability to derive insights from vast amounts of data generated in healthcare practice every day. In 2019, the FDA published a discussion paper (https://www.fda.gov/media/122535/download) that proposes a regulatory framework for modifications to AI/ML-based SaMD. This is a welcome initiative as it recognizes the need for modernization in medical devices and associated software. Let us now look at a few examples of how ML can be applied to medical devices and radiology imaging:
Now that we have an understanding of the medical device industry, its workflows, and the different components of a radiology imaging system, let us now use SageMaker training to train an image classification model for diagnosing pneumonia. Before we do that, we will revise the basics of SageMaker training so we can better understand the implementation steps.
In Chapter 5, you learned about SageMaker Studio and how to process data and train a model using SageMaker Data Wrangler and SageMaker Studio notebooks, respectively. Studio notebooks are great for experimentation on smaller datasets and testing your training scripts locally before running them on the full dataset. While SageMaker Studio notebooks provide a choice of GPU-powered accelerated computing, it is sometimes more cost effective to run the training as a job outside the notebook. It is also an architectural best practice to decouple the development environment from the training environment so they can scale independently from each other. This is where SageMaker training comes in. Let us now understand the basics of SageMaker training.
SageMaker provides options to scale your training job in a managed environment decoupled from your development environment (such as SageMaker Studio notebooks). This is done using a container-based architecture and involves orchestrating batch jobs in containers on SageMaker ML compute instances. Let us look at a diagram to understand how training jobs are run on SageMaker:
Figure 6.1 – SageMaker training architecture
As shown in Figure 6.1, SageMaker training is a container-enabled environment that allows you to run training in a managed training environment. The user provides training data that is stored on S3. In addition, depending on the way you run training (covered in the next section), you may need to provide a Docker URI from the container registry and a training script. The user also provides the SageMaker ML instance type and the count of instances where this training should be run. Once you provide these attributes, SageMaker training downloads the training container image and the code along with the training data on the SageMaker ML instance of choice. It then executes the training following the training script provided. You can monitor the execution of the training job using the status of the training job. During execution, SageMaker outputs the logs from the training job on CloudWatch Logs. Once the model is trained, it is stored on S3 at a predefined path. It can then be downloaded for deployment on your own environment or using SageMaker inference.
Let us now look at various ways in which you can run training on SageMaker.
SageMaker provides multiple options when it comes to running training jobs. Here is a summary of the options:
Once the training is over, the trained model is stored on S3.
This option is ideal for highly specialized training jobs that may need you to modify the underlying environment or install specialized libraries. You have control over how you build your container and the format of the input and output. At this point, SageMaker is acting as an orchestrator for your job that you can execute on an instance of your choice.
Now that you have a good understanding of SageMaker training, let us use it to train a computer vision model for medical image classification.
One common application of ML in medical imaging is for classifying images into different categories. These categories can consist of different types of diseases determined by visual biomarkers. It can also recognize anomalies in a broad group of images and flag the ones that need further investigation by a radiologist. Having a classifier automate the task of prescreening the images reduces the burden on radiologists, who can concentrate on more complex and nuanced cases requiring expert intervention. In this exercise, we will train a model on SageMaker to recognize signs of pneumonia in a chest X-ray. Let us begin by acquiring the dataset.
The Chest X-Ray Images (Pneumonia) dataset is available on the Kaggle website here: https://www.kaggle.com/datasets/paultimothymooney/chest-xray-pneumonia?resource=download.
The dataset consists of 5,863 chest X-ray images in JPEG format organized into three subfolders (train, test, and val). The images are organized into folders named NORMAL and PNEUMONIA:
data/
test/
NORMAL/
PNEUMONIA/
train/
NORMAL/
PNEUMONIA/
val/
NORMAL/
PNEUMONIA/
zip -r chest_xray_train.zip train val
After this step, you should have the following directory structure on your notebook instance:
medical_image_classification.ipynb
scripts
ag_model.py
data
test
NORMAL
PNEUMONIA
You are now ready to begin training a medical image classification model on Amazon SageMaker.
Building, training, and evaluating the model
To train the image classification model, we will be using the AutoGluon framework on SageMaker. AutoGluon is an AutoML framework that is easy to use and extend. It provides the ability to train models using tabular, imaging, and text data. To learn more about AutoGluon, refer to the guide here: https://auto.gluon.ai/stable/index.html.
We will be using a prebuilt AutoGluon container to run training on SageMaker. We will then download the trained model to the notebook instance and run predictions on the model using a test dataset. This demonstrates the portability that SageMaker provides by allowing you to train on AWS but run inference locally.
Open the medical_image_classification.ipynb notebook and follow the steps in the notebook to complete the exercise.
At the end of the exercise, you will be able to see the confusion matrix shown in Figure 6.2:
Figure 6.2 – Sample confusion matrix for the medical image classifier
You will also see a ROC curve, as shown in Figure 6.3:
Figure 6.3 – Sample ROC curve for the medical image classifier
This concludes our exercise. Please make sure you stop or delete your SageMaker resources to avoid incurring charges as described at the following link: https://sagemaker-workshop.com/cleanup/sagemaker.html.
In this chapter, we gained an understanding of the medical device industry and the regulatory aspects of the industry designed to ensure the safe usage of these devices among patients. We also looked at the radiology image workflow and the various components involved in the system to make it work. We saw how ML models, when applied to medical devices, can improve the overall health of the population and prevent serious medical events. In the final sections of the chapter, we got an introduction to SageMaker training and went through an implementation exercise to train an ML model to identify a normal chest X-ray compared to one displaying pneumonia.
In Chapter 7, Applying Machine Learning to Genomics, we will look at how ML technology is transforming the world of clinical research. We will understand the role of genomic sequencing in precision medicine and look at examples that demonstrate why it’s important to consider genomic data in clinical diagnosis.
18.216.38.221