Steve Kommrusch; Louis-Noël Pouchet Colorado State University, Fort Collins, CO, United States
Biomedical informatics as an emerging field has been fascinating talents from artificial intelligence and machine learning for its unique opportunities and challenges. Fast-growing biomedical and healthcare data have encompassed multiple scales ranging from molecules, cells, and individuals to populations, and have connected various entities in healthcare systems such as providers, pharma, and payers with increasing bandwidth, depth, and resolution. These data are becoming an enabling resource to harness for scientific knowledge discovery and clinical decision making. Meanwhile, the sheer volume and complexity of the data present major barriers toward their translation into effective clinical actions. In particular, biomedical data often feature large volumes, high dimensions, imbalance between classes, heterogeneous sources, noises, incompleteness, and rich contexts, which challenges the direct and immediate success of existing machine learning and optimization methods. For instance, deep learning methods have made notable advances for biomedical informatics needs, especially in processing brain-imaging data and making neuroscience discovery, although their utilities to more types of data in more biomedical informatics use-cases still awaits further assessment and development. Therefore, there is a compelling demand for novel algorithms, including machine learning, data mining and optimization, that specifically tackle the unique challenges associated with biomedical and healthcare data and allow decision-makers and stakeholders to better interpret and exploit the data [1–5].
One of the challenges of using machine learning techniques with medical data is the frequent dearth of source image data for training. In this chapter, we introduce automated lung cancer diagnosis as a representative example, where nodule images need to be classified as suspicious or benign. In this work we propose an automatic synthetic lung nodule image generator. Our 3D shape generator is designed to augment the variety of 3D images. Our proposed system takes root in autoencoder techniques, and we provide extensive experimental characterization that demonstrates its ability to produce quality synthetic images.
Autoencoder; Machine learning; Lung cancer; 3D image
This work was supported in part by the U.S. National Science Foundation award CCF-1750399.
Worldwide in 2017, lung cancer remained the leading cause of cancer deaths [6]. Computer aided diagnosis, where a software tool analyzes the patient's medical imaging results to suggest a possible diagnosis, is a promising direction: from an input low-resolution 3D CT scan, image processing techniques can be used to classify nodules in the lung scan as potentially cancerous or benign. But such systems require quality 3D training images to ensure the classifiers are adequately trained with sufficient generality. Cancerous lung nodule detection still suffers from a dearth of training images which hampers the ability to effectively automate and improve the analysis of CT scans for cancer risks [7]. In this work, we propose to address this problem by automatically generating synthetic 3D images of nodules, to augment the training dataset of such systems with meaningful (yet computer-generated) lung nodule images.
Li et al. showed how to analyze nodules using computed features from the 3D images (such as volume, degree of compactness and irregularity, etc.) [8]. These computed features are then used as inputs to a nodule classification algorithm. 2D lung nodule image generation has been investigated using generative adversarial networks (GANs) [9], reaching sufficient quality to be classified by radiologists as actual CT scan images. In our work, we aim to generate 3D lung nodule images which match the feature statistics of actual nodules as determined by an analysis program. We propose a new system inspired from autoencoders, and extensively evaluate its generative capabilities. Precisely, we introduce LuNG: a synthetic lung nodule generator, which is a neural network trained to generate new examples of 3D shapes that fit within a broad learned category.
Our work is aimed at creating synthetic images in cases where input images are difficult to get. For example, the Adaptive Lung Nodule Screening Benchmark (ALNSB) from the NSF Center for Domain-Specific Computing uses a flow that leverages compressive sensing to reconstruct images from low-dose CT scans. These images are slightly different than those built from filtered backprojection, a technique which has more samples readily available (such as LIDC/IDRI [10]). To evaluate our results, we integrate our work with the ALNSB system [11] that automatically processes a low-dose 3D CT scan, reconstructs a higher-resolution image, isolates all nodules in the 3D image, computes features on them, and classifies each nodule as benign or suspicious. We use original patient data to train LuNG, and then use it to generate synthetic nodules that are processed by ALNSB. We create a network which optimizes 3 metrics: (i) increase the percentage of generated images accepted by the nodule analyzer; (ii) increase the variation of the generated output images relative to the limited seed images; and (iii) decrease the error of the seed images with themselves when input to the autoencoder. We make the following contributions:
Improving automated CT lung nodule classification techniques and 3D image generation are areas that are receiving significant research attention.
Recently, Valente et al. provided a good overview of the requirements for CADe (Computer Aided Detection) systems in medical radiology and they evaluate the status of recent approaches [7]. Our aim is to provide a tool which can be used to improve the results of such CADe systems by both increasing the true positive rate (sensitivity) and decreasing the false positive rate of CADe classifiers through the use of an increase in nodules for analysis and training. Their survey paper discusses in detail the preprocessing, segmentation, and nodule detection steps similar to those used in the ALNSB nodule analyzer/classifier which we used in this project.
Li et al. provide an excellent overview of recent approaches to 3D shape generation in their paper “GRASS: generative recursive autoencoders for shape structures” [12]. While we do not explore the design of an autoencoder with convolutional and deconvolutional layers, the same image generation quality metrics that we teach could be used to evaluate such designs. Similar tradeoffs between overfitting and low error rates with seed images would have to be considered when setting the depth of the network and number of feature maps in the convolutional layers.
Durugkar et al. describe the challenges of training GANs well and discuss the advantages of multiple generative networks trained with multiple adversaries to improve the quality of images generated [13]. LuNG explored using multiple networks during image feedback experiments. Larsen et al. [14] teach a system which combines an autoencoder with a GAN which could be a basis for future work introducing GAN methodologies into the LuNG system by preserving our goal of generating shapes similar to existing seed shapes.
To begin, guided training is used in which each nodule is modified to create 15 additional training samples. We call the initial nodule set, of which we were provided 51 samples, the “seed” nodules, examples of which are shown in Fig. 11.2. The “base” nodules include image reflections and offsets to create 15 modified samples per seed nodule for a total of 816 samples. Fig. 11.1 shows the general structure of the LuNG system. The base nodules are used to train an autoencoder neural network with 3 latent feature neurons in the bottleneck layer. Images output by the neural network pass through a reconnection algorithm to guarantee that viable fully connected nodules are being generated for analysis by the nodule analyzer. A nodule analyzer program then extracts relevant 3D features from the nodules and prunes away nodules which can be rejected as not interesting for classification (definitely not a cancerous nodule). The analyzer has range checking on features which include nodule size, elongation, and surface area to volume ratio. The original CT scan nodule candidates pass through these checks before creating our 51 seed nodules, and the LuNG generated images are processed with the same criteria. The “analyzer accepted images” are the final output of LuNG and can be used to augment the image training set for a classifier. We use the ALNSB [11] nodule analyzer and classifier code for the LuNG project, but similar analyzers compute similar 3D features to aid in classification. The support vector machine is an example classifier to which LuNG can provide augmented data. Such augmented data is helpful in overcoming drawbacks in current lung nodule classification work [7]. To evaluate generated nodules, we develop a statistical distance metric similar to the Mahalanobis distance. Given the set of nodules output by LuNG, we explore adding them to the autoencoder training set to improve the generality of the generator. We also used the Score to evaluate various nodule reconnection options and network hyperparameters.
We chose an autoencoder architecture for LuNG because it can be split into both an encoder and decoder network for different use models. The encoder (or “feature network”) is the portion of the autoencoder that takes a 32,000 voxel image as input and outputs 3 bottleneck feature neuron values between −1 and 1. The decoder (or “generator network”) is the portion of the autoencoder that takes 3 feature neuron values between −1 to 1 as input and generates a 32,000 voxel output image. Thus, given 2 seed nodules, one can use the feature network to find their latent space coordinates and then step from one nodule to another with inputs to the generator network. We analyze using a uniform random value from −1 to 1 on the 3 feature neurons to generate random nodules which we analyze for augmenting the classifier training set.
While the LuNG use model relies on having both an encoder and decoder network as provided by autoencoder training, future work could merge our technique with a generative adversarial network to enhance the generator or test whether convolution/deconvolution layers can help improve our overall quality metrics [12].
Metrics for Scoring Images Our goals for LuNG are to generate images that have a high acceptance rate for the analyzer and a high variation relative to the seed images while minimizing the error of the network when a seed image is reproduced. We track the acceptance rate simply as the percentage of randomly generated nodules that are accepted by the analyzer. For a metric of variation, we compute a feature distance based on the 12 3D image features used in the analyzer. To track how well the distribution of output images matches the seed image means, we compute an FtMMSE based on the distribution means. The ability of the network to reproduce a given seed image is tracked with the mean squared error of the image output voxels, as is typical for autoencoder image training.
has some similarity to Mahalanobis distance, but finds the average over all the accepted images of the distance from the image to the closest seed image in the 12-dimensional analyzer feature space. As increases, the network is generating images that are less similar to specific samples in the seed images, hence it is a metric we want to increase with LuNG. Given an accepted set of n images Y and a set of 51 seed images S, and given denotes the value of feature i for an image and denotes the standard deviation of feature i within S,
FtMMSE tracks how much the 12 3D features have the same mean between the accepted set of images X and the seed images S. As FtMMSE increases, the network is generating average images that are increasing outside the typical seed image distribution, hence it is a metric we want to decrease with LuNG. Given is the mean of feature i in the set of seed images and is the mean of feature i in the final set of accepted images,
is our composite network scoring metric used to compare different networks, hyperparameters, feedback options, and reconnection options. In addition to and FtMMSE, we use AC, which is the fraction of generated images which the analyzer accepted, and MSE which is the traditional mean squared error which results when the autoencoder is used to regenerate the 51 seed nodule images:
increases as or AC increase and decreases when FtMMSE or MSE increase. The constants in the equation are based on qualitative assessments of network results; for example, using means that MSE values below 0.1 don't override the contribution of other components and mathematically aligns with the qualitative statement that an MSE of 0.1 yielded acceptable images visually in comparison with the seed images.
By using the trained encoder network to find the latent feature coordinates for seed nodules 2 and 4, Fig. 11.3 shows 6 steps between these nodules. The top and bottom nodules in the image can be seen to accurately reproduce seed nodules 2 and 4 from Fig. 11.2. The 4 intermediate nodules are novel images from the generator which can be used to improve an automated classifier system.
In addition to stepping through latent feature space values, we use our metric to evaluate a full nodule generation system that includes using uniform random values between −1 and 1 as inputs to the generator network, then sends images through the reconnection algorithm and analyzer for acceptance of the images for analysis. Fig. 11.4 shows the components of the score for the final parameter analysis we did on the network. Note that the MSE metric (mean squared error of the network on training set) continues to decrease with larger networks, but the maximum we are measuring occurs with 3 bottleneck latent feature neurons.
Our metric is used to evaluate our system as a whole and it can be used to explore variations to autoencoder training. We tested multiple methodes of using accepted images to augment the autoencoder training set, but ultimately found that such images resulted in confirmation bias that diminished the total system score as shown in Fig. 11.5. Exploring such system options resulted in a system with a that incorporates knowledge of actual nodule shapes from the seed nodule set, expert domain knowledge as encoded in the analyzer acceptance criteria, and machine learning researcher knowledge as represented by network size and system features for LuNG.
As an example of the nodule metrics we use to evaluate network architectures and interface options. We analyze the 12 3D features computed by the nodule analyzer (ALNSB). Table 11.1 shows that when 400 novel random images are generated by LuNG, the mean feature value for all 12 3D features stays within 30% of the seed nodules. (When image feedback is used in the training set, we see the mean of the generated images tend to be further from the seed nodule mean for any given feature; hence our conclusion that confirmation bias harms the for systems that used image feedback.) Based on this alignment, we plot SVM distance values for 1000 nodules and the 51 seed nodules in Fig. 11.6. After the support vectors are applied, nodules closer to the positive than the negative centroid are classified as suspicious. The results show that LuNG generated nodules augment the available nodules for analysis well, including providing many nodules which are near the existing boundary and can be useful for improving the sensitivity of the classifier. For example, by having a trained radiologist classify a subset of 100 of the 1000 generated nodules which are near the current classifier boundary or on the “cancerous” side of the boundary, a more balanced and varied classifier training set could be produced which would improve classification.
Table 11.1
Analyzer feature | Ratio |
---|---|
2D area | 1.1 |
2D max(xL,yL) | 1.0 |
2D perimeter | 1.2 |
2D area/peri2 | 0.8 |
3D volume | 1.3 |
3D rad/MeanSqDis | 1.0 |
min(xL,yL)/max(xL,yL) | 1.0 |
minl/maxl | 1.0 |
surface area3/volume2 | 1.2 |
mean breadth | 1.3 |
euler3D | 1.1 |
maskTem area/peri2 | 1.0 |
To produce quality image classifiers, machine learning requires a large set of training images. This poses a challenge for application areas where such training sets are rare if they exist, such as for computer-aided diagnosis of cancerous lung nodules.
In this work we developed LuNG, a lung nodule image generator, allowing us to augment the training dataset of image classifiers with meaningful (yet computer-generated) lung nodule images. Specifically, we have developed an autoencoder-based system that learns to produce 3D images that resembles the original training set, while covering adequately the feature space. Our tool, LuNG, was developed using PyTorch and is fully implemented. We have shown that the 3D nodules generated by this process visually and numerically align well with the general image space presented by the limited set of seed images.
18.216.44.143