Chapter 11

Biomedical Image Analytics: Automated Lung Cancer Diagnosis

Steve Kommrusch; Louis-Noël Pouchet    Colorado State University, Fort Collins, CO, United States

Abstract

Biomedical informatics as an emerging field has been fascinating talents from artificial intelligence and machine learning for its unique opportunities and challenges. Fast-growing biomedical and healthcare data have encompassed multiple scales ranging from molecules, cells, and individuals to populations, and have connected various entities in healthcare systems such as providers, pharma, and payers with increasing bandwidth, depth, and resolution. These data are becoming an enabling resource to harness for scientific knowledge discovery and clinical decision making. Meanwhile, the sheer volume and complexity of the data present major barriers toward their translation into effective clinical actions. In particular, biomedical data often feature large volumes, high dimensions, imbalance between classes, heterogeneous sources, noises, incompleteness, and rich contexts, which challenges the direct and immediate success of existing machine learning and optimization methods. For instance, deep learning methods have made notable advances for biomedical informatics needs, especially in processing brain-imaging data and making neuroscience discovery, although their utilities to more types of data in more biomedical informatics use-cases still awaits further assessment and development. Therefore, there is a compelling demand for novel algorithms, including machine learning, data mining and optimization, that specifically tackle the unique challenges associated with biomedical and healthcare data and allow decision-makers and stakeholders to better interpret and exploit the data [15].

One of the challenges of using machine learning techniques with medical data is the frequent dearth of source image data for training. In this chapter, we introduce automated lung cancer diagnosis as a representative example, where nodule images need to be classified as suspicious or benign. In this work we propose an automatic synthetic lung nodule image generator. Our 3D shape generator is designed to augment the variety of 3D images. Our proposed system takes root in autoencoder techniques, and we provide extensive experimental characterization that demonstrates its ability to produce quality synthetic images.

Keywords

Autoencoder; Machine learning; Lung cancer; 3D image

Acknowledgements

This work was supported in part by the U.S. National Science Foundation award CCF-1750399.

11.1 Introduction

Worldwide in 2017, lung cancer remained the leading cause of cancer deaths [6]. Computer aided diagnosis, where a software tool analyzes the patient's medical imaging results to suggest a possible diagnosis, is a promising direction: from an input low-resolution 3D CT scan, image processing techniques can be used to classify nodules in the lung scan as potentially cancerous or benign. But such systems require quality 3D training images to ensure the classifiers are adequately trained with sufficient generality. Cancerous lung nodule detection still suffers from a dearth of training images which hampers the ability to effectively automate and improve the analysis of CT scans for cancer risks [7]. In this work, we propose to address this problem by automatically generating synthetic 3D images of nodules, to augment the training dataset of such systems with meaningful (yet computer-generated) lung nodule images.

Li et al. showed how to analyze nodules using computed features from the 3D images (such as volume, degree of compactness and irregularity, etc.) [8]. These computed features are then used as inputs to a nodule classification algorithm. 2D lung nodule image generation has been investigated using generative adversarial networks (GANs) [9], reaching sufficient quality to be classified by radiologists as actual CT scan images. In our work, we aim to generate 3D lung nodule images which match the feature statistics of actual nodules as determined by an analysis program. We propose a new system inspired from autoencoders, and extensively evaluate its generative capabilities. Precisely, we introduce LuNG: a synthetic lung nodule generator, which is a neural network trained to generate new examples of 3D shapes that fit within a broad learned category.

Our work is aimed at creating synthetic images in cases where input images are difficult to get. For example, the Adaptive Lung Nodule Screening Benchmark (ALNSB) from the NSF Center for Domain-Specific Computing uses a flow that leverages compressive sensing to reconstruct images from low-dose CT scans. These images are slightly different than those built from filtered backprojection, a technique which has more samples readily available (such as LIDC/IDRI [10]). To evaluate our results, we integrate our work with the ALNSB system [11] that automatically processes a low-dose 3D CT scan, reconstructs a higher-resolution image, isolates all nodules in the 3D image, computes features on them, and classifies each nodule as benign or suspicious. We use original patient data to train LuNG, and then use it to generate synthetic nodules that are processed by ALNSB. We create a network which optimizes 3 metrics: (i) increase the percentage of generated images accepted by the nodule analyzer; (ii) increase the variation of the generated output images relative to the limited seed images; and (iii) decrease the error of the seed images with themselves when input to the autoencoder. We make the following contributions:

  • •  A new 3D image generation system, which can create synthetic images that resemble (in terms of features) the training images. The system is fully implemented and automated.
  • •  Novel metrics, which allow for numerical evaluation of 3D image generation aligned with qualitative goals related to lung nodule generation.
  • •  An extensive evaluation of this system to generate 3D images of lung nodules, and its use within an existing computer-aided diagnosis benchmark application.
  • •  The evaluation of iterative training techniques coupled with the ALNSB nodule classifier software, to further refine the quality of the image generator.

11.2 Related Work

Improving automated CT lung nodule classification techniques and 3D image generation are areas that are receiving significant research attention.

Recently, Valente et al. provided a good overview of the requirements for CADe (Computer Aided Detection) systems in medical radiology and they evaluate the status of recent approaches [7]. Our aim is to provide a tool which can be used to improve the results of such CADe systems by both increasing the true positive rate (sensitivity) and decreasing the false positive rate of CADe classifiers through the use of an increase in nodules for analysis and training. Their survey paper discusses in detail the preprocessing, segmentation, and nodule detection steps similar to those used in the ALNSB nodule analyzer/classifier which we used in this project.

Li et al. provide an excellent overview of recent approaches to 3D shape generation in their paper “GRASS: generative recursive autoencoders for shape structures” [12]. While we do not explore the design of an autoencoder with convolutional and deconvolutional layers, the same image generation quality metrics that we teach could be used to evaluate such designs. Similar tradeoffs between overfitting and low error rates with seed images would have to be considered when setting the depth of the network and number of feature maps in the convolutional layers.

Durugkar et al. describe the challenges of training GANs well and discuss the advantages of multiple generative networks trained with multiple adversaries to improve the quality of images generated [13]. LuNG explored using multiple networks during image feedback experiments. Larsen et al. [14] teach a system which combines an autoencoder with a GAN which could be a basis for future work introducing GAN methodologies into the LuNG system by preserving our goal of generating shapes similar to existing seed shapes.

11.3 Methodology

To begin, guided training is used in which each nodule is modified to create 15 additional training samples. We call the initial nodule set, of which we were provided 51 samples, the “seed” nodules, examples of which are shown in Fig. 11.2. The “base” nodules include image reflections and offsets to create 15 modified samples per seed nodule for a total of 816 samples. Fig. 11.1 shows the general structure of the LuNG system. The base nodules are used to train an autoencoder neural network with 3 latent feature neurons in the bottleneck layer. Images output by the neural network pass through a reconnection algorithm to guarantee that viable fully connected nodules are being generated for analysis by the nodule analyzer. A nodule analyzer program then extracts relevant 3D features from the nodules and prunes away nodules which can be rejected as not interesting for classification (definitely not a cancerous nodule). The analyzer has range checking on features which include nodule size, elongation, and surface area to volume ratio. The original CT scan nodule candidates pass through these checks before creating our 51 seed nodules, and the LuNG generated images are processed with the same criteria. The “analyzer accepted images” are the final output of LuNG and can be used to augment the image training set for a classifier. We use the ALNSB [11] nodule analyzer and classifier code for the LuNG project, but similar analyzers compute similar 3D features to aid in classification. The support vector machine is an example classifier to which LuNG can provide augmented data. Such augmented data is helpful in overcoming drawbacks in current lung nodule classification work [7]. To evaluate generated nodules, we develop a statistical distance metric similar to the Mahalanobis distance. Given the set of nodules output by LuNG, we explore adding them to the autoencoder training set to improve the generality of the generator. We also used the Score to evaluate various nodule reconnection options and network hyperparameters.

Image
Figure 11.1 Interaction between autoencoder, nodule analyzer, and support vector machine. Analyzer accepted images are suitable for augmented the training dataset for classifier.

We chose an autoencoder architecture for LuNG because it can be split into both an encoder and decoder network for different use models. The encoder (or “feature network”) is the portion of the autoencoder that takes a 32,000 voxel image as input and outputs 3 bottleneck feature neuron values between −1 and 1. The decoder (or “generator network”) is the portion of the autoencoder that takes 3 feature neuron values between −1 to 1 as input and generates a 32,000 voxel output image. Thus, given 2 seed nodules, one can use the feature network to find their latent space coordinates and then step from one nodule to another with inputs to the generator network. We analyze using a uniform random value from −1 to 1 on the 3 feature neurons to generate random nodules which we analyze for augmenting the classifier training set.

While the LuNG use model relies on having both an encoder and decoder network as provided by autoencoder training, future work could merge our technique with a generative adversarial network to enhance the generator or test whether convolution/deconvolution layers can help improve our overall quality metrics [12].

Metrics for Scoring Images  Our goals for LuNG are to generate images that have a high acceptance rate for the analyzer and a high variation relative to the seed images while minimizing the error of the network when a seed image is reproduced. We track the acceptance rate simply as the percentage of randomly generated nodules that are accepted by the analyzer. For a metric of variation, we compute a feature distance FtDistImage based on the 12 3D image features used in the analyzer. To track how well the distribution of output images matches the seed image means, we compute an FtMMSE based on the distribution means. The ability of the network to reproduce a given seed image is tracked with the mean squared error of the image output voxels, as is typical for autoencoder image training.

FtDistImage has some similarity to Mahalanobis distance, but finds the average over all the accepted images of the distance from the image to the closest seed image in the 12-dimensional analyzer feature space. As FtDistImage increases, the network is generating images that are less similar to specific samples in the seed images, hence it is a metric we want to increase with LuNG. Given an accepted set of n images Y and a set of 51 seed images S, and given yiImage denotes the value of feature i for an image and σSiImage denotes the standard deviation of feature i within S,

FtDist=1nyYminsSi=112(yisiσSi)2.

Image

FtMMSE tracks how much the 12 3D features have the same mean between the accepted set of images X and the seed images S. As FtMMSE increases, the network is generating average images that are increasing outside the typical seed image distribution, hence it is a metric we want to decrease with LuNG. Given μSiImage is the mean of feature i in the set of seed images and μYiImage is the mean of feature i in the final set of accepted images,

FtMMSE=112i=112(μYiμSi)2.

Image

ScoreImage is our composite network scoring metric used to compare different networks, hyperparameters, feedback options, and reconnection options. In addition to FtDistImage and FtMMSE, we use AC, which is the fraction of generated images which the analyzer accepted, and MSE which is the traditional mean squared error which results when the autoencoder is used to regenerate the 51 seed nodule images:

Score=FtDist1(FtMMSE+0.1)(MSE+0.1)(1AC).

Image

ScoreImage increases as FtDistImage or AC increase and decreases when FtMMSE or MSE increase. The constants in the equation are based on qualitative assessments of network results; for example, using MSE+0.1Image means that MSE values below 0.1 don't override the contribution of other components and mathematically aligns with the qualitative statement that an MSE of 0.1 yielded acceptable images visually in comparison with the seed images.

11.4 Experiments

By using the trained encoder network to find the latent feature coordinates for seed nodules 2 and 4, Fig. 11.3 shows 6 steps between these nodules. The top and bottom nodules in the image can be seen to accurately reproduce seed nodules 2 and 4 from Fig. 11.2. The 4 intermediate nodules are novel images from the generator which can be used to improve an automated classifier system.

Image
Figure 11.2 6 of 51 original seed nodules showing middle 8 2D slices of 3D image from the CT scan.
Image
Figure 11.3 Generated images of 6 steps through latent feature space from seed nodules 2 to 4.

In addition to stepping through latent feature space values, we use our ScoreImage metric to evaluate a full nodule generation system that includes using uniform random values between −1 and 1 as inputs to the generator network, then sends images through the reconnection algorithm and analyzer for acceptance of the images for analysis. Fig. 11.4 shows the components of the score for the final parameter analysis we did on the network. Note that the MSE metric (mean squared error of the network on training set) continues to decrease with larger networks, but the maximum ScoreImage we are measuring occurs with 3 bottleneck latent feature neurons.

Image
Figure 11.4 There are 4 components used to compute the network score. The component values are scaled as shown so that they can all be plotted on the same scale.

Our ScoreImage metric is used to evaluate our system as a whole and it can be used to explore variations to autoencoder training. We tested multiple methodes of using accepted images to augment the autoencoder training set, but ultimately found that such images resulted in confirmation bias that diminished the total system score as shown in Fig. 11.5. Exploring such system options resulted in a system with a ScoreImage that incorporates knowledge of actual nodule shapes from the seed nodule set, expert domain knowledge as encoded in the analyzer acceptance criteria, and machine learning researcher knowledge as represented by network size and system features for LuNG.

Image
Figure 11.5 This figure compares results between a network that used 816 base images with no analyzer feedback for 150,000 iterations of training and a network that trained for 25,000 iterations on the base images, then added 302 generated nodules to train for 25,000 iterations, then added a different 199 generated nodules to train for 25,000 iterations, and then trained for a final 75,000 iterations on the base images.

As an example of the nodule metrics we use to evaluate network architectures and interface options. We analyze the 12 3D features computed by the nodule analyzer (ALNSB). Table 11.1 shows that when 400 novel random images are generated by LuNG, the mean feature value for all 12 3D features stays within 30% of the seed nodules. (When image feedback is used in the training set, we see the mean of the generated images tend to be further from the seed nodule mean for any given feature; hence our conclusion that confirmation bias harms the ScoreImage for systems that used image feedback.) Based on this alignment, we plot SVM distance values for 1000 nodules and the 51 seed nodules in Fig. 11.6. After the support vectors are applied, nodules closer to the positive than the negative centroid are classified as suspicious. The results show that LuNG generated nodules augment the available nodules for analysis well, including providing many nodules which are near the existing boundary and can be useful for improving the sensitivity of the classifier. For example, by having a trained radiologist classify a subset of 100 of the 1000 generated nodules which are near the current classifier boundary or on the “cancerous” side of the boundary, a more balanced and varied classifier training set could be produced which would improve classification.

Table 11.1

Feature means ratios for 400 generated vs 51 seed nodules
Analyzer feature Ratio
2D area 1.1
2D max(xL,yL) 1.0
2D perimeter 1.2
2D area/peri2 0.8
3D volume 1.3
3D rad/MeanSqDis 1.0
min(xL,yL)/max(xL,yL) 1.0
minl/maxl 1.0
surface area3/volume2 1.2
mean breadth 1.3
euler3D 1.1
maskTem area/peri2 1.0
Image
Figure 11.6 Support Vector Machine (SVM) coordinates for 1000 generated and 51 seed nodules.

11.5 Conclusion

To produce quality image classifiers, machine learning requires a large set of training images. This poses a challenge for application areas where such training sets are rare if they exist, such as for computer-aided diagnosis of cancerous lung nodules.

In this work we developed LuNG, a lung nodule image generator, allowing us to augment the training dataset of image classifiers with meaningful (yet computer-generated) lung nodule images. Specifically, we have developed an autoencoder-based system that learns to produce 3D images that resembles the original training set, while covering adequately the feature space. Our tool, LuNG, was developed using PyTorch and is fully implemented. We have shown that the 3D nodules generated by this process visually and numerically align well with the general image space presented by the limited set of seed images.

References

[1] Huang S, Zhou J, Wang Z, Ling Q, Shen Y, Biomedical informatics with optimization and machine learning. 2016.

[2] A. Samareh, Y. Jin, Z. Wang, X. Chang, S. Huang, Predicting depression severity by multi-modal feature engineering and fusion, arXiv preprint arXiv:1711.11155; 2017.

[3] A. Samareh, Y. Jin, Z. Wang, X. Chang, S. Huang, Detect depression from communication: how computer vision, signal processing, and sentiment analysis join forces, IISE Transactions on Healthcare Systems Engineering 2018:1–42.

[4] M. Sun, I.M. Baytas, L. Zhan, Z. Wang, J. Zhou, Subspace network: deep multi-task censored regression for modeling neurodegenerative diseases, arXiv preprint arXiv:1802.06516; 2018.

[5] M. Karimi, D. Wu, Z. Wang, Y. Shen, DeepAffinity: interpretable deep learning of compound-protein affinity through unified recurrent and convolutional neural networks, arXiv preprint arXiv:1806.07537; 2018.

[6] R.L. Siegel, K.D. Miller, A. Jemal, Cancer statistics, 2017, CA: A Cancer Journal for Clinicians 2017;67(1):7–30 10.3322/caac.21387.

[7] I.R.S. Valente, P.C. Cortez, E.C. Neto, J.M. Soares, V.H.C. de Albuquerque, Ja.M.R. Tavares, Automatic 3D pulmonary nodule detection in CT images, Computer Methods and Programs in Biomedicine 2016;124(C):91–107 10.1016/j.cmpb.2015.10.006.

[8] Q. Li, F. Li, K. Doi, Computerized detection of lung nodules in thin-section CT images by use of selective enhancement filters and an automated rule-based classifier, Academic Radiology 2008;15(2):165–175.

[9] M.J.M. Chuquicusma, S. Hussein, J. Burt, U. Bagci, How to fool radiologists with generative adversarial networks? A visual turing test for lung cancer diagnosis, ArXiv e-prints arXiv:1710.09762; 2017.

[10] J. Rong, P. Gao, W. Liu, Y. Zhang, T. Liu, H. Lu, Computer simulation of low-dose CT with clinical lung image database: a preliminary study, Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series. 2017;vol. 10132:101322U 10.1117/12.2253973.

[11] S. Shen, P. Rawat, L.N. Pouchet, W. Hsu, Lung nodule detection C benchmark, URL: https://github.com/cdsc-github/Lung-Nodule-Detection-C-Benchmark; 2015.

[12] J. Li, K. Xu, S. Chaudhuri, E. Yumer, H. Zhang, L. Guibas, GRASS: generative recursive autoencoders for shape structures, ACM Transactions on Graphics (Proceedings of SIGGRAPH 2017) 2017;36(4).

[13] I. Durugkar, I. Gemp, S. Mahadevan, Generative multi-adversarial networks, ArXiv e-prints arXiv:1611.01673; 2016.

[14] A. Boesen Lindbo Larsen, S. Kaae Sønderby, H. Larochelle, O. Winther, Autoencoding beyond pixels using a learned similarity metric, ArXiv e-prints arXiv:1512.09300; 2015.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.44.143