Chapter 12

Computational Methods in CryoElectron Microscopy 3D Structure Reconstruction

FA ZHANG, XIAOHUA WAN, and ZHIYONG LIU

12.1 Introduction

Electron cryomicroscopy (cryoEM) is a rapidly emerging tool in structural biology for three-dimensional (3D) structure determination of macromolecular complexes. Single-particle cryoEM allows structural elucidation of macromolecular assemblies at subnanometer resolution [1, 2], and, in some cases, at atomic resolution [3]. Electron cryotomography (ET) allows structural studies of complex speciments at near-molecular resolution as well as visualization of macromolecular complexes in their native cellular context [4]. The integrative combination of these cryoEM modalities with other high-resolution approaches, such as X-ray crystallography or nuclear magnetic resonance (NMR), is expected to provide a comprehensive description of the cellular function in molecular detail [5].

In ET, a series of projection images where structural features from different layers of the 3D structure of the specimen are first superposed along the direction of the electron beam. In general, those images are obtained by tilting the specimen around tilt axes [6]. The 3D structure of the sample can then be derived from those projection images, by means of tomographic reconstruction algorithms [7]. Because of physical limitations of microscopes, the angular tilt range is limited and, as a result, tomographic tilt series have a wedge of missing data corresponding to the uncovered angular range. Moreover, biological material is very sensitive to radiation, so specimens must be imaged at very low electron doses, which makes projection images extremely noisy. In high-resolution structural studies, this poor signal-to-noise ratio (SNR) is in the order of only 0.1 [8]. As a consequence, high-resolution electron tomography requires a method of 3D reconstruction from projections able to deal with limited angle conditions and extremely low SNR of the projection images.

In terms of computation, there are several main problems in ET reconstruction: the extremely low SNR of the images, alignment and classification of the images, 3D reconstruction under limited angle conditions, and post-processing and interpretation of the reconstruction results [8]. Also, because of the poor SNR in projection images, the size of tomograms and the number of projectionimages increase for high-resolution studies, and the computation time for 3D reconstruction also increases, which becomes a bottleneck for routine applicaitons of ET.

Since the demands of huge computational costs and resources derive from the computational complexity of the recontruction algorithms and the large size and number of the projection images involved, this chapter addresses the 3D reconstruction algorithm and its multilevel parallel strategy on GPU platform. We present an adaptive simulatneous algebraic reconstruction technique (ASART) for imcomplete data and noisy conditons. Specifically, we develop three key techniques—modified multilevel access scheme (MMAS), adaptive adjustment of relaxation (AAR) parameters, and column sum substitution (CSS) technique, to improve the reconstruction quality and speed of the reconstruction process. Also, we present a multilevel parallel strategy for iterative reconstruction algorithm in ET on multi-GPUs, and we develop an asynchronous comunication scheme and data structure named blob-ELLR to accelerate reconstruction processing and reduce memory requirements.

The remainder of the chapter is organized as follows Section 12.2 reviews icterative 3D reconstruciton methods for ET. Section 12.3 focuses on the ASART algorithm. Section 12.4 presents a multilevel parallel strategy for iterative reconstruction algorithm. Section 12.5 shows and analyzes the experimental results. Section 12.6 concludes the chapter with a summary.

12.2 Iterative image reconstruction methods

In ET, the projection images are acquired from a specimen through so-called single-axis tilt geometry. The specimen is tilted over a range, typically from −60° (or −70°) to +60° (or +70°) due to physical limitations of microscopes, with small tilt increments (1° or 2°). An image of the same object area is recorded at each tilt angle, and then the 3D reconstruction of the specimen can be obtained from a set of projection images by reconstruction methods.

Several reconstruction methods for ET have been proposed. Weighted backprojection (WBP) has been one of the most popular methods in the field of 3D reconstruction of ET, due to its algorithmic simplicity and computational efficiency [9]. The major disadvantage of WBP, however, is that the results may be strongly affected by limited-angle data and noisy conditions [10]. Series expansion methods (i.e., iterative methods) constitute one of the main alternatives to WBP in 3D reconstruction of ET, owing to their good performance in handling incomplete and noisy data. Several traditional iterative methods, such as algebraic reconstruction techniques (ARTs) [11, 12], simultaneous iterative reconstruction technique(SIRT) [13], component averaging methods (CAVs) [10], block-iterative CAV (BICAV) [10], and simultaneous algebraic reconstruction technique (SART) [14] have all been utilized to 3D reconstruction of ET. ART enjoys a rapid convergence but exhibits a very noisy salt-and-pepper image characteristic of reconstructed images. SIRT and CAV, on the contrary, produce fairly smooth reconstructed images, but still require a large number of iterations for convergence. SART is characterized by better robustness than ART under noise, and its convergence speed is reported to be faster than that of SIRT and CAV [14].

In this section, we give a brief overview of blob-based iterative reconstruction methods and describe the simultaneous algebraic reconstruction technique (SART).

12.2.1 Blob-Based Iterative Reconstruction Methods

Iterative methods are based on the series expansion approach [15], in which 3D volume c12-math-0001 is represented as a linear combination of a limited set of known and fixed basis functions c12-math-0002, with appropriate coefficients c12-math-0003, that is

12.1 c12-math-0004

where (c12-math-0005) is a spherical coordinate, and c12-math-0006 is the total number of the unknown variables c12-math-0007. In 3D reconstruction, the basis functions used to represent the object greatly influences the reconstructed results. During the 1990s, spherically symmetric volume elements (called blobs) were thoroughly investigated and, as a consequence, the conclusion that blobs yield better reconstructions than the traditional voxels has been drawn in 3D reconstruction [16]. The use of blob basis functions provides iterative methods with better resolution–noise performance than voxel basis functions because of the overlapping nature of their rotational symmetric basis functions. Thus, we consider the blob basis instead of the traditional voxel one. The blob basis discussed in this chapter is constructed using generalized Kaiser–Bessel (KB) window functions

12.2 c12-math-0008

where c12-math-0009(c12-math-0010) denotes the modified Bessel function of the first kind of order m, a is the radius of the blob, and c12-math-0011 is a nonnegative real number controlling the shape of the blob. The choice of the parameters c12-math-0012, and c12-math-0013 will influence the quality of blob-based reconstructions. The basis functions that developed by Matej and Lewitt are used for the choice of the parameters in our algorithm (i.e., c12-math-0014, c12-math-0015, c12-math-0016).

In 3D ET, the model of the image formation process is expressed by the following linear system

12.3 c12-math-0017

where c12-math-0018 denotes the c12-math-0019th measured image of c12-math-0020 is the dimension of c12-math-0021, where c12-math-0022 is the number of projection angles and c12-math-0023 the number of projection values per view; and c12-math-0024 is a weighting factor representing the contribution of the c12-math-0025th basis function to the c12-math-0026th projection. Under such a model, the element c12-math-0027 can be calculated according to the projected procedure as follows:

12.4 c12-math-0028

12.5 c12-math-0029

where c12-math-0030 is the projected value of the pixel c12-math-0031 at an angle c12-math-0032; c12-math-0033 is defined as a sparse matrix with c12-math-0034 rows and c12-math-0035 columns where c12-math-0036 is the element of c12-math-0037. In general, the storage demand of the weighted matrix c12-math-0038 rapidly increases as the size and number of projection images increase. For example, when the size of images is c12-math-0039, the storage demand of the matrix approaches 3.5 GB.

Under those assumptions, the image reconstruction problem can be modeled as the inverse problem of estimating the c12-math-0040 values from the c12-math-0041 values by solving the system of linear equations given byEquation (12.3). This problem is usually resolved by means of iterative methods.

12.2.2 Simultaneous Algebraic Reconstruction Technique (SART)

The SART algorithm is a basic iterative method designed to solve the linear system in image reconstruction. Typically, the algorithm begins with an arbitrary c12-math-0042 and then begins to iterate until convergence [18]. The arbitrary initial value may greatly deviate from the true value. So the number of iterations can be very large. In order to accelerate the process of convergence, SART further adopts a backprojection technique (BPT), a simple WBP without the weighting functions, to estimate the first approximation c12-math-0043[19]. BPT is a simple reconstruction method where the gray level of a pixel can be considered as the weighted average of the projections for all the possible rays passing through the pixel [7, 20]. Consequently, the initial solution c12-math-0044 is defined by as follows:

12.6 c12-math-0045

Then, in an iterative process, SART is described by the following expression

12.7 c12-math-0046

12.8 c12-math-0047

where c12-math-0048 is the number of iterations, c12-math-0049 is the relaxation parameter, c12-math-0050 is the fixed value (in general, 0 < c12-math-0051 < 2), c12-math-0052 is the number of projections per view, and c12-math-0053 = c12-math-0054 + c12-math-0055 denotes the c12-math-0056th equation of the system; c12-math-0057 is the number of all views obtained c12-math-0058 = (c12-math-0059 is the index of the view, and c12-math-0060 is the next iterative value obtained by updating c12-math-0061. SART adopts a view-by-view strategy; that is, an approximation is updated simultaneously by all the projections of each view.

As mentioned above, the convergence speed of SART is higher than that of SIRT and CAV; the convergence process of SART is still slow and it is difficult to acquire satisfactory results:

1. SART adopts a sequential accessing scheme (SAS) to update views and thus converges slowly owing to the high correlation between consecutive views [21]. Several methods [e.g., random access scheme (RAS), multilevel access scheme (MAS), weighted distance scheme (WDS)] have been proposed and evaluated to minimize this correlation in the tomography field [21–23]. These methods are adopted mostly where projection views are distributed uniformly among 180°, cannot be directly used in 3D reconstruction of ET.
2. It has been pointed out that careful selections for the relaxation parameters can lead to the better qualities of reconstructions [24, 25]. However, SART employs a constant relaxation parameter c12-math-0062 [since c12-math-0063 is fixed and c12-math-0064 is invariable during the reconstruction procedure in each iteration, as shown in Eq. (12.7). With the constant relaxation parameter, the pixels with large gray level c12-math-0065 have the same back-projection of the discrepancies as the pixels with small c12-math-0066 as seen in Eq. (12.7). So the gray levels of the pixels should be considered for the selection of the relaxation parameters.
3. Computing the column sum for each view is time-consuming and storing it requires massive amount of memory.

12.3 Adaptive simultaneous algebraic reconstruction technique (ASART)

To generate high-quality reconstructions with improved computational speed, we have developed a technique known more popularly by its acronym, ASART. The key techniques developed in ASART include a modified multilevel access scheme (MMAS) to arrange the order of projection data, an adaptive adjustment of relaxation parameters (AAR) to correct the discrepancy between actual and computed projections, and a column sum substitution (CSS) to reduce the memory requirement and computation time.

12.3.1 Modified Multilevel Access Scheme (MMAS)

The SART method adopts SAS to order the views so that there is a high correlation between consecutive views. The convergence can be significantly facilitated if the views are ordered to maximize their orthogonality [21]. Toward this goal, a multilevel access scheme (MAS) is adopted to substantially decrease the correlated error between the consecutive views. A detailed comparison between MAS and SAS shows that MAS yields the most efficient reconstruction [26]. Applications in different situations have shown that MAS can obtain promising results [27, 28].

The MAS method is based on the fact that two views of c12-math-0067 apart are minimally correlated, and the third view is set to the angle halving the former two views to minimize the correlation of three views [21]. The MAS ordering applies to any number of views but works best if the number of views is a power of 2. Suppose that the views are indexed as 0, 1, …, c12-math-0068, where c12-math-0069 is the number of views. In MAS, views can be organized in a total of c12-math-0070 levels where c12-math-0071 is expressed by

12.9 c12-math-0072

In level c12-math-0073, view 0 (c12-math-0074) and view c12-math-0075/2 (90c12-math-0076) with a maximum orthogonality are accessed first. Then, in level c12-math-0077, there are two views: 45c12-math-0078 and 135c12-math-0079 (or whose indices are c12-math-0080/4 and c12-math-0081/4). Furthermore, in level c12-math-0082, the indices of views are respectively c12-math-0083/8, c12-math-0084/8, c12-math-0085/8, and c12-math-0086/8. In every level, the views with the odd indices and the following ones with even indices are orthogonal.

As described above, MAS is adapted for the complete projection views distributed uniformly from 0c12-math-0087 to 180c12-math-0088. However, the projection views of ET are incomplete and limited in a certain range. We cannot always find two projections whose views are 90c12-math-0089 apart. If MAS is directly used to arrange the order of the views in ET, some views to which there are no views perpendicular will be left. If these views that remain unprocessed by MAS are then arranged by SAS, there will be a high correlation between the consecutive views. We propose a modified MAS (MMAS) to arrange the order of projections in ASART. Only a series of projections evenly distributed across the whole angle is considered as ET projections. For example, the tilt angle ranges from c12-math-0090 to +60c12-math-0091 with a small tilt increment of 2c12-math-0092. In MMAS, we adopt the range c12-math-0093 (c12-math-0094) of tilt angles as the selected factor instead of c12-math-0095 since the range is 120c12-math-0096 rather than 180c12-math-0097. Note that the range for the tilt angles can still vary in different situations. In the first level, we choose view c12-math-0098 and view 0c12-math-0099 between whom the angle is the half of the range (but not 90c12-math-0100). The two views that halve the angles between the first two are in the second level (i.e., view c12-math-0101 and view 30c12-math-0102). In such a scheme,views in one level will halve (or almost) the views in all previous levels. Figure 12.1 shows the situation for the example by labeling the order of access in our MMAS scheme on the corresponding view angles. The proposed scheme clearly covers all angular regions evenly over time as shown in Figure 12.1. Table 12.1 summarizes the results of every level for this example. Note that in any level, the computed value of views may not be included in the projected views or may have already been accessed. If so, we search both sides of the computed value until the closest unused value is found and then put it into the sequence. Before the iterative process of ASART is carried out, the order of the overall views is arranged according to MMAS as discussed above.

Figure 12.1 The access orders of the first three levels for the example. The left bold numbers on the left denote the index of access, and c12-math-0103 denotes the number of the level.

c12f001

Table 12.1 Access Orders for c12-math-0104 Projections Ranging from c12-math-0105 to +60c12-math-0106 (c12-math-0107)

c12-math-0108 c12-math-0109, 0c12-math-0110
c12-math-0111 c12-math-0112, +30c12-math-0113
c12-math-0114 c12-math-0115, +44c12-math-0116
c12-math-0117 c12-math-0118, +6c12-math-0119, +36c12-math-0120, +22c12-math-0121, +52c12-math-0122
c12-math-0123 c12-math-0124, +2c12-math-0125, +26c12-math-0126, +18c12-math-0127, +48c12-math-0128, c12-math-0129, +10c12-math-0130,
c12-math-0131, +40c12-math-0132, +24c12-math-0133, +56c12-math-0134
c12-math-0135 c12-math-0136, +4c12-math-0137, +32c12-math-0138, +12c12-math-0139, +42c12-math-0140, +8c12-math-0141,
c12-math-0142, +38c12-math-0143, +20c12-math-0144, +54c12-math-0145, +16c12-math-0146, +34c12-math-0147,
c12-math-0148, +28c12-math-0149, c12-math-0150, +50c12-math-0151, +58c12-math-0152

12.3.2 Adaptive Adjustment of Relaxation Parameters (AARPs)

In SART, the gray level c12-math-0153 is corrected by the relaxation parameters c12-math-0154 and the discrepancies between the actual projections c12-math-0155 and the computed projections c12-math-0156 in each iteration. As shown in Equation (12.7 ), the relaxation parameters c12-math-0157 are determined only by the weight c12-math-0158 and the fixed value c12-math-0159 in the iterative procedure. The convergence process can be faster if the relaxation parameters are adjusted as a function of the number of iterations. The relaxation parameters are determined in such a way that they decrease as the number of iterations increases [19]. In SART, the relaxation parameters are determined only by the weight c12-math-0160 while the gray level c12-math-0161 is ignored. Thus the pixels with large gray levels will have the same backprojection of the discrepancies as the pixels with small gray levels.

In fact, the pixels with different gray levels make different contributions to the discrepancies. In ASART, a data-driven adjustment of relaxation parameters (AAR) is applied during the reconstruction procedure. In AAR, the relaxation parameters are determined according to the gray levels as well as the weights as shown in the following equation:

12.10 c12-math-0162

With this approach, the relaxation parameter for each pixel is adjusted according to its gray level obtained in the previous iteration. Note that c12-math-0163 represents the geometry contribution of the c12-math-0164 pixel only to the c12-math-0165 ray integral. According to Eq. (12.10), the contribution of the c12-math-0166 pixel to the c12-math-0167 ray integral includes both the geometry contribution c12-math-0168 and the contribution of the gray level c12-math-0169 of the c12-math-0170 pixel. This is different from SART, where the gray level c12-math-0171 of the c12-math-0172 pixel is not considered as shown in Equation (12.7).

To illustrate intuitively how the convergence process can be accelerated by AAR, we consider a case with only two variables c12-math-0173 and c12-math-0174 satisfying the following equations:

12.11 c12-math-0175

The computational procedures for locating the solution from an initial guess (c12-math-0176=1.5, c12-math-0177=1.3) are displayed in Figure 12.2. The line in red denotes the procedure with the constant relaxation parameter by Eq. (12.7), and the line in blue describes the process with the adaptive relaxation parameter by Eq. (12.10). It is seen that the solution obtained by Eq. (12.10) is closer to the true solution than that obtained by Eq. (12.7) after the same number of iterations.

Figure 12.2 Illustration of the computational procedures from an initial guess.

c12f002

Note that in fact we don't need to recompute the sum of c12-math-0178 in Eq. (12.10) since it is equal to c12-math-0179, which has been computed in the process of reprojection. Extra computation is needed only for the multiplication of c12-math-0180 and c12-math-0181, but it is not time-consuming compared with the total computation.

12.3.3 Column Sum Substitution Technique (CSS)

We define c12-math-0182 as the reciprocal of the sum of the c12-math-0183th column in each view by

12.12 c12-math-0184

Although the column sums remain unchanged in each iteration, large memory space is needed to store each column sum for a SART reconstruction of large images. In fact, it has been proved that c12-math-0185 has no effect on the final solution to which SART converges [29]. In another iterative method called BICAV (block-iterative component averaging) [10], the relaxation parameters are generated by the following equation

12.13 c12-math-0186

where c12-math-0187 denotes the number of times that components c12-math-0188 of the volume contributes with nonzero value to the equations in the c12-math-0189th block. This is called oblique projection in BICAV. SART, on the other hand, is based on orthogonal projections. It has been proved that oblique projections allow iterative methods to have higher convergence speeds, especially during the early stages of iterations [10].

Accordingly, we propose that c12-math-0190 is replaced by a scalar c12-math-0191 denoting the maximum number of the nonzero weight c12-math-0192 in each view. In this way, the memory requirement and computation time are reduced, and the high qualities of the results are obtained because of this oblique projections. In the blob (if the radius c12-math-0193=2), the maximum number of the rays that affect every pixel on its reconstructed gray level in each view is 4. The relaxation parameter c12-math-0194 with our CSS technique can be expressed by

12.14 c12-math-0195

This modification can conserve runtime because the algorithm doesn't need to compute each column sum c12-math-0196 for each view. The modification can also reduce the memory requirement by using the scalar c12-math-0197 instead of the vector c12-math-0198, especially when the size of the image is very large.

With the improvements mentioned above, ASART is formulated as follows:

12.15 c12-math-0199

12.4 Multilevel parallel strategy for iterative reconstruction algorithm

The reconstruction time of blob-based iterative methods is a major challenge in ET because of the large reconstructed data volume. So parallel computing on multiple GPUs is becoming paramount to satisfying the computational requirement. In this section, we present a multilevel parallel strategy for blob-based iterative reconstruction and implement it on the OpenMP-CUDA architecture.

12.4.1 Coarse-Grained Parallel Scheme Using OpenMP

In the first level of the multilevel parallel scheme, a coarse-grained parallelization is straightforward in line with the properties of ET reconstruction. The single-tilt axis geometry allows data decomposition into slabs of slices orthogonal to the tilt axis. For this decomposition, the number of slabs equals the number of GPUs, and each GPU reconstructs its own slab. Consequently, the 3D reconstruction problem can be decomposed into a set of 3D slab reconstruction subproblems. However, the slabs are interdependent because of the overlapping nature of blobs. Therefore, each GPU has to receive a slab that is composed of its corresponding own slices and additional redundant slices reconstructed in neighbor slabs. The number of redundant slices depends on the blob extension. In a slab, its own slices are reconstructed by the corresponding GPU and require information provided by the redundant slices from the neighbor GPUs. During the process of 3DET reconstruction, each GPU has to communicate with other GPUs for the additional redundant slices.

We have implemented the 3DET reconstruction based on the architecture in which a CPU controls several GPUs and the GPUs share the memory. We adopt two GPUs in the different platforms to implement the blob-based reconstruction. Thus the first-level parallel strategy makes use of two GPUs to perform the coarse-grained parallelization of the reconstruction. As shown in Figure 12.3, the 3D volume data are halved into two slabs, and each slab contains its own slices and a redundant slice. According to the shape of the blob adopted (the blob radius is 2 in our experiments), only one redundant slice is included in each slab. Each slab is assigned to and reconstructed on each individual GPU in parallel. A shared-memory parallel programming scheme (OpenMP) is employed to fork two threads to control the separated GPU. Each slab is reconstructed on each individual GPU by each parallel thread. Consequently, the partial results attained by GPUs are combined to complete the final result of the 3D reconstruction. Certainly, the parallel strategy can be applied on GPU clusters. In a GPU cluster, the number of slabs equals the number of GPUs for the decomposition described above.

Figure 12.3 Coarse-grained parallel scheme using blob. 3D volume is decomposed into slabs of slices. The number of slabs equals the number of GPUs.

c12f003

12.4.2 Fine-Grained Parallel Scheme Using CUDA

In the second level of the multilevel parallel scheme, 3D reconstruction of one slab, as a fine-grained parallelization, is implemented on each GPU using CUDA. In the 3D reconstruction of a slab, the generic iterative process is described as follows:

  • Initiation—compute the wight matrix c12-math-0200 and create an initial value for c12-math-0201 by BPT.
  • Reprojection—estimate the computed projection data c12-math-0202 basis of the on the current approximation c12-math-0203.
  • Comparison—compare and compute the discrepancy c12-math-0204 between the real experimental and calculated projections.
  • Backprojection—backproject the comparison results over the image space and obtain c12-math-0205.
  • Refinement—update the current approximation c12-math-0206 by incorporating the weighted backprojection c12-math-0207.

The interdependence among neighbor slabs implies that, after computation of either the reprojection or backprojection for a given slab, there must be a proper exchange of information between neighboring GPUs.

12.4.3 Asynchronous Communication Scheme

As described above for the multilevel parallel scheme, there must be two communications between neighbor GPUs during one iterative reconstruction process: (1) to exchange the computed projections of the redundant slices after the reprojection process and (2) to exchange the reconstructed pixels of the redundant slices after the backprojection process. CUDA provides a synchronous communication scheme [cudaThreadSynchronize(#)] to accommodate the communication between GPUs. With the synchronous communication scheme, GPUs must sit idle while data are exchanged, which has a negative impact on the performance of the reconstruction in ET.

In the 3DET reconstruction on clusters, the use of blobs as basis functions involves significant additional difficulties in the parallelization and necessitates substantial communications among the processors. In any parallelization project where communication between nodes is involved, latency hiding becomes an issue [30]. An effective strategy stands for overlapping communication and computing so as to keep the processor busy while waiting for the communications to be completed [10]. In this study, a latency hiding strategy has been devised to deal with the communications efficiently. To minimize the idle time on the GPUs, we also present a latency hiding strategy using an asynchronous communication scheme in which different streams are used to perform GPU execution and CPU-GPU memory access asynchronously. The communication scheme splits GPU execution and memory copies into separate streams. Execution in one stream can be performed at the same time as a memory copy from another. In one slab reconstruction, reprojection of the redundant slices, memory copies and backprojection of the redundant slices are performed in one stream. The executions (i.e., reprojection and backprojection) of the slab's own slices are performed in the other stream. This can be extremely useful for reducing the idle time of GPUs.

12.4.4 Blob-ELLR Format with Symmetric Optimization Techniques

In the parallel blob-based iterative reconstruction, another problem is the lack of memory on GPUs for the sparse weighted matrix. Several data structures have been developed to store sparse matrices. Compressed row storage (CRS) and ELLPACK are the most two extended formats to store the sparse matrix on CPUs [31]. ELLPACK-R (ELLR), a variant of the ELLPACK format, has been proved to outperform the most efficient formats for storing the sparse matrix data structure on GPUs. ELLR consists of two arrays, c12-math-0208[#] and c12-math-0209[#] of dimension c12-math-0210, and an additional c12-math-0211-dimensional integer array called c12-math-0212[#] is included in order to store the actual number of nonzeros in each row [33]. With the size and number of projection images increasing, the memory demand of the sparse weighted matrix rapidly increases. The weighted matrix involved is too large to load into most of GPUs because of the limited available memory, even with the ELLR data structure.

Vazquez et al. proposed a matrix approach and exploited several geometry-related symmetry relationships to reduce the weighted matrix involved in the WBP reconstruction method [32]. In our work, we present a data structure named blob-ELLR and exploit several geometric symmetry relationships to reduce the weighted matrix involved in iterative reconstruction methods. The blob-ELLR data structure decreases the memory requirement and then accelerates the speed of ET reconstruction on GPUs. Compared with the matrix approach [32], our matrix blob-ELLR is adopted to store the weighted matrix c12-math-0215 instead of the transpose of the one involved in the original ELLR. As shown in Figure 12.4 (a), the maximum number of rays related to each pixel is four because of the blob radius (viz., c12-math-0216 = 2). To store the matrix c12-math-0217, the blob-ELLR includes two 2D arrays: one float c12-math-0218[#] to save the entries, and the other integer c12-math-0219[#] to save the columns of every entry (see Fig. 12.4b). Both arrays are of dimension c12-math-0220, where c12-math-0221 is the number of columns of c12-math-0222 and c12-math-0223 is the maximum number of nonzeros in the columns (c12-math-0224 is the number of the projection angles). Because the percentage of zeros is low in the blob-ELLR, it is not necessary to store the actual number of nonzeros in each column. Thus the additional integer array c12-math-0225[#] is not included in the blob-ELLR.

Figure 12.4 Blob-ELLR format with symmetric optimization techniques are exploited to reduce the storage space of c12-math-0213 to almost c12-math-0214th of the original size.

c12f004

Although the blob-ELLR without symmetric techniques can reduce the storage of the sparse matrix c12-math-0226, the number of c12-math-0227 is rather large, especially when the number of c12-math-0228 increases rapidly. The optimization takes advantage of the symmetry relationships as follows:

Symmetry 1. Assume that the c12-math-0229th column elements of the matrix c12-math-0230 in each view are c12-math-0231. The relationship among the adjacent column elements is

12.16 c12-math-0232

So, only c12-math-0233 need to be stored in the blob-ELLR, whereas the other elements are easily computed from c12-math-0234. This scheme can reduce the storage spaces of c12-math-0235 and c12-math-0236 to 25%.

Symmetry 2. Assume that a point c12-math-0237 of a slice is projected to a point c12-math-0238 [c12-math-0239] in the projection image corresponding to the tilt angle c12-math-0240 and projectc12-math-0241 is shown as follows:

12.17 c12-math-0242

It is easy to see that the point (c12-math-0243) of a slice is then projected to a point c12-math-0244 (c12-math-0245) in the same tilt angle c12-math-0246. The weighted value of the point (c12-math-0247) can be computed according to that of the point (c12-math-0248). Therefore, it is not necessary to store the weighted value of almost half of the points in the matrix c12-math-0249 so that the space requirements for c12-math-0250 and c12-math-0251 are further reduced by nearly 50%.

Symmetry 3. In general, the tilt angles used in ET are halved by c12-math-0252. Under this condition, a point (c12-math-0253) with a tilt angle c12-math-0254 is projected to a point c12-math-0255 (c12-math-0256). Therefore, the projection coefficients are shared with the projection of the point (c12-math-0257) with the tilt angle c12-math-0258. This further reduces the storage spaces of c12-math-0259 and c12-math-0260 by nearly 50% again.

With the three symmetric optimizations mentioned above, the size of the storage for two arrays in the blob-ELLR is almost c12-math-0261 reducing to nearly c12-math-0262 of the original size.

12.5 Experimental results and discussion

The objective of this study is to improve the quality and efficiency of 3D ET reconstruction. In this chapter, a reconstruction method ASART and a multilevel parallel strategy are presented. Two experiments have been carried out to evaluate the proposed methods, respectively. To evaluate the performance of ASART, we compare the reconstruction qualities of ASART with those of WBP and SART. To evaluate the performance of the multilevel parallel strategy, two different experimental datasets are reconstructed on Fermi platforms.

12.5.1 Materials and Computing Resources

All the experimental data in our work are the caveolae from the procine aorta endothelial (PAE) cell, collected by FEI Tecnai 20 in China National Key Laboratory of Biomacromolecules [34]. Two different experimental datasetsare used (denoted by small-sized and large-sized) with 112 images of c12-math-0265 pixels, and 119 images of c12-math-0266 pixels, to reconstruct tomograms of c12-math-0267 and c12-math-0268, respectively.

The first experiment is carried out on a machine running Ubuntu 9.10 32-bit with an Inter Core 2 Q8200 at 2.33 GHz and 4 GB of DDR2 memory. The second experiment is carried out on Fermi platform. The Fermi platform is composed of the same CPU with GT200, and two NVIDIA Tesla C2050 cards. NVIDIA Tesla C2050 adopts the Fermi architecture and contains 14 SMs of 32 SPs (i.e., 448 SPs) at 1.15 GHz, with 3 GB of memory.

12.5.2 Experimental Results of 3D Data Reconstruction

To evaluate the performance of ASART, we have performed the 3D reconstructions of two datasets of PAE cell using WBP, SART, and ASART, respectively. We use a popular software IMOD [35] to perform the reconstructions of WBP, and adopt the relaxation factor c12-math-0269 with 0.2 and perform the reconstructions with different numbers of iterations.

Montages showing one c12-math-0270 section of the volume reconstructed with the large datasets are presented in Figure 12.5. It clear that the quality of the image obtained by ASART is superior to the images obtained by WBP and SART. Since this dataset has a large noise level and no gold markers, the results of WBP and SART are so blurred that we can't catch the legible bilayer outlines of caveolae membrane structure, while the structure–phospholipid bilayer can be clearly observed in the result of ASART, as shown the red rectangles in Figure 12.5.

Figure 12.5 In the large dataset (c12-math-0263), one of slices along the c12-math-0264 axis of the reconstructions of the caveolae by WBP (a), 50 iterations of SART (b), and 50 iterations of ASART (c). Caveolae membrane structures (indicated by the red rectangles) can be clearly observed in the result of ASART (c).

c12f005

Also, we adopt a projection error c12-math-0271 criterion for the comparison of WBP, SART, and ASART in a practical situation. c12-math-0272 measures the mean discrepancy between the ray integral c12-math-0273 and its calculated value c12-math-0274, defined as

12.18 c12-math-0275

where c12-math-0276, and c12-math-0277 is a c12-math-0278-dimensional vector whose c12-math-0279th component is c12-math-0280. In the 3D data reconstruction, the qualities of the reconstructions, with SART and ASART, respectively, have been evaluated using the measure c12-math-0281. In general, a smaller projection error indicates a better reconstruction. The curves of the measure c12-math-0282 versus the number of iterations are presented in Figure 12.6. As shown in that Figure 12.6, c12-math-0283 of the reconstructions with WBP remain constant, with the value of 20.35 and 23.94 in the small and large datasets, respectively. In the two different experiments, c12-math-0284 of the reconstructions with ASART is smaller than that with SART in the same iterations. For the medium and large datasets, the values of c12-math-0285 become smaller than that of WBP in 10 iterations. Consequently, it is shown that ASART has the advantage in dealing with extremely noisy conditions, and can obtain results better than those with WBP and SART.

Figure 12.6 Plots of the projection errors c12-math-0286 versus the iteration numbers for the reconstructions of the caveolae using WBP, SART, and ASART at low resolution (a) and high resolution (b).

c12f006

12.5.3 Experimental Results of GPU Platform

To evaluate the multilevel parallel strategy, we have performed two sets of experiments to estimate the performance of an asynchronous communication scheme and the blob-ELLR data structure, respectively.

First, to estimate the performance of the asynchronous communication scheme, we implemented the blob-based iterative reconstruction algorithm (ASART) using the two experimental datasets on the Fermi platform. All the reconstructions adopt two methods separately: multi-GPUs with the synchronous communication scheme (viz., GPU+syn), and multi-GPUs with the asynchronous communication scheme (viz., GPU+asyn). In the experiments, the blob-ELLR developed in our work is used to storage the weighted matrix in the reconstruction. Figure 12.7 shows the speedups of the two communication schemes for different reconstructions iterative numbers (i.e., 1, 5, 10, 25) using the two datasets on the Tesla C2050 versus CPU. As shown in Figure 12.7, the speedups of GPU+asyn are larger than those of GPU+syn for two datasets. The asynchronous communication scheme exhibits excellent acceleration factors, reaching up to c12-math-0287 and c12-math-0288 for 25 iterations, on two datasets, respectively. In general, the asynchronous communication scheme provides better performance than the synchronous scheme for the reason of asynchronous overlap of communications and computations.

Figure 12.7 Speedup factors of different iterations shown by both implementations on the Tesla C2050 and CPU.

c12f007

Further, to evaluate the bolb-ELLR data structure, we implemented and compared the blob-based iterative reconstruction method using standard matrix, ELLR matrix, and blob-ELLR matrix on the Fermi platform, respectively. Figure 12.8 compares the speedups against the method using standard matrix. To ensure a clear and brief description, we show the results of only two datasets for one iteration. Because of the limited memory, the ELLR matrix method for large dataset cannot be implemented on the Tesla C2050s. It is clearly seen that the blob-ELLR matrix method yields better speedups than the ELLR matrix method on the Tesla C2050s. Figure 12.9 compares the speedup factors of different methods on the Tesla C2050s over the standard method on the CPU for one iteration. It is clear that the blob-ELLR matrix method can reduce the memory requirement of the weighted matrix and yield the best performance compared with the ELLR matrix and the standard methods on GPU platforms.

Figure 12.8 Speedup factors of the different matrix methods (ELLR and blob-ELLR) for one iteration over the standard method on the Tesla C2050s.

c12f008

Figure 12.9 Speedup factors of the different matrix methods (standard, ELLR, and blob-ELLR) for one iteration on the Tesla C2050s.

c12f009

12.6 Summary

In this study, we present an adaptive simulatneous algebraic reconstruction technique, named ASART, to obtain high-quality reconstructions, and developed a multilevel parallel strategy for a blob-based reconstruction algorithm on GPU platform. The contributions of ASART include a scheme MMAS for the organization and sequence of access to projections, a strategy AAR for parameter adjustment to correct the discrepancy between real projections and computed ones in each iteration, and a strategy CSS to process the weights for each view to reduce memory requirement and computation time. Experimental results show than ASART can obtain high-quality 3D reconstruction of ET under the condition of noisy and incomplete data. In the multilevel parallel strategy, the asynchronous communication scheme is used to minimize the idle GPU time. The blob-ELLR structure only needs nearly c12-math-0289 of the storage space in comparison with the ELLR storage structure and yields significant acceleration compared to the standard and ELLR matrix methods.

Acknowledgments

We would like to thank Professor Fei Sun in Institute of Biophysics, Chinese Academy of Sciences for providing the experimental datasets. This work supported by Chinese Academy of Sciences knowledge innovation key project (KGCX1-YW-13), and National Natural Science Foundation of China grant 61232001, 61202210 and 61103139.

References

1. Frank J, Three-Dimensional Electron Microscopy of Macromolecular Assemblies. Visualization of Biological Molecules in Their Native StateNature, Oxford Univ. Press, New York, 2006.

2. Henderson R, Realising the potential of electron cryomicroscopy, Q. Rev. Biophys. 37:3–13 (2004).

3. Zhou ZH, Towards atomic resolution structural determination by singleparticle cryo-electron microscopy realising the potential of electron cryomicroscopy, Curr. Opin. Struct. Biol. 18:218–228 (2008).

4. Lucic U, Foerster F, Baumeister W, Structural studies by electron tomography: From cells to molecules, Annu. Rev. Biochem. 74:833–865 (2004).

5. Robinson CV, Sali A, Baumeister W, The molecular sociology of the cell, Nature 450:973–982 (2007).

6. Marabini R, Rietzel E, Schroeder E, Herman G, Carazo J, Three-dimensional reconstruction from reduced sets of very noisy images acquired following a single-axis tilt schema: Application of a new three-dimensional reconstruction algorithm and objective comparison with weighted backprojection. The molecular sociology of the cell, J. Struct. Biol. 120:363–371 (1997).

7. Herman GT, Fundamentals of Computerized Tomography: Image Reconstruction from Projections, 2nd ed., Springer Press, London, 2009.

8. Fernandez JJ, Sorzano C, Marabini R, Carazo J, Image processing and 3-D reconstruction in electron microscopy, IEEE Signal Process. Mag. 23(3):84–94 (2006).

9. Radermacher M, Weighted back-projection methods, in Electron Tomography: Methods for Three-Dimensional Visualization of Structures in the Cell, 2nd ed., Springer, New York, 2006, Chap. 8.

10. Fernandez JJ, Lawrence A, Roca J, Garca J, Ellisman M, Carazo J, High performance electron tomography of complex biological specimens, J. Struct. Biol. 138:6–20 (2002).

11. Marabini R, Herman G, Carazo J, 3D reconstruction in electron microscopy using ART with smooth spherically symmetric volume elements (blobs), Ultramicroscopy 72:53–65 (1998).

12. Bilbao-Castro JR, Marabini R, Sorzano C, Garcia I, Carazo J, Fernandez JJ, Exploiting desktop supercomputing for three-dimensional electron microscopy reconstructions using ART with blobs, Ultramicroscopy 165:19–26 (2009).

13. Sorzano C, Marabini R, Boisset N, Rietzel E, Schroder R, Herman G, Carazo J, The effect of overabundant projection directions on 3D reconstruction algorithms, J. Struct. Biol. 133(2–3):108–118 (2001).

14. Castano-Diez D, Mueller H, Frangakis A, Implementation and performance evaluation of reconstruction algorithms on graphics processors, J. Struct. Biol. 157:288–295 (2007).

15. Censor Y, Finite series-expansion reconstruction methods, Proc. IEEE 71(3):409–419 (1983).

16. Lewitt RM, Alternative to voxels for image representation in iterative reconstruction algorithms, Phys. Med. Biol. 37:705–716 (1992).

17. Matej S, Lewitt RM, Efficient 3D grids for image-reconstruction using spherically-symmetrical volume elements, IEEE Trans. Nucl. Sci. 42:1361–1370 (1995).

18. Andersen AH, Kak AC, Simultaneous algebraic reconstruction technique (SART): A superior implementation of the ART algorithm, Ultrasonic Imag. 6:81–94 (1984).

19. Herman GT, Meyer L, Algebraic reconstruction can be made computationally efficient, IEEE Trans. Med. Imag. 12(3):600–609 (1993).

20. Frank J, Electron Tomography: Methods for Three-Dimensional Visualization of Structures in the Cell, 2nd ed., Springer, New York, 2006.

21. Guan H, Gordon R, A projection access order for speedy convergence of ART (algebraic reconstruction technique): A multilevel scheme for computed tomography, Phy. Med. Biol. 39:2005–2022 (1994).

22. Mueller K, Yagel R, Cornhill JF, The weighted-distance scheme: A globally optimizing projection ordering method for ART, IEEE Trans. Med. Imag. 16(2):223–230 (2002).

23. Kazantsev I, Matej S, Lewitt RM, Optimal ordering of projections using permutation matrices and angles between projection subspaces, Electron. Notes Discrete Math. 20:205–216 (2005).

24. Wenkai L, Adaptive algebraic reconstruction technique, Med. Phys. 31:3222–3230 (2004).

25. Wan XH, Zhang F, Liu ZY, Modified simultaneous algebraic reconstruction technique and its parallelization in cryo-electron tomography, Proc. Int. Conf. Parallel and Distributed Systems 2009, pp. 384–390.

26. Guan H, Gordon R, Computed tomography using algebraic reconstruction techniques (ARTs) with different projection access schemes: A comparison study under practical situations, Phys. Med. Biol. 41:1727–1743 (1996).

27. Guan H, Gordon R, Zhu Y, Combining various projection access schemes with the algebraic reconstruction technique for low-contrast detection in computed tomography, Phys. Med. Biol. 43:2413–2421 (1998).

28. Guan H, Yin FF, Zhu Y, Kim JH, Adaptive portal CT reconstruction: A simulation study, Med. Phys. 27:2209–2214 (2000).

29. Jiang M, Wang G, Convergence studies on iterative algorithms for image reconstruction, IEEE Trans. Med. Imag. 22:569–575 (2002).

30. Fernandez JJ, Carazo J, Garca I, Three-dimensional reconstruction of cellular structures by electron microscope tomography and parallel computing, J. Parallel Distrib. Comput. 64:285–300 (2004).

31. Bisseling RH, Parallel Scientific Computation, Oxford Univ. Press, New York, 2004.

32. Vazquez F, Garzon EM, Fernandez JJ, A new approach for sparse matrix vector product on NVIDIA GPUs, Concurr. Comput.: Pract. Exper. 23:815–826 (2011).

33. Vazquez F, Garzon EM, Fernandez JJ, A matrix approach to tomographic reconstruction and its implementation on GPUs, J. Struct. Biol. 170:146–151 (2010).

34. Sun S, Zhang K, Xu W, Wang G, Chen J, Sun F, 3D structural investigation of caveolae from porcine aorta endothelial cell by electron tomography, Progress Biochem. Biophys. 36(6):729–735 (2009).

35. Kremer JR, Mastronarde DN, McIntosh JR, Computer visualization of three-dimensional image data using IMOD, J. Struct. Biol. 116:71–76 (1996).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.22.51.241