Chapter 21

Conclusion and outlook

Abstract

This chapter summarizes the main parts of the book. It then concludes the book by offering an outlook of how parallel programming will continue to contribute to the new innovations in science and technology.

Keywords

computational thinking; parallel patterns; golden age of computing; self-driving cars; individualized medicine

You made it! We have arrived at the finish line. In this final chapter, we will briefly review the learning goals that you have achieved through this book. Instead of drawing a conclusion, we will offer our vision for the future of massively parallel computing and how its advancements will impact the future course of science and technology.

21.1 Goals Revisited

As we stated in the Introduction, our primary goal is to teach you, the reader, how to program massively parallel processors. We promised that it would become easy once you develop the right intuition and go about it the right way. In particular, we promised to focus on computational thinking skills that would enable you to think about problems in ways that are amenable to parallel computing.

We delivered on these promises through four steps. In step one, Chapters 24, Data parallel computing, Scalable parallel execution, and Memory and data locality, introduces the essential concepts of parallel computing and CUDA C. Chapter 5, Performance considerations, introduces the key performance considerations in developing massively parallel code in CUDA. These chapters also introduce the pertinent computer architecture concepts needed to understand the hardware limitations that must be addressed in high-performance parallel programming. With this knowledge, developers can be confident in writing their parallel code and reason about the relative merit of alternative threading arrangements, loop structures, and coding styles.

The second step is to introduce six major parallel patterns (see chapters: Parallel patterns: convolution, Parallel patterns: prefix sum, Parallel patterns—parallel histogram computation, Parallel patterns: sparse matrix computation, Parallel patterns: merge sort, and Parallel patterns: graph search) that have been proven useful in introducing parallelism into many applications. These chapters cover the concepts behind the most useful patterns of parallel computation. Each pattern is illustrated with concrete code examples. Each pattern is also used to introduce important techniques for overcoming frequently encountered performance obstacles in parallel programming.

The third step is to reinforce the knowledge with high-level thinking in parallel programming. The first part is an introduction to dynamic parallelism (see chapter 13: CUDA dynamic parallelism) that allows parallel programmers to more easily address more complex parallel algorithms with dynamically varying workload in many real-world applications. The second part consists of three detailed application case studies (see chapters 14, 15, and 16: Application case study—non-Cartesian magnetic resonance imaging, Application case study—molecular visualization and analysis, and Application case study—machine learning) that show how the parallel programming techniques presented in this book can be applied to real applications. The third part is a chapter dedicated to computational thinking skills (see chapter 17: Parallel programming and computational thinking) that help the reader to generalize the concepts learned in the previous chapters into the high-level thinking required to tackle a new problem. With these insights, high-performance parallel programming becomes a well-structured thought process, rather than a black art.

The fourth step is to expose the reader to related parallel programming activities. Chapter 18, Programming a heterogeneous computing cluster presents the basic skills required to program an HPC cluster using MPI and CUDA C. Chapter 19, Parallel programming with OpenACC is an introduction to parallel programming using OpenACC, where the compiler does most of the detailed heavy-lifting. While this approach alleviates the need for the programmer to write detailed kernel code and data transfer code, the reader is in a much better position to give the compiler good directions with all the skills covered by this book. Chapter 20, More on CUDA and GPU Computing, provides further insight, and wraps up some loose ends left from earlier in the book. To help you to branch out to other programming models, we further introduce OpenCL (Appendix A), Thrust (Appendix B), CUDA FORTRAN (Appendix C), C++ AMP (Appendix D). In each case, we explain how the programming model/language relates to CUDA and how you can apply the skills you learned based on CUDA to these models/languages.

We hope that you have enjoyed the book and agree with us that you are now well equipped for programming massively parallel computing systems.

21.2 Future Outlook

Since the introduction of the first CUDA-enabled GPU G80 in 2007, the capability of GPUs as massively computing devices has improved at an amazing 12×in computing throughput and 8×in memory bandwidth. These advancements have stimulated tremendous progress in science, engineering, financing, and big data analytics. For example, as we have seen in Chapter 16, Application case study—machine learning, GPUs have ignited a revolution in deep learning from very large data sets, with applications in image recognition, speech recognition, and video analytics.

Since the first edition of this book in 2010, the field of parallel computing has also advanced at an amazing pace. The spectrum of problems that can be solved with scalable algorithms has broadened significantly. While the use of GPUs was initially concentrated on regular, dense matrix computation and Monte Carlo methods, their use has quickly expanded into sparse methods, graph computation, and adaptive refinement methods. In many areas, there has also been fast advancement in algorithms. Some of the algorithms presented in the parallel pattern chapters represent significant recent advancements.

It is only natural for some of us to wonder if we have reached the end of the fast advancement in parallel computing. From all indications, the answer is a definite no. We are only at the beginning of the parallel computing revolution. The amazing advancement in computing in the past three decades has triggered a paradigm shift in the industry. The major innovations used to be driven by physical instruments assisted by computing devices. They are now driven by computing assisted by physical instruments.

For example, the semiconductor industry used to rely on advancement in physical light sources assisted by computing methods that enforce design rules in their push to reduce the device feature size in the manufacturing process. Today, the advancement in physical light sources has practically stopped. The advancement in feature size reduction is primarily driven by lithography masks that are computationally designed to orchestrate the interference of light waves to result in extremely precise etching patterns on the chips.

For another example, two decades ago, GPS revolutionized the way we drive. GPS is primarily based on satellite signal sensing assisted by computing methods that determine the shortest path between two locations, using algorithms similar to the one we showed in Chapter 12, Parallel patterns—graph search. Today, the most exciting revolution in the automobile industry is self-driving cars, which is primarily based on machine-learning computing methods assisted by physical sensors.

For yet another example, MRI and PET revolutionized medicine in the past two decades. These technologies are primarily based on electromagnetic and light sensors assisted by computational image reconstruction methods. They allow doctors to see the pathology inside human bodies without surgery. Today, the field of medicine is going through the revolution of individualized medicine, which is primarily driven by computational genomics methods assisted by sequencing sensors.

The same kind of paradigm shift has been taking place in many other areas. Computing has become the primary driving force for virtually all exciting innovations in our society. This has created an insatiable demand for faster computing systems. As we discussed in Chapter 1, Introduction, parallel computing is the only viable approach to the growth of computing performance. This powerful demand will continue to motivate the industry to innovate and create more powerful parallel computing devices.

In conclusion, we are at the dawn of a golden age of computing. The industry will continue to recruit and reward highly skilled parallel programmers. Your work will make a real difference in the field of your choice.

Enjoy the ride!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.216.75