0%

Get to grips with various performance improvement techniques such as concurrency, lock-free programming, atomic operations, parallelism, and memory management

Key Features

  • Understand the limitations of modern CPUs and their performance impact
  • Find out how you can avoid writing inefficient code and get the best optimizations from the compiler
  • Learn the tradeoffs and costs of writing high-performance programs

Book Description

The great free lunch of "performance taking care of itself" is over. Until recently, programs got faster by themselves as CPUs were upgraded, but that doesn't happen anymore. The clock frequency of new processors has almost peaked. New architectures provide small improvements to existing programs, but this only helps slightly. Processors do get larger and more powerful, but most of this new power is consumed by the increased number of processing cores and other "extra" computing units. To write efficient software, you now have to know how to program by making good use of the available computing resources, and this book will teach you how to do that.

The book covers all the major aspects of writing efficient programs, such as using CPU resources and memory efficiently, avoiding unnecessary computations, measuring performance, and how to put concurrency and multithreading to good use. You'll also learn about compiler optimizations and how to use the programming language (C++) more efficiently. Finally, you'll understand how design decisions impact performance.

By the end of this book, you'll not only have enough knowledge of processors and compilers to write efficient programs, but you'll also be able to understand which techniques to use and what to measure while improving performance. At its core, this book is about learning how to learn.

What you will learn

  • Discover how to use the hardware computing resources in your programs effectively
  • Understand the relationship between memory order and memory barriers
  • Familiarize yourself with the performance implications of different data structures and organizations
  • Assess the performance impact of concurrent memory accessed and how to minimize it
  • Discover when to use and when not to use lock-free programming techniques
  • Explore different ways to improve the effectiveness of compiler optimizations
  • Design APIs for concurrent data structures and high-performance data structures to avoid inefficiencies

Who this book is for

This book is for experienced developers and programmers who work on performance-critical projects and want to learn different techniques to improve the performance of their code. Programmers who belong to algorithmic trading, gaming, bioinformatics, computational genomics, or computational fluid dynamics communities can learn various techniques from this book and apply them in their domain of work.

Although this book uses the C++ language, the concepts demonstrated in the book can be easily transferred or applied to other compiled languages such as C, Java, Rust, Go, and more.

Table of Contents

  1. The Art of Writing Efficient Programs
  2. Contributors
  3. About the author
  4. About the reviewer
  5. Preface
    1. Who is this book for?
    2. What this book covers
    3. To get the most out of this book
    4. Download the example code files
    5. Download the color images
    6. Conventions used
    7. Get in touch
    8. Share Your Thoughts
  6. Section 1 – Performance Fundamentals
  7. Chapter 1: Introduction to Performance and Concurrency
    1. Why focus on performance?
    2. Why performance matters
    3. What is performance?
    4. Performance as throughput
    5. Performance as power consumption
    6. Performance for real-time applications
    7. Performance as dependent on context
    8. Evaluating, estimating, and predicting performance
    9. Learning about high performance
    10. Summary
    11. Questions
  8. Chapter 2: Performance Measurements
    1. Technical requirements
    2. Performance measurements by example
    3. Performance benchmarking
    4. C++ chrono timers
    5. High-resolution timers
    6. Performance profiling
    7. The perf profiler
    8. Detailed profiling with perf
    9. The Google Performance profiler
    10. Profiling with call graphs
    11. Optimization and inlining
    12. Practical profiling
    13. Micro-benchmarking
    14. Basics of micro-benchmarking
    15. Micro-benchmarking and compiler optimizations
    16. Google Benchmark
    17. Micro-benchmarks are lies
    18. Summary
    19. Questions
  9. Chapter 3: CPU Architecture, Resources, and Performance
    1. Technical requirements
    2. The performance begins with the CPU
    3. Probing performance with micro-benchmarks
    4. Visualizing instruction-level parallelism
    5. Data dependencies and pipelining
    6. Pipelining and branches
    7. Branch prediction
    8. Profiling for branch mispredictions
    9. Speculative execution
    10. Optimization of complex conditions
    11. Branchless computing
    12. Loop unrolling
    13. Branchless selection
    14. Branchless computing examples
    15. Summary
    16. Questions
  10. Chapter 4: Memory Architecture and Performance
    1. Technical requirements
    2. The performance begins with the CPU but does not end there
    3. Measuring memory access speed
    4. Memory architecture
    5. Measuring memory and cache speeds
    6. The speed of memory: the numbers
    7. The speed of random memory access
    8. The speed of sequential memory access
    9. Memory performance optimizations in hardware
    10. Optimizing memory performance
    11. Memory-efficient data structures
    12. Profiling memory performance
    13. Optimizing algorithms for memory performance
    14. The ghost in the machine
    15. What is Spectre?
    16. Spectre by example
    17. Spectre, unleashed
    18. Summary
    19. Questions
  11. Chapter 5: Threads, Memory, and Concurrency
    1. Technical requirements
    2. Understanding threads and concurrency
    3. What is a thread?
    4. Symmetric multi-threading
    5. Threads and memory
    6. Memory-bound programs and concurrency
    7. Understanding the cost of memory synchronization
    8. Why data sharing is expensive
    9. Learning about concurrency and order
    10. The need for order
    11. Memory order and memory barriers
    12. Memory order in C++
    13. Memory model
    14. Summary
    15. Questions
  12. Section 2 – Advanced Concurrency
  13. Chapter 6: Concurrency and Performance
    1. Technical requirements
    2. What is needed to use concurrency effectively?
    3. Locks, alternatives, and their performance
    4. Lock-based, lock-free, and wait-free programs
    5. Different locks for different problems
    6. Lock-based versus lock-free, what is the real difference?
    7. Building blocks for concurrent programming
    8. The basics of concurrent data structures
    9. Counters and accumulators
    10. Publishing protocol
    11. Smart pointers for concurrent programming
    12. Summary
    13. Questions
  14. Chapter 7: Data Structures for Concurrency
    1. Technical requirements
    2. What is a thread-safe data structure?
    3. The best kind of thread safety
    4. The real thread safety
    5. The thread-safe stack
    6. Interface design for thread safety
    7. Performance of mutex-guarded data structures
    8. Performance requirements for different uses
    9. Stack performance in detail
    10. Performance estimates for synchronization schemes
    11. Lock-free stack
    12. The thread-safe queue
    13. Lock-free queue
    14. Non-sequentially consistent data structures
    15. Memory management for concurrent data structures
    16. The thread-safe list
    17. Lock-free list
    18. Summary
    19. Questions
  15. Chapter 8: Concurrency in C++
    1. Technical requirements
    2. Concurrency support in C++11
    3. Concurrency support in C++17
    4. Concurrency support in C++20
    5. The foundations of coroutines
    6. Coroutine C++ syntax
    7. Coroutine examples
    8. Summary
    9. Questions
  16. Section 3 – Designing and Coding High-Performance Programs
  17. Chapter 9: High-Performance C++
    1. Technical requirements
    2. What is the efficiency of a programming language?
    3. Unnecessary copying
    4. Copying and argument passing
    5. Copying as an implementation technique
    6. Copying to store data
    7. Copying of return values
    8. Using pointers to avoid copying
    9. How to avoid unnecessary copying
    10. Inefficient memory management
    11. Unnecessary memory allocations
    12. Memory management in concurrent programs
    13. Avoiding memory fragmentation
    14. Optimization of conditional execution
    15. Summary
    16. Questions
  18. Chapter 10: Compiler Optimizations in C++
    1. Technical requirements
    2. Compilers optimizing code
    3. Basics of compiler optimizations
    4. Function inlining
    5. What does the compiler really know?
    6. Lifting knowledge from runtime to compile time
    7. Summary
    8. Questions
  19. Chapter 11: Undefined Behavior and Performance
    1. Technical requirements
    2. What is undefined behavior?
    3. Why have undefined behavior?
    4. Undefined behavior and C++ optimization
    5. Using undefined behavior for efficient design
    6. Summary
    7. Questions
  20. Chapter 12: Design for Performance
    1. Technical requirements
    2. Interaction between the design and performance
    3. Design for performance
    4. The minimum information principle
    5. The maximum information principle
    6. API design considerations
    7. API design for concurrency
    8. Copying and sending data
    9. Design for optimal data access
    10. Performance trade-offs
    11. Interface design
    12. Component design
    13. Errors and undefined behavior
    14. Making informed design decisions
    15. Summary
    16. Questions
  21. Assessments
    1. Chapter 1:
    2. Chapter 2:
    3. Chapter 3:
    4. Chapter 4:
    5. Chapter 5:
    6. Chapter 6:
    7. Chapter 7:
    8. Chapter 8:
    9. Chapter 9:
    10. Chapter 10:
    11. Chapter 11:
    12. Chapter 12:
    13. Why subscribe?
  22. Other Books You May Enjoy
    1. Packt is searching for authors like you
    2. Share Your Thoughts
3.237.91.98