Table of Contents
CHAPTER ORGANIZATION AND OVERVIEW
1.2 TOWARD AUTOMATING PARALLEL PROGRAMMING
1.4 PARALLEL COMPUTING DESIGN CONSIDERATIONS
1.5 PARALLEL ALGORITHMS AND PARALLEL ARCHITECTURES
1.6 RELATING PARALLEL ALGORITHM AND PARALLEL ARCHITECTURE
1.7 IMPLEMENTATION OF ALGORITHMS: A TWO-SIDED PROBLEM
1.8 MEASURING BENEFITS OF PARALLEL COMPUTING
1.9 AMDAHL’S LAW FOR MULTIPROCESSOR SYSTEMS
1.11 APPLICATIONS OF PARALLEL COMPUTING
Chapter 2 Enhancing Uniprocessor Performance
2.2 INCREASING PROCESSOR CLOCK FREQUENCY
2.3 PARALLELIZING ALU STRUCTURE
2.6 VERY LONG INSTRUCTION WORD (VLIW) PROCESSORS
2.7 INSTRUCTION-LEVEL PARALLELISM (ILP) AND SUPERSCALAR PROCESSORS
3.3 SHARED-MEMORY MULTIPROCESSORS (UNIFORM MEMORY ACCESS [UMA])
3.4 DISTRIBUTED-MEMORY MULTIPROCESSOR (NONUNIFORM MEMORY ACCESS [NUMA])
3.11 COMMUNICATION BETWEEN PARALLEL PROCESSORS
3.12 SUMMARY OF PARALLEL ARCHITECTURES
Chapter 4 Shared-Memory Multiprocessors
4.2 CACHE COHERENCE AND MEMORY CONSISTENCY
4.3 SYNCHRONIZATION AND MUTUAL EXCLUSION
Chapter 5 Interconnection Networks
5.2 CLASSIFICATION OF INTERCONNECTION NETWORKS BY LOGICAL TOPOLOGIES
5.3 INTERCONNECTION NETWORK SWITCH ARCHITECTURE
Chapter 6 Concurrency Platforms
6.5 COMPUTE UNIFIED DEVICE ARCHITECTURE (CUDA)
Chapter 7 Ad Hoc Techniques for Parallel Algorithms
7.2 DEFINING ALGORITHM VARIABLES
7.3 INDEPENDENT LOOP SCHEDULING
7.5 LOOP SPREADING FOR SIMPLE DEPENDENT LOOPS
7.8 DIVIDE-AND-CONQUER (RECURSIVE PARTITIONING) STRATEGIES
Chapter 8 Nonserial–Parallel Algorithms
8.2 COMPARING DAG AND DCG ALGORITHMS
8.3 PARALLELIZING NSPA ALGORITHMS REPRESENTED BY A DAG
8.4 FORMAL TECHNIQUE FOR ANALYZING NSPAs
8.5 DETECTING CYCLES IN THE ALGORITHM
8.6 EXTRACTING SERIAL AND PARALLEL ALGORITHM PERFORMANCE PARAMETERS
8.8 PERFORMANCE OF SERIAL AND PARALLEL ALGORITHMS ON PARALLEL COMPUTERS
Chapter 9 z-Transform Analysis
9.3 THE 1-D FIR DIGITAL FILTER ALGORITHM
9.4 SOFTWARE AND HARDWARE IMPLEMENTATIONS OF THE z-TRANSFORM
9.5 DESIGN 1: USING HORNER’S RULE FOR BROADCAST INPUT AND PIPELINED OUTPUT
9.6 DESIGN 2: PIPELINED INPUT AND BROADCAST OUTPUT
9.7 DESIGN 3: PIPELINED INPUT AND OUTPUT
Chapter 10 Dependence Graph Analysis
10.2 THE 1-D FIR DIGITAL FILTER ALGORITHM
10.3 THE DEPENDENCE GRAPH OF AN ALGORITHM
10.4 DERIVING THE DEPENDENCE GRAPH FOR AN ALGORITHM
10.5 THE SCHEDULING FUNCTION FOR THE 1-D FIR FILTER
10.6 NODE PROJECTION OPERATION
10.7 NONLINEAR PROJECTION OPERATION
10.8 SOFTWARE AND HARDWARE IMPLEMENTATIONS OF THE DAG TECHNIQUE
Chapter 11 Computational Geometry Analysis
11.2 MATRIX MULTIPLICATION ALGORITHM
11.3 THE 3-D DEPENDENCE GRAPH AND COMPUTATION DOMAIN
11.4 THE FACETS AND VERTICES OF
11.5 THE DEPENDENCE MATRICES OF THE ALGORITHM VARIABLES
11.6 NULLSPACE OF DEPENDENCE MATRIX: THE BROADCAST SUBDOMAIN B
11.7 DESIGN SPACE EXPLORATION: CHOICE OF BROADCASTING VERSUS PIPELINING VARIABLES
11.9 PROJECTION OPERATION USING THE LINEAR PROJECTION OPERATOR
11.10 EFFECT OF PROJECTION OPERATION ON DATA
11.11 THE RESULTING MULTITHREADED/MULTIPROCESSOR ARCHITECTURE
11.12 SUMMARY OF WORK DONE IN THIS CHAPTER
Chapter 12 Case Study: One-Dimensional IIR Digital Filters
12.2 THE 1-D IIR DIGITAL FILTER ALGORITHM
12.3 THE IIR FILTER DEPENDENCE GRAPH
12.4 z-DOMAIN ANALYSIS OF 1-D IIR DIGITAL FILTER ALGORITHM
Chapter 13 Case Study: Two- and Three-Dimensional Digital Filters
13.2 LINE AND FRAME WRAPAROUND PROBLEMS
Chapter 14 Case Study: Multirate Decimators and Interpolators
14.3 DECIMATOR DEPENDENCE GRAPH
14.5 DECIMATOR DAG FOR s1 = [1 0]
14.6 DECIMATOR DAG FOR s2 = [1 −1]
14.7 DECIMATOR DAG FOR s3 = [1 1]
14.8 POLYPHASE DECIMATOR IMPLEMENTATIONS
14.10 INTERPOLATOR DEPENDENCE GRAPH
14.12 INTERPOLATOR DAG FOR s1 = [1 0]
14.13 INTERPOLATOR DAG FOR s2 = [1 −1]
14.14 INTERPOLATOR DAG FOR s3 = [1 1]
14.15 POLYPHASE INTERPOLATOR IMPLEMENTATIONS
Chapter 15 Case Study: Pattern Matching
15.2 EXPRESSING THE ALGORITHM AS A REGULAR ITERATIVE ALGORITHM (RIA)
15.3 OBTAINING THE ALGORITHM DEPENDENCE GRAPH
15.6 DESIGN 1: DESIGN SPACE EXPLORATION WHEN s = [1 1]t
15.7 DESIGN 2: DESIGN SPACE EXPLORATION WHEN s = [1 −1]t
15.8 DESIGN 3: DESIGN SPACE EXPLORATION WHEN s = [1 0]t
Chapter 16 Case Study: Motion Estimation for Video Compression
16.3 DATA BUFFERING REQUIREMENTS
16.5 HIERARCHICAL FORMULATION OF MOTION ESTIMATION
16.6 HARDWARE DESIGN OF THE HIERARCHY BLOCKS
Chapter 17 Case Study: Multiplication over GF(2m)
17.2 THE MULTIPLICATION ALGORITHM IN GF(2m)
17.3 EXPRESSING FIELD MULTIPLICATION AS AN RIA
17.4 FIELD MULTIPLICATION DEPENDENCE GRAPH
17.7 DESIGN 1: USING d1 = [1 0]t
17.8 DESIGN 2: USING d2 = [1 1]t
17.9 DESIGN 3: USING d3 = [1 −1]t
17.10 APPLICATIONS OF FINITE FIELD MULTIPLIERS
Chapter 18 Case Study: Polynomial Division over GF(2)
18.2 THE POLYNOMIAL DIVISION ALGORITHM
18.3 THE LFSR DEPENDENCE GRAPH
18.6 DESIGN 1: DESIGN SPACE EXPLORATION WHEN s1 = [1 −1]
18.7 DESIGN 2: DESIGN SPACE EXPLORATION WHEN s2 = [1 0]
18.8 DESIGN 3: DESIGN SPACE EXPLORATION WHEN s3 = [1 −0.5]
18.9 COMPARING THE THREE DESIGNS
Chapter 19 The Fast Fourier Transform
19.3 PIPELINE RADIX-2 DECIMATION-IN-TIME FFT PROCESSOR
19.4 DECIMATION-IN-FREQUENCY FFT
19.5 PIPELINE RADIX-2 DECIMATION-IN-FREQUENCY FFT PROCESSOR
Chapter 20 Solving Systems of Linear Equations
20.2 SPECIAL MATRIX STRUCTURES
20.3 FORWARD SUBSTITUTION (DIRECT TECHNIQUE)
20.5 MATRIX TRIANGULARIZATION ALGORITHM
20.6 SUCCESSIVE OVER RELAXATION (SOR) (ITERATIVE TECHNIQUE)
Chapter 21 Solving Partial Differential Equations Using Finite Difference Method