Cover

Table of Contents

Cover

Table of Contents

Title page

Copyright page

Dedication

Preface

ABOUT THIS BOOK

CHAPTER ORGANIZATION AND OVERVIEW

ACKNOWLEDGMENTS

COMMENTS AND SUGGESTIONS

List of Acronyms

Chapter 1 Introduction

1.1 INTRODUCTION

1.2 TOWARD AUTOMATING PARALLEL PROGRAMMING

1.3 ALGORITHMS

1.4 PARALLEL COMPUTING DESIGN CONSIDERATIONS

1.5 PARALLEL ALGORITHMS AND PARALLEL ARCHITECTURES

1.6 RELATING PARALLEL ALGORITHM AND PARALLEL ARCHITECTURE

1.7 IMPLEMENTATION OF ALGORITHMS: A TWO-SIDED PROBLEM

1.8 MEASURING BENEFITS OF PARALLEL COMPUTING

1.9 AMDAHL’S LAW FOR MULTIPROCESSOR SYSTEMS

1.10 GUSTAFSON–BARSIS’S LAW

1.11 APPLICATIONS OF PARALLEL COMPUTING

Chapter 2 Enhancing Uniprocessor Performance

2.1 INTRODUCTION

2.2 INCREASING PROCESSOR CLOCK FREQUENCY

2.3 PARALLELIZING ALU STRUCTURE

2.4 USING MEMORY HIERARCHY

2.5 PIPELINING

2.6 VERY LONG INSTRUCTION WORD (VLIW) PROCESSORS

2.7 INSTRUCTION-LEVEL PARALLELISM (ILP) AND SUPERSCALAR PROCESSORS

2.8 MULTITHREADED PROCESSOR

Chapter 3 Parallel Computers

3.1 INTRODUCTION

3.2 PARALLEL COMPUTING

3.3 SHARED-MEMORY MULTIPROCESSORS (UNIFORM MEMORY ACCESS [UMA])

3.4 DISTRIBUTED-MEMORY MULTIPROCESSOR (NONUNIFORM MEMORY ACCESS [NUMA])

3.5 SIMD PROCESSORS

3.6 SYSTOLIC PROCESSORS

3.7 CLUSTER COMPUTING

3.8 GRID (CLOUD) COMPUTING

3.9 MULTICORE SYSTEMS

3.10 SM

3.11 COMMUNICATION BETWEEN PARALLEL PROCESSORS

3.12 SUMMARY OF PARALLEL ARCHITECTURES

Chapter 4 Shared-Memory Multiprocessors

4.1 INTRODUCTION

4.2 CACHE COHERENCE AND MEMORY CONSISTENCY

4.3 SYNCHRONIZATION AND MUTUAL EXCLUSION

Chapter 5 Interconnection Networks

5.1 INTRODUCTION

5.2 CLASSIFICATION OF INTERCONNECTION NETWORKS BY LOGICAL TOPOLOGIES

5.3 INTERCONNECTION NETWORK SWITCH ARCHITECTURE

Chapter 6 Concurrency Platforms

6.1 INTRODUCTION

6.2 CONCURRENCY PLATFORMS

6.3 CILK++

6.4 OpenMP

6.5 COMPUTE UNIFIED DEVICE ARCHITECTURE (CUDA)

Chapter 7 Ad Hoc Techniques for Parallel Algorithms

7.1 INTRODUCTION

7.2 DEFINING ALGORITHM VARIABLES

7.3 INDEPENDENT LOOP SCHEDULING

7.4 DEPENDENT LOOPS

7.5 LOOP SPREADING FOR SIMPLE DEPENDENT LOOPS

7.6 LOOP UNROLLING

7.7 PROBLEM PARTITIONING

7.8 DIVIDE-AND-CONQUER (RECURSIVE PARTITIONING) STRATEGIES

7.9 PIPELINING

Chapter 8 Nonserial–Parallel Algorithms

8.1 INTRODUCTION

8.2 COMPARING DAG AND DCG ALGORITHMS

8.3 PARALLELIZING NSPA ALGORITHMS REPRESENTED BY A DAG

8.4 FORMAL TECHNIQUE FOR ANALYZING NSPAs

8.5 DETECTING CYCLES IN THE ALGORITHM

8.6 EXTRACTING SERIAL AND PARALLEL ALGORITHM PERFORMANCE PARAMETERS

8.7 USEFUL THEOREMS

8.8 PERFORMANCE OF SERIAL AND PARALLEL ALGORITHMS ON PARALLEL COMPUTERS

Chapter 9 z-Transform Analysis

9.1 INTRODUCTION

9.2 DEFINITION OF z-TRANSFORM

9.3 THE 1-D FIR DIGITAL FILTER ALGORITHM

9.4 SOFTWARE AND HARDWARE IMPLEMENTATIONS OF THE z-TRANSFORM

9.5 DESIGN 1: USING HORNER’S RULE FOR BROADCAST INPUT AND PIPELINED OUTPUT

9.6 DESIGN 2: PIPELINED INPUT AND BROADCAST OUTPUT

9.7 DESIGN 3: PIPELINED INPUT AND OUTPUT

Chapter 10 Dependence Graph Analysis

10.1 INTRODUCTION

10.2 THE 1-D FIR DIGITAL FILTER ALGORITHM

10.3 THE DEPENDENCE GRAPH OF AN ALGORITHM

10.4 DERIVING THE DEPENDENCE GRAPH FOR AN ALGORITHM

10.5 THE SCHEDULING FUNCTION FOR THE 1-D FIR FILTER

10.6 NODE PROJECTION OPERATION

10.7 NONLINEAR PROJECTION OPERATION

10.8 SOFTWARE AND HARDWARE IMPLEMENTATIONS OF THE DAG TECHNIQUE

Chapter 11 Computational Geometry Analysis

11.1 INTRODUCTION

11.2 MATRIX MULTIPLICATION ALGORITHM

11.3 THE 3-D DEPENDENCE GRAPH AND COMPUTATION DOMAIN x1D49F_EuclidMathOne-Bold_11n_000100

11.4 THE FACETS AND VERTICES OF x1D49F_EuclidMathOne-Bold_11n_000100

11.5 THE DEPENDENCE MATRICES OF THE ALGORITHM VARIABLES

11.6 NULLSPACE OF DEPENDENCE MATRIX: THE BROADCAST SUBDOMAIN B

11.7 DESIGN SPACE EXPLORATION: CHOICE OF BROADCASTING VERSUS PIPELINING VARIABLES

11.8 DATA SCHEDULING

11.9 PROJECTION OPERATION USING THE LINEAR PROJECTION OPERATOR

11.10 EFFECT OF PROJECTION OPERATION ON DATA

11.11 THE RESULTING MULTITHREADED/MULTIPROCESSOR ARCHITECTURE

11.12 SUMMARY OF WORK DONE IN THIS CHAPTER

Chapter 12 Case Study: One-Dimensional IIR Digital Filters

12.1 INTRODUCTION

12.2 THE 1-D IIR DIGITAL FILTER ALGORITHM

12.3 THE IIR FILTER DEPENDENCE GRAPH

12.4 z-DOMAIN ANALYSIS OF 1-D IIR DIGITAL FILTER ALGORITHM

Chapter 13 Case Study: Two- and Three-Dimensional Digital Filters

13.1 INTRODUCTION

13.2 LINE AND FRAME WRAPAROUND PROBLEMS

13.3 2-D RECURSIVE FILTERS

13.4 3-D DIGITAL FILTERS

Chapter 14 Case Study: Multirate Decimators and Interpolators

14.1 INTRODUCTION

14.2 DECIMATOR STRUCTURES

14.3 DECIMATOR DEPENDENCE GRAPH

14.4 DECIMATOR SCHEDULING

14.5 DECIMATOR DAG FOR s1 = [1 0]

14.6 DECIMATOR DAG FOR s2 = [1 −1]

14.7 DECIMATOR DAG FOR s3 = [1 1]

14.8 POLYPHASE DECIMATOR IMPLEMENTATIONS

14.9 INTERPOLATOR STRUCTURES

14.10 INTERPOLATOR DEPENDENCE GRAPH

14.11 INTERPOLATOR SCHEDULING

14.12 INTERPOLATOR DAG FOR s1 = [1 0]

14.13 INTERPOLATOR DAG FOR s2 = [1 −1]

14.14 INTERPOLATOR DAG FOR s3 = [1 1]

14.15 POLYPHASE INTERPOLATOR IMPLEMENTATIONS

Chapter 15 Case Study: Pattern Matching

15.1 INTRODUCTION

15.2 EXPRESSING THE ALGORITHM AS A REGULAR ITERATIVE ALGORITHM (RIA)

15.3 OBTAINING THE ALGORITHM DEPENDENCE GRAPH

15.4 DATA SCHEDULING

15.5 DAG NODE PROJECTION

15.6 DESIGN 1: DESIGN SPACE EXPLORATION WHEN s = [1 1]t

15.7 DESIGN 2: DESIGN SPACE EXPLORATION WHEN s = [1 −1]t

15.8 DESIGN 3: DESIGN SPACE EXPLORATION WHEN s = [1 0]t

Chapter 16 Case Study: Motion Estimation for Video Compression

16.1 INTRODUCTION

16.2 FBMAS

16.3 DATA BUFFERING REQUIREMENTS

16.4 FORMULATION OF THE FBMA

16.5 HIERARCHICAL FORMULATION OF MOTION ESTIMATION

16.6 HARDWARE DESIGN OF THE HIERARCHY BLOCKS

Chapter 17 Case Study: Multiplication over GF(2m)

17.1 INTRODUCTION

17.2 THE MULTIPLICATION ALGORITHM IN GF(2m)

17.3 EXPRESSING FIELD MULTIPLICATION AS AN RIA

17.4 FIELD MULTIPLICATION DEPENDENCE GRAPH

17.5 DATA SCHEDULING

17.6 DAG NODE PROJECTION

17.7 DESIGN 1: USING d1 = [1 0]t

17.8 DESIGN 2: USING d2 = [1 1]t

17.9 DESIGN 3: USING d3 = [1 −1]t

17.10 APPLICATIONS OF FINITE FIELD MULTIPLIERS

Chapter 18 Case Study: Polynomial Division over GF(2)

18.1 INTRODUCTION

18.2 THE POLYNOMIAL DIVISION ALGORITHM

18.3 THE LFSR DEPENDENCE GRAPH

18.4 DATA SCHEDULING

18.5 DAG NODE PROJECTION

18.6 DESIGN 1: DESIGN SPACE EXPLORATION WHEN s1 = [1 −1]

18.7 DESIGN 2: DESIGN SPACE EXPLORATION WHEN s2 = [1 0]

18.8 DESIGN 3: DESIGN SPACE EXPLORATION WHEN s3 = [1 −0.5]

18.9 COMPARING THE THREE DESIGNS

Chapter 19 The Fast Fourier Transform

19.1 INTRODUCTION

19.2 DECIMATION-IN-TIME FFT

19.3 PIPELINE RADIX-2 DECIMATION-IN-TIME FFT PROCESSOR

19.4 DECIMATION-IN-FREQUENCY FFT

19.5 PIPELINE RADIX-2 DECIMATION-IN-FREQUENCY FFT PROCESSOR

Chapter 20 Solving Systems of Linear Equations

20.1 INTRODUCTION

20.2 SPECIAL MATRIX STRUCTURES

20.3 FORWARD SUBSTITUTION (DIRECT TECHNIQUE)

20.4 BACK SUBSTITUTION

20.5 MATRIX TRIANGULARIZATION ALGORITHM

20.6 SUCCESSIVE OVER RELAXATION (SOR) (ITERATIVE TECHNIQUE)

Chapter 21 Solving Partial Differential Equations Using Finite Difference Method

21.1 INTRODUCTION

21.2 FDM FOR 1-D SYSTEMS

References

Index

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.135.4