affinity, 7, 19, 20, 21, 23, 35, 49, 51–54, 60, 89, 125
upc_forall with continue as affinity argument, 53
upc_forall with expression as affinity argument, 53
upc_forall with integer as affinity argument, 52
upc_forall with shared address as affinity argument, 52
ANSI C, see ISO C
backtracking, 62
barrier synchronization, see upc_barrier
block size, see data layout
blocking factor, see data layout
bulk transfers, see string functions
cache coherence, 117
collective function, 73
continue, 53
data decomposition, see domain decomposition data layout, 25, 51
*layout qualifier, 28
three-dimensional cells, 60
blocked array, 25, 27, 33, 35, 37, 50
default data distribution, see round-robin data distribution
default block size, 86
indefinite block size, 27, 83, 86, 88
multidimensional array, 54, 60, 77, 84
round-robin fashion, 6, 50, 79
deadlock, 105
distributed memory multiprocessors, see DMs
distributed shared memory, see DSM
domain decomposition, 49
dynamic memory allocation, 21
embarrassingly parallel, 67
examples
bakery algorithm, 110, 112, 113
dining philosophers, 106
heat conduction, 56, 58, 61, 62, 74, 80, 84, 93, 95, 129, 132
matrix-vector multiplication, 26, 29, 50, 52, 53, 81
parallel binary tree, 133
pointer representation, 41
sparse matrix compression, 87
temperature converter, 4, 6–8, 10, 11, 13
vector addition, 25
wavelet transform, 90
iterations, 59
ITPACK format, 86
locality, see affinity
lock, 9
malloc(), 82
memory access order, see memory consistency
memory allocation
dynamically allocated, 73
statically allocated, 61
memory consistency, 9, 108, 125
message-passing interface, see MPI
MPI, 19
nonblocking lock, see upc_lock_attempt
nonuniform memory accesses, see NUMAs
NP problem, 62
NPB benchmarks, 61
NUMAs, 117
OpenMP, 19
parallel vector processors, see PVPs
performance, 115
compilers, 122
optimization, 123
overhead, 120
run-time systems, 122
pointers, 33
*shared, 34
casting private to shared, 38
casting shared pointer to private, 38
casting shared to private, 43
casting shared to shared, 39
data representation, 37, 40, 42
handling functions, 42
private pointer to private, 14, 22, 33
private pointer to shared, 12, 22, 23, 34, 43, 46, 77, 79, 82, 83
shared pointer to lock, 104
shared pointer to private, 22, 34
shared pointer to shared, 14, 34, 82, 84
prefetching, 131
relaxed, see upc_relaxed.h
remote accesses aggregation, 131
shared arrays, 23
SIMD, 117
single program, multiple data model, see SPMD model
sparse matrix, 86
strict, see upc_strict.h
struct, see structures
structures, 23
symmetric multiprocessors, see SMPs
synchronization phase, 95
system architecture, 116
dynamic number, 30
static number, 30
UMA, 117
uniform memory accesses, see UMAs
upc_all_alloc, 73, 74, 77, 86, 89
upc_barrier, 8, 11, 59, 78, 92, 96, 99, 126
upc_forall, 11, 25, 44, 49, 51–53, 59, 89, 103, 104, 125
affinity argument, see affinity
upc_free, 89
upc_global_alloc, 78–80, 86, 89
upc_global_lock_alloc, 100, 104, 108
upc_local_alloc, see upc_alloc
upc_lock_attempt, 100, 106, 108
upc_lock_free, 100
upc_memcpy, 127
upc_memget, 127
upc_relaxed.h, 108, 109, 111, 125
upc_resetphase, 142
upc_strict.h, 108, 109, 111, 125
von Neumann, 17
UPC: Distributed Shared Memory Programming, by Tarek El-Ghazawi, William Carlson, Thomas Sterling, and Katherine Yelick
Copyright © 2005 John Wiley & Sons, Inc.
18.189.22.136