i
i
i
i
i
i
i
i
2.2. PARALLEL PROGRAMMING MODELS 31
Did you know?
In the Amdahl’s law, there is an ambiguity as to what should constitute as the
execution time of a sequential execution of a program (T
1
). The choices include:
1. The same parallel program used in parallel execution, but it runs with a single
thread on a single processor.
2. The sequential version of the program.
3. The best sequential version of the program.
The first choice is incorrect to use; the sequential execution should not include
parallel overheads, such as thread creation, calls to parallel libraries, etc. For ex-
ample, when a program containing OpenMP directives is compiled with an OpenMP
compiler, it includes calls to OpenMP libraries even when it runs on a single thread.
However, using thread-safe libraries, which include the overheads of entering and ex-
iting critical sections, for sequential execution may be acceptable if they are the only
libraries that are available.
The second choice of using the sequential version of the program is more correct
since parallel overheads are not included, but care must also be taken. A potential
pitfall of the choice is whether the sequential version of the program is a realistic
implementation for sequential execution.
The correct choice is the last option, which uses the best sequential version of
the program to measure T
1
, both algorithm-wise and code-wise. For example, some
algorithms are developed for the purpose of extracting parallelism, but it may not
be the best version chosen for a sequential implementation. The second aspect is
whether the code is the best sequential code given the algorithm. For example, some
aggressive compiler optimizations that can benefit sequential execution are often not
performed when the program is compiled for parallel execution, but they can produce
a better performing sequential code when they are applied.
2.2 Parallel Programming Models
A programming model is an abstraction of the hardware that programmers can assume. It deter-
mines how easily programmers can specify their algorithms into computation tasks that the hard-
ware and compiler understand, and how efficiently those tasks can be executed by the hardware.
For non-parallel systems, the sequential programming model has worked really well in hiding the
hardware details from programmers, allowing programmers to effectively express their algorithms
in an intuitive manner as sequence of steps which are executed sequentially. In contrast, for mul-
tiprocessor systems, the programming model that can hide hardware details from the programmer
but simultaneously achieve high efficiency remains elusive.
There are at least two major parallel programming models that are widely used: shared memory
and message passing programming models. Both programming models are widely used, with the
message passing model more popular for large machines consisting of hundreds to thousands of
processor cores, and the shared memory model more popular for smaller machines. There are also
other programming models; we will defer their discussion to the later part of the chapter.
Let us define a parallel task as a unit of computation that can be executed independently from
others. Parallel tasks can run on different processors or cores. Figure 2.3 illustrates the features
of the programming models. In the shared memory model, the abstraction is that parallel tasks
executed by separate threads or processes can access any location of the memory. Hence, they com-