8.3 Categories of Multiprocessors

Multiprocessors in general-purpose computing have a long and rich history. Embedded multiprocessors have been widely deployed for several decades. The range of embedded multiprocessor implementations is also impressively broad—multiprocessing has been used for relatively low-performance systems and to achieve very high levels of real-time performance at very low energy levels.

Shared memory vs. message passing

There are two major types of multiprocessor architectures as illustrated in Figure 8.3:

Shared memory systems have a pool of processors (P1, P2, etc.) that can read and write a collection of memories (M1, M2, etc.).

Message passing systems have a pool of processors that can send messages to each other. Each processor has its own local memory.

image

Figure 8.3 The two major multiprocessor architectures.

Both shared memory and message passing machines use an interconnection network; the details of these networks may vary considerably. These two types are functionally equivalent—we can turn a program written for one style of machine into an equivalent program for the other style. We may choose to build one or the other based on a variety of considerations: performance, cost, and so on.

System-on-chip vs. distributed

The shared memory vs. message passing distinction doesn’t tell us everything we would like to know about a multiprocessor. The physical organization of the processing elements and memory play a large role in determining the characteristics of the system. We have already seen in Chapter 4 single-chip microcontrollers that include the processor, memory, and I/O devices. A multiprocessor system-on-chip (MPSoC)[Wol08B] is a system-on-chip with multiple processing elements. We use the term distributed system, in contrast, for a multiprocessor in which the processing elements are physically separated. In general, the networks used for MPSoCs will be fast and provide lower-latency communication between the processing elements. The networks for distributed systems give higher latencies than are possible on a single chip, but many embedded systems require us to use multiple chips that may be physically very far apart. The differences in latencies between MPSoCs and distributed systems influences the programming techniques used for each.

MPSoCs

Shared memory systems are very common in single-chip embedded multiprocessors. Shared memory multiprocessors show up in low-cost systems such as CD players as we will see in Section 8.7. They also appear in higher-cost, high-performance systems such as cell phones, with the TI DaVinci being a widely used example. Shared memory systems offer relatively fast access to shared memory.

The next example introduces a multiprocessor system-on-chip for embedded computing, the ARM MPCore.

Example 8.1 ARM MPCore

The ARM MPCore architecture is a symmetric multiprocessor. An MPCore can have up to four CPUs. Interrupts are distributed among the processors by a distributed interrupt system. Consistency between the caches on the CPUs is maintained by a snooping cache controller.

image

The shared level 1 cache is managed by a snooping cache unit. Snooping maintains the consistency of caches in a multiprocessor. The snooping unit uses a MESI-style cache coherency protocol that categorizes each cache line as either modified, exclusive, shared, or invalid. Each CPU’s snooping unit looks at writes from other processors. If a write modifies a location in this CPU’s level 1 cache, the snoop unit modifies the locally cached value.

A distributed interrupt controller processes interrupts for the MPCore cluster. The interrupt distributor masks and prioritizes interrupts as in standard interrupt systems. In addition to a priority, an interrupt source also identifies the set of CPUs that can handle the interrupt. The interrupt distributor sends each CPU its highest-priority pending interrupt. Two different models can be used for distributing interrupt: taking the interrupt can clear the pending flag for that interrupt on all CPUs; the interrupt clears the pending flag only for the CPU that takes the interrupt.

Vector operations are performed on a coprocessor. It provides standard IEEE 754 floating-point operations as well as fast implementations of several operations. Hardware divide and square root operators can execute in parallel with other arithmetic units.

The control coprocessor provides several control functions: system control and configuration; management and configuration of the cache; management and configuration of the memory management unit; and system performance monitoring. The performance monitoring unit can count cycles, interrupts, instruction and data cache metrics, stalls, TLB misses, branch statistics, and external memory requests.

Distributed systems

Message passing is widely used in distributed embedded systems. The most notable example of a safety-critical real-time distributed embedded system is found in the automobile. The dominant interconnection network used in cars is the Controller Area Network (CAN) bus, which was introduced by Bosch in 1986. Today, several hundred million CAN bus nodes are sold every year. A CAN network consists of a set of electronic control units (ECUs) connected by the CAN bus; the ECUs pass messages to each other using the CAN protocol. CAN bus is used for safety-critical operations such as antilock braking. It is also used in less-critical applications such as passenger-related devices. CAN is not a high-performance network when compared to some scientific multiprocessors—it can typically run at 1 Mbit/sec. However, that transmission rate is high enough to support a large network of devices in a car. CAN is well-suited to the strict requirements of automotive electronics: reliability, low power consumption, low weight, and low cost. We will describe the details of the CAN bus in Section 8.4.

In distributed systems, the network is fairly lightweight. A car network, for example, typically provides a few Mb of bandwidth. However, the computations are organized so that each processor has to send only a relatively small amount of data to other processors to do the system’s work.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.94.152