Introduction

Designing efficient parallel programming paradigms is one of the most critical challenges for the development of many-core processors. The efficiency of programming paradigm largely depends on the communication scheme as the communication is the basis of core cooperation. There are generally two types of paradigms, the shared memory and the message passing paradigms. The shared memory paradigm uses cache coherence protocols to maintain a coherent memory view; cache coherence protocols determine traffic characteristics. Without hardware support, coherent collective communications, including multicast and reduction ones, easily become system bottlenecks. For message passing paradigms, conventional software implementations of message passing interface (MPI) functions cause large latencies. Since most functions are built from a few primitives, hardware implementations of primitives can efficiently scale up performance. In addition, the MPI protocol leverages two communication modes, the buffered and the synchronous modes; the buffered one is efficient with large receiving buffers, while the synchronous one is appropriate with limited buffers. Adaptively adjusting the communication mode on the basis of buffer status can provide robust and high performance. On the basis of the aforementioned analysis, this part delves into the co-design of network-on-chip (NoC) and high-level programming paradigms in three chapters.

Chapter 8 studies hardware support for collective communications in cache coherence protocols. Support for multicast communication in NoCs has achieved substantial throughput gains and power savings. This chapter explores support for reduction communications. As a case study, we focus on acknowledgment messages (ACK) that must be collected in a directory protocol before a cache line may be upgraded to or installed in the modified state. This chapter makes two primary contributions, an efficient framework to support the reduction of ACK packets and a novel balanced, adaptive multicast (BAM) routing algorithm. The message combination framework complements several multicast algorithms. By combining ACKs during transmission, this framework not only reduces packet latency for low-to-medium network loads, but also improves the network saturation throughput with little overhead. The balanced buffer resource configuration of BAM results in additional saturation throughput improvements.

Chapter 9 presents a NoC design that optimizes the well-known parallel programming model, MPI, to boost applications by exploiting hardware features available in NoC-based many-core architectures. Conventional MPI functions are normally implemented in software because of their enormity and complexity, resulting in large communication latencies. We propose a novel hardware implementation of basic MPI primitives. The premise is that all other MPI functions can be efficiently built from these three MPI primitives. The design includes two important features: the customized NoC design incorporating virtual buses into NoCs and the optimized MPI unit efficiently executing MPI-related transactions. The proposed designs effectively boost the performance of MPI communication functions.

Chapter 10 explores designing MPI communication protocols over NoCs. We advocate a hardware-supported communication mechanism using a protocol-adaptive approach to adjust to varying NoC configurations (e.g. number of buffers) and workload behavior (e.g. number of messages). This chapter proposes the adaptive communication mechanism (ADCM), a hybrid protocol that involves behavior similar to buffered communication when sufficient buffer is available in the receiver and that similar to a synchronous protocol when buffers in the receiver are limited. ADCM adapts dynamically by deciding on the communication protocol on a per-request basis using a local estimate of recent buffer utilization. ADCM attempts to combine the advantages of both buffered and synchronous communication modes to achieve enhanced throughput and performance. The proposed communication mechanism can be effectively used in future NoC designs.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.68.50