Chapter 11

Conclusions and future work

Abstract

This chapter first summarizes the content of this book: on the basis of the communication-centric cross-layer optimization method, the book explores the network-on-chip design space in a bottom-up, coherent, and uniform fashion, from low-level router, buffer, and topology implementations, to network-level routing and flow control designs, to co-optimizations of the network-on-chip and high-level programming paradigms. Then, we discuss promising research directions for future work.

Keywords

Communication-centric cross-layer optimization

Future work

11.1 Conclusions

The advancement of semiconductor technology and the severe design challenges for single-core processors are together driving computer architecture rapidly into the many-core era. Although the community has already made significant breakthroughs, the development of efficient many-core processors still faces several challenges, including high-level parallel programming paradigms, intermediate-level communication structure, and low-level logic implementations. Communication-centric cross-layer optimizations can not only increase the performance for the communication layer, but can also efficiently mitigate the challenges for both the programming paradigm layer and the logic implementation layer. On the basis of this insight, this book has explored the network-on-chip (NoC) design space in a bottom-up, coherent, and uniform fashion, from low-level router, buffer, and topology implementations, to network-level routing and flow control designs, to co-optimizations of the NoC and high-level programming paradigms.

The main content of this book was presented in Part II, on logic implementations, Part III, on routing and flow control, and Part IV, on programming paradigms. Part I contains the Prologue and Part V contains the Epilogue. Part II, consisting of Chapters 24, tackled the logic implementations of the NoC router architecture, buffer structure, and topology. More specifically, in Chapter 2, we designed a single-cycle router with wing channels to reduce the communication latency. The wing channels forward incoming packets to free ports immediately with the inspection of switch allocation results to achieve single-cycle per-hop delay. Also, the packets traversing wing channels fill in the free time slots of the crossbar to improve the network throughput. Chapter 3 first introduced a dynamically allocated virtual channel (VC) design to share buffers among VCs of the same port, and then a hierarchical bit-line buffer-based structure to share buffers among different ports was designed. Both the dynamically allocated VC and the hierarchical bit-line buffer adaptively avoid network congestion according to network traffic and buffer occupations. Chapter 4 presented a hierarchy topology which combines the packet-switched network with the transaction-based bus structure. The proposed virtual bus on-chip network dynamically configures point-to-point links of conventional NoCs into virtual bus structures. This topology efficiently supports both unicast and multicast/broadcast communications.

Part III, including Chapters 57, shifted the attention to a higher level of abstraction, the routing and flow control of the NoCs. On the basis of a holistic approach, Chapter 5 delved into the design of routing algorithms for workload consolidation. The proposed destination-based selection strategy achieves both high adaptivity and dynamic isolation for multiple concurrent applications. Chapter 6 explored efficient flow control mechanisms to maximize the utilization of limited buffer resources for fully adaptive routing algorithms. It presented two novel flow control designs. First, whole packet forwarding (WPF) reallocates a nonempty VC if the VC has enough free buffers for an entire packet. We proved that WPF does not induce deadlock, and that it is an important extension to several deadlock avoidance theories. Second, we extended Duato's theory to apply aggressive VC reallocation on escape VCs without deadlock. Chapter 7 continued our exploration of deadlock-free flow control in torus NoCs. The flit bubble flow control theory presented achieves deadlock freedom by maintaining one free flit-size buffer slot inside the ring. The two implementations support both high frequencies and efficient buffer utilization.

Part IV covered co-optimizations of the NoC and programming paradigms in three chapters, Chapters 810. In Chapter 8, we optimized the NoC design for shared memory programming paradigms. We provided hardware implementations for collective communications, including multicast and reduction ones, for cache-coherent protocols to prevent these communications from becoming system bottlenecks. In Chapter 9, we customized the NoC for message passing programming paradigms. The NoC designed provides special and low-cost hardware implementations for message passing interface (MPI) communication primitives. Since most other MPI functions can be built upon these hardware-implemented primitives, this design greatly and efficiently improves the performance for MPI communication. Chapter 10 studied supporting adaptive MPI communication protocols in NoCs. The proposed adaptive communication mechanism combines the advantages of both buffered and synchronous communication modes to enhance throughput and latency; it performs similarly to the buffered mode with large free receiving buffers, while it changes to the synchronous mode with limited buffers.

In summary, in this book we have applied the communication-centric cross-layer method in a bottom-up fashion. The research presented here has addressed a multitude of pressing concerns spanning a wide spectrum of design topics. In the lower logic implementation layer, the exploration of low-latency router architectures was followed by the design of efficient dynamic VC structures. The study of this layer ended with an NoC topology enhanced with virtual bus structures. The exploration of the intermediate network routing and flow control layer first focused on routing algorithms for workload consolidation, and then delved into flow control designs for fully adaptive routing and deadlock-free torus NoCs. For the co-design of the NoC and the upper programming paradigm layer, both the mainstream shared memory paradigm and message passing paradigms were addressed by providing customized and special communication hardware.

11.2 Future work

The content of this book opens several interesting avenues for future research. The low-latency router architectures with wing channels proposed in Chapter 2 can be enhanced with priority arbitrations to support critical packets or traffic flow more efficiently; reserving express wing channels for critical traffic can mitigate the performance bottlenecks due to the communication latency. The dynamic VC structures proposed in Chapter 3 can be leveraged to design the buffer structure in the network interface. The virtual bus on-chip network in Chapter 4 can be extended to support power gating techniques to shut down all routers. The reconfigurable bus links act as the backup connected network.

The idea of the destination-based selection strategy in Chapter 5 can be used in the design of an injection control mechanism. If the packet destination is integrated into the injection control procedure, the network can maintain the performance in a more robust fashion. The WPF theory presented in Chapter 6 can be leveraged to enhance the turn-model-based routing algorithm by allowing the short packets to cross the prohibited turns. Combining the flit bubble flow control in Chapter 7 and Duato's theory can result in the design of efficient fully adaptive routing algorithms for torus NoCs.

The message combination framework presented in Chapter 8 can be extended to support more general cache coherence protocols, where the requesting node collects the acknowledgments. The customized hardware implementation for MPI primitives studied in Chapter 9 can be improved by integrating the special support for latency-critical short messages. The adaptive communication mechanism proposed in Chapter 10 may be improved with novel mechanisms which deliver timelier end-point buffer status.

Although this book offers a thorough exploration of the NoC design space, several emerging technology trends or techniques indicate numerous new topics for future research in the NoC field. In addition to reducing the dynamic power consumption, reducing the static power consumption is becoming more and more important. Power gating the network components, such as the routers, buffers, allocators, or crossbars, is the general way to optimize the static power consumption. The effect of wake-up delay and maintaining connectivity for networks are two important issues. The deadlock-free flow control mechanisms presented in Chapters 6 and 7 can be used to provide deadlock freedom for partially powered down networks.

The synchronization procedure easily becomes a system bottleneck for many-core processors with shared memory programming paradigms. Exploring the NoC structure to design a fast communication structure for synchronization signals can mitigate this challenge. After about 20 years of developments, hardware transactional memory is becoming a reality. Providing customized NoC features for transactional memory programming paradigms is also an interesting research direction. The heterogeneous architecture, including both CPUs and graphics processing units, is a widely accepted method to address the power consumption problems. The latency-critical CPUs and throughput-oriented graphics processing units have different requirements for the communication structure. Deploying isolation mechanisms in the NoCs to support these kinds of communications is important for the efficiency of heterogeneous architectures.

In general, using communication-centric cross-layer methods to integrate low-level circuit and logic knowledge and high-level programming paradigms and application knowledge into the NoC designs will provide everlasting vigor to the NoC and computer architecture community; research into NoCs has countless opportunities and challenges.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.128.204.140