Daniel Kusswurm

Modern Parallel Programming with C++ and Assembly Language

X86 SIMD Development Using AVX, AVX2, and AVX-512

Daniel Kusswurm
Geneva, IL, USA
ISBN 978-1-4842-7917-5e-ISBN 978-1-4842-7918-2
© Daniel Kusswurm 2022
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Apress imprint is published by the registered company APress Media, LLC part of Springer Nature.

The registered company address is: 1 New York Plaza, New York, NY 10004, U.S.A.

Introduction

SIMD (single instruction multiple data) is a parallel computing technology that simultaneously executes the same processor operation using multiple data items. For example, a SIMD-capable processor can carry out an arithmetic operation using several elements of a floating-point array concurrently. Programs often use SIMD operations to accelerate the performance of computationally intense algorithms in machine learning, image processing, audio/video encoding and decoding, data mining, and computer graphics.

Since the late 1990s, both AMD and Intel have incorporated various SIMD instruction set extensions into their respective x86 processors. The most recent x86 SIMD instruction set extensions are called AVX (Advanced Vector Extensions), AVX2, and AVX-512. These SIMD resources facilitate arithmetic and other data processing operations using multiple elements in a 128-, 256-, or 512-bit wide processor register (most standard x86 arithmetic operations are carried out using scalar values in an 8-, 16-, 32-, or 64-bit wide register).

Despite the incorporation of advanced SIMD capabilities in x86 modern processors, high-level language compilers are sometimes unable to fully exploit these resources. To optimally utilize the SIMD capabilities of a modern x86 processor, a software developer must occasionally write SIMD code that explicitly employs the AVX, AVX2, or AVX-512 instruction sets. A software developer can use either C++ SIMD intrinsic functions or assembly language programming to accomplish this. A C++ SIMD intrinsic function is code that looks like an ordinary C++ function but is handled differently by the compiler. More specifically, the compiler directly translates a C++ SIMD intrinsic function into one or more assembly language instructions without the overhead of a normal function (or subroutine) call.

Before continuing, a couple of caveats are warranted. First, the SIMD programming techniques described in this book are not appropriate for every “slow” algorithm or function. Both C++ SIMD intrinsic function use and assembly language code development should be regarded as specialized programming tools that can significantly accelerate the performance of an algorithm or function when judiciously employed. However, it is important to note that explicit SIMD coding usually requires extra effort during initial development and possibly when performing future maintenance. Second, it should be noted that SIMD parallelism is different than other types of parallel computing you may have encountered. For example, the task-level parallelism of an application that exploits multiple processor cores or threads to accelerate the performance of an algorithm is different than SIMD parallelism. Task-level parallelism and SIMD parallelism are not mutually exclusive; they are frequently utilized together. The focus of this book is x86 SIMD parallelism and software development, specifically the computational resources of AVX, AVX2, and AVX-512.

Modern Parallel Programming with C++ and Assembly Language

Modern Parallel Programming with C++ and Assembly Language is an instructional text that explains x86 SIMD programming using both C++ intrinsic functions and assembly language. The content and organization of this book are designed to help you quickly understand and exploit the computational resources of AVX, AVX2, and AVX-512. This book also contains an abundance of source code that is structured to accelerate learning and comprehension of essential SIMD programming concepts and algorithms. After reading this book, you will be able to code performance-enhanced AVX, AVX2, and AVX-512 functions and algorithms using either C++ SIMD intrinsic functions or x86-64 assembly language.

Target Audience

The target audience for Modern Parallel Programming with C++ and Assembly Language is software developers including
  • Software developers who are creating new programs for x86 platforms and want to learn how to code performance-enhancing SIMD algorithms using AVX, AVX2, or AVX-512

  • Software developers who need to learn how to write x86 SIMD functions to accelerate the performance of existing code using C++ SIMD intrinsic functions or x86-64 assembly language functions

  • Software developers, computer science/engineering students, or hobbyists who want to learn about or need to gain a better understanding of x86 SIMD architectures and the AVX, AVX2, and AVX-512 instruction sets

Readers of this book should have some previous programming experience with modern C++ (i.e., ISO C++11 or later). Some familiarity with Microsoft’s Visual Studio and/or the GNU toolchain will also be helpful.

Content Overview

The primary objective of this book is to help you learn x86 SIMD programming using C++ SIMD intrinsic functions and x86-64 assembly language. The book’s chapters and content are structured to achieve this goal. Here is a brief overview of what you can expect to learn.

Chapter 1 discusses SIMD fundamentals including data types, basic arithmetic, and common data manipulation operations. It also includes a brief historical overview of x86 SIMD technologies including AVX, AVX2, and AVX-512.

Chapters 2 and 3 explain AVX arithmetic and other essential operations using C++ SIMD intrinsic functions. These chapters cover both integer and floating-point operands. The source code examples presented in these (and subsequent) chapters are packaged as working programs, which means that you can run, modify, or otherwise experiment with the code to enhance your learning experience.

Chapters 4, 5, and 6 cover AVX2 using C++ SIMD intrinsic functions. In these chapters, you will learn how to code practical SIMD algorithms including image processing functions, matrix operations, and signal processing algorithms. You will also learn how to perform SIMD fused-multiply-add (FMA) arithmetic.

Chapters 7 and 8 describe AVX-512 integer and floating-point operations using C++ SIMD intrinsic functions. These chapters also highlight how to take advantage of AVX-512’s wider operands to improve algorithm performance.

Chapter 9 covers supplemental x86 SIMD programming techniques. This chapter explains how to programmatically detect whether the target processor and its operating system support the AVX, AVX2, or AVX-512 instruction sets. It also describes how to utilize SIMD versions of common C++ library functions.

Chapter 10 explains x86-64 processor architecture including data types, register sets, memory addressing modes, and condition codes. The purpose of this chapter is to provide you with a solid foundation for the subsequent chapters on x86-64 SIMD assembly language programming.

Chapters 11 and 12 cover the basics of x86-64 assembly language programming. In these chapters, you will learn how to perform scalar integer and floating-point arithmetic. You will also learn about other essential assembly language programming topics including for-loops, compare operations, data conversions, and function calling conventions.

Chapter 13 and 14 explain AVX arithmetic and other operations using x86-64 assembly language. These chapters also illustrate how to code x86-64 assembly language functions that perform operations using arrays and matrices.

Chapters 15 and 16 demonstrate AVX2 and x86-64 assembly language programming. In these chapters, you will learn how to code x86-64 assembly language functions that perform image processing operations, matrix calculations, and signal processing algorithms using the AVX2 instruction set.

Chapters 17 and 18 focus on developing x86-64 assembly language code using the AVX-512 instruction set.

Chapter 19 discusses some usage guidelines and optimization techniques for both C++ SIMD intrinsic functions and assembly language code development.

Appendix A describes how to download and set up the source code. It also includes some basic instructions for using Visual Studio and the GNU toolchain. Appendix B contains a list of references and resources that you can consult for additional information about x86 SIMD programming and the AVX, AVX2, and AVX-512 instruction sets.

Source Code

The source code published in this book is available on GitHub at https://github.com/Apress/modern-parallel-programming-cpp-assembly .

Caution

The sole purpose of the source code is to elucidate programming topics that are directly related to the content of this book. Minimal attention is given to essential software engineering concerns such as robust error handling, security risks, numerical stability, rounding errors, or ill-conditioned functions. You are responsible for addressing these concerns should you decide to use any of the source code in your own programs.

The C++ SIMD source code examples (Chapters 29) can be built using either Visual Studio (version 2019 or later, any edition) on Windows or GNU C++ (version 8.3 or later) on Linux. The x86-64 assembly language source code examples (Chapters 1118) require Visual Studio and Windows. If you are contemplating the use of x86-64 assembly language with Linux, you can still benefit from this book since most of the x86-AVX instruction explanations are OS independent (developing assembly language code that runs on both Windows and Linux is challenging due to differences between the various development tools and runtime calling conventions). To execute the source code, you must use a computer with a processor that supports AVX, AVX2, or AVX-512. You must also use a recent 64-bit operating system that supports these instruction sets. Compatible 64-bit operating systems include (but not limited to) Windows 10 (version 1903 or later), Windows 11, Debian (version 9 or later), and Ubuntu (version 18.04 LTS or later). Appendix A contains additional information about the source code and software development tools.

Additional Resources

An extensive set of x86-related SIMD programming documentation is available from both AMD and Intel. Appendix B lists several important resources that both aspiring and experienced SIMD programmers will find useful. Of all the resources listed in Appendix B, two stand out.

The Intel Intrinsics Guide website ( https://software.intel.com/sites/landingpage/IntrinsicsGuide ) is an indispensable online reference for information regarding x86 C++ SIMD intrinsic functions and data types. This site documents the C++ SIMD intrinsic functions that are supported by the Intel C++ compiler. Most of these functions can also be used in programs that are developed using either Visual C++ or GNU C++. Another valuable programming resource is Volume 2 of the reference manual entitled Intel 64 and IA-32 Architectures Software Developer’s Manual, Combined Volumes: 1, 2A, 2B, 2C, 2D, 3A, 3B, 3C, 3D, and 4 ( www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html ). Volume 2 contains comprehensive information for every AVX, AVX2, and AVX-512 processor instruction including detailed operational descriptions, lists of valid operands, affected status flags, and potential exceptions. You are strongly encouraged to consult this reference manual when developing your own x86 SIMD code to verify correct instruction usage.

Acknowledgments

The production of a motion picture and the publication of a book are somewhat analogous. Movie trailers extol the performances of the lead actors. The front cover of a book trumpets the authors’ names. Actors and authors ultimately receive public acclamation for their efforts. It is, however, impossible to produce a movie or publish a book without the dedication, expertise, and creativity of a professional behind-the-scenes team. This book is no exception.

I would like to thank the talented editorial team at Apress including Steve Anglin, Mark Powers, and Jim Markham for their efforts and contributions. I would also like to thank the entire production staff at Apress. Michael Kinsner warrants applause and a thank you for his comprehensive technical review and constructive comments. Ed Kusswurm merits kudos for reviewing each chapter and offering helpful suggestions. I accept full responsibility for any remaining imperfections.

Thanks to my professional colleagues for their support and encouragement. Finally, I would like to recognize parental nodes Armin (RIP) and Mary along with sibling nodes Mary, Tom, Ed, and John for their inspiration during the writing of this book.

Table of Contents
Index 625
About the Author
Daniel Kusswurm
has over 35 years of professional experience as a software developer, computer scientist, and author. During his career, he has developed innovative software for medical devices, scientific instruments, and image processing applications. On many of these projects, he successfully employed C++ intrinsic functions, x86 assembly language, and SIMD programming techniques to significantly improve the performance of computationally intense algorithms or solve unique programming challenges. His educational background includes a BS in electrical engineering technology from Northern Illinois University along with an MS and PhD in computer science from DePaul University. Daniel Kusswurm is also the author of Modern X86 Assembly Language Programming (ISBN: 978-1484200650), Modern X86 Assembly Language Programming, Second Edition (ISBN: 978-1484240625), and Modern Arm Assembly Language Programming (ISBN: 978-1484262665), all published by Apress.
 
About the Technical Reviewer
Mike Kinsner

is a principal engineer at Intel developing languages and parallel programming models for a variety of computer architectures. He has recently been one of the architects of Data Parallel C++. He started his career at Altera working on high-level synthesis for field-programmable gate arrays and still contributes to spatial programming models and compilers. Mike is a representative within the Khronos Group standards organization, where he works on the SYCL and OpenCL open industry standards for parallel programming. Mike holds a PhD in computer engineering from McMaster University and recently coauthored the industry’s first book on SYCL and Data Parallel C++.

 
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.219.22.169