Home Page Icon
Home Page
Table of Contents for
IV. Mathematics and Computation
Close
IV. Mathematics and Computation
by Matthew Scarpino
Programming the Cell Processor: For Games, Graphics, and Computation
Copyright
Dedication
Foreword
Preface
Acknowledgments
About the Author
1. Introducing the Cell Processor
1.1. Background of the Cell Processor
History of the Cell
Potential of the Cell Processor for Scientific Computing
1.2. The Cell Architecture: An Overview
The Memory Interface Controller (MIC)
The PowerPC Processor Element (PPE)
The Synergistic Processor Element (SPE)
The Element Interconnect Bus (EIB)
The Input/Output Interface (IOIF)
1.3. The Cell Broadband Engine Software Development Kit (SDK)
1.4. Conclusion
I. The Software Development Tools
2. The Cell Software Development Kit (SDK)
2.1. Installing Fedora Core on the PlayStation 3
Obtaining the Fedora Core Image for the Cell Processor
Obtaining the Bootloader for Linux/PS3
Installing Linux onto the PS3
2.2. Installing the Cell Software Development Kit
Downloading the Cell SDK Files
Preparing for SDK Installation
Understanding and Executing the cellsdk Script
Optional SDK Packages
2.3. SDK Software Licensing
IBM’s International License Agreement for Early Release of Programs (ILAR)
GNU General Public License (GPL)
Lesser GNU General Public License (LGPL)
2.4. Exploring the SDK Installation
SDK on a Cell System
Executables in /usr/bin
Additional Tools in /opt/cell/sdk
SDK on an x86 System
Executables and Libraries in /opt/cell/toolchain
Simulator Files in /opt/ibm/systemsim-cell
2.5. Connecting to a Remote Cell Processor System
Send Commands to the Cell System with PuTTY
Transfer Files to the Cell System with WinSCP
2.6. Conclusion
3. Building Applications for the Cell Processor
3.1. Software Development for the Cell Processor
Building Applications for the PowerPC Processor Unit (PPU)
The PPU Preprocessor, ppu-cpp
The PPU Compiler, ppu-gcc
The PPU Assembler, ppu-as
The PPU Linker, ppu-ld
The Short Way: Building PPU Applications with ppu-gcc
Building Applications for the Synergistic Processor Unit (SPU)
3.2. Make and Makefiles
Anatomy of a Makefile
Makefile Dependency Lines
Shell Lines
Makefile Variables and Comments
A Simple Makefile Example
Advanced Makefile Development
Phony Targets
Automatic Variables
Pattern Rules and Built-In Rules
An Advanced Makefile
3.3. Conclusion
4. Debugging and Simulating Applications
4.1. Debugging Cell Applications
Debugging SPU Applications with spu-gdb
Breakpoints and Watchpoints
Reading Processor Information
Controlling Application Execution
An Example Debug Session
4.2. The IBM Full-System Simulator for the Cell Broadband Engine (SystemSim)
SystemSim Configuration
Starting and Running SystemSim
The SystemSim Command Window
The SystemSim Console Window
Compiling and Running Simulated Applications
Building the Example Application
Transferring the Application to the Simulated Cell
Running the Application
SPU Statistics and Checkpoints
SPU Statistics and Profiling Commands
SPU Checkpoints
SystemSim Trigger Events and Trigger Actions
Trigger Events
Trigger Actions
Associating Trigger Actions with Trigger Events
SystemSim Emitters
Emitter Event Configuration
Coding Emitter Readers Pt 1: EMIT_DATA
Coding Emitter Readers Pt 2: The Emitter API
4.3. Conclusion
5. The Cell SDK Integrated Development Environment
5.1. Eclipse and the C/C++ Development Tooling
Installing Eclipse
Installing the CDT
Installing the Cell IDE
5.2. Managing an SPU Project with the Cell IDE
Creating a Cell Project
Adding and Editing Source Code
Project Configuration
Building Projects with the Cell IDE
5.3. Running Executables with the Cell IDE
Running Executables on a Remote Cell System
Running Executables on a Cell Simulator
Debugging Executables with the Cell IDE
5.4. Conclusion
II. The PowerPC Processor Element (PPE)
6. Introducing the PowerPC Processor Unit (PPU)
6.1. Programming the PowerPC Processor Unit
PPU Datatypes
PPU Bit Ordering
PPU Libraries
PPU Intrinsics
PPU Timing and the Time Base Register
6.2. The PPU: A Bird’s-Eye View
The PPU and the PowerPC
The PPU’s Functional Units
The PPU’s Register File
6.3. PPU Instruction Processing
1. Instruction Cache (Four Stages)
2. Branch Prediction (Four Stages)
3. Instruction Buffer (Two Stages) and Dispatch
4. Microcode Translation (11 Stages)
5. Decode (Three Stages)
6. Issue (Three Stages)
6.4. Configuring the Pipeline
Controlling the Instruction Cache
Preventing Branch Misses
Removing Microcoded Instructions
Improving Chances of Dual Issue
6.5. PPU Dual-Threaded Operation
PPU Multithreading Example
PPU Intrinsics for Dual-Threaded Applications
6.6. PPU Memory Access: The Load Store Unit (LSU)
The Data Cache (DCache) and Memory Synchronization
The PowerPC Processor Storage Subsystem (PPSS)
6.7. PPU Address Translation: The Memory Management Unit (MMU)
Virtual Memory and Segments
Pages and the Translation Lookaside Buffer (TLB)
Preventing TLB Misses by Using Hugepages
6.8. Conclusion
7. The SPE Runtime Management Library (libspe)
7.1. The Big Picture
A Basic Example of the SPE Runtime Management Library
Building the Application from the Command Line
Building the Application with the Cell SDK IDE
7.2. The SPE Management Process
1. Examine System Configuration (Optional)
2. Embed the SPE Executable into the PPU Application
Compile-Time Embedding
Runtime Embedding
3. Create a Context for Each SPE
4. Create an Event Handler and Register Events (Optional)
5. Load the Program Handle into the Context and Run the Executable
6. Wait for Events and Analyze Stop Information (Optional)
7. Deallocate Data Structures
7.3. Linux Pthreads and libspe
Linux Pthreads
IBM Code Conventions
7.4. Gang Contexts and Affinity
7.5. Direct SPE Access
The SPU File System (SPUFS)
Direct SPE Access in libspe
7.6. Conclusion
8. SIMD Programming on the PPU, Part 1: Vector Libraries and Functions
8.1. Introduction to Vectors and PPU Vector Processing
PPU Vector Datatypes
PPU Floating-Point Values: Graphics Rounding Mode and Java Mode
PPU Vector Registers
8.2. Vector Function Libraries
AltiVec
SIMD Math
MASSV
8.3. SIMD Functions for the PPU
Load and Store Functions
Addition/Subtraction Functions
Multiplication/Division Functions
Conversion, Packing, and Splatting Functions
Permutation and Shifting Functions
Basic Unary Instructions
Logic Functions
Vector Comparison, Part 1: Vector Return Value
Vector Comparison, Part 2: Scalar Return Value
Exponent/Logarithm Functions
Trigonometric Functions
Floating-Point Analysis Functions
8.4. Conclusion
9. SIMD Programming on the PPU, Part 2: Methods and Algorithms
9.1. From Scalars to Vectors
Accessing Unaligned Memory and Unrolling Loops
vec_perm and vec_lvsl
Unaligned Vectorized Addition
SOA Versus AOS
9.2. Vectorizing Data Transfer and String Manipulation
libmotovec and libfreevec
memcpy
strcmp
9.3. Vectorized Insertion Sort
Insertion Sort
Intervector Sort
Intravector Sort
9.4. Conclusion
III. The Synergistic Processor Element (SPE)
10. Introducing the Synergistic Processor Unit (SPU)
10.1. The Synergistic Processor Unit
SPU Functional Units
SPU User Registers
10.2. SPU Datatypes and Floating-Point Processing
SPU Scalar Datatypes
SPU Vector Datatypes
Floating-Point Processing on the SPU
10.3. SPU Libraries in the SDK
C/C++ Standard Libraries on the SPU
SPU Intrinsics and Additional Libraries
10.4. The SPU Local Store
Memory Synchronization
10.5. SPU Initialization and Loading
SPU Initialization and Stack Operation
SPU Executable Loading and Spulets
10.6. SPU Dynamic Allocation and the Heap
10.7. The SPU Instruction Pipeline
Prefetch and Buffering
Branch Processing and Prediction
Decode
Issue
10.8. Conclusion
11. SIMD Programming on the SPU
11.1. SPU Vector Intrinsics Versus PPU Vector Intrinsics
Similarities Between PPU and SPU SIMD Coding
Differences Between PPU and SPU SIMD Coding
Comparing the SPU and PPU Vector Functions
11.2. The SIMD Math and MASSV Libraries
SIMD Math Library
MASSV
11.3. The SPU Decrementer
11.4. SPU Vector Functions
SPU Vector/Scalar Functions
SPU Addition/Subtraction Functions
SPU Multiplication/Division Functions
SPU Shuffle/Select Functions
SPU Compare/Count Functions
SPU Shift/Rotate Functions
SPU Rotation Functions
SPU Shift-Right Functions
SPU Basic Unary Functions
SPU Logical Functions
Exponent/Logarithm Functions
Trigonometric Functions
Floating-Point Analysis Functions
11.5. Common SPU Tasks
Processing Unaligned Data
Converting from AOS to SOA and Back
11.6. Accessing the SPU’s FPSCR
Detecting Errors with the FPSCR
Configuring the Double-Precision Rounding Mode with the FPSCR
11.7. Conclusion
12. SPU Communication, Part 1: Direct Memory Access (DMA)
12.1. The Element Interconnect Bus (EIB) and the Memory Flow Controller (MFC)
The Element Interconnect Bus (EIB)
The Memory Flow Controller (MFC)
The Scholar-Butler Analogy
External Access to the MFC
12.2. Introducing DMA
12.3. Tag Groups and DMA
Checking for DMA Completion
Checking the MFC Command Queue
Ordering Transfers in a Tag Group
12.4. Multibuffered DMA
Double Buffering
12.5. DMA Request Lists
DMA List Elements
DMA List Functions
12.6. SPU-SPU and SPU-SPU DMA Transfers
PPU-Initiated DMA
DMA Between SPUs
12.7. Atomic DMA and the Synchronization Library
Atomic DMA Functions
Synchronization Library
Atomic Operations
Mutexes
Reader/Writer Locks
Condition Variables
Completion Variables
The Cashier Problem
12.8. Conclusion
13. SPU Communication, Part 2: Events, Signals, Mailboxes
13.1. SPE Channels and the Memory Flow Controller
The Scholar-Butler Analogy and Channels
SPU Channels and Channel Functions
PPU Access to MFC Registers
13.2. Events and Interrupts
Step 1: Select Events of Interest
Step 2: Recognize Events as They Occur
Waiting
Polling
Interrupt Handling
Step 3: Acknowledge Events
PPE Event Handling
13.3. Mailboxes
SPU Mailbox Communication
Mailbox Write
Mailbox Read
Mailbox Event Processing
PPU Mailbox Communication
SPU-SPU Mailbox Communication
13.4. Signal Communication
Signal Notification Channels and Read Operations
Sending Signals from an SPE
Signal Notification Modes and Many-to-One Communication
PPU Signaling
Signals and SPE Synchronization
13.5. Multiprocessor Synchronization
Multiprocessor DMA Ordering
MFC Multisource Synchronization
13.6. Conclusion
14. Advanced SPU Topics: Overlays, Software Caching, and SPU Isolation
14.1. SPU Overlays
Overlays and the GNU Linker Script
Overlay Code
14.2. SPU Software Cache
Configuring the Cache
Accessing the Cache
Safe Software Cache Functions
Unsafe Software Cache Functions
Heapsort and the Software Cache
Cache Statistics
14.3. SPU Security and Isolation
SPU Security Tools
Signing and Verifying SPU Executables
Keys and Certificates
Building Secure Applications
Libspe and Secure Contexts
Application Encryption
The SPU Isolation Library
Functions in the SPU Isolation Library
Communicating with an Isolated SPU
14.4. Conclusion
15. SPU Assembly Language
15.1. Why Learn SPU Assembly?
15.2. Specific Intrinsics and Assembly-Coded Applications
Specific Intrinsics
Introducing the SPU Assembly Language
Creating Sections in Assembly
A Simple Assembly File
Building an Assembly-Coded Application
Debugging an Assembly-Coded Application
15.3. SPU Load and Store Instructions
SPU Addressing Modes
Load/Store Instructions
Load Immediate Instructions
15.4. SPU Shuffle and Select Instructions
Byte Shuffling and Shuffle Mask Creation
Bit Selection and Selection Mask Creation
15.5. SPU Arithmetic and Logic Instructions
SPU Addition and Subtraction Instructions
SPU Multiplication Instructions
SPU Logic Instructions
15.6. SPU Compare, Branch, and Halt Instructions
SPU Compare Instructions
SPU Branch Instructions
SPU Hint-for-Branch Instructions
SPU Halt Instructions
15.7. SPU Channel and Control Instructions
15.8. SPU Shift and Rotate Instructions
SPU Shift Instructions
SPU Rotate Instructions
15.9. SPU Counting and Conversion Instructions
15.10. Assembly Language and Function Calls
Writing an Assembly-Coded Function
Declaring the Function in Assembly
Managing the Stack
Assembly-Coded Function Example
Calling C/C++ Functions from Assembly Code
15.11. Assembly and the SPU Dual-Pipeline Architecture
15.12. Conclusion
IV. Mathematics and Computation
16. Vectors and Matrices
16.1. The Vector Library
Vector Products and Lengths
Graphic and Miscellaneous Vector Functions
16.2. The Matrix Library: 4x4 Matrices
Basic Matrix Operations
Projection Matrices
Orthogonal Projection
Perspective Projection
Vector Rotation
Rotating Coordinates in 2D and 3D
Quaternions and the Matrix Library
16.3. The Large Matrix Library
Basic Large Matrix Operations
Linear Equation Solution
16.4. The Basic Linear Algebra Subprograms (BLAS) Library
16.5. Multiprocessor Matrix Multiplication
Running the Application
The PPU Creates the Matrix and SPU Contexts
The SPUs Receive Information and Load Matrix Blocks
The SPUs Process the Matrix Blocks and Store Results
16.6. Conclusion
17. The Fast Fourier Transform (FFT)
17.1. Introducing Frequency Analysis and the Discrete Fourier Transform
The Time Domain and the Frequency Domain
Signals and Sampling
The Discrete Fourier Transform
Frequencies of Interest
The Single-Frequency Vector
The DFT Equation
A DFT Example
DFT Computation and the FFT
17.2. Introducing the Fast Fourier Transform
The Stretching Property
The Shifting Property
The Addition Property
The Two- and Four-Point Fourier Transform
17.3. The Example FFT Library (libfft_example)
The One-Dimensional FFT
The Two-Dimensional FFT
17.4. The FFT Library (libfft and libfft_spu)
The FFT Library for the PPU
The FFT Library for the SPU
17.5. Conclusion
18. Multiprecision Processing and Monte Carlo Methods
18.1. Multiprecision Mathematics Library (libmpm)
Multiprecision General Arithmetic/Logic Functions
Multiprecision Division and Modular Operations
Modular Exponentiation
Public Key Cryptography and RSA
18.2. The Monte Carlo Library (libmc_rand)
Generating Pseudo- and Quasi-Random Numbers
Feedback Shift Registers (FSRs) and Pseudo-Random Numbers
Quasi-Random Numbers and the Sobol Generator
Transforming the Distribution of Number Sequences
The Monte Carlo Transformation Methods
The Monte Carlo Transformation Functions
18.3. Conclusion
V. Graphics and Games
19. Programming the Frame Buffer: Linux and the PlayStation 3
19.1. Graphical Displays, Linux Devices, and the Frame Buffer
Display Monitors and Linux Configuration
Display Speed and Frame Rate
Linux Devices and the Frame Buffer
Character and Block Devices
The Linux Frame Buffer
19.2. I/O Control (ioctl) Instructions
Linux Frame Buffer I/O Control
PlayStation 3 Frame Buffer I/O Control
19.3. Drawing the Frame Buffer
19.4. Conclusion
20. OpenGL on the Cell: Gallium and Mesa
20.1. OpenGL, Mesa, and Gallium
OpenGL: Past, Present, and Future
Architecture Review Board, Extensions, and Difficulties
The Khronos Group and OpenGL 3.0
Mesa
Gallium
20.2. Acquiring and Building Mesa/Gallium
Downloading the Mesa/Gallium Source Code
Building the Mesa/Gallium Libraries
20.3. The OpenGL Utility Toolkit (GLUT)
Creating Windows with GLUT
20.4. A Gentle Introduction to OpenGL, Part 1: Creating the Viewing Region
OpenGL Datatypes
Defining the OpenGL Viewing Region
Orthographic Projections in 2D and 3D
Perspective Projections
Example Perspective Projection
20.5. A Gentle Introduction to OpenGL, Part 2: Vertices, Colors, Normals, and Vertex Buffer Objects
OpenGL Vertices and Shapes
Defining Vertex Color
OpenGL Normal Vectors
OpenGL Vertex Buffer Objects
20.6. Conclusion
21. Building Games with Ogre3D
21.1. Introducing Ogre
Downloading and Building the Ogre Libraries
Installing FreeImage
Installing Ogre
Building an Ogre Application
21.2. The Basics of Ogre Development
The Root and Ogre Plug-ins
Ogre Plug-ins
The Root Class
The SceneManager and the Camera
The Viewport
21.3. Ogre Resources: Meshes, Skeletons, and Materials
Ogre Meshes
Ogre Skeletons
Ogre Materials
Accessing Resources in Applications
21.4. Managing the Scene: Entities, Nodes, and Lighting
Meshes and Entities
Nodes and the SceneManager
Adding Lights to the Scene
21.5. Moving the Ninja: User Input, Animation, and Frame Listening
Responding to User Input
Animation
The FrameListener
21.6. Conclusion
22. Packaging Graphics with COLLADA
22.1. Introducing COLLADA
22.2. COLLADA’s Digital Asset Exchange (DAE) Format
The <asset> Element
The <library_geometries> Element
The <source> Subelement
The <vertices> Subelement
COLLADA Shapes
The <library_controllers> Element
The <skin> Subelement
The <morph> Subelement
The <library_materials> Element
The <technique_hint> Subelement
The <setparam> Subelement
22.3. The COLLADA Application Programming Interface (API)
Installing the COLLADA Libraries
Basic Objects of the COLLADA API
The DAE Class
The Runtime Database and the DAE Elements
The COLLADA Document Object Model (DOM)
22.4. Conclusion
Epilogue
VI. Appendices
A. Understanding ELF Files
A.1. ELF Object Files
The ELF Header
The Section Headers
Symbol and String Tables
Relocation Tables
A.2. ELF Executable Files
Dynamic Linking
A.3. ELF Libraries
Static Libraries
Shared Libraries
A.4. SPU-ELF and CESOF Files
SPU-ELF Files and the TOE Section
The CESOF Format
Creating the Complete PPU Executable
A.5. Accessing ELF Files in Code
A.6. Conclusion
B. Updating the PS3 Add-On Packages and Installing a New Linux Kernel
B.1. Add-On Packages for the PlayStation 3
The PS3 Add-On Packages
Installing the PS3 Add-On Packages
B.2. Configuring and Installing a New Linux Kernel
Accessing the Kernel Source Code
Configuring and Building the Linux Kernel
Configuring the PS3 to Boot the New Kernel
C. The Accelerated Library Framework (ALF)
C.1. Introduction to ALF
libspe and ALF
Accelerator Memory Buffers
C.2. ALF Applications on the Host (PPU)
Initializing the ALF Environment
alf_init
alf_query_system_info
alf_num_instances_set
Task Descriptors
Customizing the Task Descriptor
Adding a Task Context to the Task Descriptor
Creating Tasks
ALF Work Blocks
Creating Work Blocks
Adding Parameter Contexts to a Work Block
Work Blocks and Data Transfer Lists
Adding the Work Block to the Task
Launching and Ending the Task
C.3. ALF Applications on the Accelerator (SPU)
The Five ALF Accelerator Stages
Implementing ALF Accelerator Stages with Functions
Stage 1: Setup Task Context
Stage 2: Create Input Data Transfer Lists
Stage 3: Process Computational Kernel
Stage 4: Create Output Data Transfer Lists
Stage 5: Merge Task Context
Kernel API Export Definition Section
Accelerator Environment Functions
C.4. ALF Example Applications
ALF Example 1: Text Transfer and Display
ALF Example 2: Matrix Addition and Subtraction
Partitioning Data on the Host
Partitioning Data on the Accelerator
C.5. ALF Task Dependency and Event Processing
ALF Task Dependency
ALF Events
C.6. Conclusion
D. SPU Instruction Set Reference
E. A Brief Introduction to Tcl
E.1. Introducing Tcl
E.2. Higher-Level Tcl
Tcl Conditional Statement: if...elseif...else
Tcl List Processing
Tcl Arrays
Loop Iteration: for and foreach
E.3. Procedure Declarations
E.4. Conclusion
Search in book...
Toggle Font Controls
Playlists
Add To
Create new playlist
Name your new playlist
Playlist description (optional)
Cancel
Create playlist
Sign In
Email address
Password
Forgot Password?
Create account
Login
or
Continue with Facebook
Continue with Google
Sign Up
Full Name
Email address
Confirm Email Address
Password
Login
Create account
or
Continue with Facebook
Continue with Google
Prev
Previous Chapter
15. SPU Assembly Language
Next
Next Chapter
16. Vectors and Matrices
Part IV. Mathematics and Computation
Add Highlight
No Comment
..................Content has been hidden....................
You can't read the all page of ebook, please click
here
login for view all page.
Day Mode
Cloud Mode
Night Mode
Reset