Home Page Icon
Home Page
Table of Contents for
Index
Close
Index
by Naim Dahnoun
Multicore DSP
Cover
Title Page
Preface
Acknowledgements
Foreword
About the Companion Website
1 Introduction to DSP
1.1 Introduction
1.2 Multicore processors
1.3 Key applications of high‐performance multicore devices
1.4 FPGAs, Multicore DSPs, GPUs and Multicore CPUs
1.5 Challenges faced for programming a multicore processor
1.6 Texas Instruments DSP roadmap
1.7 Conclusion
References
2 The TMS320C66x architecture overview
2.1 Overview
2.2 The CPU
2.3 Single instruction, multiple data (SIMD) instructions
2.4 The KeyStone memory
2.5 Peripherals
2.6 Conclusion
References
3 Software development tools and the TMS320C6678 EVM
3.1 Introduction
3.2 Software development tools
3.3 Hardware development tools
3.4 Laboratory experiments based on the C6678 EVM: introduction to Code Composer Studio (CCS)
3.5 Loading different applications to different cores
3.6 Conclusion
References
4 Numerical issues
4.1 Introduction
4.2 Fixed‐ and floating‐point representations
4.3 Dynamic range and accuracy
4.4 Laboratory exercise
4.5 Conclusion
References
5 Software optimisation
5.1 Introduction
5.2 Hindrance to software scalability for a multicore processor
5.3 Single‐core code optimisation procedure
5.4 Interfacing C with intrinsics, linear assembly and assembly
5.5 Assembly optimisation
5.6 Software pipelining
5.7 Linear assembly
5.8 Avoiding memory banks
5.9 Optimisation using the tools
5.10 Laboratory experiments
5.11 Conclusion
References
6 The TMS320C66x interrupts
6.1 Introduction
6.2 The interrupt controller
6.3 Laboratory experiment
6.4 Conclusion
References
7 Real‐time operating system: TI‐RTOS
7.1 Introduction
7.2 TI‐RTOS
7.3 Real‐time scheduling
7.4 Dynamic memory management
7.5 Laboratory experiments
7.6 Conclusion
References
8 Enhanced Direct Memory Access (EDMA3) controller
8.1 Introduction
8.2 Type of DMAs available
8.3 EDMA controllers architecture
8.4 Parameter RAM (PaRAM)
8.5 Transfer synchronisation dimensions
8.6 Simple EDMA transfer
8.7 Chaining EDMA transfers
8.8 Linked EDMAs
8.9 Laboratory experiments
8.10 Conclusion
References
9 Inter‐Processor Communication (IPC)
9.1 Introduction
9.2 Texas Instruments IPC
9.3 Notify module
9.4 MessageQ
9.5 ListMP module
9.6 GateMP module
9.7 Multi‐processor Memory Allocation: HeapBufMP, HeapMemMP and HeapMultiBufMP
9.8 Transport mechanisms for the IPC
9.9 Laboratory experiments with KeyStone I
9.10 Laboratory experiments with KeyStone II
9.11 Conclusion
References
10 Single and multicore debugging
10.1 Introduction
10.2 Software and hardware debugging
10.3 Debug architecture
10.4 Advanced Event Triggering
10.5 Unified Instrumentation Architecture
10.6 Debugging with the System Analyzer tools
10.7 Instrumentation with TI‐RTOS and CCS
10.8 Laboratory sessions
10.9 Conclusion
References
11 Bootloader for KeyStone I and KeyStone II
11.1 Introduction
11.2 How to start the boot process
11.3 The boot process
11.4 ROM Bootloader (RBL)
11.5 Boot process
11.6 Laboratory experiment 1
11.7 Laboratory experiment 2
11.8 TFTP boot with a host‐mounted Network File System (NFS) server – NFS booting
11.9 Conclusion
References
12 Introduction to OpenMP
12.1 Introduction to OpenMP
12.2 Directive formats
12.3 Forking region
12.4 Work‐sharing constructs
12.5 Environment variables and library functions
12.6 Synchronisation constructs
12.7 OpenMP accelerator model
12.8 Laboratory experiments
12.9 Conclusion
References
13 Introduction to OpenCL for the KeyStone II
13.1 Introduction
13.2 Operation of OpenCL
13.3 Command queue
13.4 Kernel declaration
13.5 How do the kernels access data?
13.6 OpenCL memory model for the KeyStone
13.7 Synchronisation
13.8 Basic debugging profiling
13.9 OpenMP dispatch from OpenCL
13.10 Building the OpenCL project
13.11 Laboratory experiments
13.12 Conclusion
References
14 Multicore Navigator
14.1 Introduction
14.2 Navigator architecture
14.3 Complete functionality of the Navigator
14.4 Laboratory experiment
14.5 Conclusion
References
15 FIR filter implementation
15.1 Introduction
15.2 Properties of an FIR filter
15.3 Design procedure
15.4 Laboratory experiments
15.5 Conclusion
References
16 IIR filter implementation
16.1 Introduction
16.2 Design procedure
16.3 Coefficients calculation
16.4 IIR filter implementation
16.5 Laboratory experiment
16.6 Conclusion
Reference
17 Adaptive filter implementation
17.1 Introduction
17.2 Mean square error
17.3 Least mean square
17.4 Implementation of an adaptive filter using the LMS algorithm
17.5 Implementation using linear assembly
17.6 Implementation in C language with compiler switches
17.7 Laboratory experiment
17.8 Conclusion
References
18 FFT implementation
18.1 Introduction
18.2 FFT algorithm
18.3 FFT implementation
18.4 Laboratory experiment
18.5 Conclusion
References
19 Hough transform
19.1 Introduction
19.2 Theory
19.3 Limits of r and θ and θ
19.4 Hough transform implementation
19.5 Laboratory experiment
19.6 Conclusion
References
20 Stereo vision implementation
20.1 Introduction
20.2 Algorithm for performing depth calculation
20.3 Cost functions
20.4 Implementation
20.5 Conclusion
References
Index
End User License Agreement
Search in book...
Toggle Font Controls
Playlists
Add To
Create new playlist
Name your new playlist
Playlist description (optional)
Cancel
Create playlist
Sign In
Email address
Password
Forgot Password?
Create account
Login
or
Continue with Facebook
Continue with Google
Sign Up
Full Name
Email address
Confirm Email Address
Password
Login
Create account
or
Continue with Facebook
Continue with Google
Prev
Previous Chapter
20 Stereo vision implementation
Next
Next Chapter
End User License Agreement
Index
Symbols
.D unit
.L unit
.M unit
.s unit
#pragma DATA_ALIGN
#pragma DATA_SECTION
#pragma MUST_ITERATE
#pragma omp
#pragma omp atomic
#pragma omp barrier
#pragma omp critical
#pragma omp declare target
#pragma omp end declare target
#pragma omp for
#pragma omp for nowait
#pragma omp master
#pragma omp parallel
#pragma omp parallel copyin
#pragma omp parallel default
#pragma omp parallel firstprivate
#pragma omp parallel for
#pragma omp parallel for reduction
#pragma omp parallel if
#pragma omp parallel num_threads
#pragma omp parallel private
#pragma omp parallel sections
#pragma omp section
#pragma omp single
#pragma omp target
#pragma omp target data
#pragma omp target update
#pragma omp task
#pragma omp taskwait
#pragma omp threadprivate
#pragma UNROLL
a
A – synchronisation
AB – synchronisation
ACCELERATOR
Accessing the event combiner
Accumulation
Accumulator packet data structure processors
Adaptive filters
ADAS
Address cross paths
Advance debugging using the diagnostic feature
Advanced Event Triggering
Advanced Event Triggering logic
Algorithm for performing depth calculation
Aliasing
Alignment
Altera
AMD
Amdahl’s law
Analogue‐to‐digital filter design
Application‐specific integrated circuits (ASICs)
Architecture
ARM
Assembler
Assembly code
Assembly optimisation
Atomic
Avoiding memory banks
b
Barrier
Basic debugging profiling
Benchmark
Bilinear transform (BZT) method
Blackman
BOOT
Boot process
Bootloader
Bootloader initialization after power‐on reset
Bootloader initialization process after hard or soft reset
breakpoint
Building the OpenCL project
c
C compiler options
Cache
Cascade
Cascade structure
Chaining
Chaining EDMA transfers
Channel
Channel options parameter (OPT)
Channel priority
Chip‐level interrupt controller
Chip‐level interrupt controllers (CICs or CpIntcs)
Choosing
CIC0 event inputs
CIC controller for the 66AK2H14/12
CIC controllers for the TMS32C6678
CIC register offsets
CICs (CIC0, CIC1, CIC2 and CIC3)
Cl_mem_flags
Clause descriptions
Clauses
Clock
Clock functions
Clock, periodic
Code Composer Studio (CCS)
Coefficient
Coefficients calculation
Command queue
Command‐queue properties
Compiler
Complete functionality of the Navigator
Components dependency.
Compute devices
Compute units
Condition registers
Configuration example for HeapMultiBuf
Constant memory
Context/platform
Control registers
Cost functions
CPU
Create a message queue
Creating a buffer
Creating a command queue
Creating a dependency graph
Creating a GateMP instance
Creating the boot parameter table
Creating the boot table
Critical
Cross paths
CUDA
d
Data cross paths
Data memory
Data memory access
Data path
Deadlock
Debug
Debug architecture
Debugging
Debugging with the System Analyzer tools
Dependency graph
Dequeue priority
Descriptors
Design procedure
Direct form structure
Direct memory access
Direct structure
Directive formats
Discrete fourier transform
Disparity
DMA
DMA Controller
Double‐word access
Dynamic memory management
Dynamic power
Dynamic range and accuracy
e
EDMA3 channel controller (EDMA3CC)
EDMA3 transfer controller (EDMA3TC)
EDMA controllers architecture
EDMA prioritisation
enabler functionality
Enhanced Direct Memory Access (EDMA) Controller
Enqueueing a kernel
Entering a GateMP
Environment variables and library functions
Evaluation module (EVM)
Event combiner
Event loggers
Event management (resource sharing and job load balancing)
Event trace
Event with a callback function
Events
Exploiting the periodicity and symmetry of the twiddle factors
f
Fast Fourier transform (FFT)
FFT algorithm
field‐programmable gate array (FPGA)
Filter
Filter coefficients
Fixed‐ and floating‐point representations
Fixed‐point arithmetic
Floating‐point arithmetic
Forking region
Fourier
Fourier series
Fourier transform
Fractional numbers
Freescale
Frequency response of an FIR filter
Frequently used windows
FTP
Functional units
Functions for the ListMP module
g
GateMP
GateMP module
General purpose input–output (GPIO)
Generating assembly code
Global work‐item ID
Gordon Moore
GPU
GPU‐accelerated
Graphic processor
h
Hamming
Hand optimisation of the dotp function using linear assembly
Hanning
Hardware development tools
Hardware interrupts (Hwis)
Heap
Heap allocation
HeapBuf
HeapMem
HeapMin
HeapMP
HeapMultiBuf
HeapMultiBuf_Params
Heterogeneous
Hibernation
High‐performance
High‐performance computing (HPC)
Hindrance to software scalability for a multicore processor
Host
Host Interrupt Map Registers
Host interrupt mapping for the CIC0 viewed with the CCS
Host packet descriptors
Host‐side tooling
Hough
Hough transform
How do the kernels access data?
How GateMP is used
How to configure the semaphores
How to start the boot process
How to use the IBL
HPI
Hwi
Hwi hook functions
IDE
i
Idle functions
Illustration of work item and workgroups
Impulse invariant method
Infrastructure PKDMA
Initialising a GateMP parameter structure
Initialization stage for the KeyStone I
Initialization stage for the KeyStone II
Instruction‐level parallelism (ILP)
Instrumentation with TI‐RTOS and CCS
integrated development environment
integrated development environment (IDE)
Inter‐processor
Interfacing C and assembly
Interfacing C with intrinsics
Intermediate bootloader
Internal timer
Interrupt Channel Map Registers
Interrupt controller
Interrupt distributor module
Interrupt response procedure.
Interrupt sources and priority
Intrinsics
k
Kernel
Kernel declaration
Key applications of high‐performance multicore devices
KeyStone I
KeyStone II
KeyStone memory
l
Least mean square (LMS)
Leaving a Gate
Linear assembly
Linear assembly and assembly
Linear phase structures
Link RAM
Linked EDMAs
Linker
Linux
ListMP module
Local L2 memory for all TMS320C6678 cores
Local memory
Local work‐item ID
Logging events with Log_write() functions
LogSnapshot APIs for logging state information
Loop unrolling
Lucent
m
Main MessageQ functions
Main TI family of embedded processors
mantissa
Maps
Maskable
Matlab
Mean square error (MSE)
Memory leak
Memory location of the CIC0 and CIC1 for the TMS320C6678
Memory protection and extension
Memory structure, including the MPAX for KeyStone.
Memory throughput
Message priority
Message priority settings
MessageQ
MessageQ protocol
Monolithic packet descriptor
Moonshot
Multicore
Multicore processor
Multicore Shared Memory Controller
Multicore support
n
Navigator
Navigator architecture
NCC
Normalised cross correlation (NCC)
Notify
Notify module
Nvidia
o
omp for
omp master
omp parallel – parallel region construct
omp sections
omp single
omp task
Open Event Machine
Open Multimedia Application Platform (OMAP)
OpenCL
OpenCL memory model for the KeyStone
OpenCL platform model
OpenCV
OpenEM
OpenMP
OpenMP accelerator model
OpenMP dispatch from OpenCL
OpenMP for the ARM code
OpenMP for the kernel code
OpenMP loop scheduling
Operands
Operation of OpenCL
Optimisation
Optimisation using the tools
Out‐of‐order execution
Overall functionality of the interrupt mechanism
p
Parallel instructions
Parameter RAM (PaRAM)
PayPal
Performance comparison
Peripherals
Phase linearity of an FIR filter
Ping‐pong
Pipelining
PKDMA receive side
PKDMA transmit side
Pole‐zero placement approach
Power consumption
Predefined software events and metadata
Printing the U‐Boot environment
Priority illustration
Private memory
Profiling
Program control unit
Properties of an FIR filter
q
Quality of service
Queue Manager
Queue Manager Subsystem
Queue peek registers
r
Race condition
Real‐time scheduling
Realisation structure
Rectangular
Register file A and file B
Registers
Removing the NOPs
Reset
Resource allocation
ROM Bootloader (RBL)
RTOS Analyzer
s
Scheduling table
Second bootloader for the KeyStone II
Second‐level bootloader
Semaphore_pend
Semaphore_post
Semaphores
Serial port
Set the connection between the host (PC) and the KeyStone
Setting an Hwi
Setting up the memory regions for the descriptors
Signed integer
Simple EDMA transfer
Simplified memory structure for KeyStone.
Single‐core code optimisation procedure
single‐shot functions
Software and hardware debugging
Software development tools
Software instrumentation APIs
Software interrupts (Swis)
Software pipelining
Software‐pipelining procedure
Special numbers for the 32‐bit and 64‐bit floating‐point formats
Specifications
Splitting the DFT into two DFTs
Stack allocation
Standard trace
Static power
stereo
stereo vision
Sum of absolute differences (SAD)
Sum of squared differences (SSD)
Sunway
supercomputer
superscaling
Supported OpenMP device constructs
symmetric multiprocessing (SMP)
synchronisation
Synchronisation between the writer and the reader
Synchronisation constructs
Synchronization
SYS/BIOS event capture and transport
Sysbios
System (transfer controller) priority
System Analyzer
System Interrupt Status Indexed Set Register
System‐on‐chip (SoC)
System trace
t
Target‐side coding with UIA APIs and the XDCtools
Target‐side tooling
Task
Task hook functions
Texas Instruments DSPs
Texas Instruments IPC
TFTP boot with a host‐mounted Network File System (NFS) server – NFS booting
The boot configuration format
The boot configuration table
The boot process
The list of functions that can be used by GateMP
The PKDMA
Theory
Thread synchronisation
TI–RTOS
Timer functions
TMS320C54x
TMS320C66x control registers
Top 10 supercomputers
Trace
Transfer synchronisation dimensions
Transistor
Transport mechanisms for the IPC
Transports
Trigger source priority
Type of DMAs available
Types of gate protection
u
U‐Boot
Unified Breakpoint Manager
Unified Instrumentation Architecture
Universal Asynchronous Receiver/Transmitter (UART)
UNROLL
Unsigned integer
User event
Using RTOS Object Viewer
Using the help for U‐Boot
Using the RTOS Analyzer and the System Analyzer
v
Various interrupts available.
VMWare
w
wait_group_events
Waiting for one command or all commands to finish
Window method
Word access
Work‐sharing constructs
Workgroups
Writing linear assembly
x
x86
XDCtools
z
Zero‐mean normalised cross correlation (ZNCC)
Zero‐mean sum of absolute differences (ZSAD)
Zero‐mean sum of squared differences (ZSSD)
Add Highlight
No Comment
..................Content has been hidden....................
You can't read the all page of ebook, please click
here
login for view all page.
Day Mode
Cloud Mode
Night Mode
Reset