Index
Page numbers with “f” denote figures; “t” tables.
A
Acceptable timeliness,
19–20
ACTIVE, variable
runtime behavior of program,
68f
ACTIVATE/PRECHARGE command,
266
Adaptive differential pulse code modulation (ADPCM),
525
Adaptive Multi-Rate (AMR) codec,
423
Advanced Mobile Phone Systems (AMPS),
423
Algorithm complexity,
Aliasing,
Analog signal processing (ASP),
vs. digital signals,
Analog systems,
Analog-to-digital conversion (ADC),
3–6,
for signal processing,
5f
Analog-to-digital converter,
data plotted over time,
8f
Antenna systems, multiple input multiple output (MIMO),
87
API, multithreading disable,
328t
Apodization coefficients,
506
Application specific integrated circuits (ASICs),
31,
103,
107–108
Arithmetic processing unit (APU),
38
ASCII numbers,
advantages/disadvantages,
170
Asymmetric multiprocessing (AMP),
300,
361
ATCA-9100 from Radisys,
529f
Audio/speech signal,
Auto-vectorizing compiler technology,
177–179
Matlab, Labview and FFTW-like generator suites,
178
Matlab and native compiled code,
178–179
Azimuthal resolution,
496,
499
B
Barrier_Wait command,
305
Beamforming MIMO-OFDM system
baseband representation of,
90f
Bilinear Transform technique,
124
Block processing model,
46
C
C, caller procedure,
222f
C programming language
finite impulse response (FIR) filter,
171
fractional types and saturation,
172–173
with intrinsics and pragmas,
170–175
standard C integral types,
171
Cache line, set associativity by,
273f
Caller assembly code
Calling conventions,
191t
generated code for function,
193f
Carrier frequency offset (CFO),
603
HLS design tool flow,
138f
Synthesis RTL design flow,
136f
Catch-all algorithms,
120
Cdma2000-1xEVDO systems,
423
Channel matrix coefficients,
77–78
Chip-level arbitration and switching system (CLASS),
278,
363
C-like programming language,
71
CMP.EQ instruction,
68–69
Code division multiple access (CDMA),
75–76,
423
additional optimization configurations,
185
analyzing compiled code,
185
basic C optimization techniques
basic compiler configuration,
183–184
compiler optimization,
183
DSP architecture, background,
186–187
intrinsics to leverage DSP features,
189–191
loop transformations,
200
count information, communicating,
196–197
restrict/pointer aliasing,
196
unaligned accesses, use of,
199
vendor DSP libraries,
200
CodeWarrior Development Studio,
374
CodeWarrior plug-ins,
381
Common channel signalling system 7 (CCSS7),
526
Compiler loop optimization, loop unrolling,
230
Component off the Shelf (COTS),
533–534
Computer tomography (CT),
493
Customer premises equipment (CPE),
523
Cycle accurate simulators (CAS),
166
Cyclic prefix (CP) padding,
602–603
D
Data dependence analysis,
235
DDR controller, to memory connection,
264f
logical bank interleaving,
266f,
267
Delay compensation mechanism,
548–551
Delay phase
Design process of DSPs
algorithm development and validation,
339–340
block diagram of general system design flow,
338f
concept and specification phase,
338–340
development tool flow,
348
factory and field test,
343
graphical user interface (GUI),
345
high level system design and performance engineering,
341–342
Integrated Development Environment (IDE),
345–348,
346f
real-time analysis of system,
347–348
software development,
342
software performance engineering (SPE),
341–342
specification process,
339
standards and guidelines for algorithm,
340
system build, integration, and test,
342–343
system configuration tools,
348
Destructive interferences,
496
Digital data, type of,
Digital loop carrier (DLC),
524
Digital signal processing (DSP) system,
3–4,
4f
changeability,
expandability,
reliability,
repeatability,
size, weight, and power,
analog signal processing (ASP),
analog-to-digital conversion (ADC),
computer,
definition of,
digital,
digital-to-analog conversion (DAC), ,
measuring power consumption,
246–249
motor control systems,
10–11
output,
processing,
refrigeration compressors,
11
sampling errors,
sampling frequency,
signal source,
application algorithms,
337
architectural features of,
45f
challenges in application development,
337
communication mechanism,
336
development environments,
336
software development using,
335–336
Digital-to-analog conversion (DAC), ,
10f,
116
Direct memory access (DMA),
276
Discrete DDR3 memory chip’s rows/columns
Discrete Fourier transform (DFT),
119
DSP acceleration decisions
computational complexity,
41
signal processing algorithm parallelism,
41
DSP algorithms
overflow and saturation,
126
DSP applications
profiling and determining hot spots,
57f
DSP code optimization,
56f
DSP core
32-bit multiplication,
189
high-level architectural comparison of,
186
DSP IDE, main components of,
52f
DSP operation systems,
292
connected to network,
309f
memory management
virtual memory and memory protection,
306
synchronization primitives,
304–305
networking
inter-processor communication,
306–309
processes, threads and interrupts,
294–298
blocking vs. non-blocking jobs,
312
multicore considerations,
313
preemptable vs. non-preemptable scheduling,
312
priority inheritance,
328
DSP RTOS component architecture,
53f
DSP Software Code Optimization,
281
DSP software development,
51–52
DSP system
computing the channels,
49f,
50f
top eight to ten performance intensive algorithms,
58f
DSP VoIP framework differentiators,
551–569
DTMF detection and transmission,
551–557
DSP-based embedded system,
43
DTMF detection and transmission,
551–557
Dual data rate (DDR),
see DDR
Dual inline memory module (DIMM),
262
Dual tone multi frequency,
552
Dynamic host configuration protocol,
536–537
E
Echo in telephone networks,
532–533
Eclipse based development environment,
381–382
Electrical attenuation,
524
Embedded digital signal processing,
337
lifecycle using DSP,
30–34
acceleration decisions,
41–46
basics and architecture,
44–46
code tuning and optimization,
53–54
digital signal processors,
35–40
general purpose processors (GPPs),
33
performance, calculating,
48–51
signal processing solution,
40–41
software programmable,
32–33
model of sensors and actuators,
25f
Enhanced Full Rate (EFR),
423
Ethernet switch subsystem,
303f
F
Fast Fourier transform (FFT),
47
Field programmable gate arrays (FPGAs),
31,
77
Filter frequency response, low pass,
121f
Finite impulse response (FIR),
118
re-written with intrinsic,
173f
using SPE intriniscs,
39f
signal flow graph for FIR filter,
45f
Flex-Sphere, block diagram of,
82f
Flex-sphere tree traversal,
80
FPGA resource utilization,
85t,
86t
Freescale MSC8156 series,
110
Freescale StarCore CodeWarrior
Freescale StarCore DSPs,
170f
Freescale StarCore SC3850 DSP architecture,
65–66
Frequency division duplexing (FDD) mode,
92
G
General purpose processor (GPP),
31
Generated assembly code vs. example loop,
197
Get_Upper/Lower intrinsics,
39
Global System for Mobile Communications,
423
Graphic configuration tool,
331f
H
Hall effect IC voltage,
248f
Hamming window frequency response,
123f
Hardware acceleration in DSP systems,
443
Hardware/software continuum, DSP,
97–99
application driven design,
111
application specific integrated circuits (ASICs),
103,
107–108
embedded cores, general purpose,
109–110
algorithm suitability,
105
ASICs, advantages of,
108
computational throughput and power,
105
fixed point vs. floating point,
105–106
software programmable digital signal processing,
108–109
High level synthesis (HLS)
matrix multiplication design,
145–149
for complex DSP applications,
133–134
high-level design tools,
135
low-density parity-check (LDPC) codes
verification artifacts,
134
pipeline of processing arrays (PPA),
138–140
user specified constraints,
134
interface constraint,
134
High speed serial interface (HSSI),
258
Hilbert transformation,
517
Host control application,
541
Host processor baseboard (PDK),
528f
HRPD (high rate packet data) system,
423
Hybrid automatic repetition request (HARQ),
603
I
Infinite impulse response (IIR),
122–123
Inheritance algorithm,
329
Instruction set simulator (ISS),
166
Integrated circuit technology,
337
default perspectives,
386
Interactive voice response (IVR),
526
Internal components/functions,
535
International Telecommunication Union (ITU),
423
Internet engineering task force (IETF),
527
Inter-procedural optimizations,
170
Interrupt priority level (IPL),
253
Interrupt service routine (ISR),
296
IPSec implementations,
310
J
Japanese-TACS (JTACS),
423
Job characteristics,
323t
Joint Test Action Group (JTAG),
25
K
L
Level Control Unit (LCU) modules,
563–567
Log likelihood ratio (LLR),
143,
603
Long term evolution (LTE) systems,
423–425
advance baseband hardware co-processors,
425
barriers and locks for multi-core synchronization,
427f,
442–443
CRC generation and insertion,
425–426
deadlock prevention and data protection,
441–442
DL physical layer processing,
437f
eNodeB physical layer,
425
eNodeB shared data uplink processing chain,
452f
hardware acceleration,
443,
464
layer mapping and pre-coding,
431–433
multi-core digital signal processors,
438–441
physical resource-block mapping module,
433–442
point to point message posting,
8f,
428f,
431
rate matching and hybrid ARQ functionality,
426–427
shared memory space and CACHE coherency,
428–429
sub-frame pipelining,
444f
system components and design,
434–438
triggering of sequential and parallel processes,
443
24 bit CRC (CRC24
B) insertion,
425–426
UL symbol level processing,
Loops
Low-density parity-check (LDPC) codes,
141–142
M
Magnetic resonance imaging (MRI),
493
MATLAB remez function,
121
Maximum-likelihood (ML) detector,
78
Maximum ratio combining (MRC) vector,
89–90
system software functionalities,
535–541
Media processing element,
549
Medical devices, DSP for
Memory layout optimization,
231–240
data alignment’s rippling effects,
238–239
loop optimizations, for performance,
238
vectorization and dynamic code-compute ratio,
233–236
Memory management
memory protection OS,
306
virtual memory and memory protection,
306
Memory management unit (MMU),
294
arrays format, structure of,
238f
auto-vectorizing compiler technology,
238
compiler flags/flag mining,
218–219
size/performance tradeoffs, target ISA,
219–221
data structure, unit memory stride,
237f
example data structure,
236f
memory layout optimization,
231–240
data alignment’s rippling effects,
238–239
loop optimizations, for performance,
238
vectorization and dynamic code-compute ratio,
233–236
Memory optimizations,
232
MessageHandler() function,
260
Microcontroller solutions,
34f
Minimum mean squared error (MMSE),
603
Modified real-valued decomposition (M-RVD),
79,
84–85
Modulo addressing mode,
70
Motion JPEG application, using MSC8144 DSP
discrete cosine transfer (DCT),
370
inter-core communication,
373
Minimum Coded Units (MCUs),
369–370
Media Gateway for a voice over IP (VoIP) system,
364
memory system components,
363
address generation units (AGU),
512
data arithmetic logic units (DALU),
512
MSC8156 block diagram,
300f
Multi Instruction and Multi Data model,
391
Multicore communication application programming interface (MCAPI),
309
Multicore processing models,
363–367
build and link the application for,
389–403
compiler configuration for application,
393–399
executing and debugging application,
403–411
linker configuration for application,
400–403
multiple-single-cores software model,
364–366
project editing options,
390f
set-up launch configuration,
406,
407f
target configuration and verification,
411,
412f
variable length instructions sets (VLES),
411
Multimedia Broadcast Multicast Services (MBMS),
425
Multiple input multiple output (MIMO)
general characteristics of an application,
366
Multiply-accumulate operations per second (MMACS),
363
N
Network coprocessor (NETCP) peripheral,
303
New project, creating
Nonrecurring engineering (NRE) costs,
32
Nordic Mobile Telephone Systems (NMT),
423
Nyquist theorem,
reconstructed waveform,
7f
O
On chip emulator (OCE),
257
Optical channel (OC),
525
Optimization process, basic flow of,
170f
code placement, flexibility,
162
excluding non-related events,
165
runtime library code,
166
simulated measurement,
166
performance measurement, methods
performance counter-based measurement,
164
profiler-based measurement,
164–165
time-based measurement,
164
multicore/multidevice environment, execution,
163–165
test harness inputs, outputs, and correctness checking,
159–161
true system behaviors, modeling,
162–163
Orthogonal frequency division multiplexing (OFDM),
76,
87–88
P
Packet accelerator (PA),
303
Packet-switching technology,
546
Partial Euclidean Distances (PEDs),
78–79
Peak to average power ratio (PAPR),
601
Performance accurate simulators (PACC),
166
Personal digital assistants (PDAs),
11
PICO, pipelined LDPC decoder architecture,
144f
system level design flow,
139f
Pipeline of processing arrays (PPA),
138–140
Pipelined System Generator block diagram,
84f
Plain Old Telephone Service (POTS),
523
Pointer aliasing, illustration of,
196
Porting guidelines, multicore processing models,
367–379
Power architecture code,
220
Power consumption, software optimization,
11–12,
242
algorithmic optimization
compiler optimization levels,
280–281
during application runtime,
259–261
core component utilization,
250f
memory accesses, reducing power consumption,
261–262
compiler cache optimizations,
275–276
data transitions/power consumption,
270
explanation of locality,
271
optimizing memory software data organization,
267
optimizing power by timing,
265–266
optimizing with interleaving,
265–266
set-associativity, explanation of,
272–273
SRAM/cache data flow optimization,
268
SRAM power consumption and parallelization,
269–270
write back vs. write through caches,
274
eliminating recursion
low-power code sequences,
286
Freescale’s MSC815x low power modes,
253–254
Texas Instruments C6000 low power modes,
256–261
using Hall Sensor type IC,
247
voltage regulator module (VRM) power supply controller ICs,
247–249
peripheral/communication utilization,
276–286
interrupt processing,
280
speed grades and bus width,
279
time based processing,
280
STOP/WAIT instructions,
253
Power consumption savings
Power estimation module,
40
Power optimization techniques for DSP,
287t–288t
PPA architecture template,
139f
Private branch exchanges (PBX),
526
Processing elements (PE),
605
Processing node, implementation of,
152f
Processing with respect to shared channel data (PUSCH),
436
Processor solutions, general purpose of,
33f
Programmable DSP architectures,
337
predicated execution,
67–69
programmable DSP space,
64
SIMD operations, use of,
65–67
Freescale StarCore SC3850 DSP architecture,
65–66
memory architectures,
70–71
Public switched telephone network (PSTN),
523,
525
Pulse code modulation (PCM),
525
Pulse repetition frequency (PRF),
504
Pulsed wave approach,
501f
Q
QR decomposition system,
153f
Quality of service (QoS),
59
R
Real-time development tools,
336
Real-time environments,
557
efficient execution/execution environment,
19–20
centralized resource allocation/management,
23
multi-processor systems,
22–23
recovering from failures,
22
resource management,
19–20
event characteristics,
19
execution environment, characteristics,
18
multi-processor system,
22
vs. time-shared systems,
16–17
usefulness of results,
293f
RF demodulation methods,
517f
RMA, see Rate monotonic analysis (RMA)
Rotating, implementation of,
151f
RTL code generation,
137f
Run to completion procedure,
316
S
Schnorr-Euchner (SE) ordering,
80
Serial Rapid I/O (SRIO),
276
Signal processing engine (SPE)
Signal processing solution,
42f
Signal processing system,
Signal transmissions
voltage,
Signaling System 7 (SS7),
526
Single instruction multiple data (SIMD)
architecture processing engine,
36
Single-carrier frequency division multiplexing (SC-FDMA),
603
SoC level memory configuration,
270
Software defined radio (SDR),
599–609
functional architecture of base station,
601–605
Software development using DSPs,
335–336
Software interrupts (ISR),
297,
312
Software performance engineering (SPE),
65
initial performance estimates,
573
measurement error, reducing,
581–583
tracking and reporting the metrics,
575–581
Space time codes (STC),
87
Stages of DSP development process,
358–360
StarCore cores
fractional and integer operations,
173
StarCore DSPs
full bus usage with quad-word move,
194
State-of-the-art smartphones,
599–600
STATUS, variable
runtime behavior of program,
68f
Subscriber loop carrier (SLC),
524
Switched circuit network,
527
Symmetric multiprocessing (SMP) model,
361
Synchronous interface (CPRI),
607
Synchronous optical network (SONET),
525
System architecture,
150f
System Generator
System implementation,
61
T
Task control block (TCB),
298
Telephone networks, echo in,
527
3rd generation partnership project (3GPP),
75–76,
423
32-Bit embedded power architecture device,
219
Threading characteristics,
313t
Time division multiplexed (TDM) link,
525
Time slot interchange (TSI) devices,
530
TIPC (Inter process communication protocol),
307
Ti’s KeyStone architecture,
303
TMS320c5500 assembler language,
127
TMS320C6000 Optimizing Compiler,
188
Total Access Communication Systems (TACS),
423
initialization process,
377
Kernel Awareness plug-in module,
374
SDOS operating system,
376
U
UCC Ethernet Controller (UEC),
541
Unbounded priority inversion,
327f
The Unified Instrumentation Architecture (UIA),
105–106,
329
Universal Mobile Telecommunications System (UMTS),
32,
423
User interface functions,
13
V
Variable length execution set (VLES),
268–269
Vectoring, implementation of,
151f
Very Long Instruction Word (VLIW),
13–14
Voice activity detector,
543
software architecture of,
533f
Voice-band data (VBD) mode,
545–546
“Voice-to-voice” codec,
551
VoIP applications, DSP role in,
528–532
delay compensation mechanism,
548–551
system software functionalities,
535–541
migration to IP based transport,
526–528
Voltage ID (VID) parameters,
255
Voltage regulator modules (VRMs),
246,
255,
268
W
Wideband CDMA (WCDMA) system,
423
channel quantization,
93t
WiMAX Frequency Division Duplexing (FDD) mode,
87
WiMAX system, beamforming of,
94f
Wireless baseband software on multi-core,
448
adoption of advanced multi-core embedded platforms,
448
migrating from single-core to multi-core SoCs,
461–472
modular software design,
449
Wireless communications applications
code division multiple access (CDMA),
75–76
field programmable gate arrays (FPGAs),
77
flex-sphere detector,
79–81
tree traversal for,
79–81
modified real-valued decomposition (M-RVD),
84–85
modified real-valued decomposition (M-RVD) ordering,
81–82
multiple antenna (MIMO) system,
76f,
77–79
SDR handset detector, FPGA implementation of,
82–84
configurable design,
83–84
simulation results,
86–87
third generation networks (3GPP),
75–76
WiMAX, beamforming for,
87–99
computational requirements and performance,
91–93
WARPLab, experiments,
94–97
Xilinx FPGA implementation,
85–86
Wireless Open Access Research Platform (WARP),
94
Write-back cache scheme,
274
X
Xilinx Blockset/Memory,
151
Xilinx System Generator implementation of Flex-Sphere detector,
88f