Index

Page numbers with “f” denote figures; “t” tables.

A

Acceptable timeliness, 19–20
ACTIVE, variable
runtime behavior of program, 68f
ACTIVATE commands, 265–266
ACTIVATE/PRECHARGE command, 266
Adaptive differential pulse code modulation (ADPCM), 525
Adaptive Multi-Rate (AMR) codec, 423
ADI Blackfin DSP, 170
Advanced Mobile Phone Systems (AMPS), 423
Alamouti scheme, 99
beamforming system, 99f
Algorithm complexity, 9
Aliasing, 7
Amdahl’s law, 232
A-Mode, 501, 501f
Analog signal processing (ASP), 3
vs. digital signals, 5
Analog systems, 3
Analog-to-digital conversion (ADC), 3–6, 8
for signal processing, 5f
Analog-to-digital converter, 7
data plotted over time, 8f
Antenna systems, multiple input multiple output (MIMO), 87
Antennas, 83, 602–603
API, multithreading disable, 328t
Apodization coefficients, 506
Application specific integrated circuits (ASICs), 31, 103, 107–108
Arithmetic processing unit (APU), 38
ARM instruction set, 68
ASCII numbers, 1
Assembly, caller procedure, 223f, 225f
Assembly language, 169–170
advantages/disadvantages, 170
DSP kernels, 169
Asymmetric multiprocessing (AMP), 300, 361
AMP style sharing, 301f
ATCA-9100 from Radisys, 529f
Audio/speech signal, 2
Auto-vectorizing compiler technology, 177–179
Matlab, Labview and FFTW-like generator suites, 178
Matlab and native compiled code, 178–179
silicon emulation, 179
Axial direction, 495
Azimuthal direction, 495
Azimuthal resolution, 496, 499

B

Barrier_Wait command, 305
Beamforming, 505–507, 512f
Beamforming MIMO-OFDM system
baseband representation of, 90f
BER plots comparing, 89f
Bilinear Transform technique, 124
Blackman wind, 122
Block processing model, 46
of DSP, 47f
Blood velocity, 503
B-Mode, 501, 502f
B-Mode image, 498, 516f
Boot process, 536

C

C, caller procedure, 222f
C language, 336
code, 234
C programming language
custom types, 174
finite impulse response (FIR) filter, 171
floating point, 173
fractional types and saturation, 172–173
function pragmas, 174
intrinsic functions, 172–174
with intrinsics and pragmas, 170–175
pragmas, 174–175
standard C integral types, 171
statement pragmas, 174–175
variable pragmas, 175
Cache line, set associativity by, 273f
Callee routines, 223–224
Caller assembly code
user_calling_convention, 226, 227f, 229f
Calling conventions, 191t
configuration of, 175f
generated code for function, 193f
invoking, 175f
Carrier frequency offset (CFO), 603
Catapult C, 145–149
HLS design process, 136
HLS design tool flow, 138f
Synthesis RTL design flow, 136f
Catch-all algorithms, 120
Cdma2000-1xEVDO systems, 423
Central Office, 524f
Channel matrix coefficients, 77–78
ChConfig, 543
Chip-level arbitration and switching system (CLASS), 278, 363
C-like programming language, 71
CMP.EQ instruction, 68–69
Code division multiple access (CDMA), 75–76, 423
Code optimization, 182, 579–580
additional optimization configurations, 185
analyzing compiled code, 185
basic C optimization techniques
data types, 188
basic compiler configuration, 183–184
endianness, 183
memory model, 184
target architecture, 183
cache accesses, 199
compiler optimization, 183
development tools, using, 183–185
DSP architecture, background, 186–187
resources, 186–187
enabling optimizations, 184–185
inline small functions, 199–200
intrinsics to leverage DSP features, 189–191
calling conventions, 190–191
functions, 190–191
loop transformations, 200
loop unrolling, 200–201
loops, 196–197
count information, communicating, 196–197
hardware, 197–198
memory contention, 199
multisamping, 201–202
pointers/memory access, 194–196
ensuring alignment, 194
restrict/pointer aliasing, 196
unaligned accesses, use of, 199
using profiler, 185
vendor DSP libraries, 200
CodeWarrior Development Studio, 374
CodeWarrior IDE, 381, 382f
CodeWarrior plug-ins, 381
Codewords (CW), 602–603
Color Doppler, 502, 502f
Common channel signalling system 7 (CCSS7), 526
Communication buses, 58f
Compiler loop optimization, loop unrolling, 230
Component off the Shelf (COTS), 533–534
Computer tomography (CT), 493
Control traffic, 540–541
CPU, 580t
CPU load, 51
CPU speed, 48
Customer premises equipment (CPE), 523
Cycle accurate simulators (CAS), 166
Cyclic prefix (CP) padding, 602–603

D

Data ALU (DALU), 187
Data buses, 129f
Data dependence analysis, 235
DDR controller, to memory connection, 264f
DDR memory, 64-bit, 266f
logical bank interleaving, 266f, 267
Delay compensation mechanism, 548–551
Delay phase
receive, 506f
transmit, 506f
Design process of DSPs
algorithm development and validation, 339–340
block diagram of general system design flow, 338f
challenges for, 343–344
concept and specification phase, 338–340
data visualization, 347
debugging, 347–348
development tool flow, 348
DMA function, 350
factory and field test, 343
generic data flow example, 348–354
graphical user interface (GUI), 345
high level system design and performance engineering, 341–342
Integrated Development Environment (IDE), 345–348, 346f
modeling tools, 344
real-time, 345
real-time analysis of system, 347–348
software development, 342
software performance engineering (SPE), 341–342
specification process, 339
standards and guidelines for algorithm, 340
system build, integration, and test, 342–343
system configuration tools, 348
system-level, 344
toolboxes, 344–345
Destructive interferences, 496
DHCP boot, 537f
Digital data, type of, 1
Digital loop carrier (DLC), 524
Digital signal processing (DSP) system, 3–4, 4f
advantages of, 2–3
changeability, 3
expandability, 3
reliability, 3
repeatability, 3
size, weight, and power, 3
algorithms, 13
analog signal processing (ASP), 3
analog-to-digital conversion (ADC), 3
applications for, 10–11
high performance, 13–14
low cost, 10–11
power efficient, 11–14
computer, 4
definition of, 1
digital, 1
digital-to-analog conversion (DAC), 4, 9
GSM voice codec, 280
measuring power consumption, 246–249
motor control systems, 10–11
Nyquist criteria, 6–9
output, 4
processing, 2
processor, 3–4
refrigeration compressors, 11
sampling errors, 5
sampling frequency, 5
signal, 1–2
signal source, 3
Digital signal processors (DSP), 337, 505, 571–572
algorithms, 336
application algorithms, 337
architectural features of, 45f
challenges in application development, 337
code build tools, 354–358
communication mechanism, 336
design process, 337–343
development environments, 336
early, 336
evaluation module, 358
generic data flow example, 348–354
host development tools, 345–348
phases of development, 336, 336f
software development using, 335–336
starter kit, 358
Digital-to-analog conversion (DAC), 9, 10f, 116
Direct memory access (DMA), 276
three-dimensional, 277f
Discrete DDR3 memory chip’s rows/columns
basic drawing of, 263f
Discrete Fourier transform (DFT), 119
Doppler angle, 504
Doppler effects, 501–504
A-Mode, 501, 501f
B-Mode, 501, 502f
color Doppler, 502, 502f
M-Mode, 501, 501f
power Doppler, 502, 502f
spectral Doppler, 502–503, 503f
DSP acceleration decisions
computational complexity, 41
data locality, 41–43
signal processing algorithm parallelism, 41
DSP algorithms
aliasing, 116
applications of, 113–114
basic system, 116–119
block filtering, 128
circular buffers, 130
convolution, 119
correlation, 120
filtering, 118–119
FIR filter, 118–119
FIR filter, design, 120–121
Parks-McClellan algorithm, 120–121
frequency analysis, 119–124
IIR filter, 119
implementation, 124–126
FIR filter, 128
number format, 125
overflow and saturation, 126
MAC instruction, 128
on-chip RAM, 127–128
program/data buses, 128–129
system issues, 130
systems and signals, 114–116
windowing, 120–121
zero overhead looping, 129–130
DSP applications
profiling and determining hot spots, 57f
DSP architectures, 46, 124–125
DSP code optimization, 56f
DSP core
32-bit multiplication, 189
example intrinsic, 188
high-level architectural comparison of, 186
DSP Daughter Card, 529f
DSP design tool, 140
DSP development process, 55f, 59–61
DSP IDE, main components of, 52f
DSP kernel, 161
DSP operation systems, 292
connected to host, 310f
connected to network, 309f
memory management
barrier, 305f
memory allocation, 305–306
virtual memory and memory protection, 306
multicore considerations, 298–305
peripherals sharing, 302–305
synchronization primitives, 304–305
networking
inter-processor communication, 306–309
internetworking, 309–310
OS fundamentals, 292–293
processes, threads and interrupts, 294–298
real-time constraints, 293–298
scheduling, 310–329
blocking vs. non-blocking jobs, 312
cooperative scheduling, 312–313
deadline monotonic, 323
disabling, 328
dynamic priority, 323–325
multicore considerations, 313
offline scheduling, 314–320
offline vs. online, 325
online scheduling, 321
preemptable vs. non-preemptable scheduling, 312
priority ceiling, 329
priority inheritance, 328
priority inversion, 325–329
rate monotonic, 321–323
reference model, 311–312
static priority, 321–323
types of, 313
software interrupt, 297–298
tools support for, 329–331
DSP processor, 36f, 48–49
DSP RTOS component architecture, 53f
DSP SoCs, 57–59
advanced, 58f
visibility, 60f
DSP Software Code Optimization, 281
DSP software development, 51–52
DSP starter kit, 59
DSP system
basic, 117f
basic I/O for, 131f
computing the channels, 49f, 50f
evaluation board, 59f
top eight to ten performance intensive algorithms, 58f
DSP VoIP framework differentiators, 551–569
DTMF detection and transmission, 551–557
Goertzel filters, 557–569
notch filters, 561–562
peak filters, 560–561
power estimation module, 562–563
sections, 552–557
DSP-based embedded system, 43
DSPFWAPI, 533–535
DTMF detection and transmission, 551–557
DTMF frequency allocation, 550f, 554t
Dual data rate (DDR), see DDR
Dual inline memory module (DIMM), 262
Dual tone multi frequency, 552
Dynamic host configuration protocol, 536–537

E

Echo cancelling, 547–548
Echo in telephone networks, 532–533
Echo processing, 515–520
Echo source, 532f
Eclipse based development environment, 381–382
EDF algorithm, 323, 324f
Electrical attenuation, 524
Embedded C, 176
Embedded digital signal processing, 337
Embedded systems, 6, 23–26, 29–30
C++ for, 176–177
characteristics of, 26
components, 24f
DSP, 31
DSP solution, 32f
lifecycle using DSP, 30–34
acceleration decisions, 41–46
basics and architecture, 44–46
code tuning and optimization, 53–54
development flow, 54–61
digital signal processors, 35–40
FPGA solutions, 34–40, 35f
general purpose processors (GPPs), 33
hardware components, 31
hardware gates, 31–32
input/output options, 48
microcontrollers, 33–34
models of, 46–53
needs of system, 30–31
performance, calculating, 48–51
product design, 30–31
signal processing solution, 40–41
SoC, 60f
software, 51–53
software programmable, 32–33
model of sensors and actuators, 25f
reactive systems, 25–26
real-time systems, 20
sequence enumeration, 589–595
system requirements, 587–597
Enea’s LINX, 307
Enhanced Full Rate (EFR), 423
Envelope detection, 515
Ethernet frames, 302–303
Ethernet switch subsystem, 303f

F

F# (f-number), 496
Fast Fourier transform (FFT), 47
Field programmable gate arrays (FPGAs), 31, 77
Filter frequency response, low pass, 121f
Filtering, 113
Finite impulse response (FIR), 118
filter, 171
C code, 172f
with intrinsic, 173f
re-written with intrinsic, 173f
using SPE intriniscs, 39f
FIR diagram, 45–46
signal flow graph for FIR filter, 45f
FIT filter, basic, 38f
Flex-Sphere, block diagram of, 82f
Flex-sphere tree traversal, 80
FORTRAN routines, 178
4G technologies, 76
FPGA-based system, 104
FPGA resource utilization, 85t, 86t
FPGA solutions, 35f
Freescale DSP cores, 187
Freescale MSC8156 series, 110
Freescale StarCore CodeWarrior
compiler, 171t, 172t, 174t
IDE, 249
Freescale StarCore DSPs, 170f
Freescale StarCore SC3850 DSP architecture, 65–66
Freescale’s MSC8157, 536, 605
Freescale’s SmartDSP OS, 300–301
Frequency division duplexing (FDD) mode, 92
Frequency shift, 504
Fresnel region, 495

G

Gantt chart, 147f
for loop unrolling, 148f
GateMutexPri module, 328
Gateway, 527
Gaussian processes, 504
General purpose processor (GPP), 31
Generated assembly code vs. example loop, 197
Get_Upper/Lower intrinsics, 39
Global System for Mobile Communications, 423
GNU GCC compiler, 218
Goertzel filters, 557–569
Graphic configuration tool, 331f
GSM voice codec, 280

H

Hall effect IC voltage, 248f
Hamming window frequency response, 123f
Hardware acceleration in DSP systems, 443
Hardware/software continuum, DSP, 97–99
application driven design, 111
application specific integrated circuits (ASICs), 103, 107–108
architectures, 110–111
embedded cores, general purpose, 109–110
FPGA, in embedded design, 104–107
algorithm suitability, 105
ASICs, advantages of, 108
computational throughput and power, 105
fixed point vs. floating point, 105–106
implementation challenges, 106–107
software programmable digital signal processing, 108–109
HDTV, 29–30
High level synthesis (HLS)
abstraction, 133
benefits of derive, 134–135
Catapult C, 135–141
matrix multiplication design, 145–149
for complex DSP applications, 133–134
high-level design tools, 135
language, 133–134
low-density parity-check (LDPC) codes
using PICO, 141–144, 144f
objective of, 134
analysis feedback, 134
RTL implementation, 134
verification artifacts, 134
PICO C-Synthesis, 138–140
pipeline of processing arrays (PPA), 138–140
RTL module, 137
System Generator, 140–141
QR decomposition design, 149–154
user specified constraints, 134
design hierarchy, 134
interface constraint, 134
memory architecture, 134
performance, 134
target hardware, 134
High level systems, 504–515
High speed serial interface (HSSI), 258
Hilbert transformation, 517
Host control application, 541
Host processor baseboard (PDK), 528f
HRPD (high rate packet data) system, 423
HSPA NodeB, 603
Hybrid automatic repetition request (HARQ), 603

I

IEEE rounding modes, 72
Imaging modes, 501–503
Impulse response, 115f
IMT-2000 initiative, 423
Infinite impulse response (IIR), 122–123
filters, 118
Inheritance algorithm, 329
Instruction set simulator (ISS), 166
Integrated circuit technology, 337
Integrated development environments (IDE), 337, 345–348, 346f, 381–382
default perspectives, 386
project panel, 386
Interactive voice response (IVR), 526
Internal components/functions, 535
International Telecommunication Union (ITU), 423
Internet engineering task force (IETF), 527
Inter-procedural optimizations, 170
Interrupt priority level (IPL), 253
Interrupt service routine (ISR), 296
IP based transport, 526–528
IP protection, 219
IP/Ethernet, 527
IPSec implementations, 310
ISDN, 523
ITU-T V.8, 544

J

Japanese-TACS (JTACS), 423
Job characteristics, 323t
Job parameters, 315t
Joint Test Action Group (JTAG), 25
JTAG connection, 330

K

K MRC blocks, 91

L

Legacy equipment, 545–551
Level control unit, 564
Level Control Unit (LCU) modules, 563–567
Linear interpolator, 564
Linux, 307
Log likelihood ratio (LLR), 143, 603
Log viewer, 330f
Long term evolution (LTE) systems, 423–425
advance baseband hardware co-processors, 425
architecture, 424f, 425–446
barriers and locks for multi-core synchronization, 427f, 442–443
bit scrambling, 428–433
channel coding, 426–427
code block segmentation, 425–426
CRC generation and insertion, 425–426
creating set of jobs, 10
data modulation, 428–431
deadlock prevention and data protection, 441–442
DL physical layer processing, 437f
downlink channel, 425–443, 426f
dynamic scheduling, 445–446
eNodeB physical layer, 425
eNodeB shared data uplink processing chain, 452f
hardware acceleration, 443, 464
inter-core communication, 443–446
layer mapping and pre-coding, 431–433
load balancing, 442–443
multi-core digital signal processors, 438–441
OFDMA symbol generation, 433–434
parallelism and pipelining, 435f, 442–443
physical resource-block mapping module, 433–442
point to point message posting, 8f, 428f, 431
rate matching and hybrid ARQ functionality, 426–427
shared memory space and CACHE coherency, 428–429
static scheduling, 434f
sub-frame pipelining, 444f
system components and design, 434–438
triggering of sequential and parallel processes, 443
24 bit CRC (CRC24B) insertion, 425–426
UL chain processing, 440
UL symbol level processing, 6
Loops
dependence analysis, 234, 235f
unrolling, 230
vectorization of, 233f
Low-density parity-check (LDPC) codes, 141–142
LTE eNodeB, 602–603

M

MAC address, 302–303
MAC instructions, 64
Magnetic resonance imaging (MRI), 493
MAPLE accelerator, 258
MATLAB functions, 124
MATLAB remez function, 121
Maximum-likelihood (ML) detector, 78
Maximum ratio combining (MRC) vector, 89–90
Media Channel, 534
Media gateway, 532–541
controller, 528
system software functionalities, 535–541
TDM to IP processing path, 541–545
Media processing element, 549
Medical devices, DSP for
beamforming, 505–507
Doppler effects, 503–504
echo processing, 515–520
high level systems, 504–515
imaging modes, 501–503
medical imaging, 493–494
medical ultrasound, 494
ultrasound, 494–499
Medical imaging, 493–494
Medical ultrasound, 494
images, 505
Memory layout optimization, 231–240
arrays of data structures, 236–238
data alignment’s rippling effects, 238–239
data types selection, 239–240
loop optimizations, for performance, 238
optimization efforts, 232–233
overview of, 232
pointer aliasing in C, 235–236
vectorization and dynamic code-compute ratio, 233–236
Memory management
barrier, 305f
memory allocation, 305–306
memory protection OS, 306
virtual memory and memory protection, 306
Memory management unit (MMU), 294
Memory optimization, 217–218
arrays format, structure of, 238f
auto-vectorizing compiler technology, 238
code size, 218–231
ABI, tuning, 221–226
compiler flags/flag mining, 218–219
compiling code, 226–231
size/performance tradeoffs, target ISA, 219–221
data structure, unit memory stride, 237f
example data structure, 236f
flag mining, 218
kernels, performance, 239–240
memory layout optimization, 231–240
arrays of data structures, 236–238
data alignment’s rippling effects, 238–239
data types selection, 239–240
loop optimizations, for performance, 238
optimization efforts, 232–233
overview of, 232
pointer aliasing in C, 235–236
vectorization and dynamic code-compute ratio, 233–236
restrict keyword, 236f see also Memory layout optimization
Memory optimizations, 232
MessageHandler() function, 260
MEX file format, 179
Microcontroller, 36
Microcontroller solutions, 34f
Microprocessors (uP), 33
Min Finder, 83
Minimum mean squared error (MMSE), 603
MJPEG code, 257
M-Mode, 501, 501f
Mobile terminal, 599
Modified real-valued decomposition (M-RVD), 79, 84–85
ordering, 81–82
Modulo addressing mode, 70
Modulo scheduling, 67
Moore’s law, 24–25
Motion JPEG application, using MSC8144 DSP
AC coefficients, 371
design considerations, 372–373
discrete cosine transfer (DCT), 370
Huffman coding, 372
inter-core communication, 373
JPEG encoding process, 369–372, 370f
Minimum Coded Units (MCUs), 369–370
output video stream, 373
quantization step, 371
run-length coding (RLC), 371–372
scheduling, 372–373
zig-zag reordering, 371
MPC5554, 36, 37f
MSC8144, 541
block diagram, 362f
Media Gateway for a voice over IP (VoIP) system, 364
memory system components, 363
MSC8156, 272, 509–511, 520
address generation units (AGU), 512
data arithmetic logic units (DALU), 512
MSC8156 block diagram, 300f
MSC8156ADS board, 260
MSC8156’s Maple, 299
MSC8157 device, 37
MSC815x series DSPs, 278–279
Multi Instruction and Multi Data model, 391
Multicore communication application programming interface (MCAPI), 309
Multicore processing models, 363–367
application memory map, 391–393
breakpoints, 408–409, 409f
build and link the application for, 389–403
Code Coverage view, 419–421, 420f, 421f
CodeWarrior connection, 404, 416–417
compiler configuration for application, 393–399
considerations, 364t
creating new connections, 404–405
Critical code menu, 418–419, 418f, 419f
debugger actions, 406–411
DPU workflow, 414, 414f
DSP (SDOS) operating system, 389–391, 391f
executing and debugging application, 403–411
hardware breakpoints, 409–410, 410f
linker configuration for application, 400–403
MMU configuration tool, 411, 413f
motion JPEG application, 369–373
multiple-single-cores software model, 364–366
Performance view, 421–422, 422f
porting guidelines, 367–379
project editing options, 390f
Register view, 408
set-up launch configuration, 406, 407f
software analysis setup, 414–417, 415f
target configuration and verification, 411, 412f
Trace submenu, 417–418
tracing and profiling, 414–422, 417f
true-multiple-cores model, 366–367, 373–379
variable length instructions sets (VLES), 411
VTB location, 415–416, 416f
Multimedia Broadcast Multicast Services (MBMS), 425
Multiple input multiple output (MIMO)
antenna systems, 87
model, 78
techniques, 76
Multiple-single-cores software model, 364–366, 365f
advantages, 364–365, 365t
disadvantages, 365, 366t
general characteristics of an application, 366
Multiply-accumulate (MAC), 34, 128, 281–282
instruction, 269
Multiply-accumulate operations per second (MMACS), 363

N

Network coprocessor (NETCP) peripheral, 303
Network protocols, 539
New project, creating
demo, 383–386
Import dialogue, 386, 387f
project settings, 386
wizard, 383, 384f, 385f
workspace, 382, 383f
Nonrecurring engineering (NRE) costs, 32
NOP test, 249
Nordic Mobile Telephone Systems (NMT), 423
Notch filters, 561–562
Nyquist frequency, 7, 118
Nyquist limit, 504
Nyquist theorem, 6
reconstructed waveform, 7f
signal sample, 7f

O

OFDM system, 97, 109–110
Off-chip memory, 337
On chip emulator (OCE), 257
On-chip memory, 337
Optical channel (OC), 525
Optimization process, basic flow of, 170f
Optimizing DSP software, 157–158
build tools, protecting, 161–162
code placement, flexibility, 162
DSP kernel, isolating, 161–162
measurement, measuring, 165–168
excluding non-related events, 165
hardware measurement, 166–167
interrupts, 165
profiling results, 167–168
results, interpret, 168
runtime library code, 166
simulated measurement, 166
performance measurement, methods
hardware timers, 164
performance counter-based measurement, 164
profiler-based measurement, 164–165
time-based measurement, 164
system effects, 163
multicore/multidevice environment, execution, 163–165
RTOS overhead, 163
test harness inputs, outputs, and correctness checking, 159–161
true system behaviors, modeling, 162–163
cache effects, 162
memory latency, 163
writing, test harness, 158–161
Orthogonal frequency division multiplexing (OFDM), 76, 87–88

P

Packet accelerator (PA), 303
Packet-switching technology, 546
Parks-McClellan algorithm, 120–121
Partial Euclidean Distances (PEDs), 78–79
Partition, 601–602
PCM encoding, 547
Peak filters, 560–561
Peak to average power ratio (PAPR), 601
Performance accurate simulators (PACC), 166
Personal digital assistants (PDAs), 11
Phase distortions, 546–548
PHYSICAL banks, 263
Physical layer (PHY), 76
PICO, pipelined LDPC decoder architecture, 144f
PICO C-Synthesis, 138
system level design flow, 139f
Pipeline of processing arrays (PPA), 138–140
Pipelined System Generator block diagram, 84f
Plain Old Telephone Service (POTS), 523
Pointer aliasing, illustration of, 196
Porting guidelines, multicore processing models, 367–379
design considerations, 367–369
POSIX-style signal, 319
Power architecture code, 220
Power architecture cores, 219–220
Power consumption, software optimization, 11–12, 242
algorithmic optimization
compiler optimization levels, 280–281
eliminating recursion, 284–286
instruction packing, 281
loop unrolling, 281–282
software pipelining, 282–284
application’s, profiling, 249–251
average power, 245
cellular phone, 243, 252
clock and voltage control, 255–261
during application runtime, 259–261
at application start up, 258–259
in low power modes, 256–261
clock rate, 244
core component utilization, 250f
current flow, 244
data flow, 261–276
DDR overview, 262–264
memory accesses, reducing power consumption, 261–262
DDR data flow, 264–276
array merging, 275
cache coherency functions, 274–275
cache utilization, 270
compiler cache optimizations, 275–276
data transitions/power consumption, 270
DDR burst accesses, 267–268
DDR configuration, 267
explanation of locality, 271
interchanging, 275
memory layout for cache, 273–274
optimizing memory software data organization, 267
optimizing power by timing, 265–266
optimizing with interleaving, 265–266
set-associativity, explanation of, 272–273
SoC memory layout, 270
SRAM/cache data flow optimization, 268
SRAM power consumption and parallelization, 269–270
write back vs. write through caches, 274
eliminating recursion
low-power code sequences, 286
hardware support, 251–255
clock gating, 252
Freescale’s MSC815x low power modes, 253–254
low power modes, 251–252
power gating, 252
Texas Instruments C6000 low power modes, 256–261
leakage consumption, 248
measurement, 246–249
using ammeter, 246–247, 247f
using Hall Sensor type IC, 247
voltage regulator module (VRM) power supply controller ICs, 247–249
minimizing, 251–255
peripheral/communication utilization, 276–286
coprocessors, 278
to core communication, 279–280
DMA of data vs. CPU, 277–280
interrupt processing, 280
polling, 279–280
speed grades and bus width, 279
system bus configuration, 278–279
time based processing, 280
static vs. dynamic, 244–246
STOP/WAIT instructions, 253
understanding, 243–246
Power consumption savings
in PD modes, 261f
Power Doppler, 502, 502f
Power estimation module, 40
Power optimization techniques for DSP, 287t–288t
PPA architecture template, 139f
Precedence graph, 315f
PRECHARGE, 264
Priority inversion, 326f
Private branch exchanges (PBX), 526
Procedure inlining, 230, 231f
Processing elements (PE), 605
Processing node, implementation of, 152f
Processing with respect to shared channel data (PUSCH), 436
Processor, 605
Processor clock cycles, 220–221
Processor solutions, general purpose of, 33f
Programmable DSP architectures, 337
C data operations, 71–73
features of, 66f
DSP core/ISA, 63–69
DSP kernels, 65
predicated execution, 67–69
programmable DSP space, 64
SIMD operations, use of, 65–67
Freescale StarCore SC3850 DSP architecture, 65–66
memory architectures, 70–71
access sizes, 70–71
alignment issues, 71
Public switched telephone network (PSTN), 523, 525
architecture, 524f
Pulse code modulation (PCM), 525
Pulse repetition frequency (PRF), 504
Pulsed wave approach, 501f
PUSH/POP style, 226
PWM switching, 11

Q

QR decomposition system, 153f
Quality of service (QoS), 59
QUICC Engine, 364, 372, 377–378

R

RAM ports, 110–111
Rate monotonic method, 321–322, 322t
Real-time development tools, 336
Real-time environments, 557
Real-time systems, 1–2, 5–6, 19–22
definition of, 15
DSP systems, 17–18
efficient execution/execution environment, 19–20
centralized resource allocation/management, 23
challenges, 20–22
initialization, 22–23
load distribution, 23
multi-processor systems, 22–23
processor interfaces, 23
recovering from failures, 22
resource management, 19–20
response time, 21–22
event characteristics, 19
execution environment, characteristics, 18
hard, 17–18
inputs and outputs, 16f
multi-processor system, 22
soft and hard, 16
vs. time-shared systems, 16–17
usefulness of results, 293f
Real-world signals, 1–2
Recursion, cost, 285
Resource elements (RE), 602–603
RF demodulation methods, 517f
RMA, see Rate monotonic analysis (RMA)
ROM boot code, 536
Rotating, implementation of, 151f
RPTB instruction, 129
RTL code generation, 137f
RTL design flow, 136f
RTL implementation, 133–134
RTOS overhead, 163
RTOSes, 314
RTTI functionality, 177
Run to completion procedure, 316
RX Beamforming, 504–505

S

SC3850 core, 170
SC3400 DSP, 363
SC3850 prefetch, 268–269
Scan conversion, 520
Scan lines, 499, 499f
Schnorr-Euchner (SE) ordering, 80
Serial Rapid I/O (SRIO), 276
Signal, 1–2
circular buffers of, 66f
Signal processing engine (SPE)
architecture of, 37f
DSP capabilities, 39f
Signal processing solution, 42f
Signal processing system, 4
Signal transmissions
diagram of, 98f
voltage, 8
Signaling System 7 (SS7), 526
Single instruction multiple data (SIMD)
architecture processing engine, 36
capability, 43
extensions, 64
functionality, 38
vector, 70–71
hardware, 234
Single-carrier frequency division multiplexing (SC-FDMA), 603
SmartDSP Operating System (SDOS), 296, 314, 317, 328–329, 374, 383
motion JPEG demo, 257f
SoC level memory configuration, 270
Software architecture, 607–609
control plane, 608f
Software defined radio (SDR), 599–609
functional architecture of base station, 601–605
joint architecture, 604–605
LTE eNodeB, 602–603
partition, 601–602
processor, 605
UMTS and HSPA NodeB, 603
software architecture, 607–609
Software development team, 575–576
Software development using DSPs, 335–336
Software interrupts (ISR), 297, 312
Software performance engineering (SPE), 65
assessment, 573
initial performance estimates, 573
measurement error, reducing, 581–583
project description, 571–583
tracking and reporting the metrics, 575–581
Software pipelining, 229–230, 282
Source code, 69f
Space time codes (STC), 87
Spectral Doppler, 502–503, 503f
Spectrum analysis, 113
Speech coding algorithms, 529–530
SRAM memory, 261–262, 268
SRIO port, 279
Stages of DSP development process, 358–360
StarCore, 52, 298
StarCore cores
fractional and integer operations, 173
StarCore DSPs
full bus usage with quad-word move, 194
StarCore processors, 191
State-of-the-art smartphones, 599–600
STATUS, variable
runtime behavior of program, 68f
Subscriber loop carrier (SLC), 524
SWI, stack, 299f
Switched circuit network, 527
Symmetric multiprocessing (SMP) model, 361
Synchronous interface (CPRI), 607
Synchronous optical network (SONET), 525
SYS/BIOS, 298, 303, 324, 329
System architecture, 150f
System Generator
MATLAB M-code, 140
System implementation, 61

T

T1 frame format, 526f
Task control block (TCB), 298
Taylor approximation, 495, 497–498
T-carrier, 525
TCP/IP stack, 309
TDM interface, 316
TDM-IP channel, 534f, 543–544
TDM-IP media gateway, 528–531
TDM to IP path, 540f
Teager-Kaiser algorithm, 563–564
Telephone networks, echo in, 527
3rd generation partnership project (3GPP), 75–76, 423
32-Bit embedded power architecture device, 219
Threading characteristics, 313t
3G Media Gateway, 527
3GPP WCDMA, 603
Time division multiplexed (TDM) link, 525
Time slot interchange (TSI) devices, 530
TIPC (Inter process communication protocol), 307
Ti’s KeyStone architecture, 303
TK loops, 567–569
TMS320c5500 assembler language, 127
TMS320C6000 Optimizing Compiler, 188
Total Access Communication Systems (TACS), 423
Transducers, 500f
True-multiple-cores model, 366–367, 367f
advantages, 367, 368t
CodeWarrior IDE, 374
data input/output process, 378–379
disadvantages, 367, 368t
implementation of, 373–379
initialization process, 377
inter-core communication, 377–378, 377f
Kernel Awareness plug-in module, 374
master-slave approach, 373, 374f
scheduler functionality, 375–376, 375f
SDOS operating system, 376
serialization, 378–379
WAIT state, 376

U

UCC Ethernet Controller (UEC), 541
Ultrasound, 494–499
design use, 507–515
Ultrasound imaging, 493
Doppler effects in, 503
limitation of, 494
Ultrasound system, 505f
Ultrasound transducers, 499–500
Unbounded priority inversion, 327f
The Unified Instrumentation Architecture (UIA), 105–106, 329
Universal Mobile Telecommunications System (UMTS), 32, 423
User interface functions, 13
User’s source code, 220

V

Variable length execution set (VLES), 268–269
Vectoring, implementation of, 151f
Verilog HDL (VHDL), 106, 135
Very Long Instruction Word (VLIW), 13–14
ALUs, 70–71
architecture, 64
Virtual circuits, 525
Voice activity detector, 543
Voice codec, 547
Voice processing, 529–530
software architecture of, 533f
Voice-band data (VBD) mode, 545–546
“Voice-to-voice” codec, 551
VoIP applications, DSP role in, 528–532
framework, 531–532
framework differentiators, 545–551
delay compensation mechanism, 548–551
legacy equipment, 545–551
phase distortions, 546–548
media gateway, 532–541, 545–546
system software functionalities, 535–541
TDM to IP processing path, 541–545
TDM-IP media gateway, 528–531
VoIP domain, 523–528
migration to IP based transport, 526–528
wired TDM telecom network, 523–526
Voltage ID (VID) parameters, 255
Voltage regulator modules (VRMs), 246, 255, 268

W

WARP nodes, 97
WARPLab setup, 96, 96f, 97f
experiment setup, 97f
Watchdog timer, 538f, 539
Waveform generators (WV), 601–602
WCDMA transmitter, 608f
Wideband CDMA (WCDMA) system, 423
WiMAX codebooks, 89–90, 92–93
channel quantization, 93t
WiMAX Frequency Division Duplexing (FDD) mode, 87
WiMAX standard, 91–92
WiMAX system, beamforming of, 94f
Wired TDM telecom network, 523–526
Wireless baseband software on multi-core, 448
adoption of advanced multi-core embedded platforms, 448
advantages, 459
Agile practices, 449
blocks and modules, 452–455
considerations, 459–461
DMA copy vs. MEMCPY, 490
migrating from single-core to multi-core SoCs, 461–472
modular software design, 449
P4080, example, 457–458, 472–484
parameters, 488
process principles, 449–451
quality principles, 449–451
refactoring, 450
reuse of software, 450–451
single core application, 455–457
software tools, 451
tips and tricks, 484–490
Wireless communications applications
code division multiple access (CDMA), 75–76
field programmable gate arrays (FPGAs), 77
flex-sphere detector, 79–81
tree traversal for, 79–81
4G technologies, 76
modified real-valued decomposition (M-RVD), 84–85
timing analysis, 85
modified real-valued decomposition (M-RVD) ordering, 81–82
multiple antenna (MIMO) system, 76f, 77–79
SDR handset detector, FPGA implementation of, 82–84
configurable design, 83–84
modulation order, 83–84
number of antennas, 83
PED computations, 82
simulation results, 86–87
third generation networks (3GPP), 75–76
WiMAX, beamforming for, 87–99
computational requirements and performance, 91–93
experiment setup, 97–99
WARPLab, experiments, 94–97
WARPLab framework, 94–97
wideband systems, 87–91
Xilinx FPGA implementation, 85–86
Wireless Open Access Research Platform (WARP), 94
with radio board, 95f
Write-back cache scheme, 274

X

Xilinx blockset, 140
Xilinx Blockset/Memory, 151
Xilinx System Generator implementation of Flex-Sphere detector, 88f
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.137.75