Abort mode, 23, 26t
Abort signal, 462
Absolute function, 254
Access permission
memory management units, 510–512
memory protection units, 470–474
page-table-based, 512
ADC instruction, 54, 93, 222, 573–574
ADD instruction, 54, 93, 166, 574–575
Address, 49
Address relocation, 493
Addressing modes
multiple-register, 65t
single-register, 63t–64t, 96
stack operations, 70t
ADR instruction, 78, 575–576
Advanced Microcontroller Bus Architecture bus. See AMBA bus
Aliasing, pointer, 127–130
ALIGN, 624
AMBA bus
development of, 8
protocol for, 8–9
AND instruction, 55, 94, 576
Application programmer interface, 131–132
Application programming interface, 369
Applications, 15
AREA, 624–625
AREA directive, 159
Argument registers, 121t, 172
Arithmetic instructions
barrel shift used with, 55
definition of, 53–54
description of, 80–81
examples of, 54–55
Arithmetic logic unit
barrel shifter and, 51f
data processing instructions processed in, 51
description of, 20
ARM assembler
directives, 624–631
expressions, 623–624
labels, 622
overview of, 620–621
variables, 621–622
ARM assembly code
bit-fields. See Bit-fields
conditional execution, 180–183
digital signal processing vs., 269
efficient switches, 197–200
instruction scheduling. See Instruction scheduling
register allocation. See Register allocation
ARM instruction
conditional execution of, 6
encodings, 637–638
ARM processor(s)
applications of, 15
architectures, 647–649
coprocessors attached to, 36–37
cores, 647–649
description of, 3
design philosophy of, 5–6, 15–16
development of, 3
embedded systems. See Embedded systems
exceptions handling, 318–319
family of, 38–44
functions of, 7
future of, 549
instruction set architecture. See Instruction set architecture
load-store architecture of, 19–20, 106t
changing of, 25
characteristics of, 26t
description of, 23, 318–319
naming convention, 647
nomenclature of, 37–38
operating systems for, 14–15
specialized, 43
variants of, 41t
ARM7 core
attributes of, 40t
family of, 40–41
pipeline for, 31, 32f
read-allocate policy, 422
ARM7EJ-S, 40, 41t
description of, 40, 41t
digital signal processing on, 270–272
instruction cycle timings, 653–654
ARM9 core
attributes of, 40t
family of, 42
pipeline length in, 31
read-allocate policy, 422
digital signal processing on, 275–277
instruction cycle timings, 656–657
Newton-Raphson division routines on, 217
description of, 164–165
digital signal processing on, 272–274
instruction cycle timings, 654–655
unsigned 64-bit by 64-bit multiply with 128-bit result, 210
ARM10 core
attributes of, 40t
family of, 42
pipeline length in, 31–32
read-allocate policy, 422
digital signal processing on, 277–278
instruction cycle timings, 658–659
ARM11 core
attributes of, 40t
family of, 43
instruction cycle timings, 661–665
ARM720T, 41t
ARM740T, 463, 467
ARM920T, 41t
ARM922T, 41t
ARM926EJ-S, 41t, 42
ARM940T, 41t, 42, 463
ARM946E-S, 41t, 42, 467
ARM966E-S, 41t
ARM1020E, 41t, 42
ARM1022E, 41t
ARM1026EJ-S, 41t, 42
ARM1136J-S, 41t
ARM1136JF-S, 41t
ARM High Performance Bus, 8
ARM instruction set. See Instruction set
ARM Peripheral Bus, 8
ARM Procedure Call Standard, 122
ARM1 prototype, 3
armasm , 158, 620
armcc, 105–106, 151
arm-elf-gcc, 105–106
ARM-Thumb interworking, 90–92
ARM-Thumb Procedure Call Standard
argument passing, 123f
description of, 70, 72, 120
function of, 122
ARMv1, 39t
ARMv2, 39t
ARMv2a, 39t
ARMv3, 39t
ARMv3M, 39t
architecture of, 106
description of, 39t
integer normalization on, 213–215
ARMv4T, 39t
architecture of, 106, 106t
integer normalization on, 212–213
description of, 79
extensions, 79–82
multiply instructions, 81–82
description of, 39t
ARMv5TEJ, 39t
ARMv5TE, 130t
architecture of, 550
complex arithmetic support, 554–555
cryptographic multiplication extensions, 559
description of, 39t
exception processing, 560, 562t
implementations, 563
mixed-endianness support, 560
most significant word multiplies, 558–559
multiprocessing synchronization primitives, 560–562
packing instructions, 554
reverse instructions in, 561f
saturation instructions, 555–556
single instruction multiple data arithmetic operations, 550–554
sum of absolute differences instructions, 556–557
Ascending stack, 70
.ascii, 632
.asciz, 632
ASR instruction, 94, 577–578
Assembly code
looping constructs. See Loop(s)
names allocated to variables, 172
writing of, 158–163
Atomic operation, 72


B instruction, 577
Background regions, for memory protection units, 464–465
Backward branch, 59
.balign, 632
Banked registers, 23–26
Barrel shifter
arithmetic instructions with, 55
arithmetic logic unit and, 51f
data processing instructions that do not use, 51
description of, 51
operations, 52t
syntax for, 53t
Base address register, 61
Base-two exponentiation, 244–245
Base-two logarithm, 242–244
BIC instruction, 55–56, 94, 577–578
Big-endian mode, 137, 138t
Biquads, 295–296
Bit permutations
description of, 249t, 249–250
Examples of, 251–252
macros, 250–251
Bit population count, 252–253
Bit reversal, 249t
Bit spread, 249t
Bitbuffer, 193
description of, 133–136
fixed-width bit-field packing and unpacking, 191–192
fixed-width bit-field packing and unpacking, 191–192
variable-width packing, 192–194
variable-width unpacking, 195–197
BKPT instruction, 578
BL instruction, 578
Block finite impulse response filters, 282–294
Block memory copy, 68
Block-floating algorithms, 149
Block-floating representation of digital signal, 263
BLX instruction, 90–91, 579
BNE instruction, 69
Boot code, 13–14
Booting, 13
Bootloader, 368, 377
Branch exchange, 60
Branch exchange with link, 60
Branch instructions
conditional, 92
description of, 58–60
variations of, 92–93
Branch prediction, 32
architecture levels of, 8
characteristics of, 8
function of, 7
schematic diagram of, 7f
Bus master, 8
Bus slaves, 8
BX instruction, 90–91, 579–580
BXJ instruction, 579–580
.byte, 632
Byte reversal, 249t


C code
data types
function argument, 111–112
local variable, 107–110
overview of, 105–107
signed, 112–113
unsigned, 112–113
with fixed number of iterations, 113–116
unrolling, 117–120
with variable number of iterations, 116–117
optimization of, 104–105
overview of, 104–105
portability issues, 153–154
C compilers
bit-fields, 133–136
datatype mappings, 107t
description of, 104–105
function calls, 122–127
inline assembly, 149–153
inline functions, 149–153
pointer aliasing, 127–130
register allocation, 120–122
structure arrangement, 130–133
unaligned data, 136–140
architecture of, 408–417
cleaning of, 438–443
coprocessor 15 and, 423
D-, cleaning of
description of, 423, 428
in Intel XScale SA-110 and Intel StrongARM cores, 435–438
procedural methods for, 428t, 428–431
test-clean command for, 428t, 434–435
way and set index addressing for, 428t, 431–434
definition of, 403, 457
description of, 9–10, 34–35
direct-mapped, 410–411
efficiency measurements, 417
flushing of, 423–427, 438–443
fully associative, 414
hit rate for, 417
improvements using, 406–407
initializing of, 465–466
logical, 406, 407f, 458
main memory and, relationship between, 410–412
memory management units and, 406–408, 512–513
miss rate for, 417, 443
performance of, 456–457
physical, 406, 407f, 458
primary, 405
region attributes, 474–477
secondary, 405
self-modifying code, 424
set associativity, 412–416, 458
simple, 408, 409f
size of, 408
split, 408, 424, 458
status bits in, 408–409
unified, 408, 458
write buffer used with, 403, 416–417, 457
writeback, 418–419
Cache bit, 474
Cache controller
description of, 409–410
replacement policy of, 419
Cache lines
definition of, 408, 457
eviction, 410, 419
replacement policies, 419–422
Cache lockdown
definition of, 443
by incrementing the way index, 445–449
Intel XScale SA-110, 453–456
lock bits for, 450–453
locking code and data, 444–445
method of, 445t
Cache policies
allocation policy on a cache miss, 422
cache line replacement policies, 419–422
description of, 418
write policy, 418–419
Cache-tag, 457–458
CDP instruction, 580
Checksums, 107–108
Circular buffers, 141, 177
CISC, 4f
CLZ instruction, 214, 580
CMN comparison instruction, 56, 94, 580–581
CMP comparison instruction, 56–57, 94, 582–583
CN, 625
Coalescing, 417
.code, 632
CODE16, 625
CODE32, 625
Command line interpreter, 369
Common object file format, 370
Common subexpression elimination, 127
Comparison instructions, 56–57
Compilers, 65
Complex instruction set computer. See CISC
Condition codes, 571–572
Condition field, 82
Condition flags, 27–29, 82, 181
Conditional branch instruction, 92
Conditional execution, 6, 29, 29t, 82–84, 180–183
Conditional instructions, 170
Content addressable memory, 414
Context switch
description of, 396–398, 486
page table activation, 497
description of, 409–410
replacement policy of, 419
function of, 7
description of, 36–37
instructions, 76–77
system control, 77
Coprocessor , 15
access permissions, 470t, 471f
cache and, 423
description of, 77
instruction syntax, 77–78
memory management unit configuration and, 513–515
Core extensions
cache memory, 34–35
coprocessors, 36–37
description of, 34, 44
function of, 19
memory management, 35–36
tightly coupled memory, 35, 36f
cos, 245
Count leading zeros
description of, 215–216
instruction, 80
Count trailing zeros, 215–216
Counted loops
decremented, 183–184
types of, 190–191
unrolled, 184–187
CP, 625
CP15:c7, 432t
CPS instruction, 581–582
CPY instruction, 582
Cryptographic multiplication extensions, 559
Current program status register
banked registers, 23–26
condition flags, 27–29
conditional execution, 29, 29t
description of, 21–23, 40t
fields of, 22
instruction sets, 26–27, 27t
interrupt masks, 27
processor modes, 23
saving of, 26
state instruction sets, 26–27
Cycle counter, 163
Cyclic redundancy check, 107


DATA, 625–626
C code
function argument, 111–112
local variable, 107–110
overview of, 105–107
signed, 112–113
unsigned, 112–113
description of, 136–140
handling of, 201–203
Data abort, 318t, 321
Data abort vector, 33
Data bus, 19
Data encryption standard permutation, 249t
Data pointers, 154
Data processing instructions
arithmetic instructions, 53–55
barrel shifter. See Barrel shifter
comparison instructions, 56–57
logical instructions, 55–56
move instructions, 50
multiply instructions, 57–58
Thumb instruction set, 93–95
Data streaming, 410
D-cache cleaning
description of, 423, 428
in Intel XScale SA-110 and Intel StrongARM cores, 435–438
procedural methods for, 428t, 428–431
test-clean command for, 428t, 434–435
way and set index addressing for, 428t, 431–434
DCB, 626
DCD, 626
DCI, 626
DCQ, 626
DCW, 626
Decimation-in-time radix-2 butterfly, 304
Decode, 164
Decremented counted loops, 183–184
Defines, 339
Descending stack, 70
Device driver, 369, 398–400
Diagnostics, 13
Digital signal processing
complex arithmetic support, 554–555
cryptographic multiplication extensions, 559
dual 16-bit multiply instructions, 557–558
most significant word multiplies, 558–559
packing instructions, 554
saturation instructions, 555–556
single instruction multiple data arithmetic operations, 550–554
sum of absolute differences instructions, 556–557
applications of, 259
on ARM9E, 275–277
on ARM10E, 277–278
on ARM7TDMI, 270–272
on ARM9TDMI, 272–274
description of, 259–260
discrete Fourier transform
definition of, 303
fast Fourier transform
benchmarks, 314t
description of, 303–304
radix-2, 304–305
radix-4, 305–313
function of, 303
finite impulse response filters
block, 282–294
definition of, 280
description of, 280–281
fixed-point representation signals
addition of, 265–266
description of, 262–263
division of, 267
multiplication of, 266–267
operating on values stored in, 264
square root of, 267–268
subtraction of, 265–266
summary of, 268
floating-point representation signal, 262, 268
infinite impulse response filters, 294–302
on Intel XScale, 278–280
load-store intensive, 259
multiply, 259
representation of digital signal
block-floating, 263
description of, 260
floating-point, 262, 268
logarithmic, 263
selection of, 260–263
summary of, 268–269
on StrongARM, 274–275
Digital signal processor, 6
Direct-mapped cache, 410–411
Disable_lower_priority routine, 362
Discrete Fourier transform definition of, 303
fast Fourier transform
benchmarks, 314t
description of, 303–304
radix-2, 304–305
radix-4, 305–313
function of, 303
conversion into multiplies, 143–145
description of, 216–217
fixed-point representation signal, 267
applications of, 223–224
on ARM9E, 217
description of, 223–225
fractional values
initial estimate for, 231
iteration accuracy, 232
overview of, 230
theory of, 231
integer normalization for, 212
Q15 fixed-point division by, 233–235
Q31 fixed-point division by, 235–237
unsigned 32/32-bit divide by, 225–230
overview of, 140–142
repeated unsigned division with remainder, 142–143
by a constant, 147–149
description of, 237–238
trial subtraction
description of, 217–218
nonrestoring, 218
restoring, 218
unsigned 64/31-bit divide by, 222–223
unsigned 32-bit/15-bit divide by, 220–222
unsigned 32-bit/32-bit divide by, 218–220
by a constant, 145–147
by Newton-Raphson division. See Division, Newton-Raphson
repeated, with remainder, 142–143
by trial subtraction. See Division, trial subtraction
access to, 541–542
fast context switch extension use of, 518–519
memory management units, 510–512
Double-precision integer multiplication
description of, 208
long long multiplication, 208–209
signed 64-bit by 64-bit multiply with 128-bit result, 211–212
unsigned 64-bit by 64-bit multiply with 128-bit result, 209–210
DRAM, 11
DSL modems, 15
Dual 16-bit multiply instructions, 557–558
Dynamic predictor, 661–662
Dynamic random access memory. See DRAM
Dynamic task, 382


ELSE, 626
.else, 632
Embedded operating systems
ARM processors. See ARM processors
components of, 381–383
description of, 381
device driver framework, 383
hardware, 6–12, 16
initialization, 382
initialization code, 12–14
instruction set for, 6
memory. See Memory
memory handling, 382
nonpreemptive, 382
peripherals, 11–12
round-robin algorithm, 383
scheduler, 383
schematic diagram of, 7f
simple little operating system
context switch, 396–398
device driver framework, 398–400
directory layout, 384–385
exceptions handling
description of, 389
IRQ exception, 393–394
reset exception, 390
SWI exception, 390–393
initialization, 385–389
interrupts, 389
memory model, 389
overview of, 383–384
periodic timer, 388
scheduler, 394–396
service routines, 384
software, 12–16
Embedded trace macrocell, 42
EmbeddedICE macrocell, 38
END, 626
.end, 633
END directive, 159
Endian reversal, 248–249
Endianness, 137, 154
.endif, 633
.endm, 633
ENTRY, 626
enum, 132
EOR instruction, 55, 94, 583
.equ, 633
EQU (alias *), 626–627
.err, 633
Eviction, 410, 419
Exception handling
ARM processor, 318–319
description of, 317–318
fast interrupt request, 326–329
interrupt request, 326–329
link register offsets, 322–324
prioritizing, 321–322
simple little operating system
description of, 389
IRQ exception, 393–394
reset exception, 390
SWI exception, 390–393
vector table, 319–320
Executable and linking format, 370
.exitm, 633
Exponentiation, base-two, 244–245
EXPORT (alias GLOBAL), 627
EXPORT directive, 159


Fast context switch extension
definition of, 515
domains used by, 518–519
features of, 515–516
hints for, 519–520
page tables used by, 518–519
schematic diagram of, 517f
virtual addresses modified by, 516
Fast Fourier transform
benchmarks, 314t
description of, 303–304
radix-2, 304–305
radix-4, 305–313
Fast interrupt mode, 23, 26t
Fast interrupt request
description of, 23, 27, 318t, 321–322
exceptions, 326–329
Fast interrupt request vector, 34
Fetch, 164
FIELD (alias #), 627
benchmarks for, 314t
finite impulse response
block, 282–294
definition of, 280
description of, 280–281
infinite impulse response, 294–302
Finite impulse response filter
benchmarks for, 314t
block, 282–294
definition of, 280
description of, 280–281
ARM Firmware Suite, 370–371
definition of, 367–368
description of, 13
execution flow, 368t
implementation of, 368t, 368–369
interactive functions, 369
RedBoot, 371–372
Fixed kernel memory, 500
Fixed mapping, 499
Fixed-point algorithm, 149
Fixed-point representation of digital signal
addition of, 265–266
description of, 262–263
division of, 267
multiplication of, 266–267
operating on values stored in, 264
saturating, 263
square root of, 267–268
subtraction of, 265–266
summary of, 268
Fixed-width bit-field packing and unpacking, 191–192
Flags, 22, 571–572
Flash ROM, 11
Flash ROM filing system, 369
Floating point, 149
Floating point accelerator, 149
Floating-point representation of digital signal, 262, 268
Flushing of cache, 423–427, 438–443
Forward branch, 59
Four-register rule, 122
Four-way set associativity, 413f, 414, 415f
Fractional value division, by Newton-Raphson iteration
initial estimate for, 231
iteration accuracy, 232
overview of, 230
theory of, 231
Fully associative cache, 414
Function arguments, 111–112
Function call overhead, 125
Function calls, 122–127


GBLA, 627
GBLL, 627
GBLS, 627
gcc compiler, 111–112
General scratch register, 121t
General variable register, 121t
.global, 633
GNU assembler
directives, 632–635
quick reference for, 631–635


æHAL, 370–371
Hardware abstraction layer, 369–370
Harvard architecture, 35f, 408
Hash function, 200, 214
Headroom, of fixed-point representation, 264
High code density, 5
Hit rate, 417
Huffmnan codes, 191
.hword, 633


.if, 633
if statements, 181–182
.ifdef, 633
.ifndef, 634
Immediate postindex, 63, 64t
Immediates, 571
IMPORT, 627, 628
IMPORT directive, 161
Impulse response filters
benchmarks for, 314t
block, 282–294
definition of, 280
description of, 280–281
infinite, 294–302
.include, 634
INCLUDE (alias GET), 628
Index methods, 61–63, 63t–64t
Infinite impulse response filters, 294–302
INFO (alias !), 628
Initialization code, 12–14
Inline assembly, 149–153
Inline barrel shifter, 6
Inline functions, 149–153
AND, 55, 94, 576
ADC, 54, 93, 222, 573–574
ADD, 54, 93, 166, 574–575
barrel shift used with, 55
definition of, 53–54
description of, 80–81
Examples of, 54–55
B, 577
BKPT, 578
BL, 578
BLX, 90–91, 579
BNE, 69
conditional, 92
description of, 58–60
variations of, 92–93
CDP, 580
CLZ, 214, 580
CMN, 56, 94, 580–581
conditional, 170
conditional branch, 92
count leading zeros, 80
CPY, 582
data processing
arithmetic instructions, 53–55
barrel shifter. See Barrel shifter
comparison instructions, 56–57
logical instructions, 55–56
move instructions, 50
multiply instructions, 57–58
Thumb instruction set, 93–95
dual 16-bit multiply, 557–558
EOR, 55, 94, 583
LDMIA, 66, 67f, 97
LDR, 60, 63, 64t, 78, 96, 106t, 164, 319, 586–589
LDRB, 60, 96, 106t
LDRD, 106t
LDRH, 60, 96, 106t, 109
LDRSB, 60, 96, 106t
LDRSH, 60, 96, 106t
logical, 55–56
LSL, 94, 589
MCR, 590
MCRR, 590
MRC, 592
MRRC, 592
MRS, 75–76, 592
multiply, 57–58
NEG, 94, 595
NOP, 595
ORR, 55, 94, 595–596
PKH, 596
POP, 70, 98, 597
program status registers, 75–76
PUSH, 70, 98, 597
QADD, 81, 597–599
QDADD, 81, 597–599
QDSUB, 81, 597–599
QSUB, 81, 597–599
reverse subtract, 54
RFE, 600
ROR, 94, 600
RSC, 54, 601
SADD, 601–603
Saturation, 81t
SBC, 54, 94, 603
scheduling of
description of, 30, 163–167
load instructions, 167–171
SHADD, 604–605
single-register load-store
addressing modes, 61–63, 96
description of, 61–63
Thumb instruction set, 96–97
SMLA, 605–607
SMLAL, 57–58
SMLALxy, 82t
SMLAWy, 82t
SMLAxy, 82t
SMLS, 605–607
SMMLA, 607
SMMLS, 607
SMMUL, 607
SMUA, 608–609
SMUL, 608–609
SMULL, 57–58
SMULWy, 82t
SMULxy, 82t
SMUS, 608–609
SRS, 609
SSAT, 609
SSUB, 609–610
STC, 610
STRB, 60, 96, 106t
STRD, 106t
STRH, 60, 64t, 96, 106t
SUB, 54, 94, 615–616
sum of absolute differences, 556–557
Swap, 72–73
SWI, 99, 616
SWPB, 72
SXTA, 617–618
TEQ, 56, 618
TST, 56, 94, 618–619
UADD, 619
UHADD, 619
UHSUB, 619
UMAAL, 619
UMLAL, 57–58, 620
UMULL, 57–58, 620
undefined, 318t, 321
UQADD, 620
UQSUB, 620
USAD, 620
USAT, 620
USUB, 620
UXT, 620
UXTA, 620
Instruction cycle timings
ARM11, 661–665
ARM9E, 656–657
ARM10E, 658–659
ARM7TDMI, 653–654
ARM9TDMI, 654–655
Intel XScale, 659–660
StrongARM1, 655–656
tables, 651–653
Instruction set
definition of, 37
evolution of, 38
revisions of, 37–38, 39t
ARM, 26, 27t
branch instructions, 58–60
characteristics of, 6
conditional execution, 82–84
coprocessor, 76–77
data processing instructions
arithmetic instructions, 53–55
barrel shifter. See Barrel shifter
comparison instructions, 56–57
logical instructions, 55–56
move instructions, 50
multiply instructions, 57–58
description of, 26, 47–50, 48t–49t
Jazelle, 26–27, 27t
loading constants, 78–79
load-store instructions
multiple-register transfer. See Multiple-register transfer
single-register load-store addressing modes, 61–63
single-register transfer, 60–61
swap instruction, 72–73
program status register instructions, 75–76
16-bit, 6
software interrupt instruction, 73–75
ARM-Thumb interworking, 90–92
branch instructions, 92–93
code density, 87, 88f
data processing instructions, 93–95
decoding, 88f, 639–641
description of, 26, 27t
encodings, 638–644
list of, 89t
load and store offsets, 132t
multiple-register load-store instructions, 97–98
overview of, 87–89
register usage, 89–90
single-register load-store instructions, 96–97
software interrupt instruction, 99
stack instructions, 98–99
double-precision multiplication
description of, 208
long long multiplication, 208–209
signed 64-bit by 64-bit multiply with 128-bit result, 211–212
unsigned 64-bit by 64-bit multiply with 128-bit result, 209–210
normalization of
on ARMv4, 213–215
on ARMv5 and above, 212–213
description of, 212
overflow of, 265
Intel XScale
D-cache cleaning in, 435–438
digital signal processing on, 278–280
instruction cycle timings, 659–660
Intel XScale SA-110, 453–456
assigning of, 324–325
description of, 33, 317
software, 324
Interrupt controller registers, 349t
Interrupt controllers, 12
Interrupt handler
nested, 325, 333, 336–342
nonnested, 333–336
prioritized direct, 333, 356–359
prioritized group, 333, 359–363
prioritized simple, 333, 346–352
prioritized standard, 333, 352–356
reentrant, 333, 342–346
Interrupt handling schemes, 317
Interrupt latency, 325–326
Interrupt masks, 27
Interrupt request
assigning of, 324
description of, 318t, 322
exceptions, 326–329
stack design and implementation, 329–333
Interrupt request mode, 23–24, 26t, 27
Interrupt request vector, 33
Interrupt stack, 343
Inverted logical relations, 183
.irp, 634


J bit, 22
Jazelle, 26–27, 27t
JTAG, 38


KEEP, 629


L1 translation table base address, 503–504
Latency, 30
LCLA, 629
LCLL, 629
LCLS, 629
LDC instruction, 583–584
LDM instruction, 65, 164, 584–586
LDMIA instruction, 66, 67f, 97
LDR instruction, 60, 63, 64t, 78, 96, 106t, 164, 319, 586–589
LDRB instruction, 60, 96, 106t
LDRD instruction, 106t
LDRH instruction, 60, 96, 106t, 109
LDRSB instruction, 60, 96, 106t
LDRSH instruction, 60, 96, 106t
Least recently used, 422
Left shifts, saturation of, 253–254
Level 1 page table entry, 501–503
Level 2 page table entry, 504–505
Link register
description of, 22, 121t
offsets, 322–324
Little-endian mode, 137, 138t
Load instructions scheduling
overview of, 167–168
by preloading, 168–169
by unrolling, 169–171
Loading constants, 78–79
Load-store architecture, 5, 19–20
Load-store instructions
multiple-register transfer. See Multiple-register transfer
single-register load-store
description of, 61–63
Thumb instruction set, 96–97
single-register transfer, 60–61
swap instruction, 72–73
Local variable data types, 107–110
Locality of reference, 407, 457
Lock bits, for cache lockdown, 450–453
base-two, 242–244
calculation of, 242f
Logarithmic indexing, 190–191
Logarithmic representation of digital signal, 263
Logical cache, 406, 407f, 458
Logical instructions, 55–56
Long long multiplication, 208–209
decremented, 183–184
types of, 190–191
unrolled, 184–187
with fixed number of iterations, 113–116
example of, 176
multiple, 187–190
unrolling, 117–120, 184–187
with variable number of iterations, 116–117
writing for, 120
Loop counter, 114–115
Loop overhead, 118–119
LS1, 165
LS2, 165
LSL instruction, 94, 589
LSR instruction, 94, 589–590
LTORG, 629


Machine independent layer, 370
MACRO, 629
.macro, 634
MACRO directive, 202
MAP (alias image), 630
MCR instruction, 590
MCRR instruction, 590
cache. See Cache
content addressable, 414
description of, 9
dynamic random access. See DRAM
fetching instructions for, 10t
hierarchy of, 9–10, 404f
cache and, relationship between, 410–412
description of, 405
management of, 35–36
nonprotected, 35
random access. See RAM
read-only. See ROM
remapping of, 14, 14f
secondary, 405
size of, 10
static random access. See SRAM
synchronous dynamic random access. See DRAM
tightly coupled, 35, 36f, 405
types of, 10–11
width of, 10
Memory controllers, 11
Memory management units
access permission, 510–512
ARM, 501
attributes of, 492–493, 493t
caches, 512–513
coprocessor 15 and, 513–515
definition of, 491
description of, 35–36, 406–408, 462
domains, 510–512
fast context switch extension
definition of, 515
domains used by, 518–519
features of, 515–516
hints for, 519–520
page tables used by, 518–519
schematic diagram of, 517f
virtual addresses modified by, 516
functions of, 491
multitasking and, 497–499
page tables
activation of, 497
architecture of, 501–502
context switch activation of, 497
definition of, 495
L1 translation table base address, 503–504
types of, 502t
regions, 492
simple little operating system, 545
tasks in, 493
translation lookaside buffer
CP15:c7 commands, 509t, 509–510
definition of, 506
functions of, 506
hit, 506
lockdown registers, 510t
miss, 506
operations, 509–510
single-step page table walk, 507–508
two-step page table walk, 508–509
write buffer, 512–513
Memory protection units
access permission for, 470–474
description of, 35, 461–462
initializing of
access permission, 470–474
cache attributes, 474–477
demonstration of, 481–482, 485–486
enabling of regions, 477–478
region size and location, 466–470
write buffer attributes, 474–477
protected regions
access permission for, 470–474
assigning of, 479–481
background regions, 464–465
configuring of, 482–485
enabling of, 477–478
governing rules for, 463–464
initializing of, 482–485
location of, 466–470
overlapping regions, 464
size of, 466–470
sample demonstration of
context switch, 486
description of, 478
initializing, 481–482
memory map for assigning regions, 479–481
mpuSLOS, 487
system requirements, 479
MEND, 629
MEXIT, 629
Miss rate, 417
Mixed-endianness support, 560
MLA multiply instruction, 57–58, 590–591
mmuSLoS, 492
Modified virtual address, 516
Most significant word multiplies, 558–559
MOV instruction, 94, 591–592
Move instructions, 50
mpuSLOS, 487
MRC instruction, 592
MRRC instruction, 592
MRS instruction, 75–76, 592
MSR instruction, 75–76, 592–593
MUL multiply instruction, 57–58, 94, 593–594
Multiple-register transfer
description of, 63
stack operations, 70–72
Thumb instruction set, 97–98
double-precision integer
signed 64-bit by 64-bit multiply with 128-bit result, 211–212
unsigned 64-bit by 64-bit multiply with 128-bit result, 209–210
repeated divisions converted into, 143–145
Multiply instructions, 57–58
Multiply-accumulate unit, 20
Multiprocessing synchronization primitives, 560–562
Multitasking, 497–499
MVN instruction, 94, 594–595


NEG instruction, 94, 595
Negative indexing, 190
Nested interrupt handler, 325, 333, 336–342
Nested loops
example of, 176
multiple, 187–190
Network order, 192
Newton-Raphson iteration
division by
applications of, 223–224
on ARM9E, 217
description of, 223–225
fractional values
initial estimate for, 231
iteration accuracy, 232
overview of, 230
theory of, 231
integer normalization for, 212
Q15 fixed-point, 233–235
Q31 fixed-point, 235–237
unsigned 32/32-bit, 225–230
square root by, 240–250
NOFP, 630
Nonnested interrupt handler, 333–336
Nonprivileged mode, 23
Nonprotected memory, 35
NOP instruction, 595
Normalization, integer
on ARMv4, 213–215
on ARMv5 and above, 212–213
description of, 212


One-cycle interlock, 166, 166f
Operating systems, 14–15
OPT, 630
Optional expressions, 570
ORR instruction, 55, 94, 595–596


fixed-width bit-field, 191–192
of variable-width bitstreams, 192–194
definition of, 494
regions defined using, 495–497
Page frame
definition of, 494
mapping pages to, 496f
Page size, 505–506
Page table(s)
access permission, 512
activation of, 497
architecture of, 501–502
context switch activation of, 497
definition of, 495
demonstration of, in virtual memory system
activation of, 539–540
data structures, 525–529
defining of, 525
filling of, with translations, 531–538
initializing of, in memory, 529–531
locating of, 525
fast context switch extension use of, 518–519
L1 translation table base address, 503–504
types of, 502t
Page table control block, 527
Page table entry
definition of, 495
Level 1, 501–503
Level 2 , 504–505
page size selection, 505–506
Page table walk
single-step, 507–508
two-step, 508–509
Periodic interrupt, 382
Peripheral component interconnect bus, 8
description of, 11
function of, 7
interrupt controllers, 12
memory controllers, 11
description of, 249t, 249–250
Examples of, 251–252
macros, 250–251
description of, 249t
Physical addresses, 492
Physical cache, 406, 407f, 458
definition of, 29
description of, 4
executing characteristics, 31–32
filling of, 30
five-stage, 31f
schematic diagram of, 30f
six-stage, 31f
three-stage, 30, 30f
Pipeline bubble, 166
Pipeline flush, 167
Pipeline hazard, 165
Pipeline interlock, 165, 208
PKH instruction, 596
Platform operating systems, 14
PLD instruction, 596–597
Pointer aliasing, 127–130
Polling, 382–383
POP instruction, 70, 98, 597
Postindex, 62–63
Prefetch abort, 318t, 322
Prefetch abort vector, 33
Preindex, 62–63, 96
Preindex with writeback, 62
definition of, 207
double-precision integer multiplication description of, 208
long long multiplication, 208–209
signed 64-bit by 64-bit multiply with 128-bit result, 211–212
unsigned 64-bit by 64-bit multiply with 128-bit result, 209–210
multiprocessing synchronization, 560–562
permutations, 250t
Prioritized direct interrupt handler, 333, 356–359
Prioritized group interrupt handler, 333, 359–363
Prioritized simple interrupt handler, 333, 346–352
Prioritized standard interrupt handler, 333, 352–356
Priority mask table, 352
Privileged mode, 23
Process control block, 385
Profiler, 163
Profiling, 163
Program status registers
decode, 645
instructions, 75–76
schematic diagram of, 23f
Protected regions, for memory protection units
access permission for, 470–474
assigning of, 479–481
background regions, 464–465
configuring of, 482–485
enabling of, 477–478
governing rules for, 463–464
initializing of, 482–485
location of, 466–470
overlapping regions, 464
size of, 466–470
Pseudoinstructions, 78–79
Pseudorandom numbers, 255
Pseudorandom replacement, 419, 458
PUSH instruction, 70, 98, 597


Q representation, 264
Q15 fixed-point division, by Newton-Raphson division, 233–235
Q31 fixed-point division, by Newton-Raphson division, 235–237
QADD instruction, 81, 597–599
QDADD instruction, 81, 597–599
QDSUB instruction, 81, 597–599
QSUB instruction, 81, 597–599


Race condition, 342
Radix-2 fast Fourier transform, 304–305
Radix-4 fast Fourier transform, 305–313
description of, 11
dynamic, 11
Random number generation, 255
Rd, 20
Read-allocate, 422
Read-write-allocate, 422
Real-time operating systems, 14
RedBoot, 371–372
Reduced instruct set computer design. See RISC design
Reentrant interrupt handler, 333, 342–346
argument, 172
banked, 23–26
function of, 4–5
general-purpose, 21–22
description of, 22, 121t
offsets, 322–324
maximizing of, 177–180
names, 570–571
program status
decode, 645
instructions, 75–76
schematic diagram of, 23f
special-purpose, 22
Thumb, 89–90
types of, 22
in user mode, 21f, 21–22
Register allocation
C compilers, 120–122
description of, 171
maximizing the available registers, 177–180
allocation to register numbers, 171–175
more than 14 local variables, 175–177
Register file, 20, 405
Register numbers, 171–175
Register postindex, 63, 64t
Register set, 24f
Repeated divisions converted into multiplications, 143–145
Repeated unsigned division with remainder, 142–143
.rept, 634
.req, 634
Reset exception, 390
Reset vector, 33, 385
Return stack, 662
REV instruction, 599–600
Reverse subtract instruction, 54
RFE instruction, 600
Right shift, rounded, 254, 264
RISC design
CISC vs., 4f
philosophy of, 4–5
RLIST, 630–631
Rm, 20
description of, 10
flash, 11
ROR instruction, 94, 600
Round-robin algorithm, 383
Round-robin replacement, 419
ROUT, 631
RSB instruction, 54, 600–601
RSC instruction, 54, 601


SADD instruction, 601–603
code structure, 373–378
description of, 372
directory layout of, 372–373, 373f
execution flow, 373t
hardware initialization, 375, 377
remap memory, 375–377
reset exception, 374
Saturated arithmetic, 80–81
absolute, 254
ARMv6, 555–556
function of, 253
left shift, 253–254
32 bits to 16 bits, 253
32-bit addition and subtraction, 254
Saturation instructions, 81t
SBC instruction, 54, 94, 603
SC100, 43
Scaled register postindex, 63
Scheduler, 394–396
Scheduling of instructions
description of, 30, 163–167
load instructions
overview of, 167–168
by preloading, 168–169
by unrolling, 169–171
.section, 634
SEL instruction, 603–604
.set, 635
Set associativity
description of, 412–414
four-way, 413f, 414, 415f
increasing of, 414–416
Set index, 412
Set of defines, 339
SETA, 631
SETEND instruction, 604
SETL, 631
SETS, 631
SHADD instruction, 604–605
Shift operations, 572–573
Signed 64-bit by 64-bit multiply with 128-bit result, 211–212
Signed data type, 112–113
Signed division by a constant, 147–149
Simple cache, 408, 409f
Simple little operating system
context switch, 396–398
device driver framework, 398–400
directory layout, 384–385
exceptions handling
description of, 389
IRQ exception, 393–394
reset exception, 390
SWI exception, 390–393
initialization, 385–389
interrupts, 389
memory management unit, 545
memory model, 389
memory protection units, 487
mmuSLOS, 545
mpuSLOS, 487
overview of, 383–384
periodic timer, 388
scheduler, 394–396
service routines, 384
sin, 245
Single instruction multiple data arithmetic operations, 550–554
Single issue multiple data processing, 178
Single-register load-store instructions
addressing modes, 61–63, 96
description of, 61–63
Thumb instruction set, 96–97
Single-register transfer, 60–61
SMLA instruction, 605–607
SMLAL multiply instruction, 57–58
SMLALxy instruction, 82t
SMLAWy instruction, 82t
SMLAxy instruction, 82t
SMLS instruction, 605–607
SMMLA instruction, 607
SMMLS instruction, 607
SMMUL instruction, 607
SMUA instruction, 608–609
SMUL instruction, 608–609
SMULL instruction, 57–58
SMULWy instruction, 82t
SMULxy instruction, 82t
SMUS instruction, 608–609
Software, 12–16
Software interrupt exception, 321
Software interrupt instruction
ARM, 73–75
Thumb, 99
Software Interrupt vector, 33
.space, 635
SPACE (alias %), 631
Spatial locality, 408
Spilled variables, 120
Split cache, 408, 424, 458
Square root
description of, 238
fixed-point representation signal, 267–268
by Newton-Raphson iteration, 240–250
by trial subtraction, 238–239
SRAM, 11
SRS instruction, 609
SSAT instruction, 609
SSUB instruction, 609–610
Stack base, 72
Stack frame, 338, 341
Stack instructions
ARM, 70–72
Thumb, 98–99
Stack limit, 72
Stack operations, 70–72
Stack overflow, 329
Stack overflow error, 72
Stack pointer, 72, 121t
Static predictor, 661
Static random access memory. See SRAM
Static task, 382
Status bits, 408–409
STC instruction, 610
STM instruction, 65, 610–612
STMED instruction, 71
STMIA instruction, 97
STMIB instruction, 68
STR instruction, 60, 96, 106t, 612–615
STRB instruction, 60, 96, 106t
STRD instruction, 106t
STRH instruction, 60, 64t, 96, 106t
description of, 43
digital signal processing on, 274–275
StrongARM1 instruction cycle timings, 655–656
SUB instruction, 54, 94, 615–616
Subroutine, 160
Subtraction. See Trial subtraction
Sum of absolute differences instructions, 556–557
Supervisor mode, 23, 26t
Supervisor mode stack, 332
Swap instruction, 72–73
Swapped out variables, 120
SWI exception, 390–393
SWI instruction, 99, 616
on a general value x, 199–200
efficient, 197–200
function of, 197
on the range of 0 ó x ó N, 197–199
SWP instruction, 72, 616–617
SWPB instruction, 72
SXT instruction, 617–618
SXTA instruction, 617–618
Synthesizable, 38
System control coprocessor, 77
System mode, 23–24, 26t
System-on-chip architecture, 560


TEQ comparison instruction, 56, 618
Test-clean command, for D-cache cleaning, 428t, 434–435
addition, 254
subtraction, 254
32-bit interrupt controller register, 350f
32-bit/32-bit divide, unsigned
by Newton-Raphson divide, 225–230
by trial subtraction, 218–220
32-bit/15-bit divide by trial subtraction, 220–222
definition of, 411, 412f
ways for reducing, 412
Thumb-2, 565
Thumb instruction set
ARM-Thumb interworking, 90–92
branch instructions, 92–93
code density, 87, 88f
data processing instructions, 93–95
decoding, 88f, 639–641
description of, 26, 27t
encodings, 638–644
list of, 89t
load and store offsets, 132t
multiple-register load-store instructions, 97–98
overview of, 87–89
register usage, 89–90
single-register load-store instructions, 96–97
software interrupt instruction, 99
stack instructions, 98–99
Tightly coupled memory, 35, 36f, 405
Trailing zeros, counting of, 215–216
Transcendental functions
base-two exponentiation, 244–245
base-two logarithm, 242–244
description of, 241–242
trigonometric operations, 245–248
Translation lookaside buffer
CP15:c7 commands, 509t, 509–510
definition of, 506
functions of, 506
hit, 506
lockdown registers, 510t
miss, 506
operations, 509–510
single-step page table walk, 507–508
two-step page table walk, 508–509
Trial subtraction, division by
description of, 217–218
nonrestoring, 218
restoring, 218
unsigned 64/31-bit divide by, 222–223
unsigned 32-bit/15-bit divide by, 220–222
unsigned 32-bit/32-bit divide by, 218–220
Trigonometric operations, 245–248
Truncation error, 228
TrustZone, 563–565
TST comparison instruction, 56, 94, 618–619


UADD instruction, 619
UHADD instruction, 619
UHSUB instruction, 619
UMAAL instruction, 619
UMLAL multiply instruction, 57–58, 620
UMULL multiply instruction, 57–58, 620
Unaligned data
description of, 136–140
handling of, 201–203
Undefined instruction, 318t, 321
Undefined instruction vector, 33
Undefined mode, 23, 26t
Underflow error, 72
Unified cache, 408
Unique identification number, 398
Unknown_condition routine, 362
fixed-width bit-field, 191–192
variable-width bitstreams, 195–197
Unrolled counted loops, 184–187
load instructions scheduling by, 169–171
Unsigned 64-bit by 64-bit multiply with 128-bit result, 209–210
Unsigned 64/31-bit divide, by trial subtraction, 222–223
Unsigned 32-bit/32-bit divide
by Newton-Raphson divide, 225–230
by trial subtraction, 218–220
Unsigned 32-bit/15-bit divide, by trial subtraction, 220–222
Unsigned data type, 112–113
Unsigned division
by a constant, 145–147
repeated, with remainder, 142–143
UQADD instruction, 620
UQSUB instruction, 620
USAD instruction, 620
USAT instruction, 620
User mode, 23–24, 26t
User mode stack, 332
USMLAL macro, 211
USUB instruction, 620
UXT instruction, 620
UXTA instruction, 620


Variables, 171–175
Variable-width bitstream packing, 192–194
Variable-width bitstream unpacking, 195–197
Vector floating point accelerator, 149
Vector floating-point, 37
Vector interrupt controller, 12
Vector interrupt controller PL190 based interrupt service routine, 333, 363–364
Vector table, 33t, 33–34, 319–320
Veneer, 90
VIC PL190 based interrupt service routine, 333, 363–364
Victim, 419, 458
Victim reset value, 445
Virtual address, 516
Virtual addresses, 492
Virtual memory system
components of, 495f
definition of, 491
demonstration of
context switch procedure, 544
fixed system software regions, 521–522
memory management unit initialization
activation of page table, 539–540
assigning of domain access, 541–542
overview of, 529
page tables filled with translations, 531–538
page tables initialized in memory, 529–531
overview of, 520–521
page tables
activation of, 539–540
data structures, 525–529
defining of, 525
filling of, with translations, 531–538
initializing of, in memory, 529–531
locating of, 525
region data structures, 525–529
regions in physical memory, 522–525
virtual memory maps, 522, 524f
fixed mapping in, 499–500
mechanism of, 493–495
memory organization in, 499–501
modified, 516
task mapping in, 494f
task switching, 499
volatile, 154
Von Neumann architecture, 34, 34f, 408


Way and set index addressing, for D-cache cleaning, 428t, 431–434
Ways, 412
WEND, 631
WHILE, 631
.word, 635
Write buffer
description of, 403, 416–417
initializing of, 465–466
memory management units, 512–513
region attributes, 474–477
Write collapsing, 417
Write combining, 417
Write merging, 417
Writeback, 418–419
Writethrough, 418


XScale, 43


count leading, 215–216
count trailing, 215–216
Zero-wait-state memory, 164
z-transform, 295
