Numerics
2-3 search trees 698
A
A* algorithm 513–521
heuristics to balance real-time data 520–521
usefulness of 517–520
abstraction 7–8
access violation 713
accuracy 719, 721
acyclic graphs 492, 494–495
adaptive simulated annealing 603
add method 103–104, 131, 156, 160, 167, 705
ADT (abstract data structure) 3, 20, 71, 146, 177, 211
agglomerative hierarchical clustering 170
Airflow 449
AIS (artificial immune systems) 592, 669
algorithms 12, 20, 33
aliases 681
allKeys method 194
allKeysInBranch method 210
alpha organisms 640
Anaconda distribution 60
ancestors, of nodes 696
annealing 595
ant colony optimization 669
Apache HBase 135
API (Application Programming Interface) 16, 19, 26
APX-complete problems 661
areDisjoint method 154, 157, 160–161
Array structure 702
arrays 4, 672–673, 690–692, 702
associative 114–115
sorted 116–117
unsorted 115
associative arrays 114–115
associative properties 702
asynchronous call handling 226–227
autocomplete 215–216
average-case hardness 561
AVL trees 698
B
B-trees 4, 698
bags 705
balanced trees 696, 702
balancedness 696
Bellman-Ford’s algorithm 513
Bézier curves 550–555
BFS (Breadth First Search) 495, 497–500, 521, 534
best route to deliver parcel 506
Dijkstra algorithm vs. 508–509
optimizing delivery routes 495–497
reconstructing path to target 500–501
biconnected graphs 531
big-O notation 682–688
algorithms and performance 682–683
examples of 687–688
notation 685–687
order of magnitude 684
RAM model 683
binary decision problems 717
binary search
multidimensional 269–270
sorted arrays and 116–117
unidimensional 268
binary trees 696
bipartite graphs 526–528
blocks 678–679
Bloom filters 110, 119–120
alternatives to implementing dictionaries 113–114
applications for 132–135
caching 132–133
crawlers 134
distributed databases and file systems 135
IO fetchers 134
routers 133
spell checkers 134
as randomized algorithms 137–138
associative arrays 114–115
concrete data structures 115–120
binary search trees 118–119
Bloom filters 119–120
hash tables 117–118
sorted arrays and binary search 116–117
unsorted arrays 115
dictionary problem 111–112
false negatives/false positives 136–137
implementating 122–132
checking keys 128–130
constructor 127–128
estimating accuracy 132
finding where key is stored 126
reading and writing bits 124–125
storing keys 130–132
using Bloom filters 122–123
improved variants 144–146
combining Bloom filters 144
compressed Bloom filters 145–146
layered Bloom filters 144–145
scalable Bloom filters 146
constructor 138
running time 138
storing an element 138–139
precision estimation 139–143
Bloomier filters 144
borrowFromSibling method 357
bound algorithm 537
bounding envelope 325
boxplot 61
branch algorithm 537
branching factor 55–57, 68
finding optimal 56–57
memory vs. 57
need for d-ary heaps 55–56
running time 56
break keyword 678
bridge node 204
brute force algorithm 9
BST class 106
BSTs (binary search trees) 73, 118–119, 182, 265, 276–281, 283–284, 286–287, 295–296, 337, 696, 698, 702
_bubble_up method 62–64
bubbleUp method 29–32, 34–35, 41–42, 251
bug tracking 17–18
buildTable method 53
Burstsort 214
C
cache hit 234
cache miss 234
Cache-Aside policy 245
caching
Bloom filters 132–133
NN search 262
camelCase 672
canopy clustering 403, 449, 482
applying 454–455
centroid initialization with 469–471
MapReduce 471–475
parallelizing 467–469
Cassandra 135
CDNs (Content Delivery Networks) 133
CDT (concrete data structure) 20, 146
chaining 117, 495, 700
child nodes 695
children array 332
chromosomeInitializer method 653
circulation-demand problem 667
circumscribed circle 555
classification algorithm 402, 405, 451
classification problems 716–721
classification metrics 718–721
accuracy 718–719
precision and recall 719–721
decision problems 717
Las Vegas algorithms 717
Monte Carlo algorithms 717–718
classifyPoints method 408–409, 415, 417, 464, 474
closest hub problem 376–380
centralized application 380–385
complex decisions 383–385
filtering points 381–382
distributed application 386–391
issues handling HTTP communication 387–389
keeping inventory in sync 389–390
lessons learned 390–391
issues with 379–380
sketching solution 377–379
cluster_points method 416
clustering 259, 402, 447
canopy 452–454
applying 454–455
centroid initialization with 469–471
MapReduce 471–475
parallelizing 467–469
DBSCAN 418–426
directly vs. density-reachable 419
implementation 422–424
pros and cons with 424–426
disjoint sets 170–171
evaluation metrics 442–446
k-means 405–418
boosting with k-d trees 414–416
dimensionality 412–413
issues with 410–412
performance analysis 413–414
NN search 398–399
OPTICS 426–442
algorithm 428–432
clustering 433–436
definitions 427–428
Hierarchical clustering 436–440
performance analysis 441–442
parallel 482
MapReduce 455–463
parallelization 449–455
types of 403–405
types of learning 402
collisions 699
color reduction 391–393
complete balanced trees 58, 696
complete graphs 526–528, 616
complete trees 696
completely symmetrical graphs 616
compressed Bloom filters 145–146
ComputeFrequencies method 50–51
concrete data structures 22–28, 115–120
binary search trees 118–119
Bloom filters 119–120
choosing the right data structure 23–24
comparing performance 23
d-ary heaps 27–28
hash tables 117–118
heaps 24–25
min-heaps and max-heaps 26–27
sorted arrays and binary search 116–117
unsorted arrays 115
concurrency 248–249, 253–254
ConcurrentHashMap class 249
ConcurrentHeap class 249
conditional instructions 674–676
else-if variant 675
switch instruction 675–676
connected components 169
connected graphs 492–493, 531
constructKdTree method 292
containers 704–707
bags 705
comparative analysis for 707
queues 706–707
stacks 705–706
contains method 41, 45–46, 82, 129, 131, 135, 138, 144–145, 175, 721
Contains operation 45, 91
contains(x) method 131
continue keyword 678
convex hull 552
core data structures 690–703
arrays 691–692
comparative analysis of 702–703
hash tables 698–702
conflicts resolution in hashing 699–701
hashing 698–699
performance 702
storing key-value pairs 698
linked lists 692–695
trees 695–698
core points 419, 423
cost functions 634
cProfile package 60
crawlers 134
create_encoding method 67
create_frequency_table method 67
Creating the structure operation 116–117
crossing number 541
finding 541–542
heuristics for 560–568
rectilinear 543–545
crossing-number optimization 592
crossover 640, 670
crossover chance 646
curse of dimensionality 320, 367
cycles, in graphs 494
D
d-ary heaps 27–28, 55–56
DAGs (directed acyclic graphs) 494
data compression 50–55
data model 20
data structures 5–12
algorithms vs. 5–6
concrete 22–28, 115–120
binary search trees 118–119
Bloom filters 119–120
choosing the right data structure 23–24
comparing performance 23
d-ary heaps 27–28
hash tables 117–118
heaps 24–25
min-heaps and max-heaps 26–27
sorted arrays and binary search 116–117
unsorted arrays 115
core 690–703
arrays 691–692
comparative analysis of 702–703
hash tables 698–702
linked lists 692–695
trees 695–698
defining 2–4
describing 5
Mars colony food supply example 7
abstraction 7–8
algorithms 9–10
writing your own implementations of 6
DataFrame 61
DBSCAN (density-based spatial clustering of applications with noise) 418–426
directly vs. density-reachable 419
implementation 422–424
MapReduce 475–481
pros and cons with 424–426
dbscan method 422, 479
deadlock 250
decision problems 717
default arguments 680
delete method 92, 125, 198, 352–353, 355–356, 358, 488, 697
Delete-Minimum operation 23
DeLiClu algorithm 6
dense graphs 521
destination vertex 486
DFS (Depth First Search) 169, 495–506, 521
best route to deliver parcel 506
optimizing delivery routes 495–497
queues vs. stacks 505–506
dfs method 502, 504–505
dictionary 111, 114, 117, 175, 707
Dijkstra algorithm 49, 507–513, 521, 523
analysis 510–511
BFS vs. 508–509
implementation 509
shortest route for deliveries 511–513
directed graphs 492
directly reachable points 419
disjoint sets 172, 479
applications for 168–171
clustering 170–171
connected components 169
Kruskal’s algorithm 169–170
unification 171
data structure API 153–154
distinct subsets problem 148–151
heuristics to improve running time 162–168
implementing balancing and path compression 166–168
path compression 164–166
naïve solution 154–158
reasoning on solutions 151–152
tree-like structure 159–162
disjointSet method 155, 169
distance function 337, 513, 515
distributed databases and file systems 135
division method 699
DL (deep learning) 401
doAction function 672
DoS (Denial of Service) 133
double hashing function 126
doubly-linked lists 693, 702
DS (data structures) 2–3, 6, 19, 690
dynamic arrays 690–691
dynamic programming 10, 537, 710
dynamic properties 702
E
EA (evolutionary algorithms) 668
edge intersections 545–556
Bézier curves 550–552
intersections between quadratic Bézier curves 552–555
polylines 549–550
straight-line segments 545–549
vertex-vertex and edge-vertex intersections 555–556
edgeIntersections method 565
edges 486
elbow method 447
else-if variant 675
embeddings 523–528
complete and bipartite graphs 526–528
definitions 525–526
gradient descent for 585–589
different criterion 586–588
implementation 588–589
simulated annealing 615–623
force-directed drawing 618–623
minimum edge crossing 615–617
enabling recursion 706
epistasis 651
eventual consistency 220
evictOneEntry method 252
exact element search 333
Extensible Indexing 395
F
f function 680, 685, 687, 700–701, 710, 715
factorial function 712
false Boolean value 82, 118, 123, 137, 145, 156–158, 161, 168, 174, 180, 184–186, 190, 203, 717–718, 720–721
false-biased algorithm 138, 718, 720–721
Fibonacci heap 243
FIFO (First In, First Out) 241, 706
Find-Minimum operation 23
findClosestEntryInNodesList function 356
finding outliers 403
findMin method 295–298, 300–301, 316
findPartition method 153, 157–162, 164–165, 167–168
findSiblingToMergeTo function 356
findSplitIndex method 349
fitness method 634
flat clustering 404
fNN method 382, 384
FNs (false negatives) 720
fnv1 (Fowler-Noll-Vo hashing) 122, 126
for loops 676–677
forbidden graphs 526
force-directed graph drawing 586, 590, 592, 618
forceDirectedEmbedding method 588–589
FPs (false positives) 144, 720
_frequency_table_to_heap method 61–62, 64, 67
functions 679–681
overloading and default arguments 680
tuples 680–681
G
g function 685, 715
GA (genetic algorithms) 631
GCD (greatest common divisor) 524, 678
gcd function 680
genetic algorithms 592, 625–670
chromosomes 631–633
crossover 645–648
fitness 634–635
inspired by nature 627–631
maximum flow 665–667
minimum vertex cover 661–665
applications of 662–663
implementing algorithm 663–665
mutations 648–650
natural selection 635–639
population 633–634
protein folding 667–668
selecting individuals for mating 639–645
template 650
traveling salesman problem 652–660
chromosomes 653
crossover 654–655
fitness 653
initialization 653
mutations 653–654
optimizing routes of whole fleet 660
results and parameters tuning 656–660
when they work best 651–652
get method 238, 251–253
getPassThroughEdge function 208
getSize() method 238
globally optimal choices 625
gradient descent 470, 558, 560, 574, 592, 624, 651, 669
applications of 581–585
embeddings 585–589
different criterion 586–588
implementation 588–589
geometrical interpretation 576–579
heuristics for crossing number 560–568
math of 575–579
optimization 568–574
cost functions 568–570
optimizing random sampling 571–574
step functions and local minima 571
problems with 579–581
simulated annealing vs. 603–604
when applicable 579
Graph class 490
graphs 521–557
A* algorithm 513–521
heuristics to balance real-time data 520–521
usefulness of 517–520
as algebraic types 489–490
BFS and DFS 495–506
best route to deliver parcel 506
BFS 497–500
DFS 502–505
optimizing delivery routes 495–497
queues vs. stacks 505–506
reconstructing path to target 500–501
connected components 169
defining 486–491
Dijkstra algorithm 49, 507–513
analysis 510–511
BFS vs. 508–509
implementation 509
shortest route for deliveries 511–513
edge intersections 545–556
Bézier curves 550–552
intersections between quadratic Bézier curves 552–555
polylines 549–550
straight-line segments 545–549
vertex-vertex and edge-vertex intersections 555–556
embeddings 523–528
complete and bipartite graphs 526–528
definitions 525–526
gradient descent 590
applications of 581–585
for embeddings 585–589
geometrical interpretation 576–579
heuristics for crossing number 560–568
math of 575–579
optimization 568–574
problems with 579–581
when appliable 579
implementing 487–489
Kruskal’s algorithm 169–170
non-planar graphs 539–545
finding crossing number 541–542
rectilinear crossing number 543–545
planar graphs 528–539
efficient algorithms 538–539
improving performance 536–537
Kuratowski’s theorem 529–530
planarity testing 530–535
Prim’s algorithm 49–50
properties of 491–495
acyclic 494–495
connected 492–493
undirected 492
pseudo-code 490–491
greedy algorithm 8
gridShard method 479
H
H function 121
Hamming distance 213
hard clustering 404
hash function 699
hash set 120, 138
hash tables 117–118, 120, 690, 698–702
conflicts resolution in hashing 699–701
hashing 698–699
performance 702
storing key-value pairs 698
hashing 698
HashMap 45, 166, 249
HashSets 144
HashTables 144
hasItemX() method 382
_heap_to_tree method 61–62, 67
_heapify method 63–64, 66–67
heaps 20, 25, 702, 712
d-ary heaps 27–28
implementing 28–46
bubbleUp method 29–32
contains method 44–45
dealing with duplicates 41–42
full code 46
heapify method 42–44
insert method 35–37
performance 45–46
pushDown method 33–35
top method 37–40
update method 40
max-heaps 26–27
min-heaps 26–27
height-balancedness 696
height, of trees 696
heuristics
for crossing number 560–568
to balance real-time data 520–521
to improve running time 162–168
implementing balancing and path compression 166–168
path compression 164–166
HFile 135
Hierarchical clustering 404
_highest_priority_child method 64–65
HighestPriorityChild method 64
Hill climbing 560, 573, 593
Hill descent 573, 593, 600
Hit ratio 133
Huffman codes 50–55
huffman method 53
I
if-then-else statement 674–675, 678
in keyword 677
indentation 678–679
Individual class 633–634
induced subgraphs 537, 539
Info class 166–167
initialization 451, 463
initialSet 155
initPopulation method 634, 653
insert method 20, 23, 35, 45–46, 58–59, 61–64, 72, 84–85, 91, 96, 115, 117, 120, 131, 138–139, 144–145, 175, 187–189, 192, 196, 204–206, 282, 287, 290, 293, 298, 316, 318, 333, 339, 341, 343–345, 350, 352–353, 356, 369, 378, 488, 697
k-d trees 287–288
radix tries 204–206
SS-trees 337–344
split nodes 348–352
variance, means, and projections 345–348
tries 187–189
insertion sort 32
integers 127
invariants 20
inverse Ackermann function 166, 172
IO fetchers 134
isPassThrough function 208
iterate operations 705
Iterative MapReduce 465
J
Jordan curve 526–527
JsGraphs library 568, 613, 621, 656
K
k-d trees 318
boosting k-means with 414–416
cycling through dimensions 275–282
balance 282
binary search trees 276–281
invariants 281
limits and possible improvements 316–318
methods 282–316
balanced tree 289–292
insert method 287–288
NN search 301–310
region search 310–316
remove method 293–301
search method 284–286
overcoming flaws of 322
k-dimensional spaces 268–271
modeling 2-D partitions with data structure 270–271
multidimensional binary search 269–270
unidimensional binary search 268
k-means 404, 418
boosting with k-d trees 414–416
dimensionality 412–413
issues with 410–412
MapReduce 463–475
centroid initialization with canopy clustering 469–471
MapReduce canopy clustering 471–475
parallelizing canopy clustering 467–469
parallelizing 451–452
performance analysis 413–414
KdNode class 284–285, 288
KdTree class 267, 284–285, 344, 415
KdTree::search(target) method 285
keysStartingWith method 194, 210
radix tries 209–211
tries 193–195
keysWithPrefix method 195–197
Kruskal’s algorithm 169–170
Kubernetes 449
Kuratowski’s theorem 529–530
L
Las Vegas algorithms 137, 146, 716–717
LBFs (layered Bloom filters) 144–145
leaves 695–696
Levenshtein distance 211, 213
LFU (least frequently used) 133, 238–244
choosing 240
LRU vs. 240–242
performance 243
problems with 243–244
LIFO (Last In, First Out) 705
linear regression 583–585, 669
linearithmic time 292, 606
linked lists 4, 690, 692–695
list’s nodes 695
local minimum 569
local optimization 573
locality of reference 214
locality properties 702
locally-optimal choice 625
locks 249–250
acquiring 250–252
read locks 252–253
reentrant locks 252
logistic regression 669
radix tries 209
tries 192–193
loops 676–678
break and continue keywords 678
for loops 676–677
while loops 677–678
loss function 583
LRU (least recently used) cache 256
applications for 254–255
computing things twice 219–222
LFU vs. 240–242
performance 238
policies 244–245
remembering values 222–228
asynchronous call handling 226–227
description and API 224
fresh data 224–226
marking cache values as 227–228
synchronization 245–254
concurrency 248–249, 253–254
locks 249–250, 253
temporal ordering 232–238
M
Manhattan Distance 506
Map class 53, 167
MapReduce model 455–463
DBSCAN 475–481
k-means 463–475
centroid initialization with canopy clustering 469–471
MapReduce canopy clustering 471–475
parallelizing canopy clustering 467–469
mapping, then reducing 459–462
problem 455–459
market segmentation 403
Mars colony food supply example 7
abstraction 7–8
algorithms 9–10
Matplotlib 60
max method 26, 72, 90–91, 681, 697
max-heaps 26–27
maximum flow 625, 665, 667
maxSize element 128
maxTolerance method 127–128
MDRQ (multidimensional range queries) 395
Membership category 405
memetic algorithm 668
memoization 537, 708, 710
memory locality 25
memory management 706
mergeChildren method 358
meta-heuristic 595
metrics functions 718
MFU (Most Frequently Used) 239
min method 26, 72, 90–91, 296, 681, 697
min_max method 681
min-heaps 26–27, 48
Min/Max operation 91, 119
mini-batch backpropagation 632
minimum bounding circle 555
minimum crossing number 560
minimum edge crossing 615
minimum rectilinear crossing number 560
minimum vertex cover 661–665
applications of 662–663
implementing genetic algorithm 663–665
minVarianceSplit method 352
ML (machine learning) 401, 720
Monte Carlo algorithms 137, 146, 603, 656, 716–718
MRU (Most Recently Used) 133
MSD (Most Significant Digit) 214
MST (minimum spanning tree) 49, 169, 606
multi-indexing 70–71
multidimensional data indexing 318
cycling through dimensions 275–282
balance 282
binary search trees 276–281
invariants 281
limits and possible improvements 316–318
methods 282–316
balanced tree 289–292
insert method 287–288
NN search 301–310
region search 310–316
remove method 293–301
search method 284–286
multigraphs 487
multiplication method 699
Murmur hashing 122, 126
mutations 640, 648–650, 653–654, 670
mutual recursion 715
N
natural selection 628, 635–639
nearestNeighbor method 316, 360, 362, 369
negative gradient 576
NN (nearest neighbors) search 272, 399
centralized application 380–385
complex decisions 383–385
filtering points 381–382
closest hub problem 376–380
issues with 379–380
sketching solution 377–379
clustering 398–399
color reduction 391–393
description and API 266–267
distributed application 386–391
issues handling HTTP communication 387–389
keeping inventory in sync 389–390
lessons learned 390–391
k-d trees 301–310
k-dimensional spaces 268–271
Higher dimensions 269–270
modeling 2-D partitions with data structure 270–271
unidimensional binary search 268
particle interaction 393–394
search problem 260–261
similarity search trees 374
solutions 261–266
caching 262
data structure choice 264–266
precomputing 261–262
simplifying problem 262–263
nNearestNeighbor method 306
node.left 91
node.right 91
nodeDistance function 362
non-planar graphs 539–545
finding crossing number 541–542
rectilinear crossing number 543–545
NP class 403, 661
NP-complete problem 541, 605
NP-hard problem 560
null inbound flow 666
null outbound flow 666
O
open addressing 117, 700
operating system 133
OPTICS (ordering points to identify the clustering structure) 426–442
algorithm 428–432
clustering 433–436
definitions 427–428
Hierarchical clustering 436–440
performance analysis 441–442
opticsCluster method 435–436
optimization 568–574
cost functions 568–570
genetic algorithms 670
chromosomes 631–633
crossover 645–648
fitness 634–635
inspired by nature 627–631
maximum flow 665–667
minimum vertex cover 661–665
mutations 648–650
natural selection 635–639
population 633–634
protein folding 667–668
selecting individuals for mating 639–645
template 650
traveling salesman problem 652–660
when it works best 651–652
gradient descent 590
applications of 581–585
for embeddings 585–589
geometrical interpretation 576–579
heuristics for crossing number 560–568
math of 575–579
problems with 579–581
when applicable 579
random sampling 571–574
simulated annealing 623
embeddings 615–623
function of 598–600
gradient descent vs. 603–604
implementation 597–598
short-range vs. long-range transitions 601–602
traveling salesman problem 604–615
variants 602–603
step functions and local minima 571
optimization problems 568
Optional class 107–108
Optional::map 108
order function 379
order properties 702
outliers 418
overloading 680
P
Pandas library 60–61
parallel clustering 482
MapReduce 455–463
DBSCAN 475–481
k-means 463–475
mapping, then reducing 459–462
problem 455–459
parallelization 449–455
canopy clustering 452–455
parallel vs. distributed computing 450–451
parallelizing k-means 451–452
parallelization 449–455
canopy clustering 452–455
parallel vs. distributed computing 450–451
parallelizing k-means 451–452
parent node 695
parentIndex 30
parentsMap 160
particle interaction 393–394
particle swarm optimization 669
partition method 291–292
partitioning clustering 404
partitionsMap 160
pass-through nodes 198
path compression 164, 166
Patricia trees 198
PCB (printed circuit boards) 523, 615
perfect hashing 700, 702
perfectly balanced tree 696
phenotypes 631
pigeonhole principle 699–700
planar embedding 525
planar graphs 525, 528–539
efficient algorithms 538–539
improving performance 536–537
Kuratowski’s theorem 529–530
planarity testing 530–535
planarity testing 530–535
points array 291, 332, 378
pointsInRegion operation 316, 369
polylines 549–550
positive predictive value 719
PQ (priority queue) 19–20, 705
precision 719, 721
predecessor method 697
Prev/Next operation 119
Prim’s algorithm 49–50
priority queues 68, 707
branching factor analysis 55–57
branching factor vs. memory 57
finding optimal branching factor 56–57
need for d-ary heaps 55–56
running time 56
concrete data structures 22–28
choosing the right data structure 23–24
comparing performance 23
d-ary heaps 27–28
heaps 24–25
min-heaps and max-heaps 26–27
containers as 704–707
bags 705
comparative analysis for 707
queues 706–707
stacks 705–706
Dijkstra algorithm 49
finding k largest elements 47–48
choosing the right data structure 47
coding 48
min-heaps 48
function of 21–22
generalizing FIFO 22
Huffman codes 50–55
implementing heaps 28–46
bubbleUp method 29–32
contains method 44–45
dealing with duplicates 41–42
full code 46
heapify method 42–44
insert method 35–37
performance 45–46
pushDown method 33–35
top method 37–40
update method 40
performance analysis 58–68
choosing best branching factor 67–68
heapify method running time growth 66–67
interpreting results 61–65
profiling 59–61
Prim’s algorithm 49–50
priority handling 17–18
sorted lists 19
PriorityQueue class 40, 72, 242, 249
profiling 59
program counters 712
protected method 153
protein folding 668
pseudo-code 671–681
arrays 672–673
blocks and indentation 678–679
conditional instructions 674–676
else-if 675
switch 675–676
functions 679–681
overloading and default arguments 680
tuples 680–681
loops 676–678
break and continue keywords 678
for loops 676–677
while loops 677–678
variables and basics 671–672
pStats.Stats function 61
_push_down method 62–66
pushDown method 33–35, 39–40, 42–44, 64
Q
quantum annealing 592
query method 416
queues 706–707
priority 68
branching factor analysis 55–57
concrete data structures 22–28
containers as 704–707
Dijkstra algorithm 49
finding k largest elements 47–48
function of 21–22
generalizing FIFO 22
Huffman codes 50–55
implementing heaps 28–46
performance analysis 58–68
Prim’s algorithm 49–50
priority handling 17–18
sorted lists 19
stacks vs. 505–506
R
R-tree 267
radix tries (compact prefix trees) 197–211
insert method 204–206
keysStartingWith method 209–211
longestPrefix method 209
nodes and edges 199–202
remove method 207
search method 202–204
RadixSort 174
RAM model 683
random permutation 611–612
random sampling algorithm 560, 564, 571, 651
random swaps 611–612
Random Swaps operation 611–612
randomized treaps 91–97
applications of 96–97
performance analysis and profiling 97–108
expected height 97–100
profiling height 100–103
profiling memory usage 106–108
profiling running time 104–106
RandomizedTreap class 104, 106
randomStep method 597, 601, 612, 614
rcn (rectilinear crossing number) 543, 615
re-centering 405, 451, 463
reachability plot 432
Read-Through policy 245
readBit method 124–125, 130
recall 719, 721
recursion 708–715
mutual recursion 715
simple 709–712
tail recursion 712–714
red-black trees 698
RedBox class 672
ReentrantReadWriteLock class 249
reflexive property 118
Refresh-Ahead policy 245
region search
k-d trees 310–316
SS-trees 363
registers 671
regression algorithms 402
Remove an entry operation 116–117, 119
remove method 45–46, 72, 87, 89, 91, 101, 104–105, 115, 175, 186, 208, 282, 316, 318, 369
k-d trees 293–301
radix tries 207
tries 189–192
root 84, 167, 695–696
rotations 75–80
Roulette wheel selection 642
routers 133
S
samePartition method 158
scalable Bloom filters 146
scaling up 112
SCC (strongly connect component) 493
scipy.spatial module 416
search method 72, 87, 187–188, 190, 192, 196, 205, 209, 287–288, 290, 293, 298, 316, 333, 335, 353, 369, 697
k-d trees 284–286
radix tries 202–204
tries 183–187
searchClosestLeaf method 353
searchNode method 186, 190, 193–194, 204
searchNodeWithPrefix method 210
searchParentLeaf method 340–341
segmentation faults 712–713
Selection Sort algorithm 47
sensitivity 719
sentiment analysis 220
set method 117, 127, 151, 157, 234, 250–252
sharding 220
Shop class 378, 385
shop.order method 380
siblings 357, 695–696
side effects 2
simple graph 487, 492
simulated annealing 592, 623–624, 651
embeddings 615–623
force-directed drawing 618–623
minimum edge crossing 615–617
function of 598–600
gradient descent vs. 603–604
implementation 597–598
short-range vs. long-range transitions 601–602
traveling salesman problem 604–615
adjacent vs. random swaps 613–614
applications of 614–615
exact vs. approximated solutions 606
pruning domain 608–609
state transitions 609–613
visualizing cost 607–608
variants 602–603
single-link chain effect 419
singly-linked lists 693, 702
SLA (service-level agreement) 221
SLC (single-linkage clustering) 419
snake_case 672
soft clustering 404
sorted arrays 116–117
Sorted List operation 119
sorted lists 19, 119
SortedPriorityQueue class 72
source vertex 486
Space needed operation 489
spell checkers 174–177, 211–213
Bloom filters 134
compression 176
description and API 177
split coordinate 281
split method 348, 351
splitPoints function 350
SS (similarity search) trees 374
approximated similarity search 364–367
complex example 321–322
delete method 352–359
insert method 337–344
split nodes 348–352
variance, means, and projections 345–348
NN search 359–363
overcoming k-d trees flaws 322
R-trees 322–329
B-trees 323–324
inserting points in 326–328
search 328–329
region search 363
SS+-trees 367–373
improved split heuristic 370–371
mitigating hyper-sphere limitations 369
reducing overlap 371–373
SS-trees vs. 367–368
SsNode::intersectsPoint method 336
SsSphere class 332
SSTable 135
SsTree::delete method 353
SsTree::insert method 352
stack frame 712
stacks 505–506, 705–706
static arrays 690–691
step function 440, 683
stochastic backpropagation 632
StoreFile 135
straight-line drawing 557
straight-line segments 545–549
string search 217
applications for 211–216
autocomplete 215–216
spell checkers 211–213
string similarity 213–214
string sorting 214
T9 215
radix tries 197–211
insert method 204–206
keysStartingWith method 209–211
longestPrefix method 209
nodes and edges 199–202
remove method 207
search method 202–204
spell checkers 174–177
compression 176
description and API 177
tries 177–197
binary search vs. 180–183
insert method 187–189
keysStartingWith method 193–195
longestPrefix method 192–193
remove method 189–192
search method 183–187
when to use 195–197
string similarity 213–214
string sorting 214
subscribed hypersphere 412
successor method 697
switch method 675
symbol table 114
symmetric property 118
symmetric quadratic Bézier curves 552
synchronization
concurrency 248–249, 253–254
locks 249–250
acquiring lock 250–252
read locks 252–253
reentrant locks 252
T
T9 215
tail recursion 711, 713
template method 381
TF-IDF (term frequency–inverse document frequency) 116
thread-safe data structures 247
TN (true negatives) 719
top method 20–21, 37, 39, 45–46, 58–59, 61–66, 89–91, 94, 705
topological sorting 495
TP (true positives) 718–719
transitive property 118
Treap class 75–76, 78, 82, 89
treaps 109
description and API 71–72
design questions 80–81
insert method 82–85
max method 90–91
min method 90–91
multi-indexing 70–71
peek method 89–90
performance analysis and profiling 97–108
expected height 97–100
profiling height 100–103
profiling memory usage 106–108
profiling running time 104–106
randomized treaps 91–97
applications of 96–97
remove method 87–89
rotations 75–80
search method 81–82
top method 89–90
update method 89–90
TreeNode 51–53
trees 690, 695
binary search trees 118–119, 276–281, 696–698
k-d trees 318
boosting k-means with 414–416
cycling through dimensions 275–282
limits and possible improvements 316–318
methods 282–316
overcoming flaws of 322
SS-trees 374
approximated similarity search 364–367
complex example 321–322
delete method 352–359
insert method 337–344
NN search 359–363
overcoming k-d trees flaws 322
R-trees 322–329
region search 363
SS+-trees 367–373
Trémaux trees 539
tries (prefix trees) 177–197
binary search vs. 180–183
insert method 187–189
keysStartingWith method 193–195
longestPrefix method 192–193
remove method 189–192
search method 183–187
when to use 195–197
triggering condition 33
true Boolean value 30, 82, 87, 117–118, 120–121, 123, 134–135, 144, 156–158, 161, 168, 174, 180, 184–186, 190, 203, 337, 355, 436, 497, 509, 513, 532, 535, 538, 549, 565, 672, 717–721
true/false Boolean value 268, 488, 718
TSP (traveling salesman problem) 592, 605, 625, 652
genetic algorithms 652–660
chromosomes 653
crossover 654–655
fitness 653
initialization 653
mutations 653–654
optimizing routes of whole fleet 660
results and parameters tuning 656–660
simulated annealing 604–615
adjacent vs. random swaps 613–614
applications of 614–615
exact vs. approximated solutions 606
pruning domain 608–609
state transitions 609–613
visualizing cost 607–608
TST (ternary search trie) 197
tuples 167, 680–681
type-coercion 709
U
undirected graphs 492
unification 171
unique properties 702
unsorted arrays 115
unsorted list 58
update method 20, 40, 42, 89–90
updateBoundingEnvelope method 347
updatePriority method 45–46, 91, 251
updateQueue method 432
V
validate method 565
value type 117
variables 671–672
vector data structure 673
vertex cover 625, 661, 670
Vertex delete operation 489
Vertex insert operation 489
vertex-vertex intersections 555–556
vertices 486
VMAs (virtual memory areas) 97
W
weight-balancedness 696
while loops 677–678
worst-case performance 561
Write-Ahead policy 244
Write-Around policy 245
Write-Back policy 244
Write-Behind policy 244
Write-Through policy 244
writeBit method 125, 131