index

Numerics

2-3 search trees 698

A

A* algorithm 513–521

heuristics to balance real-time data 520–521

usefulness of 517–520

abstraction 7–8

access violation 713

accuracy 719, 721

acyclic graphs 492, 494–495

adaptive simulated annealing 603

add method 103–104, 131, 156, 160, 167, 705

ADT (abstract data structure) 3, 20, 71, 146, 177, 211

agglomerative hierarchical clustering 170

Airflow 449

AIS (artificial immune systems) 592, 669

algorithms 12, 20, 33

aliases 681

allKeys method 194

allKeysInBranch method 210

alpha organisms 640

Anaconda distribution 60

ancestors, of nodes 696

annealing 595

ant colony optimization 669

Apache HBase 135

API (Application Programming Interface) 16, 19, 26

APX-complete problems 661

areDisjoint method 154, 157, 160–161

Array structure 702

arrays 4, 672–673, 690–692, 702

associative 114–115

sorted 116–117

unsorted 115

associative arrays 114–115

associative properties 702

asynchronous call handling 226–227

autocomplete 215–216

average-case hardness 561

AVL trees 698

B

B-trees 4, 698

bags 705

balanced trees 696, 702

balancedness 696

Bellman-Ford’s algorithm 513

Bézier curves 550–555

BFS (Breadth First Search) 495, 497–500, 521, 534

best route to deliver parcel 506

Dijkstra algorithm vs. 508–509

optimizing delivery routes 495–497

reconstructing path to target 500–501

biconnected graphs 531

big-O notation 682–688

algorithms and performance 682–683

examples of 687–688

notation 685–687

order of magnitude 684

RAM model 683

binary decision problems 717

binary search

multidimensional 269–270

sorted arrays and 116–117

unidimensional 268

binary trees 696

bipartite graphs 526–528

blocks 678–679

Bloom filters 110, 119–120

alternatives to implementing dictionaries 113–114

applications for 132–135

caching 132–133

crawlers 134

distributed databases and file systems 135

IO fetchers 134

routers 133

spell checkers 134

as randomized algorithms 137–138

associative arrays 114–115

concrete data structures 115–120

binary search trees 118–119

Bloom filters 119–120

hash tables 117–118

sorted arrays and binary search 116–117

unsorted arrays 115

dictionary problem 111–112

false negatives/false positives 136–137

implementating 122–132

checking keys 128–130

constructor 127–128

estimating accuracy 132

finding where key is stored 126

reading and writing bits 124–125

storing keys 130–132

using Bloom filters 122–123

improved variants 144–146

combining Bloom filters 144

compressed Bloom filters 145–146

layered Bloom filters 144–145

scalable Bloom filters 146

constructor 138

running time 138

storing an element 138–139

precision estimation 139–143

Bloomier filters 144

borrowFromSibling method 357

bound algorithm 537

bounding envelope 325

boxplot 61

branch algorithm 537

branching factor 55–57, 68

finding optimal 56–57

memory vs. 57

need for d-ary heaps 55–56

running time 56

break keyword 678

bridge node 204

brute force algorithm 9

BST class 106

BSTs (binary search trees) 73, 118–119, 182, 265, 276–281, 283–284, 286–287, 295–296, 337, 696, 698, 702

_bubble_up method 62–64

bubbleUp method 29–32, 34–35, 41–42, 251

bug tracking 17–18

buildTable method 53

Burstsort 214

C

cache hit 234

cache miss 234

Cache-Aside policy 245

caching

Bloom filters 132–133

NN search 262

camelCase 672

canopy clustering 403, 449, 482

applying 454–455

centroid initialization with 469–471

MapReduce 471–475

parallelizing 467–469

Cassandra 135

CDNs (Content Delivery Networks) 133

CDT (concrete data structure) 20, 146

chaining 117, 495, 700

child nodes 695

children array 332

chromosomeInitializer method 653

circulation-demand problem 667

circumscribed circle 555

classification algorithm 402, 405, 451

classification problems 716–721

classification metrics 718–721

accuracy 718–719

precision and recall 719–721

decision problems 717

Las Vegas algorithms 717

Monte Carlo algorithms 717–718

classifyPoints method 408–409, 415, 417, 464, 474

closest hub problem 376–380

centralized application 380–385

complex decisions 383–385

filtering points 381–382

distributed application 386–391

issues handling HTTP communication 387–389

keeping inventory in sync 389–390

lessons learned 390–391

issues with 379–380

sketching solution 377–379

cluster_points method 416

clustering 259, 402, 447

canopy 452–454

applying 454–455

centroid initialization with 469–471

MapReduce 471–475

parallelizing 467–469

DBSCAN 418–426

directly vs. density-reachable 419

implementation 422–424

pros and cons with 424–426

disjoint sets 170–171

evaluation metrics 442–446

k-means 405–418

boosting with k-d trees 414–416

dimensionality 412–413

issues with 410–412

performance analysis 413–414

NN search 398–399

OPTICS 426–442

algorithm 428–432

clustering 433–436

definitions 427–428

Hierarchical clustering 436–440

performance analysis 441–442

parallel 482

MapReduce 455–463

parallelization 449–455

types of 403–405

types of learning 402

collisions 699

color reduction 391–393

complete balanced trees 58, 696

complete graphs 526–528, 616

complete trees 696

completely symmetrical graphs 616

compressed Bloom filters 145–146

ComputeFrequencies method 50–51

concrete data structures 22–28, 115–120

binary search trees 118–119

Bloom filters 119–120

choosing the right data structure 23–24

comparing performance 23

d-ary heaps 27–28

hash tables 117–118

heaps 24–25

min-heaps and max-heaps 26–27

sorted arrays and binary search 116–117

unsorted arrays 115

concurrency 248–249, 253–254

ConcurrentHashMap class 249

ConcurrentHeap class 249

conditional instructions 674–676

else-if variant 675

switch instruction 675–676

connected components 169

connected graphs 492–493, 531

constructKdTree method 292

containers 704–707

bags 705

comparative analysis for 707

queues 706–707

stacks 705–706

contains method 41, 45–46, 82, 129, 131, 135, 138, 144–145, 175, 721

Contains operation 45, 91

contains(x) method 131

continue keyword 678

convex hull 552

core data structures 690–703

arrays 691–692

comparative analysis of 702–703

hash tables 698–702

conflicts resolution in hashing 699–701

hashing 698–699

performance 702

storing key-value pairs 698

linked lists 692–695

trees 695–698

core points 419, 423

cost functions 634

cProfile package 60

crawlers 134

create_encoding method 67

create_frequency_table method 67

Creating the structure operation 116–117

crossing number 541

finding 541–542

heuristics for 560–568

rectilinear 543–545

crossing-number optimization 592

crossover 640, 670

crossover chance 646

curse of dimensionality 320, 367

cycles, in graphs 494

D

d-ary heaps 27–28, 55–56

DAGs (directed acyclic graphs) 494

data compression 50–55

data model 20

data structures 5–12

algorithms vs. 5–6

concrete 22–28, 115–120

binary search trees 118–119

Bloom filters 119–120

choosing the right data structure 23–24

comparing performance 23

d-ary heaps 27–28

hash tables 117–118

heaps 24–25

min-heaps and max-heaps 26–27

sorted arrays and binary search 116–117

unsorted arrays 115

core 690–703

arrays 691–692

comparative analysis of 702–703

hash tables 698–702

linked lists 692–695

trees 695–698

defining 2–4

describing 5

Mars colony food supply example 7

abstraction 7–8

algorithms 9–10

writing your own implementations of 6

DataFrame 61

DBSCAN (density-based spatial clustering of applications with noise) 418–426

directly vs. density-reachable 419

implementation 422–424

MapReduce 475–481

pros and cons with 424–426

dbscan method 422, 479

deadlock 250

decision problems 717

default arguments 680

delete method 92, 125, 198, 352–353, 355–356, 358, 488, 697

Delete-Minimum operation 23

DeLiClu algorithm 6

dense graphs 521

destination vertex 486

DFS (Depth First Search) 169, 495–506, 521

best route to deliver parcel 506

optimizing delivery routes 495–497

queues vs. stacks 505–506

dfs method 502, 504–505

dictionary 111, 114, 117, 175, 707

Dijkstra algorithm 49, 507–513, 521, 523

analysis 510–511

BFS vs. 508–509

implementation 509

shortest route for deliveries 511–513

directed graphs 492

directly reachable points 419

disjoint sets 172, 479

applications for 168–171

clustering 170–171

connected components 169

Kruskal’s algorithm 169–170

unification 171

data structure API 153–154

distinct subsets problem 148–151

heuristics to improve running time 162–168

implementing balancing and path compression 166–168

path compression 164–166

naïve solution 154–158

reasoning on solutions 151–152

tree-like structure 159–162

disjointSet method 155, 169

distance function 337, 513, 515

distributed databases and file systems 135

division method 699

DL (deep learning) 401

doAction function 672

DoS (Denial of Service) 133

double hashing function 126

doubly-linked lists 693, 702

DS (data structures) 2–3, 6, 19, 690

dynamic arrays 690–691

dynamic programming 10, 537, 710

dynamic properties 702

E

EA (evolutionary algorithms) 668

edge intersections 545–556

Bézier curves 550–552

intersections between quadratic Bézier curves 552–555

polylines 549–550

straight-line segments 545–549

vertex-vertex and edge-vertex intersections 555–556

edgeIntersections method 565

edges 486

elbow method 447

else-if variant 675

embeddings 523–528

complete and bipartite graphs 526–528

definitions 525–526

gradient descent for 585–589

different criterion 586–588

implementation 588–589

simulated annealing 615–623

force-directed drawing 618–623

minimum edge crossing 615–617

enabling recursion 706

epistasis 651

eventual consistency 220

evictOneEntry method 252

exact element search 333

Extensible Indexing 395

F

f function 680, 685, 687, 700–701, 710, 715

factorial function 712

false Boolean value 82, 118, 123, 137, 145, 156–158, 161, 168, 174, 180, 184–186, 190, 203, 717–718, 720–721

false-biased algorithm 138, 718, 720–721

Fibonacci heap 243

FIFO (First In, First Out) 241, 706

Find-Minimum operation 23

findClosestEntryInNodesList function 356

finding outliers 403

findMin method 295–298, 300–301, 316

findPartition method 153, 157–162, 164–165, 167–168

findSiblingToMergeTo function 356

findSplitIndex method 349

fitness method 634

flat clustering 404

fNN method 382, 384

FNs (false negatives) 720

fnv1 (Fowler-Noll-Vo hashing) 122, 126

for loops 676–677

forbidden graphs 526

force-directed graph drawing 586, 590, 592, 618

forceDirectedEmbedding method 588–589

FPs (false positives) 144, 720

_frequency_table_to_heap method 61–62, 64, 67

functions 679–681

overloading and default arguments 680

tuples 680–681

G

g function 685, 715

GA (genetic algorithms) 631

GCD (greatest common divisor) 524, 678

gcd function 680

genetic algorithms 592, 625–670

chromosomes 631–633

crossover 645–648

fitness 634–635

inspired by nature 627–631

maximum flow 665–667

minimum vertex cover 661–665

applications of 662–663

implementing algorithm 663–665

mutations 648–650

natural selection 635–639

population 633–634

protein folding 667–668

selecting individuals for mating 639–645

template 650

traveling salesman problem 652–660

chromosomes 653

crossover 654–655

fitness 653

initialization 653

mutations 653–654

optimizing routes of whole fleet 660

results and parameters tuning 656–660

when they work best 651–652

get method 238, 251–253

getPassThroughEdge function 208

getSize() method 238

globally optimal choices 625

gradient descent 470, 558, 560, 574, 592, 624, 651, 669

applications of 581–585

embeddings 585–589

different criterion 586–588

implementation 588–589

geometrical interpretation 576–579

heuristics for crossing number 560–568

math of 575–579

optimization 568–574

cost functions 568–570

optimizing random sampling 571–574

step functions and local minima 571

problems with 579–581

simulated annealing vs. 603–604

when applicable 579

Graph class 490

graphs 521–557

A* algorithm 513–521

heuristics to balance real-time data 520–521

usefulness of 517–520

as algebraic types 489–490

BFS and DFS 495–506

best route to deliver parcel 506

BFS 497–500

DFS 502–505

optimizing delivery routes 495–497

queues vs. stacks 505–506

reconstructing path to target 500–501

connected components 169

defining 486–491

Dijkstra algorithm 49, 507–513

analysis 510–511

BFS vs. 508–509

implementation 509

shortest route for deliveries 511–513

edge intersections 545–556

Bézier curves 550–552

intersections between quadratic Bézier curves 552–555

polylines 549–550

straight-line segments 545–549

vertex-vertex and edge-vertex intersections 555–556

embeddings 523–528

complete and bipartite graphs 526–528

definitions 525–526

gradient descent 590

applications of 581–585

for embeddings 585–589

geometrical interpretation 576–579

heuristics for crossing number 560–568

math of 575–579

optimization 568–574

problems with 579–581

when appliable 579

implementing 487–489

Kruskal’s algorithm 169–170

non-planar graphs 539–545

finding crossing number 541–542

rectilinear crossing number 543–545

planar graphs 528–539

efficient algorithms 538–539

improving performance 536–537

Kuratowski’s theorem 529–530

planarity testing 530–535

Prim’s algorithm 49–50

properties of 491–495

acyclic 494–495

connected 492–493

undirected 492

pseudo-code 490–491

greedy algorithm 8

gridShard method 479

H

H function 121

Hamming distance 213

hard clustering 404

hash function 699

hash set 120, 138

hash tables 117–118, 120, 690, 698–702

conflicts resolution in hashing 699–701

hashing 698–699

performance 702

storing key-value pairs 698

hashing 698

HashMap 45, 166, 249

HashSets 144

HashTables 144

hasItemX() method 382

_heap_to_tree method 61–62, 67

_heapify method 63–64, 66–67

heaps 20, 25, 702, 712

d-ary heaps 27–28

implementing 28–46

bubbleUp method 29–32

contains method 44–45

dealing with duplicates 41–42

full code 46

heapify method 42–44

insert method 35–37

performance 45–46

pushDown method 33–35

top method 37–40

update method 40

max-heaps 26–27

min-heaps 26–27

height-balancedness 696

height, of trees 696

heuristics

for crossing number 560–568

to balance real-time data 520–521

to improve running time 162–168

implementing balancing and path compression 166–168

path compression 164–166

HFile 135

Hierarchical clustering 404

_highest_priority_child method 64–65

HighestPriorityChild method 64

Hill climbing 560, 573, 593

Hill descent 573, 593, 600

Hit ratio 133

Huffman codes 50–55

huffman method 53

I

if-then-else statement 674–675, 678

in keyword 677

indentation 678–679

Individual class 633–634

induced subgraphs 537, 539

Info class 166–167

initialization 451, 463

initialSet 155

initPopulation method 634, 653

insert method 20, 23, 35, 45–46, 58–59, 61–64, 72, 84–85, 91, 96, 115, 117, 120, 131, 138–139, 144–145, 175, 187–189, 192, 196, 204–206, 282, 287, 290, 293, 298, 316, 318, 333, 339, 341, 343–345, 350, 352–353, 356, 369, 378, 488, 697

k-d trees 287–288

radix tries 204–206

SS-trees 337–344

split nodes 348–352

variance, means, and projections 345–348

tries 187–189

insertion sort 32

integers 127

invariants 20

inverse Ackermann function 166, 172

IO fetchers 134

isPassThrough function 208

iterate operations 705

Iterative MapReduce 465

J

Jordan curve 526–527

JsGraphs library 568, 613, 621, 656

K

k-d trees 318

boosting k-means with 414–416

cycling through dimensions 275–282

balance 282

binary search trees 276–281

invariants 281

limits and possible improvements 316–318

methods 282–316

balanced tree 289–292

insert method 287–288

NN search 301–310

region search 310–316

remove method 293–301

search method 284–286

overcoming flaws of 322

k-dimensional spaces 268–271

modeling 2-D partitions with data structure 270–271

multidimensional binary search 269–270

unidimensional binary search 268

k-means 404, 418

boosting with k-d trees 414–416

dimensionality 412–413

issues with 410–412

MapReduce 463–475

centroid initialization with canopy clustering 469–471

MapReduce canopy clustering 471–475

parallelizing canopy clustering 467–469

parallelizing 451–452

performance analysis 413–414

KdNode class 284–285, 288

KdTree class 267, 284–285, 344, 415

KdTree::search(target) method 285

keysStartingWith method 194, 210

radix tries 209–211

tries 193–195

keysWithPrefix method 195–197

Kruskal’s algorithm 169–170

Kubernetes 449

Kuratowski’s theorem 529–530

L

Las Vegas algorithms 137, 146, 716–717

LBFs (layered Bloom filters) 144–145

leaves 695–696

Levenshtein distance 211, 213

LFU (least frequently used) 133, 238–244

choosing 240

LRU vs. 240–242

performance 243

problems with 243–244

LIFO (Last In, First Out) 705

linear regression 583–585, 669

linearithmic time 292, 606

linked lists 4, 690, 692–695

list’s nodes 695

local minimum 569

local optimization 573

locality of reference 214

locality properties 702

locally-optimal choice 625

locks 249–250

acquiring 250–252

read locks 252–253

reentrant locks 252

logistic regression 669

radix tries 209

tries 192–193

loops 676–678

break and continue keywords 678

for loops 676–677

while loops 677–678

loss function 583

LRU (least recently used) cache 256

applications for 254–255

computing things twice 219–222

LFU vs. 240–242

performance 238

policies 244–245

remembering values 222–228

asynchronous call handling 226–227

description and API 224

fresh data 224–226

marking cache values as 227–228

synchronization 245–254

concurrency 248–249, 253–254

locks 249–250, 253

temporal ordering 232–238

M

Manhattan Distance 506

Map class 53, 167

MapReduce model 455–463

DBSCAN 475–481

k-means 463–475

centroid initialization with canopy clustering 469–471

MapReduce canopy clustering 471–475

parallelizing canopy clustering 467–469

mapping, then reducing 459–462

problem 455–459

market segmentation 403

Mars colony food supply example 7

abstraction 7–8

algorithms 9–10

Matplotlib 60

max method 26, 72, 90–91, 681, 697

max-heaps 26–27

maximum flow 625, 665, 667

maxSize element 128

maxTolerance method 127–128

MDRQ (multidimensional range queries) 395

Membership category 405

memetic algorithm 668

memoization 537, 708, 710

memory locality 25

memory management 706

mergeChildren method 358

meta-heuristic 595

metrics functions 718

MFU (Most Frequently Used) 239

min method 26, 72, 90–91, 296, 681, 697

min_max method 681

min-heaps 26–27, 48

Min/Max operation 91, 119

mini-batch backpropagation 632

minimum bounding circle 555

minimum crossing number 560

minimum edge crossing 615

minimum rectilinear crossing number 560

minimum vertex cover 661–665

applications of 662–663

implementing genetic algorithm 663–665

minVarianceSplit method 352

ML (machine learning) 401, 720

Monte Carlo algorithms 137, 146, 603, 656, 716–718

MRU (Most Recently Used) 133

MSD (Most Significant Digit) 214

MST (minimum spanning tree) 49, 169, 606

multi-indexing 70–71

multidimensional data indexing 318

cycling through dimensions 275–282

balance 282

binary search trees 276–281

invariants 281

limits and possible improvements 316–318

methods 282–316

balanced tree 289–292

insert method 287–288

NN search 301–310

region search 310–316

remove method 293–301

search method 284–286

multigraphs 487

multiplication method 699

Murmur hashing 122, 126

mutations 640, 648–650, 653–654, 670

mutual recursion 715

N

natural selection 628, 635–639

nearestNeighbor method 316, 360, 362, 369

negative gradient 576

NN (nearest neighbors) search 272, 399

centralized application 380–385

complex decisions 383–385

filtering points 381–382

closest hub problem 376–380

issues with 379–380

sketching solution 377–379

clustering 398–399

color reduction 391–393

description and API 266–267

distributed application 386–391

issues handling HTTP communication 387–389

keeping inventory in sync 389–390

lessons learned 390–391

k-d trees 301–310

k-dimensional spaces 268–271

Higher dimensions 269–270

modeling 2-D partitions with data structure 270–271

unidimensional binary search 268

particle interaction 393–394

search problem 260–261

similarity search trees 374

solutions 261–266

caching 262

data structure choice 264–266

precomputing 261–262

simplifying problem 262–263

nNearestNeighbor method 306

node.left 91

node.right 91

nodeDistance function 362

non-planar graphs 539–545

finding crossing number 541–542

rectilinear crossing number 543–545

NP class 403, 661

NP-complete problem 541, 605

NP-hard problem 560

null inbound flow 666

null outbound flow 666

O

open addressing 117, 700

operating system 133

OPTICS (ordering points to identify the clustering structure) 426–442

algorithm 428–432

clustering 433–436

definitions 427–428

Hierarchical clustering 436–440

performance analysis 441–442

opticsCluster method 435–436

optimization 568–574

cost functions 568–570

genetic algorithms 670

chromosomes 631–633

crossover 645–648

fitness 634–635

inspired by nature 627–631

maximum flow 665–667

minimum vertex cover 661–665

mutations 648–650

natural selection 635–639

population 633–634

protein folding 667–668

selecting individuals for mating 639–645

template 650

traveling salesman problem 652–660

when it works best 651–652

gradient descent 590

applications of 581–585

for embeddings 585–589

geometrical interpretation 576–579

heuristics for crossing number 560–568

math of 575–579

problems with 579–581

when applicable 579

random sampling 571–574

simulated annealing 623

embeddings 615–623

function of 598–600

gradient descent vs. 603–604

implementation 597–598

short-range vs. long-range transitions 601–602

traveling salesman problem 604–615

variants 602–603

step functions and local minima 571

optimization problems 568

Optional class 107–108

Optional::map 108

order function 379

order properties 702

outliers 418

overloading 680

P

Pandas library 60–61

parallel clustering 482

MapReduce 455–463

DBSCAN 475–481

k-means 463–475

mapping, then reducing 459–462

problem 455–459

parallelization 449–455

canopy clustering 452–455

parallel vs. distributed computing 450–451

parallelizing k-means 451–452

parallelization 449–455

canopy clustering 452–455

parallel vs. distributed computing 450–451

parallelizing k-means 451–452

parent node 695

parentIndex 30

parentsMap 160

particle interaction 393–394

particle swarm optimization 669

partition method 291–292

partitioning clustering 404

partitionsMap 160

pass-through nodes 198

path compression 164, 166

Patricia trees 198

PCB (printed circuit boards) 523, 615

perfect hashing 700, 702

perfectly balanced tree 696

phenotypes 631

pigeonhole principle 699–700

planar embedding 525

planar graphs 525, 528–539

efficient algorithms 538–539

improving performance 536–537

Kuratowski’s theorem 529–530

planarity testing 530–535

planarity testing 530–535

points array 291, 332, 378

pointsInRegion operation 316, 369

polylines 549–550

positive predictive value 719

PQ (priority queue) 19–20, 705

precision 719, 721

predecessor method 697

Prev/Next operation 119

Prim’s algorithm 49–50

priority queues 68, 707

branching factor analysis 55–57

branching factor vs. memory 57

finding optimal branching factor 56–57

need for d-ary heaps 55–56

running time 56

concrete data structures 22–28

choosing the right data structure 23–24

comparing performance 23

d-ary heaps 27–28

heaps 24–25

min-heaps and max-heaps 26–27

containers as 704–707

bags 705

comparative analysis for 707

queues 706–707

stacks 705–706

Dijkstra algorithm 49

finding k largest elements 47–48

choosing the right data structure 47

coding 48

min-heaps 48

function of 21–22

generalizing FIFO 22

Huffman codes 50–55

implementing heaps 28–46

bubbleUp method 29–32

contains method 44–45

dealing with duplicates 41–42

full code 46

heapify method 42–44

insert method 35–37

performance 45–46

pushDown method 33–35

top method 37–40

update method 40

performance analysis 58–68

choosing best branching factor 67–68

heapify method running time growth 66–67

interpreting results 61–65

profiling 59–61

Prim’s algorithm 49–50

priority handling 17–18

sorted lists 19

PriorityQueue class 40, 72, 242, 249

profiling 59

program counters 712

protected method 153

protein folding 668

pseudo-code 671–681

arrays 672–673

blocks and indentation 678–679

conditional instructions 674–676

else-if 675

switch 675–676

functions 679–681

overloading and default arguments 680

tuples 680–681

loops 676–678

break and continue keywords 678

for loops 676–677

while loops 677–678

variables and basics 671–672

pStats.Stats function 61

_push_down method 62–66

pushDown method 33–35, 39–40, 42–44, 64

Q

quantum annealing 592

query method 416

queues 706–707

priority 68

branching factor analysis 55–57

concrete data structures 22–28

containers as 704–707

Dijkstra algorithm 49

finding k largest elements 47–48

function of 21–22

generalizing FIFO 22

Huffman codes 50–55

implementing heaps 28–46

performance analysis 58–68

Prim’s algorithm 49–50

priority handling 17–18

sorted lists 19

stacks vs. 505–506

R

R-tree 267

radix tries (compact prefix trees) 197–211

insert method 204–206

keysStartingWith method 209–211

longestPrefix method 209

nodes and edges 199–202

remove method 207

search method 202–204

RadixSort 174

RAM model 683

random permutation 611–612

random sampling algorithm 560, 564, 571, 651

random swaps 611–612

Random Swaps operation 611–612

randomized treaps 91–97

applications of 96–97

performance analysis and profiling 97–108

expected height 97–100

profiling height 100–103

profiling memory usage 106–108

profiling running time 104–106

RandomizedTreap class 104, 106

randomStep method 597, 601, 612, 614

rcn (rectilinear crossing number) 543, 615

re-centering 405, 451, 463

reachability plot 432

Read-Through policy 245

readBit method 124–125, 130

recall 719, 721

recursion 708–715

mutual recursion 715

simple 709–712

tail recursion 712–714

red-black trees 698

RedBox class 672

ReentrantReadWriteLock class 249

reflexive property 118

Refresh-Ahead policy 245

region search

k-d trees 310–316

SS-trees 363

registers 671

regression algorithms 402

Remove an entry operation 116–117, 119

remove method 45–46, 72, 87, 89, 91, 101, 104–105, 115, 175, 186, 208, 282, 316, 318, 369

k-d trees 293–301

radix tries 207

tries 189–192

root 84, 167, 695–696

rotations 75–80

Roulette wheel selection 642

routers 133

S

samePartition method 158

scalable Bloom filters 146

scaling up 112

SCC (strongly connect component) 493

scipy.spatial module 416

search method 72, 87, 187–188, 190, 192, 196, 205, 209, 287–288, 290, 293, 298, 316, 333, 335, 353, 369, 697

k-d trees 284–286

radix tries 202–204

tries 183–187

searchClosestLeaf method 353

searchNode method 186, 190, 193–194, 204

searchNodeWithPrefix method 210

searchParentLeaf method 340–341

segmentation faults 712–713

Selection Sort algorithm 47

sensitivity 719

sentiment analysis 220

set method 117, 127, 151, 157, 234, 250–252

sharding 220

Shop class 378, 385

shop.order method 380

siblings 357, 695–696

side effects 2

simple graph 487, 492

simulated annealing 592, 623–624, 651

embeddings 615–623

force-directed drawing 618–623

minimum edge crossing 615–617

function of 598–600

gradient descent vs. 603–604

implementation 597–598

short-range vs. long-range transitions 601–602

traveling salesman problem 604–615

adjacent vs. random swaps 613–614

applications of 614–615

exact vs. approximated solutions 606

pruning domain 608–609

state transitions 609–613

visualizing cost 607–608

variants 602–603

single-link chain effect 419

singly-linked lists 693, 702

SLA (service-level agreement) 221

SLC (single-linkage clustering) 419

snake_case 672

soft clustering 404

sorted arrays 116–117

Sorted List operation 119

sorted lists 19, 119

SortedPriorityQueue class 72

source vertex 486

Space needed operation 489

spell checkers 174–177, 211–213

Bloom filters 134

compression 176

description and API 177

split coordinate 281

split method 348, 351

splitPoints function 350

SS (similarity search) trees 374

approximated similarity search 364–367

complex example 321–322

delete method 352–359

insert method 337–344

split nodes 348–352

variance, means, and projections 345–348

NN search 359–363

overcoming k-d trees flaws 322

R-trees 322–329

B-trees 323–324

inserting points in 326–328

search 328–329

region search 363

SS+-trees 367–373

improved split heuristic 370–371

mitigating hyper-sphere limitations 369

reducing overlap 371–373

SS-trees vs. 367–368

SsNode::intersectsPoint method 336

SsSphere class 332

SSTable 135

SsTree::delete method 353

SsTree::insert method 352

stack frame 712

stacks 505–506, 705–706

static arrays 690–691

step function 440, 683

stochastic backpropagation 632

StoreFile 135

straight-line drawing 557

straight-line segments 545–549

string search 217

applications for 211–216

autocomplete 215–216

spell checkers 211–213

string similarity 213–214

string sorting 214

T9 215

radix tries 197–211

insert method 204–206

keysStartingWith method 209–211

longestPrefix method 209

nodes and edges 199–202

remove method 207

search method 202–204

spell checkers 174–177

compression 176

description and API 177

tries 177–197

binary search vs. 180–183

insert method 187–189

keysStartingWith method 193–195

longestPrefix method 192–193

remove method 189–192

search method 183–187

when to use 195–197

string similarity 213–214

string sorting 214

subscribed hypersphere 412

successor method 697

switch method 675

symbol table 114

symmetric property 118

symmetric quadratic Bézier curves 552

synchronization

concurrency 248–249, 253–254

locks 249–250

acquiring lock 250–252

read locks 252–253

reentrant locks 252

T

T9 215

tail recursion 711, 713

template method 381

TF-IDF (term frequency–inverse document frequency) 116

thread-safe data structures 247

TN (true negatives) 719

top method 20–21, 37, 39, 45–46, 58–59, 61–66, 89–91, 94, 705

topological sorting 495

TP (true positives) 718–719

transitive property 118

Treap class 75–76, 78, 82, 89

treaps 109

description and API 71–72

design questions 80–81

insert method 82–85

max method 90–91

min method 90–91

multi-indexing 70–71

peek method 89–90

performance analysis and profiling 97–108

expected height 97–100

profiling height 100–103

profiling memory usage 106–108

profiling running time 104–106

randomized treaps 91–97

applications of 96–97

remove method 87–89

rotations 75–80

search method 81–82

top method 89–90

update method 89–90

TreeNode 51–53

trees 690, 695

binary search trees 118–119, 276–281, 696–698

k-d trees 318

boosting k-means with 414–416

cycling through dimensions 275–282

limits and possible improvements 316–318

methods 282–316

overcoming flaws of 322

SS-trees 374

approximated similarity search 364–367

complex example 321–322

delete method 352–359

insert method 337–344

NN search 359–363

overcoming k-d trees flaws 322

R-trees 322–329

region search 363

SS+-trees 367–373

Trémaux trees 539

tries (prefix trees) 177–197

binary search vs. 180–183

insert method 187–189

keysStartingWith method 193–195

longestPrefix method 192–193

remove method 189–192

search method 183–187

when to use 195–197

triggering condition 33

true Boolean value 30, 82, 87, 117–118, 120–121, 123, 134–135, 144, 156–158, 161, 168, 174, 180, 184–186, 190, 203, 337, 355, 436, 497, 509, 513, 532, 535, 538, 549, 565, 672, 717–721

true/false Boolean value 268, 488, 718

TSP (traveling salesman problem) 592, 605, 625, 652

genetic algorithms 652–660

chromosomes 653

crossover 654–655

fitness 653

initialization 653

mutations 653–654

optimizing routes of whole fleet 660

results and parameters tuning 656–660

simulated annealing 604–615

adjacent vs. random swaps 613–614

applications of 614–615

exact vs. approximated solutions 606

pruning domain 608–609

state transitions 609–613

visualizing cost 607–608

TST (ternary search trie) 197

tuples 167, 680–681

type-coercion 709

U

undirected graphs 492

unification 171

unique properties 702

unsorted arrays 115

unsorted list 58

update method 20, 40, 42, 89–90

updateBoundingEnvelope method 347

updatePriority method 45–46, 91, 251

updateQueue method 432

V

validate method 565

value type 117

variables 671–672

vector data structure 673

vertex cover 625, 661, 670

Vertex delete operation 489

Vertex insert operation 489

vertex-vertex intersections 555–556

vertices 486

VMAs (virtual memory areas) 97

W

weight-balancedness 696

while loops 677–678

worst-case performance 561

Write-Ahead policy 244

Write-Around policy 245

Write-Back policy 244

Write-Behind policy 244

Write-Through policy 244

writeBit method 125, 131

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.140.185.123