Index

A

abstract data types (ADT)

exploring 43

queues 47, 49

stacks 45-47

tree 50

vector 43, 44

abstract data types (ADT), stacks

practical example 47

time complexity 47

abstract data types (ADT), vector

time complexity 44

actionable rules

examples 176

activation functions 262

hyperbolic tangent (tanh) function 268, 269

ReLU activation function 266

sigmoid function 264, 265

softmax function 269, 270

step function 264

adjacency list

constructing 126

Advanced Encryption Standard (AES) 424

advanced lossless compression formats 404

GZIP compression 405

LZO compression 404

snappy compression 404

advanced sequential modeling techniques

evolution 346, 347

AI spring 252

AI winter 252

algorithm 4, 9

approximate algorithm 23

coding phase 6

compute-intensive algorithms 10

data-intensive algorithms 10

design phase 5

deterministic algorithm 22

development environment 7

exact algorithm 23

explainability 23

performance analysis 12

performance, estimating 15

phases 5, 6

randomized algorithm 22

selecting 21

validating 22

algorithm design techniques

compute dimension 12

data dimension 10-12

algorithmic ethics 467

bias and discrimination 467

consideration 468, 469

privacy 467

problems 468

solution factors 469

algorithmic ethics, solution factors

inconclusive evidence, considering 470

misguided evidence 470

traceability 470

unfair outcomes 471

algorithmic solutions

challenges 458

unexpected disruption 458, 459

algorithmic strategies 87

divide-and-conquer strategy 87

dynamic programming strategy 90

greedy algorithms 92

Amazon Web Services (AWS) 7, 218, 439

Amdahl’s law 444

deriving 445-447

Apache Spark

divide-and-conquer strategy, applying 88-90

large-scale algorithms, processing with 454, 455

reference link 88

using, in cloud computing 452

Apache Spark architecture

cluster manager 453

driver program 452

executors 453

worker node 453

append()

elements, adding with 29

elements, removing with 30

approximate algorithm 21, 23

apriori algorithm 178

limitation 179

AP systems 397

Arithmetic Logic Units (ALUs) 448

association analysis algorithms 178

apriori algorithm 178

apriori algorithm, limitation 179

FP-growth algorithm 178

FP-growth algorithm, using 183-185

FP-tree 179-182

frequent pattern growth (FP-growth) algorithm 179

frequent patterns, mining 182, 183

Association for Computing Machinery (ACM) 24

association rules mining 172, 175

actionable rules 176

association analysis algorithms 178

confidence 177, 178

inexplicable rules 176

lift 178

ranking rules 176

support measure 177

trivial rules 175

types of rules 175

asymmetric encryption 424

blockchain 428, 429

cryptography 428, 429

public key infrastructure (PKI) 427, 428

SSL/TLS handshaking algorithm 425-427

attention mechanism 346, 353

challenges 357

in neural network 353

key aspects 355

overview 356

autoencoder 346

coding 348

environment, setting up 349

exploring 347, 348

reconstruction phase 348

training phase 348

autoencoder, environment

compilation 349

data preparation 349

model architecture 349

prediction 350

training phase 350

visualization 350, 351

average pooling 280

B

backend engines, Keras 271

backpropagation through time (BPTT) 325, 326

bag-of-words-based (BoW-based) 289

betweenness centrality 124

bias model

reducing 471, 472

Bidirectional Encoder Representations from Transformers (BERT) 366

bidirectional RNNs 329, 359

big data 86

Big O notation 16-18, 43

functions 31

binary classifier 201

binary search 72, 73

performance analysis 73

binary tree 51

black box algorithm 460

versus white box algorithm 460

black swan event

challenges and opportunities 473

defining 472

practical application 473

blockchain 428-430

characteristics 428

breadth-first search (BFS) 116, 126

adjacency list, constructing 126

breadth-first search (BFS), algorithm implementation 127

code implementation 127, 128

initialization 127

loop 127

specific searches, using 129, 130

brute-force strategy

using 94-97

bubble sort algorithm 57

logic 57-59

optimizing 59, 60

performance analysis 60

bubble sort algorithm, time complexity

best case 60

worst case 60

C

caesar cipher 414, 415

candidate cell state 334

candidate-generation phase 179

CAP theorem 394

applying 405, 406

AP systems 397

availability 395

CA systems 397

connecting 395

consistency 395

CP systems 398

partition tolerance 395

presenting 395, 396

case conversion 294

CA systems 397

centrality measures 119

betweenness 120

closeness 121

degree 119, 120

eigenvector centrality 121

fairness 121

centrality metrics

calculating, with Python 122

graph, crafting 122

graph, visualizing 123

libraries and data, setting 122

Central Processing Units (CPUs) 439

Certification Authority (CA) 427

cipher 411

designing 414

suite 411

text 411

classification algorithms 193, 233, 234

feature engineering, with data processing pipeline 236, 237

historical dataset, exploring 235, 236

regression algorithms 234

regressors challenge, presenting 235

regressors challenge, problem statement 235

Classification And Regression Tree (CART) algorithm 93

classifiers 193

versus regressors 193

classifiers challenge 194

evaluation 200

feature engineering, with data processing pipeline 195

feature normalization 199

logistic regression, using 226, 227

Naive Bayes theorem, using 233

problem statement 194, 195

Random Forest algorithm, using 222, 223

SVM algorithm, using 229

classifiers phase

specifying 211, 212

cleaning data 294

case conversion 294

lemmatization 296, 297

numbers, handling 295

punctuation removal 294

stemming 296, 297

stopword removal 296

white space removal 296

closeness 121

closeness centrality 124

Cloud and Algorithmic Scalability

elasticity 87

cloud computing

Apache Spark, using 452

large-scale algorithms, using 455

Cloud for distributed model training

advantages 455

clustering algorithms 149

Cosine distance 153, 154

Euclidean distance 151

Manhattan distance 152

quantify similarities 150

clusters

application 166

creating, with DBSCAN in Python 164, 165

evaluating 166

clusters, government use cases

crime-hotspot analysis 166

demographic social analysis 166

clusters, market research

customer categorization 167

market segmentation 166

targeted advertisements 167

collaborative filtering engines 374, 376

issues 377

compilation 349

complexity theory 12

components, attention mechanism

contextual relevance 355

prioritized focus 355

symbol efficiency 355

compression algorithms

using 406

computational ethics 467

compute-intensive algorithms 10

Compute Unified Device Architecture (CUDA)

Bottom of Form 448

data locality 451, 452

GPU architectures, in parallel computing 447

parallel processing, in LLMs 449, 450

confidence 177, 178

confusion matrix 201, 202

precision 202

recall 202

constant time (O(1)) complexity 18, 19

content-based recommendation engines 374, 375

unstructured documents, determining 375, 376

contextual relevance 355

context vector (c2) 357

Converging Iterations 13

convolution neural networks (CNNs) 279

convolution 280

pooling 280

Cosine distance 153, 154

cost function 225

CP systems 398

crime-hotspot analysis 166

Cross-Industry Standard Process for Data Mining (CRISP-DM) 144, 211, 430, 471

reference link 144

cryptanalysis 411

cryptographic hash function

application 422

characteristics 418, 419

implementing 419

implementing, with MD5 420, 421

implementing, with SHA 421, 422

using 418, 419

cryptographic techniques

asymmetric encryption 424

symmetric encryption 423

types 417

cryptography 410, 428-430

CUDA Deep Neural Network (cuDNN) library 271

Cyclic Redundancy Check (CRC) 413

D

data algorithms 394

CAP theorem 394

CAP theorem, connecting 395

data compression, connecting 395

distributed environment storage 394, 395

data categorization 413, 414

data cleaning

with Python 297-299

data compression

connecting 395

data compression algorithm

decoding 398

lossless compression techniques 398

DataFrame 39

subset, creating 39

using 38

data-intensive algorithms 10

data management in AWS 405

benefits, quantifying 406, 407

CAP theorem, applying 405, 406

compression algorithms, using 406

data preparation 349

data processing pipeline 195

feature engineering 236, 237

data representation

for sequential models 317

decision tree classification algorithm 212-215

strengths and weaknesses 216

use cases 216

decoder 352

decoding phase 347, 348

decryption 411

deep learning technology stack 271

deep model

architecture 367-369

using, to create LLMs 367

deep neural network 256

degree 119, 120

degree centrality 123

degree of suspicion (DOS) 137-139

Deletion Operations 42

demographic social analysis 166

dendrogram 161

Density-based spatial clustering of applications with noise (DBSCAN) 163

used, for creating clusters in Python 164, 165

depth-first search (DFS) 126, 130-132

designed algorithm

concerns 80

correctness 81

performance 82-86

scalability 86

deterministic algorithm 22

dictionaries 33-35

need for 37

dictionary-based compression LZ77 403

example 403

versus Huffman 403

digital certificate 427

dimensionality reduction 167

feature aggregation 167

feature selection 167

directed graph (DiGraph) 113, 277

distributed computing 454

distributed environment storage 394

Distributed Ledger Technology (DLT) 428

Distributed Shared Memory (DSM) 442

Diverging Iterations 13

divide-and-conquer strategy 87

applying, to Apache Spark 88-90

domain validation 427

downsampling 280

performing 280

driver machine 452

dynamic programming strategy 90

characteristics 91

components 91

conditions 91

E

ego-centered network 114

applications 115

basics 114

one-hop neighbors 114

two-hop neighbors 114

Eigen 271

eigenvector centrality 121, 124

Elastic Compute Cloud (EC2) 439

Elastic Load Balancing (ELB) 439-442

encoding phase 347, 348

encryption 411

End-Of-Sentence (<EOS>) 352

ensemble boosting

versus Random Forest algorithm 221

ensemble methods 217

gradient boosting, implementing with XGBoost algorithm 218, 220

Euclidean distance 151

event types

dependent 230

independent 230

mutually exclusive 230

exact algorithm 23

explainability

of algorithm 23

F

fairness 121

False Positive Rate (FPR) 207

fault tolerance 440

feature engineering 195

data, importing 196

dataset, into testing portion 199

dataset, into training portion 199

features and label, specifying 199

feature selection 196, 197

one-hot encoding 197, 198

with data processing pipeline 236, 237

feature normalization 199

feature selection 196, 197

Feature Vector 1 284

Feature Vector 2 284

feedforward neural networks (FFNNs) 362

filter 280

filter phase 179

First In, First Out (FIFO) principle 130

First-In, Last-Out (FILO) principle 45

fixed-length code 402

Flat Iterations 13

forget gate 333

fraud detection with deep learning, case study 283

methodology 283-287

fraud detection with SNA 133

fraud (F) 133-135

simple fraud analytics, conducting 135, 136

watchtower fraud analytics methodology, presenting 136

fraud (F) 133

frequent pattern growth (FP-growth) algorithm 179

full tree 52

Functional API 272

functional model

selecting 276

functional requirements 6

G

Garbage-in, Garbage-out (GIGO) 470

gating mechanism 330

Generative Adversarial Networks (GANs) 281

Generative Pre-trained Transformer (GPT) 366

global explainability strategy 461

Google Cloud Platform (GCP) 7

gradient boosting

implementing, with XGBoost algorithm 218, 220

gradient boost regression algorithm 243

using, for regressors challenge 244

gradient descent

defining 260-262

graphs

ego-centered network 114

representations 112, 113

types 113, 114

graph theory 112

graph traversals 125

depth-first search (DFS) 130-132

greedy algorithms 92

characteristics 92

usage, conditions 92, 93

using 98, 99

GRU 329, 330

hidden cell, updating 331, 332

running, for multiple timesteps 332

update gate 330, 331

update gate, implementing 331

GZIP compression 405

H

hidden cell

updating 331, 332

hierarchical clustering 161, 162

algorithm, coding 162

historical database 74-76

historical dataset

exploring 235, 236

homophily principle 133

Huffman

versus dictionary-based compression LZ77 403

example 399

implementing, in Python 399-402

Huffman tree 399

human brain

axon 251

dendrites 251

synapse 251

hybrid recommendation engines 374, 378

recommendations, generating 380

recommendation system, evolving 381

reference vectors, generating 379

similarity matrix, generating 379

hybrid recommendation systems 382

double-edged sword of social influence 382, 383

hyperbolic tangent (tanh) function 268, 269

hyperparameters

defining 272

I

information bottleneck 353

in-memory processing 454

inner loop 60

Insertion Operations 41

insertion sort algorithm 61

performance analysis 62

intercluster distance 150

interpolation search 73, 74

performance analysis 74

Interquartile Range (IQR) method 156

intracluster distance 150

Inverse Document Frequency (IDF) 300

itemset 174

K

Keras 270

architecture 271

backend engines 270

hyperparameters 272

low-level layers, of deep learning stack 271

Microsoft Cognitive Toolkit (CNTK) 271

model, defining 272-276

reference link 270

TensorFlow 271

Theona 271

kernel 280

k-means clustering algorithm 155

coding 158, 160

hierarchical clustering 161

initialization 155

limitation 160

logic 155

running 156, 157

stop condition 157

L

labeled data 193

Language models (LMs) 364

Large Language Models (LLMs) 345, 346, 364, 365

deep model, using 367

wide model, using 367

large-scale algorithms 438

characterizing 439

fault tolerance 440

parallelism 440

performant infrastructure, characterizing 439

processing, with Apache Spark 454, 455

using, in Cloud computing 455

Last In, First Out (LIFO) principle 45, 130

layered deep learning architectures 255

intuition, developing for hidden layers 256, 257

mathematical foundation of neural networks 258, 259

optimal number of hidden layers 257

Leaky ReLU activation function 267

default value 267

parametric ReLU 268

randomized ReLU 268

lemmatization 296, 297

lift 178

linear discriminant analysis (LDA) 167

linear programming 103

capacity, planning with 104-107

constraints, specifying 104

objective function, defining 104

problem, formulating 104

linear regression 237

gradient boost regression algorithm 243

gradient boost regression algorithm, using for regressors challenge 244

multiple regression 240, 241

regression tree algorithm 242

regressors, evaluating 239, 240

simple linear regression 238

usage 242

using, for regressors challenge 241

weaknesses 242

linear search 71, 72

performance analysis 72

linear time (O(n)) complexity 19

lists 26

modifying 29

time complexity 31

using 27

load balancing

combining, with elasticity 441, 442

fault tolerance 440

local explainability strategy 461

Local Interpretable Model-Agnostic

Explanations (LIME) 24, 462

usage 463-466

logarithmic time (O(logn)) complexity 20, 21

logistic function 225

logistic regression 223

assumption 224

cost function 225

for classifiers challenge 226, 227

loss function 225

need for 226

relationship, establishing 224

Long Short-Term Memory (LSTM) 332, 333, 364

candidate cell state 334

cell state 333

forget gate 333

hidden state 333

memory state, calculating 335

output gate 335, 336

sequential models, coding 337

update gate 334, 335

working, with multiple timesteps 337

loss function 225

lossless compression techniques 398

advanced lossless compression formats 404

LSTM, sequential models

data, preparing 339

incorrect predictions, viewing 343

model, creating 339, 341

model, training 342

LZO compression 404

M

machine learning algorithm 460

machine learning explainability 461

strategies, presenting 461, 462

Machine Learning (ML) 148, 252

machine learning model, security concerns 430

data encryption 432-435

masquerading, avoidance 432

MITM attacks 430, 431

model encryption 432-435

Manhattan distance 152

Man-in-the-Middle (MITM) attacks 430, 431

preventing 431, 432

many-to-many sequence models 316, 317

many-to-one sequence models 315, 316

market basket analysis 174

market segmentation 166

Matplotlib 9

reference link 9

matrices 43

Matrix factorization methods 382

matrix operations 42

max pooling 281

MD5 420, 421

and SHA, selecting between 422

using 421

Mean Absolute Error (MAE) 390

Mean Squared Error (MSE) 242

Measure Of Similarity (MOS) 284

memorization 369

memory state

calculating 335

merge sort algorithm 63

pseudocode overview 64

Python implementation 64

merging phase 63

Microsoft Cognitive Toolkit (CNTK) 271

reference link 271

Miles per Gallon (MPG) 235

model architecture 349

Modified National Institute of Standards and Technology (MNIST) 348

reference link 348

multilayer neural network 255

hidden layer 256

input layer 255

output layer 256

multilayer perceptron 255

multiple regression 240, 241

multi-resource processing

strategizing 442-444

N

Naive Bayes algorithm 230

Naive Bayes' theorem 230

addition rules for OR events 232

for classifiers challenge 233

general multiplication rule 232

multiplication rules for AND events 231

probabilities, calculating 231

National Research Council (NRC) 410

natural language processing (NLP) 55, 289, 290, 312, 346, 455

applications 309

exploring 365

negative outcomes

scoring 136

network analysis 112

network analysis theory 115

centrality measures 119

centrality metrics, calculating with Python 122

shortest path 116

Social Network Analysis (SNA) 125

networkx Python package

reference link 113

neural network 249, 252

activation function 260

anatomy 259

attention mechanism 353

basic idea 353

convolution neural networks (CNNs) 279

cost function 259

evolution 250, 361, 362

example 354

fraud detection case study 283

Generative Adversarial Network (GANs) 281

history 250, 251

input data 260

intuition 254, 255

layered deep learning architectures 255

layers 259

loss function 259

optimizer 259

output 363

perceptrons 252

Python code 362, 363

tools and frameworks 270

training 259

transformer architecture 362

types 279

weights 260

NLP terminology 290

corpus 290

Named Entity Recognition (NER) 291

normalization 291

stemming and lemmatization 291

stop words 291

tokenization 291

Non-Deterministic Polynomial (NP) 83

versus NP-complete 85

versus NP-hard 85

versus polynomial (P) 85

non-fraud (NF) 133

non-functional requirements 6

NP-complete 84

NP-hard 85

versus polynomial (P) 85

NumPy 9

reference link 9

O

one-hop neighbors 114

one-to-many sequence models 313-315

characteristics 314

ordered tree 52

outer loop 60

output gate 335, 336

overfitting 209-211

P

PageRank algorithm

implementing 100-103

presenting 100

problem, defining 100

pandas 9

reference link 9

Series-based data structures 38

parallel computing

GPU architectures 447

limitations 444

Parallelism 440

partitions 452

passes 60

perceptrons 252

perfect tree 52

performance estimation, algorithm

average case 16

best case 15

worst case 16

performant infrastructure, characterizing for large-scale algorithm

elasticity 439

Personally Identifiable Information (PII) 411

Pip Installs Python 7

plain text 411

polynomial algorithm 82

polynominal (P) 83

pooling 280

advantages 280

average pooling 281

max pooling 281

practical application area 383

Amazon recommendation system 384

data-driven recommendation 383

precision 202, 203

precision Trade-off 204-208

principal component analysis (PCA) 167-171

association rules mining 172

examples 173

limitations 172

market basket analysis 173, 174

Public Key Infrastructure (PKI) 427, 428

punctuation removal 294

PyPI 7

Python 7

data cleaning 297-299

Huffman coding, implementing 399-402

reference link 7

used, for calculating centrality metrics 122

variables, swapping 56

Python built-in data types

DataFrames, using 38

dictionaries 33, 34

exploring 26

lists 26

lists, using 27

matrices 42

series, using 38

set 35-37

time complexity, of dictionaries 35

tuples 31, 32

Python built-in data types, lists

elements, adding with append() 29

elements, adding with pop() 30

elements, removing with pop() 30

iteration 29

list indexing 27

list slicing 27, 28

modifying 29

negative indexing 28

nesting 28

range function 30

time complexity 31

Python built-in data types, matrices

Big O notation 43

matrices 43

matrix operations 42

Python Notebook

using 9

Python packages 7, 8

Q

quadratic time (O(n2)) complexity 19, 20

queues 47, 49

time complexity 49

usage 49

R

Random Forest algorithm 220, 221

using, for classifiers challenge 222, 223

versus ensemble boosting 221

randomized algorithm 22

range function 30

ranking rules 176

recall 202, 203

recall Trade-off 204-208

Receiving Operating Curve (ROC) 207

recommendation engine

correlation 389

creating 385

data, loading 385, 386

data, merging 386

descriptive analysis 387

framework, setting up 385

model, evaluating 390

movies, correlating 389

structuring 387

testing 388, 389

user feedback, retraining 390

recommendation system 374

cold start problem 381

collaborative filtering engines 376

content-based recommendation engines 375

data sparsity problem 382

hybrid recommendation engines 378

limitations 381

metadata requisites 382

types 374

Recurrent Neural Network (RNN) 318, 352

activation function 320-322

architecture 318

Backpropagation through time (BPTT) 325, 326

limitations 327-329

loss, computing 324, 325

output, calculating for each timestep 324

predicting with 326, 327

training, at first timestep 320

training, for whole sequence 322, 323

recursive function 91

regression algorithm 234, 245

regression tree algorithm 242

using, for regressors challenge 243

regressors 193

evaluating 239, 240

versus classifiers 193

regressors challenge

gradient boost regression algorithm, using 244

linear regression, using 241

presenting 235

problem statement 235

regression tree algorithm, using 243

regular RNNs 360

ReLU activation function 266

Leaky ReLU 267

Resilient Distributed Datasets (RDDs) 452

restaurant review sentiment analysis, case study 306

dataset, loading 306, 307

libraries, importing 306, 307

results, analyzing 308

text data, converting into numerical features 307

text data, preprocessing 307

RNNs, architecture

input variable, characteristics 319, 320

memory cell and hidden state 318, 319

RNNs, for whole sequence

weight parameter matrix, combining 323, 324

Root Mean Square Error (RMSE) 240, 390

Rotation 13 (ROT13) 416

S

scalability 86, 87

scaling 199

scikit-learn 9

reference link 9

SciPy ecosystem 8

SciPy ecosystem, packages

Matplotlib 9

NumPy 9

pandas 9

scikit-learn 9

searching algorithm 70

binary search 72

linear search 71, 72

Secure Hashing Algorithm (SHA) 421, 422

Secure Sockets Layer (SSL)/Transport Layer Security (TLS) 425

security requirements 411

data sensitivity 413

entities, identifying 412

security goals, establishing 412

Selection Operations 41

selection sort algorithm 68, 69

performance analysis 69

self-attention 357

attention weights 358

bidirectional RNNs 359

regular RNNs 360

thought vector 360

training, versus inference 360, 361

sensitive data 413

Sequence-to-Sequence (Seq2Seq) models 316, 345, 346, 351

encoder 352

information bottleneck 353

thought vector 352

tokens 352

writer 352

Sequential API 272

sequential data 312, 313

types 312

types, scenarios 312, 313

sequential data, types

Spatial-Temporal Data 312

Textual Data 312

Time Series Data 312

sequential model

coding 337

data representation 317

dataset, loading 338, 339

selecting 276

types 313

sequential model, types

many-to-many sequential models 316, 317

many-to-one sequential models 315, 316

one-to-many sequence models 313-315

series

using 38

sets 35, 37

need for 37

time complexity analysis 37

shell sort algorithm 66, 68

performance analysis 68

shortest path 116

density 118

neighborhood, creating 116

triangles 117

Siamese neural networks 283

using 283-287

sigmoid function 225, 264, 265

similarity scores

interpreting 305

simple fraud analytics

conducting 135, 136

simple graph 113

simple linear regression 238

simple neural network 256

snappy compression 404

Social Network Analysis (SNA) 125

softmax function 269

sort algorithm

selecting 70

sorting algorithms 56

bubble sort algorithm 57

insertion sort algorithm 61

merge sort algorithm 63

shell sort algorithm 66, 68

variables, swapping in Python 56

space complexity analysis 13

Spatial-Temporal Data 312

Special Interest Group on Knowledge Discovery (SIGKDD) 24

Speedup 445

splitting phase 63

stacks 45-47

practical example 47

time complexity 47

usage 49

stemming 296, 297

step function 264

stopword removal 296

subset, of DataFrame

column selection 40

row selection 40, 41

time complexity analysis, for sets 41, 42

substitution-based ciphers

caesar cipher 414, 415

cryptanalysis 416

presenting 414

ROT13 416

supercomputers 442

supervised machine learning 188, 189

classifiers, versus regressors 193

conditions, enabling 192

formulating 189-192

support measure 177

Support Vector Machines (SVMs) 187

support vectors 228

SVM algorithm 227-229

for classifiers challenge 229

Naive Bayes algorithm 230

symbol efficiency 355

symmetric encryption

advantages 424

coding 423, 424

issues 424

using 423

T

targeted advertisements 167

Tay Twitter AI bot

failure 459

TensorFlow 271, 276, 277

3D tensor 277

basic concepts, presenting 277, 278

matrix 277

rank 277

scalar 277

Tensor mathematics 278, 279

tensors 277

URL 271

vector 277

Tensor mathematics 278, 279

tensors 271

Term Document Matrix (TDX) 299

Term Frequency-Inverse Document Frequency (TF-IDF) 300

summary 302

using 300, 301

Term Frequency (TF) 300

test document 284

Testing with Concept Activation Vectors (TCAV) 462

text preprocessing in NLP 292

cleaning data 294

tokenization 292, 293

text preprocessing techniques, of NLP

language modeling 291

machine translation 291

sentiment analysis 291

word embeddings 291

Textual Data 312

Theona 271

reference link 271

thought vector 352, 360

time complexity analysis 13-15

Time Series Data 312

TLS handshake 425

tokenization 292, 293

tokens 438

top-secret data 413

transfer learning 281

examples 282

using 281

transformer architecture 346, 361

Transmission Control Protocol/Internet Protocol (TCP/IP) 413

transnational data 174

transposition 417

transposition ciphers 417

Traveling Salesperson Problem (TSP) 79

brute-force strategy, using 94-97

greedy algorithm, using 98, 99

solving 93, 94

strategies, comparing 99

tree 50

practical examples 52

terminology 50, 51

types 51, 52

true document 284

True Positive Rate (TPR) 207

tuples 31, 32

time complexity 32, 33

two-hop neighbors 114

U

undirected graph 113

unidirectional RNNs 328, 329

Universal Language Model Fine-Tuning (ULMFiT) 365

unlabeled data 193

unsupervised learning 143, 144

for marketing segmentation 149

practical examples 149

research trends 148, 149

unsupervised learning, in data-mining lifecycle 144, 145

business 145

data 146

data preparation 146

deployment 147, 148

evaluation 147

modeling 146, 147

update gate 330-335

implementing 331

V

variable-length code 402

variables

swapping, in Python 56

variety 10

vector

time complexity 44

velocity 10

Virtual Private Cloud (VPC) 412

volume 10

W

watchtower fraud analytics methodology

degree of suspicion (DOS) 137-139

negative outcomes, scoring 136

presenting 136

weakest link

significance 410, 411

weather prediction

example 245-248

weighted graph 113

white box algorithm 460

versus black box algorithm 460

white space removal 296

wide model

architecture 367-69

using, to create LLMs 367

Word2Vec

advantages 305

disadvantages 306

used, for implementing word embedding 303, 304

Word2Vec() functions

min_count 304

sentences 304

size 304

window 304

workers 304

word embedding 291, 302

implementing, with Word2Vec 303, 304

similarity scores, interpreting 305

writer 352

X

XGBoost algorithm

used, for implementing gradient boosting 218

Download a free PDF copy of this book

Thanks for purchasing this book!

Do you like to read on the go but are unable to carry your print books everywhere?Is your eBook purchase not compatible with the device of your choice?

Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.

Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application. 

The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily

Follow these simple steps to get the benefits:

  1. Scan the QR code or visit the link below

    https://packt.link/free-ebook/9781803247762

  2. Submit your proof of purchase
  3. That’s it! We’ll send your free PDF and other benefits to your email directly
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.133.138