Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Previous Chapter

1. Introduction

Chanchal Chatterjee

Adaptive Machine Learning Algorithms with Python

Solve Data Analytics and Machine Learning Problems on Edge Devices

Logo of the publisher

Chanchal Chatterjee

San Jose, CA, USA

ISBN 978-1-4842-8016-4e-ISBN 978-1-4842-8017-1

https://doi.org/10.1007/978-1-4842-8017-1

Apress Standard

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Apress imprint is published by the registered company APress Media, LLC part of Springer Nature.

The registered company address is: 1 New York Plaza, New York, NY 10004, U.S.A.

I dedicate this book to my father, Basudev Chatterjee, and all my teachers and mentors who have guided and inspired me.

Preface

This book presents several categories of streaming data problems that have significant value in machine learning, data visualization, and data analytics. It offers many adaptive algorithms for solving these problems on streaming data vectors or matrices. Complex neural network-based applications are commonplace and computing power is growing exponentially, so why do we need adaptive computation?

Adaptive algorithms are critical in environments where the data volume is large, data has high dimensions, data is time-varying and has changing underlying statistics, and we do not have sufficient storage, computing power, and bandwidth to process the data with low latency. One such environment is computation on edge devices.

Due to the rapid proliferation of billions of devices at the cellular edge and the exponential growth of machine learning and data analytics applications on these devices, there is an urgent need to manage the following on these devices:

Power usage for computation at scale
Non-stationarity of inputs and drift of the incoming data
Latency of computation on devices
Memories and bandwidth of devices

The 2021 Gartner report on Edge computation [Stratus Technologies, 2021] suggests that device-based computation growth propelled by the adoption of cloud and 5G will require us to prioritize and build a framework for edge computation.

In these environments are the following constraints:

The data cannot be batched immediately and needs to be used instantaneously. We have a streaming sequence of vectors or matrices as inputs to the algorithms.
The data changes with time. In other words, the data is non-stationary, causing significant drift of input features whereby the machine learning models are no longer effective over time.
The data volume and dimensionality are large, and we do not have the device memory, bandwidth, or power to store or upload the data to the cloud.

In these applications, we use adaptive algorithms to manage the device’s power, memory, and bandwidth so that we can maintain accuracy of the pretrained models. Some use cases are the following:

1.
Calculate feature drift of incoming data and detect training-serving skew [Kaz et al. 2021] ahead of time.
2.
Adapt to incoming data drift and calculate features that best fit the data.
3.
Calculate anomalies in incoming data so that good, clean data is used by the models.
4.
Compress incoming data into features for use in new model creation.

In Chapter 8, I present solutions to these problems with adaptive algorithms discussed with real-world data.

Detecting Feature Drift

See the example where the real-time data [Vinicius Souza et al. 2020] has a gradual drift of features. In real-time environments there are changes in the underlying distribution of the observed data. These changes in the statistical properties of the data are called drift. When the changes in statistical properties are smooth, it is called gradual drift. Figure 1 shows a slow change in the magnitude of the components of a multivariate data over time, showing a gradual drift.

It’s important to detect these drift components early in the process so that we can update the machine learning model to maintain performance. Note that the baseline statistics of the data is not known ahead of time.

Figure 1
Data components show gradual drift over time

I used an adaptive algorithm (see Chapter 5) to compute the principal components [principal component analysis, Wikipedia] of the data and derive a metric from them. Figure 2 shows that the metric does not converge to its statistical value (ideally 1) and diverges towards 0. This detects the feature drift quickly so that the edge device can update the machine learning model.

Figure 2
An adaptive principal component-based metric detects drift in data early in the real-time process

The downward slope of the detection metric in the graph indicates the gradual drift of the features.

Adapting to Drift

In another type of drift, the data changes its statistical properties abruptly. Figure 3 shows simulated multi-dimensional data that abruptly changes to a different underlying statistic after 500 samples.

Figure 3
Simulated data that abruptly changes statistical properties after 500 samples

The adaptive algorithms help us adapt to this abrupt change and recalculate the underlying statistics, in this case, the first two principal eigenvectors of the data (see Chapter 6). The ideal values in Figure 4 are 1. As the data changes abruptly after 500 samples, the value falls and quickly recovers back to 1.

Figure 4
Principal eigenvectors adapt to abruptly changing data

My Approach

Adaptive Algorithms and Best Practices

In this book, I offer over 50 examples of adaptive algorithms to solve real-world problems. I also offer best practices to select the right algorithms for different use cases. I formulate these problems as matrix computations, where the underlying matrices are unknown. I assume that the entire data is not available at once. Instead, I have a sequence of random matrices or vectors from which I compute the matrix functions without knowing the matrices. The matrix functions are computed adaptively as each sample is presented to the algorithms.

1.
My algorithms process each incoming sample x_k such that at any instant k all of the currently available data is taken into consideration.
2.
For each sample, the algorithm estimates the desired matrix functions, and its estimates have known statistical properties.
3.
The algorithms have the ability to adapt to variations in the underlying systems so that for each sample x_k I obtain the current state of the process and not an average over all samples (i.e., I can handle both stationary and non-stationary data).

Problems with Conventional Non-Adaptive Approaches

The conventional approach for evaluating matrix functions requires the computation of the matrix after collecting all samples and then the application of a numerical procedure. There are two problems with this approach.

1.
The dimension of the samples may be large so that even if all the samples are available, performing the matrix algebra may be difficult or may take a prohibitively large number of computational resources and memory.
2.
The matrix functions evaluated by conventional schemes cannot adapt to small changes in the data (e.g., a few incoming samples). If the matrix functions are estimated by conventional methods from K (finite) samples, then for each additional sample, all of the computation must be repeated.

These deficiencies make the conventional schemes inefficient for real-time applications.

Computationally Simple

My approach is to use computationally simple adaptive algorithms. For example, given a sequence of random vectors {x_k}, a well-known algorithm for the principal eigenvector evaluation uses the update rule w_k+1 = w_k+η (x_kx_k^T–w_kw_k^Tx_kx_k^Tw_k), where η is a small positive constant. In this algorithm, for each sample x_k the update procedure requires simple matrix-vector multiplications, yet the vector w_k converges quickly to the principal eigenvector of the data correlation matrix. Clearly, this can be easily implemented in CPUs on devices with low memory and power usage.

The objective of this book is to present a variety of neuromorphic [neuromorphic engineering, Wikipedia] adaptive algorithms [Wikipedia] for matrix algebra problems. Neuromorphic algorithms work by mimicking the physics of the human neural systems using networks, where activation by neurons propagate to other neurons in a cascading chain.

Matrix Functions I Solve

The matrix algebra functions that I compute adaptively include

Normalized mean, median
LU decomposition (square root), inverse square root
Eigenvector/eigenvalue decomposition (EVD)
Generalized EVD
Singular value decomposition (SVD)
Generalized SVD

For each matrix function, I will discuss practical use cases in machine learning and data analytics and support them with experimental results.

Common Methodology to Derive Adaptive Algorithms

Another contribution of this book is the presentation of a common methodology to derive each adaptive algorithm. For each matrix function and every adaptive algorithm, I present a scalar unconstrained objective function J(W;A) whose minimizer W^* is the desired matrix function of A. From this objective function J(W;A), I derive the adaptive algorithm such as W_k + 1 = W_k − η∇_WJ(W_k; A) by using standard techniques of optimization (for example, gradient descent). I then speed up these adaptive algorithms by using statistical methods. Note that this helps practitioners create new adaptive algorithms for their use cases.

In summary, the book starts with a common framework to derive adaptive algorithms and then uses the framework for each category of streaming data problems starting with the adaptive median to complex principal components and discriminant analysis. I present practical problems in each category and derive the algorithms to solve them. The final chapter solves critical edge computation problems for time-varying, non-stationary data with minimal compute, memory, latency, and bandwidth. I also provide the code [Chatterjee GitHub] for all algorithms discussed here.

GitHub

All simulations and implementation code by chapters are published in the public GitHub:

https://github.com/cchatterj0/AdaptiveMLAlgorithms

The GitHub page contains the following items:

Python code for all chapters starting with Chapter 2
MATLAB simulation code in a separate directory
Data for all implementation code
Proofs of convergence for some of the algorithms

Acknowledgments

I want to thank my professor and mentor Vwani Roychowdhury for guiding me through my Ph.D. thesis, where I first created much of the research presented in this book. Vwani taught me how to research, write, and present this material in the many papers we wrote together. He also inspired me to continue this research and eventually write this book. I could not have done it without his inspiration, help, and guidance. I sincerely thank Vwani for being my teacher and mentor.

Table of Contents

Chapter 2: General Theories and Notations33

2.1 Introduction33

2.2 Stationary and Non-Stationary Sequences34

2.3 Use Cases for Adaptive Mean, Median, and Covariances34

Handwritten Character Recognition35

Anomaly Detection of Streaming Data36

2.4 Adaptive Mean and Covariance of Nonstationary Sequences37

2.5 Adaptive Covariance and Inverses38

2.6 Adaptive Normalized Mean Algorithm39

Variations of the Adaptive Normalized Mean Algorithm40

2.7 Adaptive Median Algorithm41

2.8 Experimental Results42

Chapter 3: Square Root and Inverse Square Root47

3.1 Introduction and Use Cases47

Various Solutions for A^½ and A^–½51

Outline of This Chapter51

3.2 Adaptive Square Root Algorithm: Method 152

Objective Function52

Adaptive Algorithm52

3.3 Adaptive Square Root Algorithm: Method 253

Objective Function53

Adaptive Algorithm54

3.4 Adaptive Square Root Algorithm: Method 354

Adaptive Algorithm54

3.5 Adaptive Inverse Square Root Algorithm: Method 155

Objective Function55

Adaptive Algorithm55

3.6 Adaptive Inverse Square Root Algorithm: Method 256

Objective Function56

Adaptive Algorithm56

3.7 Adaptive Inverse Square Root Algorithm: Method 357

Adaptive Algorithm57

3.8 Experimental Results58

Experiments for Adaptive Square Root Algorithms59

Experiments for Adaptive Inverse Square Root Algorithms60

3.9 Concluding Remarks62

Chapter 4: First Principal Eigenvector63

4.1 Introduction and Use Cases63

Outline of This Chapter65

4.2 Algorithms and Objective Functions66

Adaptive Algorithms66

Objective Functions67

4.3 OJA Algorithm68

Objective Function68

Adaptive Algorithm69

Rate of Convergence70

4.4 RQ, OJAN, and LUO Algorithms70

Objective Function70

Adaptive Algorithms71

Rate of Convergence72

4.5 IT Algorithm73

Objective Function73

Adaptive Algorithm73

Rate of Convergence74

Rate of Convergence76

Upper Bound of η_k76

4.7 Penalty Function Algorithm76

Objective Function76

Adaptive Algorithm77

Rate of Convergence78

Upper Bound of η_k78

4.8 Augmented Lagrangian 1 Algorithm78

Objective Function and Adaptive Algorithm78

Rate of Convergence79

Upper Bound of η_k80

4.9 Augmented Lagrangian 2 Algorithm80

Objective Function80

Adaptive Algorithm80

Rate of Convergence81

Upper Bound of η_k82

4.10 Summary of Algorithms82

4.11 Experimental Results83

Experiments with Various Starting Vectors w₀84

Experiments with Various Data Sets: Set 189

Experiments with Various Data Sets: Set 292

Experiments with Real-World Non-Stationary Data95

4.12 Concluding Remarks96

Chapter 5: Principal and Minor Eigenvectors101

5.1 Introduction and Use Cases101

Unified Framework104

Outline of This Chapter106

5.2 Algorithms and Objective Functions107

Summary of Objective Functions for Adaptive Algorithms107

5.3 OJA Algorithms111

OJA Homogeneous Algorithm111

OJA Deflation Algorithm112

OJA Weighted Algorithm112

OJA Algorithm Python Code113

5.4 XU Algorithms114

XU Homogeneous Algorithm114

XU Deflation Algorithm114

XU Weighted Algorithm115

XU Algorithm Python Code115

5.5 PF Algorithms116

PF Homogeneous Algorithm116

PF Deflation Algorithm117

PF Weighted Algorithm118

PF Algorithm Python Code118

5.6 AL1 Algorithms119

AL1 Homogeneous Algorithm119

AL1 Deflation Algorithm120

AL1 Weighted Algorithm121

AL1 Algorithm Python Code121

5.7 AL2 Algorithms123

AL2 Homogeneous Algorithm123

AL2 Deflation Algorithm123

AL2 Weighted Algorithm124

AL2 Algorithm Python Code125

5.8 IT Algorithms126

IT Homogeneous Function126

IT Deflation Algorithm127

IT Weighted Algorithm127

IT Algorithm Python Code128

5.9 RQ Algorithms129

RQ Homogeneous Algorithm129

RQ Deflation Algorithm130

RQ Weighted Algorithm130

RQ Algorithm Python Code131

5.10 Summary of Adaptive Eigenvector Algorithms132

5.11 Experimental Results135

5.12 Concluding Remarks144

Chapter 6: Accelerated Computation of Eigenvectors145

6.1 Introduction145

Objective Functions for Gradient-Based Adaptive PCA146

Outline of This Chapter148

6.2 Gradient Descent Algorithm149

6.3 Steepest Descent Algorithm150

Computation of ${alpha}_k^i$ for Steepest Descent152

Steepest Descent Algorithm Code153

6.4 Conjugate Direction Algorithm155

Conjugate Direction Algorithm Code156

6.5 Newton-Raphson Algorithm159

Newton-Raphson Algorithm Code161

6.6 Experimental Results163

Experiments with Stationary Data163

Experiments with Non-Stationary Data169

Comparison with State-of-the-Art Algorithms174

6.7 Concluding Remarks177

Chapter 7: Generalized Eigenvectors179

7.1 Introduction and Use Cases179

Application of GEVD in Pattern Recognition180

Application of GEVD in Signal Processing181

Methods for Generalized Eigen-Decomposition181

Outline of This Chapter182

7.2 Algorithms and Objective Functions183

Summary of Objective Functions for Adaptive GEVD Algorithms183

Summary of Generalized Eigenvector Algorithms184

7.3 OJA GEVD Algorithms187

OJA Homogeneous Algorithm187

OJA Deflation Algorithm187

OJA Weighted Algorithm188

OJA Algorithm Python Code188

7.4 XU GEVD Algorithms189

XU Homogeneous Algorithm189

XU Deflation Algorithm190

XI Weighted Algorithm190

XU Algorithm Python Code191

7.5 PF GEVD Algorithms192

PF Homogeneous Algorithm192

PF Deflation Algorithm192

PF Weighted Algorithm193

PF Algorithm Python Code193

7.6 AL1 GEVD Algorithms194

AL1 Homogeneous Algorithm194

AL1 Deflation Algorithm195

AL1 Weighted Algorithm196

AL1 Algorithm Python Code196

7.7 AL2 GEVD Algorithms198

AL2 Homogeneous Algorithm198

AL2 Deflation Algorithm198

AL2 Weighted Algorithm199

AL2 Algorithm Python Code199

7.8 IT GEVD Algorithms201

IT Homogeneous Algorithm201

IT Deflation Algorithm202

IT Weighted Algorithm202

IT Algorithm Python Code203

7.9 RQ GEVD Algorithms204

RQ Homogeneous Algorithm204

RQ Deflation Algorithm205

RQ Weighted Algorithm205

RQ Algorithm Python Code205

7.10 Experimental Results207

7.11 Concluding Remarks212

Chapter 8: Real-World Applications of Adaptive Linear Algorithms217

8.1 Detecting Feature Drift219

INSECTS-incremental_balanced_norm Dataset: Eigenvector Test219

References235

Index263

About the Author

Chanchal Chatterjee

, Ph.D., has held several leadership roles in machine learning, deep learning, and real-time analytics. He is currently leading machine learning and artificial intelligence at Google Cloud Platform, California, USA. Previously, he was the Chief Architect of EMC CTO Office where he led end-to-end deep learning and machine learning solutions for data centers, smart buildings, and smart manufacturing for leading customers. Chanchal has received several awards including an Outstanding Paper Award from the IEEE Neural Network Council for adaptive learning algorithms, recommended by MIT professor Marvin Minsky. Chanchal founded two tech startups between 2008-2013. Chanchal has 29 granted or pending patents and over 30 publications. Chanchal received M.S. and Ph.D. degrees in Electrical and Computer Engineering from Purdue University.

About the Technical Reviewer

Joos Korstanje

is a data scientist with over five years of industry experience in developing machine learning tools, a large part of which are forecasting models. He currently works at Disneyland Paris where he develops machine learning for a variety of tools.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Front Matter

Create new playlist

Sign In

Sign Up

Adaptive Machine Learning Algorithms with Python

Solve Data Analytics and Machine Learning Problems on Edge Devices

Detecting Feature Drift

Adapting to Drift

My Approach

Adaptive Algorithms and Best Practices

Problems with Conventional Non-Adaptive Approaches

Computationally Simple

Matrix Functions I Solve

Common Methodology to Derive Adaptive Algorithms

GitHub

Table of Contents for
Front Matter