Exercises

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

If the coecient = 0, then it means that the input has no impact on the model that is, 0*x = 0.

This is an important factor because sometimes, especially in the regularization methods, the

coecients are shrunk to zero.

Classiﬁcation

Classiﬁcation is required when we are dealing with a discrete output. So in simple words,

whenever you want an answer which deals with a ﬁnite or ﬁxed set of results, then you would

require classiﬁcation. For instance, for a web application to detect spam emails, it is faced with

two likely outcomes; any incoming email can be either authentic or spam. This type of classiﬁca-

tion is also known as binary classiﬁcation.

ConvolutionConvolutionInput

Feature learning Classiﬁcation

Flatten Softmax

Bioycle

Car

Truck

Van

Fully

Connected

PoolingPooling + Relu+ Relu

Multi-label classiﬁcation takes other useful factors into account and can be used for audio

and text analysis, image grouping, and user segmentation.

Classiﬁcation problems deal with “observed” set of values. Depending upon the number

of input, a classiﬁcation model performs a prediction for multiple outcomes. While under-

standing classiﬁcation, the following jargon is used extensively.

• Classiﬁer: The classiﬁcation algorithm which applies the mapping of a given input to

certain grouping.

• Classiﬁcation model: A model attempts to extract an outcome from the given training.

Itestimates the labels of the fresh data.

• Feature: An atomic measurable metric from a given phenomenon (which is being observed).

• Multi-class classiﬁcation: An extension of binary classiﬁcation. In this type of classiﬁca-

tion, a sample is deﬁned with any single label (target). For instance, a sport can either be

basketball or baseball but not both in the same time interval.

• Multi-label classiﬁcation: Classiﬁcation in which samples are assigned with multiple

target labels. For instance, a website can have a blog for machine learning, IoT, and AI in

the same time interval.

To build any classiﬁcation model, you would ﬁrst have to initialize a classiﬁer. Then, you

would be required to train that classiﬁer and lastly, you have to generate results for the observed

values of x for an estimation or prediction of the label y.

To understand further, let’s look over some classiﬁcation algorithms.

Chapter 10 Data Analytics and Machine Learning for IoT 257

Internet_of_Things_CH10_pp249-270.indd 257 9/3/2019 10:15:57 AM

K-nearest Neighbor (KNN)

KNN is a powerful and multi-purpose classiﬁer which is useful for measuring the eciency

of more intricate classiﬁers such as SVM and ANN. At ﬁrst glance, KNN sounds too simplistic

in comparison to other ML algorithms; however, in reality, it is robust enough to beat many

renowned classiﬁers. Perhaps, this is why it is used so heavily in genetics, data compression, and

forecasting. So what exactly is KNN? During the course of its explanation, we would be required

to use certain notations and deﬁnitions.

• We have a variable “x” which represents our feature. It is also known as a predictor

attribute.

• We have a variable “y” which represents the target which requires to be predicted. It is

also known as class or label.

As KNN is part of the classiﬁcation, thus it also belongs to supervised learning. What this

means is that it addresses volumes of data which contain training observations. These obser-

vations include the values of x and y where KNN has to reﬂect their relationship clearly. Techni-

cally, the objective is to comprehend a function h: X → Y so that for any unveiled observation x,

h(h) is able to easily decode a prediction for the appropriate output y.

KNN also falls under the categories of non-parametric and instance-based. The term

non-parametric refers to its non-explicit assumption related to the function h. Thus, it does

not commit the common mistake of improper distribution of data. For instance, if we pick a

Gaussian learning model despite having most of our data in the non-Gaussian form, then it is

expected that the algorithm would fail to achieve accuracy.

Instance-based refers to the algorithm’s non-requirement of explicit learning of a model.

So how does it learn? Well, it attempts to learn from the training instances—also called

knowledge to work in the prediction pipeline. Additionally, this also means that the algo-

rithm would only utilize the training instances for responding with an answer when it gets a

relevant query.

It is important to understand that KNN’s minimal training phase consumes a signiﬁ-

cant amount of memory. This is because we have a large dataset in addition to the processing

constraints that occur during the test time as evident by an observation’s run-down for the

complete dataset. Obviously, such constraints are negative, as quicker responses are desirable.

Now let’s understand the working of KNN. KNN basically focuses to create a majority

vote among the most common and identical instances for a given observation which has to

be “unseen”. Likewise, it is deﬁned through a metric, distance. Distance refers to the interval

between any two points of data. One of the common formulas for this is the Euclidean

distance.

d(x, x′) = (x1 − x′1)2 + (x2 − x′2)2 + … + (xn − x′n)2

However, it is not a hard and fast rule to choose Euclidean. Considering the nature of the

problem, you may also choose Chebyshev, Hamming, or Manhattan distance.

To understand further, provided you have an integer which is positive, let’s represent it by

K. The unseen observation can be represented by an x while the metric for similarity is shown

by d. Now a KNN classiﬁer works through two approaches.

• It goes through the whole dataset and processes x and d for all the training observations.

Now, we will point out the K points from the training data that are the nearest to x in the

set A. Bear in mind, that K is commonly deﬁned in an odd value to avoid a tie.

258 Internet of Things

Internet_of_Things_CH10_pp249-270.indd 258 9/3/2019 10:15:57 AM

• The classiﬁer then attempts an estimation of each class’s conditional probability .i.e. all

the points’ fraction in the set A for the provided labels of class. Remember that I(x) serves

as an indicator function. This function generates a value of 1 whenever the argument of

x equals to true. Alternatively, it will generate a 0 value for other scenarios.

P(y = j|X = x) = 1K ∑i∈AI (y(i) = j)

Lastly, the input “x” is assigned for the class that has the highest probability.

What are the Pros and Cons of using K-nearest neighbor (KNN)?

Flash Question

Quick Challenge

Improve K-nearest neighbor (KNN) by using weighed voting.

Random Forest

Random forest is one of the most common and useful ML algorithms that fall into the category

of supervised learning. It is used in both regression and classiﬁcation problems. The name of

the algorithm matches its working—it actually generates a forest with trees. In the real world, a

forest which has a greater number of trees looks better, similarly in this algorithm; more trees

are seen as a measure of better accuracy.

The generated forest is random and is composed of decision tree components. These

trees are often trained through the utilization of the bagging method. It is said that bagging is

useful for providing a boost to the overall result. To explain simply, random forest creates and

combines decisions trees for getting robust and precise forecasting.

It is an adjustable and easy ML algorithm known for generating accurate results, even

without the use of hyper-parameter tuning.

Random forest is not too dissimilar to a decision tree and contains similar hyperparame-

ters. Luckily, its classiﬁer class can be used for convenience. Additionally, the algorithm infuses

more randomness in its working during the process of tree growth. During operation, it looks

for the best feature in a random collection of features. As a result, more diversity is acquired in

the model. This is dierent than other algorithms where the most important feature is searched

when a node is split.

So remember, that random forest only processes a random subset from the features while

attempting to split a node. More randomness in the trees can be achieved by the use of random

thresholds. These thresholds are used with each feature, unlike decision trees where the focus is

on the use of most suitable thresholds.

To understand the random forest, suppose we have a person, Will. Will is looking to go

on a vacation which requires him to ﬁnalize his destinations. Thus, he begins asking people

for advice and turns to his friend, Jacob. Jacob inquires Will about his past vacation history,

particularly those places that he liked the most. This approach is a classic example of a decision

tree because Jacob generated a set of conditions to reach a decision based on Will’s input.

Chapter 10 Data Analytics and Machine Learning for IoT 259

Internet_of_Things_CH10_pp249-270.indd 259 9/3/2019 10:15:57 AM

However, Will is not satisﬁed with Jacob’s answers. He wants more suggestions and

recommendations. Therefore, he contacts more friends and asks for their advice. Will’s other

friends also use Jacob’s approach and ask Will a list of questions to provide the best advice.

After going through the suggestions of all his friends, Will picks the most repeated location and

ﬁnalizes it for his vacation. Now, this complete approach was what we do in random forests.

In a random forest, it is not too complex to calculate the relative importance of a predic-

tion’s feature. Python’s ML library scikit-learn has designed a productive tool for this purpose

which is used for calculating the importance of features. This analysis considers tree nodes that

employ a feature and how much of their impurity is decreased in the forest. When after training,

a score is generated for all the features; it performs scaling on the results, thereby ensuring that

all the importance sum is equal to zero.

Feature importance is valuable because it can assist you in making a decision for dropping

any feature. A feature is dropped when it is not providing any contribution to the prediction.

Bear in mind that in ML, more features may lead to an over-ﬁtting issue.

To speed up the pace of the model or boosting the prediction process, the random forest

uses hyper parameters. Let’s go over some of the hyper parameters from scikit-learn.

• The “n_jobs” hyperparameter informs the engine about the processor limit for usage.

Avalue of “−1” points to no limit. On the other hand, a value of 1 means that no more

than one processor is allowed for use.

• The “random_state”hyperparameter is able to transform the output of the model into a

replicable output. The model is bound to generate identical results if it is fed with iden-

tical training data, hyperparameters, and a ﬁxed random_state value.

• In the end, we have “oob_score” that is basically a cross-validation method. This

sampling works by, using almost one-third of data for the evaluation of performance.

• The “n_estimators” hyperparameter refers to the total ﬁgure of trees which the algo-

rithm is going to create before assessing prediction averages and maximum voting.

Usually, a bigger tree ﬁgure slows the performance but brings stability in predictions.

• The “max_features” hyperparameter takes knowledge of the max digit value of features

which the algorithm can allow to run in a single tree.

• Lastly, there is the “min_sample_leaf” hyperparameter. It assesses the least leaf limit,

which is needed for splitting any internal node.

To create a random forest, you would have to go through the following steps.

1. In the given total “m” features, pick any “k” features randomly.

2. For the “k” features, perform a calculation for the node “d” by the use of split point.

3. By using the best split method, create daughter nodes from the node.

4. Keep performing the above steps untill the requirement for the number of nodes “l” is met.

5. Generate forest with the repetition of the above steps to produce “n” tree samples.

For executing predictions, consider the following pseudocode.

1. Use the “test features” and utilize the set of rules for every decision tree (which was gener-

ated randomly) to predict the result and save the result. This result is referred to as a target.

2. Perform a calculation of votes for all the predicted targets.

3. The predicted targets having a higher number of votes have to be considered as the ﬁnal

prediction of the algorithm.

260 Internet of Things

Internet_of_Things_CH10_pp249-270.indd 260 9/3/2019 10:15:57 AM

What are the benefits of using Random Forest?

Flash Question

Convolutional Neural Networks

Convolutional neural networks are commonly used for the categorization of images .i.e. what

they are able to see. Then, they perform clustering on them through common factors and ﬁnally

apply the recognition of objects in the scenes. CNN is used for the identiﬁcation of diseases

(tumors), street signs, individuals, faces, and other visual information.

For the hand-written information, CNN uses OCR or optical character recognition which

applies digitization to textual information. As a result, it facilitates natural language processing

for the transcription of symbols. CNN is also useful to work with sound where its visuals are

provided via a spectrogram. In recent times, CNN has been highly valuable for graph convolu-

tional networks and text analytics.

CNN is one of the core success factors behind the empowerment of ML in real-world appli-

cations. It is aiding computer vision via use in drones, medical diagnoses, robotics, security, and

self-driving cars.

The meaning of convolutional is to roll together. Mathematically, a convolution is an integer

which assesses the overlapping between any two functions that happen to pass over each. The

convolution is an approach in which two functions can be mixed through multiplication.

The processing of CNN’s for images is taken as tensors. For those who do not know,

tensors are just multidimensional matrices which store numbers in their content. To under-

stand, suppose a scalar value—(it is just a simple number) like [5]. A vector can store multiple

values like [2,3,4]. A matrix itself is a rectangular-shaped grid which possesses columns and

numbers—just like a normal spreadsheet.

Geometrically speaking, a scalar has a zero dimension. A vector is a line with one dimen-

sion. On the other hand, a matrix lies on the two-dimensional plane. A stack of matrices lies on

a cube of three dimensions. Now, if all the elements of these matrices have feature maps, then

we have one more dimension: the fourth dimension. A 2×2 matrix looks like this.

[4 3]

[9 6]

A tensor contains dimensions which are beyond the 2nd-dimensional plane. Tensors are

generated through arrays which are nested in other arrays. This nesting may continue indeﬁ-

nitely. CNNs use 4-D tensors. The dimensionality of a tensor is also referred to as its order. For

example, a tensor with an order of 3 will lies on the third-dimensional plane.

For images, height and weight are basic concepts—everyone gets it. However, the depth

part is a little tricky. Depth represents the encoding of color. For instance, in RGB encoding, an

image is generated which is three layers deep. These layers are referred to as a channel. Convo-

lution helps to generate a stack involving feature maps. This means that in CNN, images are not

mapped to the two-dimensional plane. Instead, they exist on the four-dimensional plane.

As you might have understood by now, CNN do not perceive images in the same way

as humans do. Hence, it is necessary to understand the concept of the processing of images

through a convolutional network.

Chapter 10 Data Analytics and Machine Learning for IoT 261

Internet_of_Things_CH10_pp249-270.indd 261 9/3/2019 10:15:58 AM

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Exercises

Create new playlist

Sign In

Sign Up

Table of Contents for
Exercises