If the coecient = 0, then it means that the input has no impact on the model that is, 0*x = 0.
This is an important factor because sometimes, especially in the regularization methods, the
coecients are shrunk to zero.
Classification
Classification is required when we are dealing with a discrete output. So in simple words,
whenever you want an answer which deals with a finite or fixed set of results, then you would
require classification. For instance, for a web application to detect spam emails, it is faced with
two likely outcomes; any incoming email can be either authentic or spam. This type of classifica-
tion is also known as binary classification.
ConvolutionConvolutionInput
Feature learning Classification
Flatten Softmax
Bioycle
Car
Truck
Van
Fully
Connected
PoolingPooling + Relu+ Relu
Multi-label classification takes other useful factors into account and can be used for audio
and text analysis, image grouping, and user segmentation.
Classification problems deal with “observed” set of values. Depending upon the number
of input, a classification model performs a prediction for multiple outcomes. While under-
standing classification, the following jargon is used extensively.
Classifier: The classification algorithm which applies the mapping of a given input to
certain grouping.
Classification model: A model attempts to extract an outcome from the given training.
Itestimates the labels of the fresh data.
Feature: An atomic measurable metric from a given phenomenon (which is being observed).
Multi-class classification: An extension of binary classification. In this type of classifica-
tion, a sample is defined with any single label (target). For instance, a sport can either be
basketball or baseball but not both in the same time interval.
Multi-label classification: Classification in which samples are assigned with multiple
target labels. For instance, a website can have a blog for machine learning, IoT, and AI in
the same time interval.
To build any classification model, you would first have to initialize a classifier. Then, you
would be required to train that classifier and lastly, you have to generate results for the observed
values of x for an estimation or prediction of the label y.
To understand further, let’s look over some classification algorithms.
Chapter 10 Data Analytics and Machine Learning for IoT 257
Internet_of_Things_CH10_pp249-270.indd 257 9/3/2019 10:15:57 AM
K-nearest Neighbor (KNN)
KNN is a powerful and multi-purpose classifier which is useful for measuring the eciency
of more intricate classifiers such as SVM and ANN. At first glance, KNN sounds too simplistic
in comparison to other ML algorithms; however, in reality, it is robust enough to beat many
renowned classifiers. Perhaps, this is why it is used so heavily in genetics, data compression, and
forecasting. So what exactly is KNN? During the course of its explanation, we would be required
to use certain notations and definitions.
We have a variable “x” which represents our feature. It is also known as a predictor
attribute.
We have a variable “y” which represents the target which requires to be predicted. It is
also known as class or label.
As KNN is part of the classification, thus it also belongs to supervised learning. What this
means is that it addresses volumes of data which contain training observations. These obser-
vations include the values of x and y where KNN has to reflect their relationship clearly. Techni-
cally, the objective is to comprehend a function h: X Y so that for any unveiled observation x,
h(h) is able to easily decode a prediction for the appropriate output y.
KNN also falls under the categories of non-parametric and instance-based. The term
non-parametric refers to its non-explicit assumption related to the function h. Thus, it does
not commit the common mistake of improper distribution of data. For instance, if we pick a
Gaussian learning model despite having most of our data in the non-Gaussian form, then it is
expected that the algorithm would fail to achieve accuracy.
Instance-based refers to the algorithm’s non-requirement of explicit learning of a model.
So how does it learn? Well, it attempts to learn from the training instances—also called
knowledge to work in the prediction pipeline. Additionally, this also means that the algo-
rithm would only utilize the training instances for responding with an answer when it gets a
relevant query.
It is important to understand that KNN’s minimal training phase consumes a signifi-
cant amount of memory. This is because we have a large dataset in addition to the processing
constraints that occur during the test time as evident by an observation’s run-down for the
complete dataset. Obviously, such constraints are negative, as quicker responses are desirable.
Now let’s understand the working of KNN. KNN basically focuses to create a majority
vote among the most common and identical instances for a given observation which has to
be “unseen”. Likewise, it is defined through a metric, distance. Distance refers to the interval
between any two points of data. One of the common formulas for this is the Euclidean
distance.
d(x, x) = (x1 x1)2 + (x2 x2)2 ++ (xn xn)2
However, it is not a hard and fast rule to choose Euclidean. Considering the nature of the
problem, you may also choose Chebyshev, Hamming, or Manhattan distance.
To understand further, provided you have an integer which is positive, let’s represent it by
K. The unseen observation can be represented by an x while the metric for similarity is shown
by d. Now a KNN classifier works through two approaches.
It goes through the whole dataset and processes x and d for all the training observations.
Now, we will point out the K points from the training data that are the nearest to x in the
set A. Bear in mind, that K is commonly defined in an odd value to avoid a tie.
258 Internet of Things
Internet_of_Things_CH10_pp249-270.indd 258 9/3/2019 10:15:57 AM
The classifier then attempts an estimation of each class’s conditional probability .i.e. all
the points’ fraction in the set A for the provided labels of class. Remember that I(x) serves
as an indicator function. This function generates a value of 1 whenever the argument of
x equals to true. Alternatively, it will generate a 0 value for other scenarios.
P(y = j|X = x) = 1K ∑iAI (y(i) = j)
Lastly, the input “x” is assigned for the class that has the highest probability.
What are the Pros and Cons of using K-nearest neighbor (KNN)?
Flash Question
Quick Challenge
Improve K-nearest neighbor (KNN) by using weighed voting.
Random Forest
Random forest is one of the most common and useful ML algorithms that fall into the category
of supervised learning. It is used in both regression and classification problems. The name of
the algorithm matches its working—it actually generates a forest with trees. In the real world, a
forest which has a greater number of trees looks better, similarly in this algorithm; more trees
are seen as a measure of better accuracy.
The generated forest is random and is composed of decision tree components. These
trees are often trained through the utilization of the bagging method. It is said that bagging is
useful for providing a boost to the overall result. To explain simply, random forest creates and
combines decisions trees for getting robust and precise forecasting.
It is an adjustable and easy ML algorithm known for generating accurate results, even
without the use of hyper-parameter tuning.
Random forest is not too dissimilar to a decision tree and contains similar hyperparame-
ters. Luckily, its classifier class can be used for convenience. Additionally, the algorithm infuses
more randomness in its working during the process of tree growth. During operation, it looks
for the best feature in a random collection of features. As a result, more diversity is acquired in
the model. This is dierent than other algorithms where the most important feature is searched
when a node is split.
So remember, that random forest only processes a random subset from the features while
attempting to split a node. More randomness in the trees can be achieved by the use of random
thresholds. These thresholds are used with each feature, unlike decision trees where the focus is
on the use of most suitable thresholds.
To understand the random forest, suppose we have a person, Will. Will is looking to go
on a vacation which requires him to finalize his destinations. Thus, he begins asking people
for advice and turns to his friend, Jacob. Jacob inquires Will about his past vacation history,
particularly those places that he liked the most. This approach is a classic example of a decision
tree because Jacob generated a set of conditions to reach a decision based on Will’s input.
Chapter 10 Data Analytics and Machine Learning for IoT 259
Internet_of_Things_CH10_pp249-270.indd 259 9/3/2019 10:15:57 AM
However, Will is not satisfied with Jacob’s answers. He wants more suggestions and
recommendations. Therefore, he contacts more friends and asks for their advice. Will’s other
friends also use Jacob’s approach and ask Will a list of questions to provide the best advice.
After going through the suggestions of all his friends, Will picks the most repeated location and
finalizes it for his vacation. Now, this complete approach was what we do in random forests.
In a random forest, it is not too complex to calculate the relative importance of a predic-
tion’s feature. Python’s ML library scikit-learn has designed a productive tool for this purpose
which is used for calculating the importance of features. This analysis considers tree nodes that
employ a feature and how much of their impurity is decreased in the forest. When after training,
a score is generated for all the features; it performs scaling on the results, thereby ensuring that
all the importance sum is equal to zero.
Feature importance is valuable because it can assist you in making a decision for dropping
any feature. A feature is dropped when it is not providing any contribution to the prediction.
Bear in mind that in ML, more features may lead to an over-fitting issue.
To speed up the pace of the model or boosting the prediction process, the random forest
uses hyper parameters. Let’s go over some of the hyper parameters from scikit-learn.
The “n_jobs” hyperparameter informs the engine about the processor limit for usage.
Avalue of “1” points to no limit. On the other hand, a value of 1 means that no more
than one processor is allowed for use.
The “random_state”hyperparameter is able to transform the output of the model into a
replicable output. The model is bound to generate identical results if it is fed with iden-
tical training data, hyperparameters, and a fixed random_state value.
In the end, we have “oob_score” that is basically a cross-validation method. This
sampling works by, using almost one-third of data for the evaluation of performance.
The “n_estimators” hyperparameter refers to the total figure of trees which the algo-
rithm is going to create before assessing prediction averages and maximum voting.
Usually, a bigger tree figure slows the performance but brings stability in predictions.
The “max_features” hyperparameter takes knowledge of the max digit value of features
which the algorithm can allow to run in a single tree.
Lastly, there is the “min_sample_leaf” hyperparameter. It assesses the least leaf limit,
which is needed for splitting any internal node.
To create a random forest, you would have to go through the following steps.
1. In the given total “m” features, pick any “k” features randomly.
2. For the “k” features, perform a calculation for the node “d” by the use of split point.
3. By using the best split method, create daughter nodes from the node.
4. Keep performing the above steps untill the requirement for the number of nodes “l” is met.
5. Generate forest with the repetition of the above steps to produce “n” tree samples.
For executing predictions, consider the following pseudocode.
1. Use the “test features” and utilize the set of rules for every decision tree (which was gener-
ated randomly) to predict the result and save the result. This result is referred to as a target.
2. Perform a calculation of votes for all the predicted targets.
3. The predicted targets having a higher number of votes have to be considered as the final
prediction of the algorithm.
260 Internet of Things
Internet_of_Things_CH10_pp249-270.indd 260 9/3/2019 10:15:57 AM
What are the benefits of using Random Forest?
Flash Question
Convolutional Neural Networks
Convolutional neural networks are commonly used for the categorization of images .i.e. what
they are able to see. Then, they perform clustering on them through common factors and finally
apply the recognition of objects in the scenes. CNN is used for the identification of diseases
(tumors), street signs, individuals, faces, and other visual information.
For the hand-written information, CNN uses OCR or optical character recognition which
applies digitization to textual information. As a result, it facilitates natural language processing
for the transcription of symbols. CNN is also useful to work with sound where its visuals are
provided via a spectrogram. In recent times, CNN has been highly valuable for graph convolu-
tional networks and text analytics.
CNN is one of the core success factors behind the empowerment of ML in real-world appli-
cations. It is aiding computer vision via use in drones, medical diagnoses, robotics, security, and
self-driving cars.
The meaning of convolutional is to roll together. Mathematically, a convolution is an integer
which assesses the overlapping between any two functions that happen to pass over each. The
convolution is an approach in which two functions can be mixed through multiplication.
The processing of CNN’s for images is taken as tensors. For those who do not know,
tensors are just multidimensional matrices which store numbers in their content. To under-
stand, suppose a scalar value—(it is just a simple number) like [5]. A vector can store multiple
values like [2,3,4]. A matrix itself is a rectangular-shaped grid which possesses columns and
numbers—just like a normal spreadsheet.
Geometrically speaking, a scalar has a zero dimension. A vector is a line with one dimen-
sion. On the other hand, a matrix lies on the two-dimensional plane. A stack of matrices lies on
a cube of three dimensions. Now, if all the elements of these matrices have feature maps, then
we have one more dimension: the fourth dimension. A 2×2 matrix looks like this.
[4 3]
[9 6]
A tensor contains dimensions which are beyond the 2nd-dimensional plane. Tensors are
generated through arrays which are nested in other arrays. This nesting may continue indefi-
nitely. CNNs use 4-D tensors. The dimensionality of a tensor is also referred to as its order. For
example, a tensor with an order of 3 will lies on the third-dimensional plane.
For images, height and weight are basic concepts—everyone gets it. However, the depth
part is a little tricky. Depth represents the encoding of color. For instance, in RGB encoding, an
image is generated which is three layers deep. These layers are referred to as a channel. Convo-
lution helps to generate a stack involving feature maps. This means that in CNN, images are not
mapped to the two-dimensional plane. Instead, they exist on the four-dimensional plane.
As you might have understood by now, CNN do not perceive images in the same way
as humans do. Hence, it is necessary to understand the concept of the processing of images
through a convolutional network.
Chapter 10 Data Analytics and Machine Learning for IoT 261
Internet_of_Things_CH10_pp249-270.indd 261 9/3/2019 10:15:58 AM
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.161.222