Normalizing data

Data normalization is a crucial preprocessing step in machine learning. In general, data normalization is a process that transforms multiscaled data to the same scale. Feature values in a dataset can have very different scales—for example, the height can be given in centimeters with small values, but the income can have large-value amounts. This fact has a significant impact on many machine learning algorithms. For example, if some feature values differ from values of other features several times, then this feature will dominate over others in classification algorithms based on the Euclidean distance. Some algorithms have a strong requirement for normalization of input data; an example of such an algorithm is the Support Vector Machine (SVM) algorithm. Neural networks also usually require normalized input data. Also, data normalization has an impact on optimization algorithms. For example, optimizers based on the gradient descent (GD) approach can converge much quicker if data has the same scale.

There are several methods of normalization, but from our point of view, the most popular are the standardization, the min-max, and the mean normalization methods.

Standardization  is a process of making data to have a zero mean and a standard deviation equal to 1. The formula for standardized vector is , where is an original vector, is an average value of calculated with the formula , and isthe standard deviation of calculated with the formula .

Min-max normalization or rescaling is a process of making data fit the range of [0, 1]. We can do rescaling with the following formula: 

Mean normalization is used to fit data into the range [-1, 1], so its mean becomes zero. We can use the following formula to do mean normalization: 

Consider how we can implement these normalization techniques and which machine learning framework functions can be used to calculate them.

We assume that each row of this matrixis one training sample, and the value in each column is the value of one feature of the current sample.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.91.254