Simple linear regression

Simple linear regression (SLR) takes linear data and applies feature scaling to it if required. Feature scaling is a method used to balance the effects of various attributes. All machine learning models are mathematical in nature, so before training the model with the data, we need to apply a few steps to make sure the predictions made are not biased.

For example, if the dataset contains three attributes (age, salary, and item_purchased[0/1]), we as humans know that the age group that is likely to visit shops is between 10 and 70, and the salary can range between 10,000 and 100,000 or higher. When making the prediction, we want to take both parameters into consideration, to know which age group with what salary is most likely to purchase the product. However, if we train the model without scaling the age and the salary to the same level, the value of the salary will overshadow the effect of age due to the large numeric difference between them. To make sure this does not happen, we apply feature scaling to the dataset to balance them out.

Another step required is data encoding, using a one-hot encoder. For example, if the dataset has a country attribute, this a categorical value, which, let's say, has three categories: Russia, US, and UK. These words do not make sense to a mathematical model. Using a one-hot encoder, we transform the dataset so it reads (id, age, salary, Russia, UK, USA, item_purchased). Now, all the customers who have purchased the product and are from Russia would have the number 1 under the column named Russia, and the number 0 under the USA and UK columns.

As an example, let's say the data initially looks as follows:

ID

Country

Age

Salary

Purchased

1

USA

32

70 K

1

2

Russia

26

40 K

1

3

UK

32

80 K

0

 

After performing the data transformations, we would get the following dataset:

ID

Russia

USA

UK

Age

Salary

Purchased

1

0

1

-

0.5

0.7

1

2

1

0

0

0.4

0.4

1

3

0

0

1

0.5

0.8

0

 

It can be seen that the dataset obtained is purely mathematical and so we can now give it to our regression model to learn from and then make predictions.

It should be noted that the input variables that help to make the prediction are called independent variables. In the preceding example, country, age, and salary are the independent variables. The output variable that defines the prediction is called the dependent variable, which is the Purchased column in our case.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.124.232