How does the regression model work?

Our objective is to train a machine learning model on the dataset and then ask the model to make predictions in order to establish the salary that should be given to an employee based on their years of experience.

The example that we are considering is based on an Excel sheet. Basically, we have data from a company where we have a salary structure based on years of experience. We want our machine learning model to derive the correlation between the years of experience and the salary given. From the derived correlation, we want the model to provide future predictions and specify the modeled salary. The machine does this through simple linear regression. In simple linear regression, various lines are drawn through the given scattered data (trend lines). The idea of the trend line is it should best-fit (cut across) all the scattered data. After that, the best trend line is chosen by computing the modeled differences. This can be further explained as follows:

Continuing with the same example, let's take the case of an employee "e" who is earning a salary of 100,000 after 10 years of experience in their actual job. According to the model, however, the employee should be earning a little less than what he is actually earning, as shown by the green  + and the line beneath the green + is actually less than the line followed by the organization (the modeled salary). The green dotted line represents the difference between the actual salary and the modeled salary (~=80K). It is given by yi -yi^, where yi is actual salary and yi^ is the mode.

SLR draws all possible trend lines through your data, then computes the sum (y-y^)2 for the whole line. It then finds the minimum of the computed squares. The line with the minimum sum of the squares is considered to be the one that would best fit the data. This method is called the least squares method or the Euclidean distance methodThe least squares method is a form of mathematical regression analysis that finds the line of best fit for a dataset, providing a visual demonstration of the relationship between the data points.

The following screenshot represents the various prediction lines drawn by a regression model:

Based on the sum of squares method, the best fitting line is chosen, as shown here:

Basically, the data points plotted are not in a line, but the the actual dots are plotted symmetrically either side of the straight line, as shown here:

The following section represents the code to implement SLR:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.192.3