Assessing linear regression models

We'll proceed once again with using the lm() function to fit linear regression models to our data. For both of our data sets, we'll want to use all the input features that remain in our respective data frames. R provides us with a shorthand to write formulas that include all the columns of a data frame as features, excluding the one chosen as the output. This is done using a single period, as the following code snippets show:

> machine_model1 <- lm(PRP ~ ., data = machine_train)
> cars_model1 <- lm(Price ~ ., data = cars_train)

Training a linear regression model may be a one-line affair once we have all our data prepared, but the important work comes straight after, when we study our model in order to determine how well we did. Fortunately, we can instantly obtain some important information about our model using the summary() function. The output of this function for our CPU data set is shown here:

> summary(machine_model1)

Call:
lm(formula = PRP ~ ., data = machine_train)

Residuals:
    Min      1Q  Median      3Q     Max 
-199.29  -24.15    6.91   26.26  377.47 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -5.963e+01  8.861e+00  -6.730 2.43e-10 ***
MYCT         5.210e-02  1.885e-02   2.764 0.006335 ** 
MMIN         1.543e-02  2.025e-03   7.621 1.62e-12 ***
MMAX         5.852e-03  6.867e-04   8.522 7.68e-15 ***
CACH         5.311e-01  1.494e-01   3.555 0.000488 ***
CHMIN        7.761e-02  1.055e+00   0.074 0.941450    
CHMAX        1.498e+00  2.304e-01   6.504 8.20e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 61.31 on 172 degrees of freedom
Multiple R-squared:  0.874,	Adjusted R-squared:  0.8696 
F-statistic: 198.8 on 6 and 172 DF,  p-value: < 2.2e-16

Following a repeat of the call we made to the lm() function itself, the information provided by the summary() function is organized into three distinct sections. The first section is a summary of the model residuals, which are the errors that our model makes on the observations in the data on which it was trained. The second section is a table containing the predicted values of the model coefficients as well as the results of their significance tests. The final few lines display overall performance metrics for our model. If we repeat the same process on our cars data set, we will notice the following line in our model summary:

Coefficients: (1 not defined because of singularities)

This occurs because we still have a feature whose effect on the output is indiscernible from other features due to underlying dependencies. This phenomenon is known as aliasing. The alias() command shows the features we need to remove from the model:

> alias(cars_model1)
Model :
Price ~ Mileage + Cylinder + Doors + Cruise + Sound + Leather + 
    Buick + Cadillac + Chevy + Pontiac + Saab + Saturn + convertible + hatchback + sedan

Complete :
       (Intercept) Mileage Cylinder Doors Cruise Sound
Saturn  1           0       0        0     0      0   
       Leather Buick Cadillac Chevy Pontiac Saab convertible
Saturn  0      -1    -1       -1    -1      -1    0         
       hatchback sedan
Saturn  0         0   

As we can see, the problematic feature is the Saturn feature, so we will remove this feature and retrain the model. To exclude a feature from a linear regression model, we include it in the formula after the period and prefix it with a minus sign:

> cars_model2 <- lm(Price ~. -Saturn, data = cars_train)
> summary(cars_model2)

Call:
lm(formula = Price ~ . - Saturn, data = cars_train)

Residuals:
    Min      1Q  Median      3Q     Max 
-9324.8 -1606.7   150.5  1444.6 13461.0 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -954.1919  1071.2553  -0.891  0.37340    
Mileage        -0.1877     0.0137 -13.693  < 2e-16 ***
Cylinder     3640.5417   123.5788  29.459  < 2e-16 ***
Doors        1552.4008   284.3939   5.459 6.77e-08 ***
Cruise        330.0989   324.8880   1.016  0.30998    
Sound         388.4549   256.3885   1.515  0.13022    
Leather       851.3683   274.5213   3.101  0.00201 ** 
Buick        1104.4670   595.0681   1.856  0.06389 .  
Cadillac    13288.4889   673.6959  19.725  < 2e-16 ***
Chevy        -553.1553   468.0745  -1.182  0.23772    
Pontiac     -1450.8865   524.9950  -2.764  0.00587 ** 
Saab        12199.2093   600.4454  20.317  < 2e-16 ***
convertible 11270.4878   597.5162  18.862  < 2e-16 ***
hatchback   -6375.4970   669.6840  -9.520  < 2e-16 ***
sedan       -4441.9152   490.8347  -9.050  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2947 on 669 degrees of freedom
Multiple R-squared:  0.912,	Adjusted R-squared:  0.9101 
F-statistic: 495.1 on 14 and 669 DF,  p-value: < 2.2e-16

Residual analysis

A residual is simply the error our model makes for a particular observation. Put differently, it is the difference between the actual value of the output and our prediction:

Residual analysis

Analyzing residuals is very important when building a good regression model, as residuals reveal various aspects of our model, from violated assumptions and the quality of the fit to other problems, such as outliers. To understand the metrics in the residual summary, imagine ordering the residuals from the smallest to the largest. Besides the minimum and maximum values that occur at the extremes of this sequence, the summary shows the first and third quartiles, which are the values one quarter along the way in this sequence and three quarters, respectively. The median is the value in the middle of the sequence. The interquartile range is the portion of the sequence between the first and third quartiles, and by definition, this contains half of the data. Looking first at the residual summary from our CPU model, it is interesting to note that the first and third quartiles are quite small in value compared to the minimum and maximum value. This is a first indication that there might be a few points that have a large residual error. In an ideal scenario, our residuals will have a median of zero and will have small values for the quartiles. We can reproduce the residuals summary from the summary function by noting that the model produced by the lm() function has a residuals attribute:

> summary(cars_model2$residuals)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-9325.0 -1607.0   150.5     0.0  1445.0 13460.0 
> mean(cars_train$Price)
[1] 21320.2

Note that in the preceding example for our cars model, we need to compare the value of the residuals against the average value of the output variable, in order to get a sense of whether the residuals are large or not. Thus, the previous results show that the average selling price of a car in our training data is around $21k, and 50% of our predictions are roughly within ± $1.6k of the correct value, which seems fairly reasonable. Obviously, the residuals for our CPU model are all much smaller in the absolute value because the values of the output variable for that model, namely the published relative performance, are much smaller than the values for Price in the cars model.

In linear regression, we assume that the irreducible errors in our model are randomly distributed with a normal distribution. A diagnostic plot, known as the Quantile-Quantile plot (Q-Q plot), is useful in helping us visually gauge the extent to which this assumption holds. The key idea behind this plot is that we can compare two distributions by comparing the values at their quantiles. The quantiles of a distribution are essentially evenly spaced intervals of a random variable, such that each interval has the same probability; for example, quartiles are 4-quantiles because they split up a distribution into four equally probable parts. If the two distributions are the same, then the graph should be a plot of the line y = x. To check whether our residuals are normally distributed, we can compare their distribution against a normal distribution and see how close to the y = x line we land.

Tip

There are many other ways to check whether the model residuals are normally distributed. A good place to look is the nortest R package, which implements a number of well-known tests for normality, including the Anderson-Darling test and the Lilliefors test. In addition, the stats package contains the shapiro.test() function for performing the Shapiro-Wilk normality test.

The following code generates Q-Q plots for our two data sets:

> par(mfrow = c(2, 1))
> machine_residuals <- machine_model1$residuals
> qqnorm(machine_residuals, main = "Normal Q-Q Plot for CPU data set")
> qqline(machine_residuals)
> cars_residuals <- cars_model2$residuals
> qqnorm(cars_residuals, main = "Normal Q-Q Plot for Cars data set")
> qqline(cars_residuals)

The following diagram displays the Q-Q plots:

Residual analysis

The residuals from both models seem to lie reasonably close to the theoretical quantiles of a normal distribution, although the fit isn't perfect, as is typical with most real-world data. A second very useful diagnostic plot for a linear regression is the so-called residual plot. This is a plot of residuals against corresponding fitted values for the observations in the training data. In other words, this is a plot of the pairs (Residual analysis i, ei). There are two important properties of the residual plot that interest us in particular. Firstly, we would like to confirm our assumption of constant variance by checking whether the residuals are not larger on average for a range of fitted values and smaller in a different range. Secondly, we should verify that there isn't some sort of pattern in the residuals. If a pattern is observed, however, it may be an indication that the underlying model is nonlinear in terms of the features involved or that there are additional features missing from our model that we have not included. In fact, one way of discovering new features that might be useful for our model is to look for new features that are correlated with our model's residuals.

Residual analysis

Both plots show a slight pattern of decreasing residuals in the left part of the graph. Slightly more worrying is the fact that the variance of the residuals seems to be a little higher for higher values of both output variables, which could indicate that the errors are not homoscedastic. This is more pronounced in the second plot for the cars data sets. In the preceding two residual plots, we have also labeled some of the larger residuals (in absolute magnitude). We'll see shortly that these are potential candidates for outliers. Another way to obtain a residual plot is to use the plot() function on the model produced by the lm() function itself. This generates four diagnostic plots, including the residual plot and the Q-Q plot.

Significance tests for linear regression

After scrutinizing the residual summaries, the next thing we should focus on is the table of coefficients that our models have produced. Here, every estimated coefficient is accompanied by an additional set of numbers, as well as a number of stars or a dot at the end. At first, this may seem confusing because of the barrage of numbers, but there is a good reason why all this information is included. When we collect measurements on some data and specify a set of features to build a linear regression model, it is often the case that one or more of these features are not actually related to the output we are trying to predict. Of course, this is something we are generally not aware of beforehand when we are collecting the data. Ideally, we would want our model to not only find the best values for the coefficients that correspond to the features that our output does actually depend on, but also tell us which of the features we don't need.

One possible approach for determining whether a particular feature is needed in our model is to train two models instead of one. The second model will have all the features of the first model, excluding the specific feature whose significance we are trying to ascertain. We can then test whether the two models are different by looking at their distributions of residuals. This is actually what R does for all of the features that we have specified in each model. For each coefficient, a confidence interval is constructed for the null hypothesis that its corresponding feature is unrelated to the output variable. Specifically, for each coefficient, we consider a linear model with all the other features included, except the feature that corresponds to this coefficient. Then, we test whether adding this particular feature to the model significantly changes the distribution of residual errors, which would be evidence of a linear relationship between this feature and the output and that its coefficient should be nonzero. R's lm() function automatically runs these tests for us.

Note

In statistics, a confidence interval combines a point estimate with the precision of that estimate. This is done by specifying an interval in which the true value of the parameter that is being estimated is expected to lie under a certain degree of confidence. A 95 percent confidence interval for a parameter essentially tells us that if we were to collect 100 samples of data from the same experiment and construct a 95 percent confidence interval for the estimated parameter in each sample, the real value of the target parameter would lie within its corresponding confidence interval for 95 of these data samples. Confidence intervals that are constructed for point estimates with high variance, such as when the estimate is being made with very few data points, will tend to define a wider interval for the same degree of confidence than estimates made with low variance.

Let's look at a snapshot of the summary output for the CPU model, which shows the coefficient for the intercept and the MYCT feature in the CPU model:

              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -5.963e+01  8.861e+00  -6.730 2.43e-10 ***
MYCT         5.210e-02  1.885e-02   2.764 0.006335 ** 

Focusing on the MYCT feature for the moment, the first number in its row is the estimate of its coefficient, and this number is roughly 0.05 (5.210×10-2). The standard error is the standard deviation of this estimate, and this is given next as 0.01885. We can gauge our confidence as to whether the value of our coefficient is really zero (indicating no linear relationship for this feature) by counting the number of standard errors between zero and our coefficient estimate. To do this, we can divide our coefficient estimate by our standard error, and this is precisely the definition of the t-value, the third value in our row:

> (q <- 5.210e-02 / 1.885e-02)
[1] 2.763926

So, our MYCT coefficient is almost three standard errors away from zero, which is a fairly good indicator that this coefficient is not likely to be zero. The higher the t-value, the more likely we should be including our feature in our linear model with a nonzero coefficient. We can turn this absolute value into a probability that tells us how likely it is that our coefficient should really be zero. This probability is obtained from Student's t-distribution and is known as the p-value. For the MYCT feature, this probability is 0.006335, which is small. We can obtain this value for ourselves using the pt() function:

> pt(q, df = 172, lower.tail = F) * 2
[1] 0.006333496

The pt() function is the distribution function for the t-distribution, which is symmetric. To understand why our p-value is computed this way, note that we are interested in the probability of the absolute value of the t-value being larger than the value we computed. To obtain this, we first obtain the probability of the upper or right tail of the t-distribution and multiply this by two in order to include the lower tail as well. Working with basic distribution functions is a very important skill in R, and we have included examples in our online tutorial chapter on R if this example seems overly difficult. The t-distribution is parameterized by the degrees of freedom.

Note

The number of degrees of freedom is essentially the number of variables that we can freely change when calculating a particular statistic, such as a coefficient estimate. In our linear regression context, this amounts to the number of observations in our training data minus the number of parameters in the model (the number of regression coefficients). For our CPU model, this number is 179 – 7 = 172. For the cars model where we have more data points, this number is 664. The name comes from its relation to the number of independent dimensions or pieces or information that are applied as input to a system, and hence the extent to which the system can be freely configured without violating any constraints on the input.

As a general rule of thumb, we would like our p-values to be less than 0.05, which is the same as saying that we would like to have 95 percent confidence intervals for our coefficient estimates that do not include zero. The number of stars next to each coefficient provides us with a quick visual aid for what the confidence level is, and a single star corresponds to our 95 percent rule of thumb while two stars represent a 99 percent confidence interval. Consequently, every coefficient in our model summary that does not have any stars corresponds to a feature that we are not confident we should include in our model using our rule of thumb. In the CPU model, the CHMIN feature is the only feature that is suspect, with the other p-values being very small. The situation is different with the cars model. Here, we have four features that are suspect as well as the intercept.

It is important to properly understand the interpretation of p-values in the context of our linear regression model. Firstly, we cannot and should not compare p-values against each other in order to gauge which feature is the most important. Secondly, a high p-value does not necessarily indicate that there is no linear relationship between a feature and the output; it only suggests that in the presence of all the other model features, this feature does not provide any new information about the output variable. Finally, we should always remember that the 95 percent rule of thumb is not infallible and is only really useful when the number of features and hence coefficients is not very large. Under 95 percent confidence, if we have 1000 features in our model, we can expect to get the wrong result for 50 coefficients on average. Consequently, linear regression coefficient significance tests aren't as useful for problems in high dimensions.

The final test of significance actually appears at the very bottom of the summary of the lm() output and is on the last line. This line provides us with the F statistic, which gets its name from the F test that checks whether there is a statistical significance between the variances of two (ideally normal) distributions. The F statistic in this case tries to assess whether the variance of the residuals from a model in which all coefficients are zero is significantly different to the variance of the residuals from our trained model.

Put differently, the F test will tell us whether the trained model explains some of the variance in the output, and hence we know that at least one of the coefficients must be nonzero. While not as useful when we have many coefficients, this tests the significance of coefficients together and doesn't suffer from the same problem as the t-test on the individual coefficients. The summary shows a tiny p-value for this, so we know that at least one of our coefficients is nonzero. We can reproduce the F test that was run using the anova() function, which stands for analysis of variance. This test compares the null model, which is the model built with just an intercept and none of the features, with our trained model. We'll show this here for the CPU data set:

> machine_model_null <- lm(PRP ~ 1, data = machine_train)
> anova(machine_model_null, machine_model1)
Analysis of Variance Table

Model 1: PRP ~ 1
Model 2: PRP ~ MYCT + MMIN + MMAX + CACH + CHMIN + CHMAX
  Res.Df     RSS Df Sum of Sq      F    Pr(>F)    
1    178 5130399                                  
2    172  646479  6   4483919 198.83 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Note that the formula of the null model is PRP ~ 1, where the 1 represents the intercept.

Performance metrics for linear regression

The final details in our summary are concerned with the performance of the model as a whole and the degree to which the linear model fits the data. To understand how we assess a linear regression fit, we should first point out that the training criterion of the linear regression model is to minimize the MSE on the data. In other words, fitting a linear model to a set of data points amounts to finding a line whose slope and position minimizes the sum (or average) of the squared distances from these points. As we refer to the error between a data point and its predicted value on the line as the residual, we can define the Residual Sum of Squares (RSS) as the sum of all the squared residuals:

Performance metrics for linear regression

In other words, RSS is just the Sum of Squared Errors (SSE), so we can relate to the MSE with which we are familiar via this simple equation:

Performance metrics for linear regression

Beyond certain historic reasons, RSS is an important metric to be aware of because it is related to another important metric, known as the RSE, which we will talk about next. For this, we'll need to first build up an intuition about what happens when we train linear regression models. If we run our simple linear regression experiment with artificial data a number of times, each time changing the random seed so that we get a different random sample, we'll see that we will get a number of regression lines that are likely to be very close to the true population line, just as our single run showed us. This illustrates the fact that linear models are characterized by low variance in general. Of course, the unknown function we are trying to approximate may very well be nonlinear and as a result, even the population regression line is not likely to be a good fit to the data for nonlinear functions. This is because the linearity assumption is very strict, and consequently, linear regression is a method with high bias.

We define a metric known as the Residual Standard Error (RSE), which estimates the standard deviation of our model compared to the target function. That is to say, it measures roughly how far away from the population regression line on average our model will be. This is measured in the units of the output variable and is an absolute value. Consequently, it needs to be compared against the values of y in order to gauge whether it is high or not for a particular sample. The general RSE for a model with k input features is computed as follows:

Performance metrics for linear regression

For simple linear regression, this is just with k = 1:

Performance metrics for linear regression

We can compute the RSE for our two models using the preceding formula, as follows:

> n_machine <- nrow(machine_train)
> k_machine <- length(machine_model1$coefficients) - 1
> sqrt(sum(machine_model1$residuals ^ 2) / (n_machine - k_machine - 1))
[1] 61.30743

> n_cars <- nrow(cars_train)
> k_cars <- length(cars_model2$coefficients) - 1
> sqrt(sum(cars_model2$residuals ^ 2) / (n_cars - k_cars - 1))
[1] 2946.98

To interpret the RSE values for our two models, we need to compare them with the mean of our output variables:

> mean(machine_train$PRP)
[1] 109.4804
> mean(cars_train$Price)
[1] 21320.2

Note that in the car model, the RSE of 61.3 is quite small compared to the RSE of the cars model, which is roughly 2947. When we look at these numbers in terms of how close they are to the means of their respective output variables, however, we learn that actually it is the cars model RSE that shows a better fit.

Now, although the RSE is useful as an absolute value in that one can compare it to the mean of the output variable, we often want a relative value that we can use to compare across different training scenarios. To this end, when evaluating the fit of linear regression models, we often also look at the R2 statistic. In the summary, this is denoted as multiple R-squared. Before we provide the equation, we'll first present the notion of the Total Sum of Squares (TSS). The total sum of squares is proportional to the total variance in the output variable, and is designed to measure the amount of variability intrinsic to this variable before we perform our regression. The formula for TSS is:

Performance metrics for linear regression

The idea behind the R2 statistic is that if a linear regression model is a close fit to the true population model, it should be able to completely capture all the variance in the output. In fact, we often refer to the R2 statistic as the relative amount that shows us what proportion of the output variance is explained by the regression. When we apply our regression model to obtain an estimate of the output variable, we see that the errors in our observations are called residuals and the RSS is essentially proportional to the variance that is left between our prediction and the true values of the output function. Consequently, we can define the R2 statistic, which is the amount of variance in our output y that our linear regression model explains, as the difference between our starting variance (TSS) and our ending variance (RSS) relative to our starting variance (TSS). As a formula, this is nothing other than:

Performance metrics for linear regression

From this equation, we can see that R2 ranges between 0 and 1. A value close to 1 is indicative of a good fit as it means that most of the variance in the output variable has been explained by the regression model. A low value, on the other hand, indicates that there is still significant variance in the errors in the model, indicating that our model is not a good fit. Let's see how the R2 statistic can be computed manually for our two models:

compute_rsquared <- function(x, y) {
     rss <- sum((x - y) ^ 2)
     tss <- sum((y - mean(y)) ^ 2)
     return(1 - (rss / tss))
 }
 
> compute_rsquared(machine_model1$fitted.values, machine_train$PRP)
[1] 0.8739904
> compute_rsquared(cars_model2$fitted.values, cars_train$Price)
[1] 0.9119826

We used the fitted.values attribute of the model trained by lm(), which are the predictions the model makes on the training data. Both values are quite high, with the cars model again indicating a slightly better fit. We've now seen two important metrics to assess a linear regression model, namely RSE and the R2 statistic. At this point, we might consider whether there is a more general measure of the linear relationship between two variables that we could also apply to our case. From statistics, we might recall that the notion of correlation describes exactly that.

The correlation between two random variables, X and Y, is given by:

Performance metrics for linear regression

It turns out that in the case of simple regression, the square of the correlation between the output variable and the input feature is the same as the R2 statistic, a result that further bolsters the importance of the latter as a useful metric.

Comparing different regression models

When we want to compare between two different regression models that have been trained on the same set of input features, the R2 statistic can be very useful. Often, however, we want to compare two models that don't have the same number of input features. For example, during the process of feature selection, we may want to know whether including a particular feature in our model is a good idea. One of the limitations of the R2 statistic is that it tends to be higher for models with more input parameters.

The adjusted R2 attempts to correct the fact that R2 always tends to be higher for models with more input features and hence is susceptible to overfitting. The adjusted R2 is generally lower than R2 itself, as we can verify by checking the values in our model summaries. The formula for the adjusted R2 is:

Comparing different regression models

The definitions of n and k are the same as those for the R2 statistic. Now, let's implement this function in R and compute the adjusted R2 for our two models:

compute_adjusted_rsquared <- function(x, y, k) {
     n <- length(y)
     R2 <- compute_rsquared(x, y)
     return(1 - ((1 - R2) * (n - 1) / (n - k - 1)))
 }

> compute_adjusted_rsquared(machine_model1$fitted.values, 
                            machine_train$PRP, k_machine)
[1] 0.8695947
> compute_adjusted_rsquared(cars_model2$fitted.values, 
                            cars_train$Price, k_cars)
[1] 0.9101407

Note

There are several other commonly used metrics of performance designed to compare models with a different number of features. The Akaike Information Criterion (AIC) uses an information theoretic approach to assess the relative quality of a model by balancing the model complexity and accuracy. For our linear regression models trained by minimizing the squared error, this is proportional to another well-known statistic, Mallow's Cp, so these can be used interchangeably. A third metric is the Bayesian Information Criterion (BIC). This tends to penalize models with more variables more heavily compared to the previous metrics.

Test set performance

So far, we've looked at the performance of our models in terms of the training data. This is important in order to gauge whether a linear model can fit the data well, but doesn't give us a good sense of predictive accuracy over unseen data. For this, we turn to our test data sets. To use our model to make predictions, we can use the predict() function. This is a general function in R that many packages extend. With models trained with lm(), we simply need to provide the model and a data frame with the observations that we want to predict:

> machine_model1_predictions <- predict(machine_model1, 
                                        machine_test)
> cars_model2_predictions <- predict(cars_model2, cars_test)

Next, we'll define our own function for computing the MSE:

compute_mse <- function(predictions, actual) { 
     mean( (predictions - actual) ^ 2 ) 
}
> compute_mse(machine_model1$fitted.values, machine_train$PRP)
[1] 3611.616
> compute_mse(machine_model1_predictions, machine_test$PRP)
[1] 2814.048
> 
> compute_mse(cars_model2$fitted.values, cars_train$Price)
[1] 8494240
> compute_mse(cars_model2_predictions, cars_test$Price)
[1] 7180150

For each model, we've used our compute_mse() function to return the training and test MSE. It happens that in this case both test MSE values are smaller than the train MSE values. Whether the test MSE is slightly larger or smaller than the train MSE is not particularly important. The important issue is that the test MSE is not significantly larger than the train MSE as this would indicate that our model is overfitting the data. Note that, especially for the CPU model, the number of observations in the original data set is very small and this has resulted in a test set size that is also very small. Consequently, we should be conservative with our confidence in the accuracy of these estimates for the predictive performance of our models on unseen data, because predictions made using a small test set size will have a higher variance.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.14.39.59