One method to build a performance-prediction model could be using multiple variable regression models. A linear estimation should only include variables with minimal linear connection among them. As we have just seen, our explanatory variables are more or less independent of each other, which is great. It is bad news, though, that these variables individually also have low correlation with the dependent variable, TRS.
To get the best linear estimation, we may choose from several methods. One option is to first include all variables and ask R to drop step by step the one with the lowest significance (step-wise method). Under another widely used method, R could start with one variable only and enter stepwise the next one with the highest explanatory power (the backward method). Here, we picked the latter, as the first method could not end with a significant model:
library(MASS) vars <- colnames(d_filt) m <- length(vars) lin_formula <- paste(vars[m], paste(vars[-m], collapse = " + "), sep = " ~ ") fit <- lm(formula = lin_formula, data = d_filt) fit <- stepAIC(object = fit, direction = "backward", k = 4) summary(fit) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.77884 1.11533 6.078 1.4e-09 *** Cash.Assets.Y.1 -0.08757 0.03186 -2.749 0.006022 ** Net.Fixed.Assets.to.Tot.Assets.Y.1 0.07153 0.01997 3.583 0.000346 *** R.D.Net.Sales.Y.1 0.30689 0.07888 3.891 0.000102 *** P.E.Y.1 -0.09746 0.02944 -3.311 0.000943 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 19.63 on 2591 degrees of freedom Multiple R-squared: 0.01598, Adjusted R-squared: 0.01446 F-statistic: 10.52 on 4 and 2591 DF, p-value: 1.879e-08
The backward method ended up with an R squared of 1.6 percent, only meaning that the regression cannot explain more than 1.6 percent of the total variance of the TRS. In other words, the model's performance is extremely bad. Notice that the poor performance is due to the weak (linear) connection between explanatory variables and TRS. Should you have some variables with stronger connection, your linear regressions will show better results. With an R squared above 50 percent, you are very likely to build a great stock-selection strategy by buying shares that have high values for significant explanatory variables with a positive sign in the model, while they have low values for variables with a negative sign in the model. As we cannot use this method here, we have to follow a different logic.
3.144.93.9