Separating investment targets

An alternative method to build an investment strategy could be to separate good investment targets and check what is common between them. A good way to find similarities among stocks that performed well could be to create groups based on the TRS values and compare low- and high-performer clusters. The first step to this should be to analyze the following code:

library(stats)
library(matrixStats)
h_clust <- hclust(dist(d[,19]))
plot(h_clust, labels = F, xlab = "")

The following dendogram is the output for the preceding code:

Separating investment targets

Based on the dendrogram, three clusters separate very well, but to cut the biggest of them into two subgroups, we may need to increase the number of clusters up until seven. To keep the overview, we should try to keep the number of cluster to the lowest possible, so first, we will try to create three clusters only using the k-means method:

k_clust <- kmeans(d[,19], 3)
K_means_results <- cbind(k_clust$centers, k_clust$size)
colnames(K_means_results) = c("Cluster center", "Cluster size")
K_means_results

Our results are pretty encouraging. Our three clusters have 1000 to 4000 elements, and we can very clearly identify the overperformers, underperformers, and, mid-range performers:

  Cluster center Cluster size
1       9.405869         3972
2      48.067540          962
3     -16.627188         2264

Next, we have to check whether there are significant differences regarding the average ratio values among these three groups. For this, we will use the Anova table. This statistical tool would compare the deviation across group averages and the standard deviation within the individual groups. Once the classification is valid, you would find huge differences among group averages but lesser differences when comparing firms within the same clusters:

for(i in c(3,4,6,10,12,14,16,17)) { print(colnames(d)[i]); print(summary(
aov(d[,i]~k_clust$cluster  , d))) }

Output:

[1] "Cash.Assets.Y.1"
                  Df  Sum Sq Mean Sq F value Pr(>F)    
k_clust$cluster    1    7491    7491   41.94  1e-10 ***
Residuals       7195 1285207     179                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
1 observation deleted due to missingness
[1] "Net.Fixed.Assets.to.Tot.Assets.Y.1"
                  Df  Sum Sq Mean Sq F value   Pr(>F)    
k_clust$cluster    1   19994   19994   40.26 2.36e-10 ***
Residuals       7106 3529208     497                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
90 observations deleted due to missingness
[1] "P.CF.5Yr.Avg.Y.1"
                  Df   Sum Sq Mean Sq F value Pr(>F)
k_clust$cluster    1    24236   24236     1.2  0.273
Residuals       4741 95772378   20201               
2455 observations deleted due to missingness
[1] "Asset.Turnover.Y.1"
                  Df Sum Sq Mean Sq F value  Pr(>F)    
k_clust$cluster    1      7   6.759   11.64 0.00065 ***
Residuals       7115   4133   0.581                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
81 observations deleted due to missingness
[1] "OI...Net.Sales.Y.1"
                  Df  Sum Sq Mean Sq F value  Pr(>F)   
k_clust$cluster    1    1461  1461.4   10.12 0.00147 **
Residuals       7196 1038800   144.4                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
[1] "LTD.Capital.Y.1"
                  Df  Sum Sq Mean Sq F value Pr(>F)  
k_clust$cluster    1    1575  1574.6   4.134 0.0421 *
Residuals       7196 2740845   380.9                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
[1] "Market.Cap.Y.1"
                  Df    Sum Sq   Mean Sq F value Pr(>F)
k_clust$cluster    1 1.386e+08 138616578   2.543  0.111
Residuals       7196 3.922e+11  54501888               
[1] "P.E.Y.1"
                  Df  Sum Sq Mean Sq F value  Pr(>F)   
k_clust$cluster    1    1735  1735.3   8.665 0.00325 **
Residuals       7196 1441046   200.3                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In the output, R marks significance with an asterisk (*) after the F test probabilities (Pr). So, you learned from the previous table that six of the variables show significant differences across clusters. To see the average values per cluster, you need to type the following code:

f <- function(x) c(mean = mean(x, na.rm = T), N = length(x[!is.na(x)]), sd = sd(x, na.rm = T))
output <- aggregate(d[c(19,3,4,6,10,12,14,16,17)], list(k_clust$cluster), f)
rownames(output) = output[,1]; output[,1] <- NULL
output <- t(output)
output <- output[,order(output[1,])]
output <- cbind(output, as.vector(apply(d[c(19,3,4,6,10,12,14,16,17)], 2, f)))
colnames(output) <- c("Underperformers", "Midrange", "Overperformers", "Total")
options(scipen=999)
print(round(output,3))

Our output was as follows. As you see, each variable has three rows (mean, number of elements, and standard deviation). That is why, the table is so long.

 

Underperformers

Midrange

Overperformers

Total

Total.Return.YTD..I..mean

-16.627

9.406

48.068

6.385

Total.Return.YTD..I..N

2264.000

3972.000

962.000

7198.000

Total.Return.YTD..I..sd

12.588

8.499

17.154

23.083

Cash.Assets.Y.1.mean

15.580

13.112

12.978

13.870

Cash.Assets.Y.1.N

2263.000

3972.000

962.000

7197.000

Cash.Assets.Y.1.sd

14.092

12.874

13.522

13.403

Net.Fixed.Assets.to.Tot.Assets.Y.1.mean

26.932

29.756

31.971

29.160

Net.Fixed.Assets.to.Tot.Assets.Y.1.N

2252.000

3899.000

957.000

7108.000

Net.Fixed.Assets.to.Tot.Assets.Y.1.sd

21.561

22.469

23.204

22.347

P.CF.5Yr.Avg.Y.1.mean

18.754

19.460

28.723

20.274

P.CF.5Yr.Avg.Y.1.N

1366.000

2856.000

521.000

4743.000

P.CF.5Yr.Avg.Y.1.sd

57.309

132.399

281.563

142.133

Asset.Turnover.Y.1.mean

1.132

1.063

1.052

1.083

Asset.Turnover.Y.1.N

2237.000

3941.000

939.000

7117.000

Asset.Turnover.Y.1.sd

0.758

0.783

0.679

0.763

OI...Net.Sales.Y.1.mean

13.774

14.704

15.018

14.453

OI...Net.Sales.Y.1.N

2264.000

3972.000

962.000

7198.000

OI...Net.Sales.Y.1.sd

11.385

12.211

12.626

12.023

LTD.Capital.Y.1.mean

17.287

20.399

17.209

18.994

LTD.Capital.Y.1.N

2264.000

3972.000

962.000

7198.000

LTD.Capital.Y.1.sd

18.860

19.785

19.504

19.521

P.E.Y.1.mean

20.806

19.793

19.455

20.067

P.E.Y.1.N

2264.000

3972.000

962.000

7198.000

P.E.Y.1.sd

14.646

13.702

14.782

14.159

As we have seen in our preceding Anova table, in the case of six out of eight financial ratios, we find significant differences among the three groups. This method helps to find even nonlinear connections (in contrast to correlation ratios). A good example of this is Cash.Assets; Overperformers and mid-range shows very similar values, but underperformers have a significantly higher amount of (probably unused) cash. This means that being below a certain level, cash/asset gives us the hint that the given share is not a good investment. We will find the same pattern with the asset turnover.

The 5-year average of Price/Cash flow (P/CF) is another good example of how we may discover connections that remain hidden when only checking correlations. This ratio shows the J form, that is, the lowest value is with the mid-range group, and the highest with the overperformers.

Based on these results, the best investment targets may have, at the same time, lower cash ratio and financial leverage (LT debt / capital) but higher fixed asset rate and P/CF ratio, while P/E and asset turnover are just average. In short, the best firms use their current capital efficiently; they average the asset turnover with not too much free cash. They have further room to increase their leverage and have a good cash flow growth outlook reflected by the higher P/CF rate. Before testing this selection method, we shall check whether we may refine this by either adding more exact rules to separate potential investment or by simplifying it by removing some of these criteria.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.41.148