Obtaining the 99% quantile

We can look at the cut-off for the 99% mark using the quantile() function. Using the same sample data, we could use:

quantile(pct$hitpct, probs = 0.99, na.rm = TRUE)

This would have corresponding output:

99%: 0.470588235294118

So, the hit percentage of 47% is the cutoff for the 99% level of the data. Given that the three-quarter percentile was at 28% (as in the preceding hitpct graphic), there is quite a range of performance for that last quarter of data points—that is, there are some great baseball players.

We could get a list of those players in the top 25% of the hit percentage using:

top_players <- filter(pct, hitpct > 0.47)
top_players <- top_players[order(top_players$hitpct) , ]
head(top_players)
nrow(top_players)
198
If the players are arranged by hit percentage in descending order, then the players with perfect hit ratios are displayed, but they all had under 10 at bats.

We can see the data points as follows:

So, we have 200 (198) players in the top 25% of our dataset, meaning that 1% of the players are in the top 25% of hit performance. I did not think the data would be that lopsided.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.159.224