We can look at the cut-off for the 99% mark using the quantile() function. Using the same sample data, we could use:
quantile(pct$hitpct, probs = 0.99, na.rm = TRUE)
This would have corresponding output:
99%: 0.470588235294118
So, the hit percentage of 47% is the cutoff for the 99% level of the data. Given that the three-quarter percentile was at 28% (as in the preceding hitpct graphic), there is quite a range of performance for that last quarter of data points—that is, there are some great baseball players.
We could get a list of those players in the top 25% of the hit percentage using:
top_players <- filter(pct, hitpct > 0.47)
top_players <- top_players[order(top_players$hitpct) , ]
head(top_players)
nrow(top_players)
198
We can see the data points as follows:
So, we have 200 (198) players in the top 25% of our dataset, meaning that 1% of the players are in the top 25% of hit performance. I did not think the data would be that lopsided.