The near-zero variance, available in the nearZeroVar function in the R package, caret, is used to identify variables that have little or no variance. Consider a set of 10,000 numbers with only three distinct values. Such a variable may add very little value to an algorithm. In order to use the nearZeroVar function, first install the R package, caret, in RStudio (which we had set up Chapter 3, The Analytics Toolkit. The exact code to replicate the effect of using nearZeroVar is shown here:
> library(caret) Loading required package: lattice Loading required package: ggplot2 Need help getting started? Try the cookbook for R: http://www.cookbook-r.com/Graphs/ > repeated <- c(rep(100,9999),10) # 9999 values are 100 and the last value is 10 >random<- sample(100,10000,T) # 10,000 random values from 1 - 100 >data<- data.frame(random = random, repeated = repeated) >nearZeroVar(data) [1] 2 > names(data)[nearZeroVar(data)] [1] "repeated"
As the example shows, the function was able to correctly detect the variable that met the criteria.