The near-zero variance function

The near-zero variance, available in the nearZeroVar function in the R package, caret, is used to identify variables that have little or no variance. Consider a set of 10,000 numbers with only three distinct values. Such a variable may add very little value to an algorithm. In order to use the nearZeroVar function, first install the R package, caret, in RStudio (which we had set up Chapter 3The Analytics Toolkit. The exact code to replicate the effect of using nearZeroVar is shown here:

> library(caret) 
Loading required package: lattice 
Loading required package: ggplot2 
Need help getting started? Try the cookbook for R: http://www.cookbook-r.com/Graphs/ 
 
> repeated <- c(rep(100,9999),10) # 9999 values are 100 and the last value is 10 
 
>random<- sample(100,10000,T) # 10,000 random values from 1 - 100 
 
>data<- data.frame(random = random, repeated = repeated) 
 
>nearZeroVar(data) 
[1] 2 
 
> names(data)[nearZeroVar(data)] 
[1] "repeated" 

As the example shows, the function was able to correctly detect the variable that met the criteria.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.17.128