An alternative way of using the na.rm
argument in R functions is removing NA
from the dataset before passing that to the analysis functions. This means that we are removing the missing values from the dataset permanently, so that they won't cause any problems at later stages in the analysis. For this, we could use either the na.omit
or the na.exclude
functions:
> na.omit(c(1:5, NA)) [1] 1 2 3 4 5 attr(,"na.action") [1] 6 attr(,"class") [1] "omit" > na.exclude(c(1:5, NA)) [1] 1 2 3 4 5 attr(,"na.action") [1] 6 attr(,"class") [1] "exclude"
The only difference between these two functions is the class of the na.action
attribute of the returned R object, which are omit
and exclude
respectively. This minor difference is only important when modelling. The na.exclude
function returns NA
for residuals and predictions, while na.omit
suppresses those elements of the vector:
> x <- rnorm(10); y <- rnorm(10) > x[1] <- NA; y[2] <- NA > exclude <- lm(y ~ x, na.action = "na.exclude") > omit <- lm(y ~ x, na.action = "na.omit") > residuals(exclude) 1 2 3 4 5 6 7 8 9 10 NA NA -0.89 -0.98 1.45 -0.23 3.11 -0.23 -1.04 -1.20 > residuals(omit) 3 4 5 6 7 8 9 10 -0.89 -0.98 1.45 -0.23 3.11 -0.23 -1.04 -1.20
Important thing to note in case of tabular data, like a matrix
or data.frame
, these functions remove the whole row if it contains at least one missing value. For a quick demo, let's create a matrix with 3 columns and 3 rows with values incrementing from 1 to 9, but replacing all values divisible by 4 with NA
:
> m <- matrix(1:9, 3) > m[which(m %% 4 == 0, arr.ind = TRUE)] <- NA > m [,1] [,2] [,3] [1,] 1 NA 7 [2,] 2 5 NA [3,] 3 6 9 > na.omit(m) [,1] [,2] [,3] [1,] 3 6 9 attr(,"na.action") [1] 1 2 attr(,"class") [1] "omit"
As seen here, we can find the row numbers of the removed cases in the na.action
attribute.
18.119.163.238