Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Getting rid of missing data

An alternative way of using the na.rm argument in R functions is removing NA from the dataset before passing that to the analysis functions. This means that we are removing the missing values from the dataset permanently, so that they won't cause any problems at later stages in the analysis. For this, we could use either the na.omit or the na.exclude functions:

> na.omit(c(1:5, NA))
[1] 1 2 3 4 5
attr(,"na.action")
[1] 6
attr(,"class")
[1] "omit"
> na.exclude(c(1:5, NA))
[1] 1 2 3 4 5
attr(,"na.action")
[1] 6
attr(,"class")
[1] "exclude"

The only difference between these two functions is the class of the na.action attribute of the returned R object, which are omit and exclude respectively. This minor difference is only important when modelling. The na.exclude function returns NA for residuals and predictions, while na.omit suppresses those elements of the vector:

> x <- rnorm(10); y <- rnorm(10)
> x[1] <- NA; y[2] <- NA
> exclude <- lm(y ~ x, na.action = "na.exclude")
> omit <- lm(y ~ x, na.action = "na.omit")
> residuals(exclude)
    1     2     3     4     5     6     7     8     9    10 
   NA    NA -0.89 -0.98  1.45 -0.23  3.11 -0.23 -1.04 -1.20 

> residuals(omit)
    3     4     5     6     7     8     9    10 
-0.89 -0.98  1.45 -0.23  3.11 -0.23 -1.04 -1.20

Important thing to note in case of tabular data, like a matrix or data.frame, these functions remove the whole row if it contains at least one missing value. For a quick demo, let's create a matrix with 3 columns and 3 rows with values incrementing from 1 to 9, but replacing all values divisible by 4 with NA:

> m <- matrix(1:9, 3)
> m[which(m %% 4 == 0, arr.ind = TRUE)] <- NA
> m
     [,1] [,2] [,3]
[1,]    1   NA    7
[2,]    2    5   NA
[3,]    3    6    9
> na.omit(m)
     [,1] [,2] [,3]
[1,]    3    6    9
attr(,"na.action")
[1] 1 2
attr(,"class")
[1] "omit"

As seen here, we can find the row numbers of the removed cases in the na.action attribute.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Getting rid of missing data

Create new playlist

Sign In

Sign Up

Getting rid of missing data

Table of Contents for
Getting rid of missing data