Missing data

Datasets are also prone to missing information. This is very common in datasets that require user input, for example, in surveys where the user might not have entered all the information in the respective fields. Also, sometimes the data might not even be available or due to restrictions could not be included in the respective dataset.

There are multiple techniques that have been devised to fill in missing values. The methods include simple procedures such as using the mean or median of the columns to more advanced methods in fields such as survey statistics.

A few of the common methods to impute, that is, fill in missing values, have been provided as follows:

  • Imputation using statistical measures of central tendency: This means using the mean, median, and mode values of the available data in the column to fill in the missing values.
  • Imputation using statistical models: This means using statistical modelling methods such as regression to create a predictive model with the outcome variable as the column being filled in. We can thereafter use predict to fill in the missing entries.
  • Using data imputation methods such as KNN imputation: KNN is a clustering technique that attempts to fill in missing data using the points that are closest to it in a multi-dimensional space. In a different sense, it means that KNN attempts to find other rows of data that are similar to the present and fill in the missing value accordingly.
  • Hot deck imputation: This involves filling in the missing value using a similar record that is randomly selected from the complete dataset. One such method is the last observation carried forward, in which case the values are randomized and the value in the record prior to the missing data is used as the new imputed value. In general, hot deck imputation, while used, can be prone to issues such as bias if a large number of missing data is filled in with the same value.

Packages implementing missing data imputation in R include the following:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.227.111.193