To some extent, we touched upon model-based dimension reduction in Chapter 4, Regression with Automobile Data, Logistic regression, where we tried to implement regression modelling and in that process we tried to reduce the data dimension by applying Akaike Information Criteria (AIC). Bayesian Information Criteria (BIC) such as AIC can also be used to reduce data dimensions. As far as the model-based method is concerned, there are two approaches:
The problem in this chapter we are discussing is a case of supervised learning classification, where the dependent variable is default or no default. The logistic regression method, as we discussed in Chapter 4, Regression with Automobile Data, Logistic regression, uses a step wise dimension reduction procedure to remove unwanted variables from the model. The same exercise can be done on the dataset we discussed in this chapter.
Apart from the standard methods of data dimension reduction, there are some not so important methods available which can be considered, such as missing value estimation method. In a large data set with many dimension sparsity problem will be a common scenario, before applying any formal process of dimensionality reduction, if we can apply the missing value percentage calculation method on the dataset, we can drop many variables. The threshold to drop the variables failing to meet the minimum missing percentage has to be decided by the analyst.
18.220.27.93