Partitioning samples

Although there are numerous approaches to achieve data partitioning, the caTools package is one of the most useful. This package contains a function called sample.split, which generates random numbers to split a sample but keeps the proportion of bads and goods in the original dataset also in the separated samples.

As the caTools package uses random numbers, it is convenient to fix a seed to the replicability of the results:

set.seed(1234)

Then, use the sample.split function:

library(caTools)
index = sample.split(Model_database$Default, SplitRatio = .70)

This function takes two arguments, the target variable and the partition size, in our case, the 70%.

It generates an index with two values, TRUE and FALSE, which can be used to split the dataset into the two desired samples:

train<-subset(Model_database, index == TRUE)
test<-subset(Model_database, index == FALSE)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.51.6