Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Partitioning samples

Although there are numerous approaches to achieve data partitioning, the caTools package is one of the most useful. This package contains a function called sample.split, which generates random numbers to split a sample but keeps the proportion of bads and goods in the original dataset also in the separated samples.

As the caTools package uses random numbers, it is convenient to fix a seed to the replicability of the results:

set.seed(1234)

Then, use the sample.split function:

library(caTools)
 index = sample.split(Model_database$Default, SplitRatio = .70)

This function takes two arguments, the target variable and the partition size, in our case, the 70%.

It generates an index with two values, TRUE and FALSE, which can be used to split the dataset into the two desired samples:

train<-subset(Model_database, index == TRUE)
test<-subset(Model_database, index == FALSE)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

18.116.51.6

Table of Contents for Partitioning samples

Create new playlist

Sign In

Sign Up

Table of Contents for
Partitioning samples