Execute the following steps to split the dataset into training and test sets.
- Import the function from sklearn:
from sklearn.model_selection import train_test_split
- Split the data into training and test sets:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
- Split the data into training and test sets without shuffling:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=False)
- Split the data into training and test sets with stratification:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y, random_state=42)
- Verify that the ratio of the target is preserved:
y_train.value_counts(normalize=True)
y_test.value_counts(normalize=True)
In both sets, the percentage of defaults is ~22.12%.