Basic train-test split

For a single split of your data into a training and a test set, use sklearn.model_selection.train_test_split, where the shuffle parameter, by default ensures the randomized selection of observations, which in turn can be replicated by setting random_state. There is also a stratify parameter that, for a classification problem, ensures that the train and test sets will contain approximately the same shares of each class, as shown in the following code:

train_test_split(data, train_size=.8)
[[8, 7, 4, 10, 1, 3, 5, 2], [6, 9]]

In this case, we train a model using all data except row numbers 6 and 9, which will be used to generate predictions and measure the errors given on the know labels. This method is useful for quick evaluation but is sensitive to the split, and the standard error of the test error estimate will be higher.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.91.177.91