Training sets and corpus creation

After the data preparation step, the resulting dataset is used as a training corpus. Generally, the training corpus is split into three chunks: a training set, a validation set, and a testing set. 

The training set is the chunk of data that you use to train one or more machine learning algorithms. The validation set is the chunk of data that you use to validate the trained model. Finally, the testing set is the chunk of data that you use to assess the performance of a fully trained classifier. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.