But enough with the theory. Let's do some coding!
It might be a good idea to pace ourselves. For our very first SVM, we should probably focus on a simple dataset, perhaps a binary classification task.
A cool trick about scikit-learn's datasets module that I haven't told you about is that you can generate random datasets of controlled size and complexity. A few notable ones are as follows:
- datasets.make_classification([n_samples, ...]): This function generates a random n-class classification problem, where we can specify the number of samples, the number of features, and the number of target labels
- datasets.make_regression([n_samples, ...]): This function generates a random regression problem
- datasets.make_blobs([n_samples, n_features, ...]): This function generates a number of Gaussian blobs we can use for clustering
This means that we can use make_classification to build a custom dataset for a binary classification task.