Choosing between linear and RBF kernels

Of course, linear separability is the rule of thumb when choosing the right kernel to start with. However, most of the time, this is very difficult to identify, unless you have sufficient prior knowledge of the dataset, or its features are of low dimensions (1 to 3).

Some general prior knowledge we have include: text data is often linearly separable, while data generated from the XOR function is not.

Now, let's look at the following three scenarios where linear kernel is favored over RBF:

Scenario 1: Both the numbers of features and instances are large (more than 104 or 105). Since the dimension of the feature space is high enough, additional features as a result of RBF transformation will not provide any performance improvement, but will increase computational expense. Some examples from the UCI machine learning repository are of this type:

Scenario 2: The number of features is noticeably large compared to the number of training samples. Apart from the reasons stated in scenario 1, the RBF kernel is significantly more prone to overfitting. Such a scenario occurs, for example, in the following referral links:

Scenario 3: The number of instances is significantly large compared to the number of features. For a dataset of low dimension, the RBF kernel will, in general, boost the performance by mapping it to a higher-dimensional space. However, due to the training complexity, it usually becomes no longer efficient on a training set with more than 106 or 107  samples. Example datasets include the following:

Aside from these three scenarios, RBF is ordinarily the first choice.

The rules for choosing between linear and RBF kernel can be summarized as follows:

Once again, first choice means what we can begin with this option; it does not mean that this is the only option moving forward.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.86.211