Data heterogeneity

Heterogeneity, according to Webster's dictionary, means the quality or state of consisting of dissimilar or diverse elements: the quality or state of being heterogeneous. To us this means that the feature vectors include features of many different kinds. If this applies to our application, then it may be better for us to apply a different learning algorithm for the task. Some learning algorithms also require that our data is scaled to fit within certain ranges, such as [0 - 1], [-1 - 1], and so on. As we get into learning algorithms that utilize distance functions as their basis, such as nearest neighbor and support vector methods, you will see that they are exceptionally sensitive to this. On the other hand, algorithms such as those that are tree-based (decision trees, and so on) handle this phenomenon quite well.

We will end this discussion by saying that we should always start with the least complex, and most appropriate algorithm, and ensure our data is collected and prepared correctly. From there, we can always experiment with different learning algorithms and tune them to see which one works best for our situation. Make no mistake, tuning algorithms may not be a simple task, and in the end, consumes a lot more time than we have available. Always ensure the appropriate amount of data is available first!

Table of Contents for Data heterogeneity

Create new playlist

Sign In

Sign Up

Table of Contents for
Data heterogeneity