Text classifying techniques

Classification is concerned with taking a specific document and determining if it fits into one of several other document groups. There are two basic techniques for classifying text:

  • Rule-based
  • Supervised Machine Learning

Rule-based classification uses a combination of words and other attributes organized around expert crafted rules. These can be very effective but creating them is a time-consuming process.

Supervised Machine Learning (SML) takes a collection of annotated training documents to create a model. The model is normally called the classifier. There are many different machine learning techniques including Naive Bayes, Support-Vector Machine (SVM), and k-nearest neighbor.

We are not concerned with how these approaches work but the interested reader will find innumerable sources that expand upon these and other techniques.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.227.10.162