Implementating the classifier

In the earlier parts of the chapter, we sketched out a dummy Classifier type that does nothing. Let's make it do something now:

type Classifier struct {
corpus *corpus.Corpus

tfidfs [MAXCLASS]*tfidf.TFIDF
totals [MAXCLASS]float64

ready bool
sync.Mutex
}

Here, there are introductions to a few things. Let's walk them through one by one:

  • We'll start with the corpus.Corpus type.
  • This is a type imported from the corpus package, which is a subpackage of the NLP library for Go, lingo.
  • To install lingo, simply run go get -u github.com/chewxy/lingo/....
  • To use the corpus package, simply import it like so: import "github.com/chewxy/lingo/corpus".
Bear in mind that in the near future, the package will change to github.com/go-nlp/lingo. If you are reading this after January 2019, use the new address.

A corpus.Corpus object simply maps from a word to an integer. The reason for doing this is twofold:

  • It saves on memory: A []int uses considerably less memory than []string. Once a corpus has been converted to be IDs, the memory for the strings can be freed. The purpose of this is to provide an alternative to string interning.
  • String interning is fickle: String interning is a procedure where for the entire program's memory, only exactly one copy of the string exists. This turns out to be harder than expected for most tasks. Integers provide a more stable interning procedure.

Next, we are faced with two fields which are arrays. Specifically, tfidfs [MAXCLASS]*tfidf.TFIDF and totals [MAXCLASS]float64. At this point, it might be a good idea to talk about the Class type.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.100.237