Alternative class design

Here, we imagine an alternative design of Class:

type Class string

const (
Ham Class = "Ham"
Spam Class = "Spam"
)

With this change, we will have to update the definition of Classifier:

type Classifier struct {
corpus *corpus.Corpus

tfidfs map[Class]*tfidf.TFIDF
totals map[Class]float64

ready bool
sync.Mutex
}

Consider now the steps required to get the totals of class Ham:

  1. The string has to be hashed
  2. The hash will be used to look up the bucket where the data for totals is stored
  3. An indirection is made to the bucket and the data is retrieved and returned to the user

Consider now the steps required to get the totals of class Ham if the class design was the original:

  • Since Ham is a number, we can directly compute the location of the data for retrieval and return to the user.

By using a constant value and a numeric definition of the type Class, and an array type for totals, we are able to skip two steps. This yields very slight performance improvements. In this project, they're mostly negligible, until your data gets to a certain size.

The aim of this section on the Class design is to instill a sense of mechanical sympathy. If you understand how the machine works, you can design very fast machine learning algorithms.

All this said and done, there is one assumption that underpins this entire exercise. This is a main package. If you're designing a package that will be reused on different datasets, the tradeoff considerations are significantly different. In the context of software engineering, overgeneralizing your package often leads to leaky abstractions that are hard to debug. Better to write slightly more concrete and specific data structures that are purpose built.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.59.107.152