Preprocessing a single word 

The p.single processes a single word. It returns the ID of the word, and whether to add it to the list of words that make up the tweet. It is defined as follows:

 func (p *processor) single(a string) (wordID int, ok bool) {
word := strings.ToLower(a)
if _, ok = stopwords[word]; ok {
return -1, false
if strings.HasPrefix(word, "#") {
return p.corpus.Add(hashtag), true
if strings.HasPrefix(word, "@") {
return p.corpus.Add(mention), true
if strings.HasPrefix(word, "http://") {
return p.corpus.Add(url), true
if isRT(word) {
return p.corpus.Add(retweet), false
return p.corpus.Add(word), true

We start by making the word lowercase. This makes words such as café and Café equivalent.

Speaking of café, what would happen if there are two tweets mentioning a café, but one user writes café and the other writes cafe? Assume, of course, they both refer to the same thing. We'd need some form of normalization to tell us that they're the same.

