Content-based filtering

As a musician himself, Tim Westergren had spent years on the road listening to other talented musicians, wondering why they could never get ahead. Their music was good—just as good as anything you might hear on the radio—and yet, somehow, they just never caught their big break. He imagined it must be because their music just never got in front of enough of the right people.

Tim eventually quit his job as a musician and took another job as a composer for movie scores. It was there that he began to think of each piece of music as having a distinct structure that could be decomposed into constituent parts—a form of musical DNA.

After giving it some thought, he began to consider creating a company around this idea of building a musical genome. He ran the concept by one of his friends, who had previously created and sold a company. The friend loved Tim's idea. So much so, in fact, that he began helping him to write a business plan and gather the initial funding round for the project. It was a go.

Over the next several years, they employed a small army of musicians who meticulously codified almost 400 distinct musical features for over a million pieces of music. Each feature was rated on a 0 to 5 point scale by hand (or maybe by ear is a better way to say it). Each three- or four-minute song took nearly a half hour to classify.

The features included things such as how gravelly the lead singers' voice was or how many beats per minute the tempo was. It took nearly a year for their first prototype to be completed. Built entirely in Excel using a VBA macro, it took nearly four minutes just to return a single recommendation. But in the end, it worked and it worked well.

That company is now known as Pandora music, and chances are you've either heard of it or used its products as it has millions of daily users around the world. It's without a doubt a triumphant example of content-based filtering.

Rather than treat each song as a single indivisible unit, as in content-based filtering, the songs become feature vectors that can be compared using our friend cosine similarity.

Another benefit is that not only are the songs subject to being decomposed into feature vectors, but the listeners can be as well. Each listener's taste profile becomes a vector in this space so that measurements can be made between their taste profiles and the songs themselves.

For Tim Westergren, this was the magic, because rather than rely on the popularity of the music like so many recommendations are, the recommendations from this system were made based upon the inherent structural similarity. Maybe you've never heard of song X, but if you like song Y, then you should like song X because it's genetically almost identical. That's content-based filtering.

Table of Contents for Content-based filtering

Create new playlist

Sign In

Sign Up

Table of Contents for
Content-based filtering