Chapter 6. AI for content curation and community building

This chapter covers

  • Using recommender systems to suggest engaging content and products
  • Understanding the two approaches to recommender systems: content and community-based
  • Understanding the drawbacks of algorithmic recommendations
  • Case study: using recommender systems to save $1 billion in churn

Recommender systems are the workhorse behind today’s “personalized” experiences, and a fundamental tool to help consumers navigate huge catalogs of media, products, and clothes. Anytime you click a related offer from Amazon or check out a suggested movie from Netflix, these companies are taking advantage of recommender systems to drive user engagement and top-line growth. Without them, navigating the vastness of products and digital content available on the internet would be simply impossible.

In the case study at the end of the chapter, you’ll see how Netflix believes that its recommender system has been saving the company more than $1 billion each year since 2015.

6.1 The curse of choice

Have you ever entered a large mall to shop for clothes and felt disoriented? It’s hard to find the T-shirt of your dreams when there are hundreds to choose from, with different colors, fabrics, brands, and prices. The internet has no real estate limitations, and therefore you potentially have to choose from not hundreds, but tens of thousands of T-shirts. Or movies. Or songs. Or news articles. Or dishwasher tablets. Frustration is around the corner, and frustrated customers are not good for business.

The internet is so brimming with choices that we need to find a way to help make them. In chapter 3, we tried to find a way to group users according to their tastes and target them with offers that match those tastes, using a set of techniques called unsupervised learning. However, that is still not enough to reach the marketing singularity : the holy grail of one-to-one communication, whereby each customer gets fully personalized offers and recommendations for maximum satisfaction. This is the topic of this chapter.

How can you help people make choices? If we look at great shop assistants, they do either or both of these two things:

  • They know their catalog. If you like a pair of jeans, they’ll be able to suggest others that are similar and quickly direct you toward the perfect choice.
  • They know you. If you are a returning customer, they learned about your tastes and can suggest items you’re likely to like: “You must try these jeans; they’re so trendy right now and they really match your style.”

Throughout this chapter, you’ll see how to build algorithms that can do both of these things, even better than humans. These algorithms, called recommender systems , produce tailored recommendations to customers at scale.

6.2 Driving engagement with recommender systems

Thanks to the AI-based features we built in the previous chapters, FutureHouse has been drawing more and more users. The free estimate of the sale price and automated listings have attracted many new sellers too. All this success has brought a new challenge: so many properties are listed for sale now that it’s hard for buyers to find the perfect home. We need to find a way to highlight the properties that match their tastes, and make it easier for them to find the home of their dreams. The solution is clear: we need to build a Recommended Houses for You feature that learns the taste of each user and preselects houses for them.

Let’s try to think about how FutureHouse could decide which homes to show to each user. How would an experienced human approach this problem? A real estate agent may look at one or more houses that the customer has liked and propose others that are somehow similar (for example, in the same neighborhood or with the same number of rooms).

How do we translate this approach to the world of the internet? An obvious idea would be to keep track of the homes that each customer visits on the site, and suggest that they check out similar properties to the ones they have already found. For example, if a prospective buyer has already looked at several three-bedroom homes close to train stations, it’s likely that they would be interested in others that share the same attributes.

How do we define the meaning of similar , though? If we had a clear idea in mind of how to measure similarity, we could just explain that to a computer. However, our idea of similar is intuitive and natural, and therefore we struggle to do so. By now, you’re familiar with the fact that this is a perfect starting point for machine learning.

In chapter 2, we introduced the concept of features , measurable aspects of each individual home that can be used as inputs for ML algorithms. More specifically, the groundwork we did to build the price-prediction model suggested that the number of rooms, the square footage, and the distance to public transportation were some of the most important factors to take into account. Similarly to what we did for price prediction, we can lay down the feature values for each home in an Excel spreadsheet, as we did in figure 6.1.

Figure 6.1 An Excel spreadsheet is great for visualizing the features of homes. Each row is a different property, and each column is a feature.

Let’s start simple and consider a single feature of the homes: square footage. It’s obvious that a 1,000 sq. ft. home and a 1,100 sq. ft. home are more similar to each other than to a 12,000 sq. ft. home. Therefore, our algorithm should recommend the 1,000 sq. ft. home to a user who has visited a 1,100 sq. ft. home, and avoid surfacing the 12,000 sq. ft. villa. In ML terms, this is formalized by the concept of distance between the feature values: 1,100-1,000 is much smaller than 12,000-1,000, so the algorithm can figure out they’re more similar to each other.

As you’ve seen in chapter 2, most real-world ML applications don’t use a single feature, but tens or even hundreds. Let’s now step up our game a notch and consider a second feature: the number of rooms. If we drew a circle for each home in a (very small) dataset of homes, the diagram would look like figure 6.2.

Figure 6.2 A representation of houses based on two features: number of rooms and square footage

Let’s imagine the experience of a new user we’ll call Lucie. Lucie signs up on FutureHouse and starts browsing around. She immediately clicks one of the houses and spends a lot of time on its page, looking through the pictures, reading the descriptions, considering the reviews, and so on. We can assume that this is a clear indication that Lucie likes that house. If we take the plot in figure 6.2 and turn the dot corresponding to Lucie’s favorite house into a star, we get figure 6.3.

Figure 6.3 Lucie really likes one of the homes, highlighted with a star.

Our task is to help her find other houses that match her tastes: houses that are similar to the one she just spent a lot of time looking at. Even just by staring at figure 6.4, it seems intuitive that three other homes are quite similar to Lucie’s favorite. The other four houses seem to be much more different: the two on the bottom left are too small, and the ones on the top right are probably too big.

Now we need to translate this reasoning into computing terms so that we can automate it and help thousands of people like Lucie at the same time. We’ll again use the notion of distance that we introduced earlier. Imagine using a ruler to measure the distance between the points (that is, homes) printed in figure 6.3: this is exactly what a computer does, as shown in figure 6.4.

Figure 6.4 A computer takes each pair of homes and measures the straight-line distance between them.

After we’ve measured the distance from the starred home to all of the other ones in the dataset, we can ask the computer to sort them from shortest to longest, and pick the top two or three to show Lucie.

Of course, we’re still considering only the square footage and number of rooms, so our notion of similarity is rough and incomplete: we’re ignoring many other important house features including the location, year of construction, and more. Luckily, it turns out that the algorithm doesn’t really change when we add all of the other features back; it just becomes harder to draw it on paper and conceptualize it for us poor, limited humans. Engineers can easily build distance algorithms that work with all the features that we used in chapter 2 for price prediction, and a computer can effortlessly compute them.

What we have just described is the simplest recommender system ever, but it’s still effective, and this basic concept is easy to deploy in a variety of business situations. Take the page that shows details and pictures of each home. It’s easy for engineers to add a sidebar with links to the three or four homes that are most similar to the one the user is browsing, and enjoy the additional traffic.

More-sophisticated models also keep track of how the tastes of the users evolve over time, by considering all the homes that they have ever interacted with. Even better, we could group choices based on the category of the items and on the time of the year, instead of treating all preferences the same. For example, an e-commerce site would choose to recommend your kids’ favorite brand of pencils during the back-to-school season, but recognize that the very same recommendations are not useful during the summer.

Notice that Lucie’s recommender system was looking only at the features of Lucie’s favorite home and completely ignored who she is and what other users in the community have done. This is also true for the more sophisticated examples we made: we’re still focusing on only the content for now. This is why this family of recommender systems is called content-based .

6.2.1 Content-based systems beyond simple features

While our fictional FutureHouse examples have served us well so far, you’re likely wondering how recommender systems can be adapted to other types of items, such as the clothes in your favorite e-shop. When talking about homes, we were lucky to have a set of descriptive features that are also reasonably easy to collect. Again, those are the same features that we used in chapter 2 to predict the sale price of the home.

However, many organizations deploy recommender systems on catalogs of items for which it’s much harder to select features. Imagine the example at the start of this chapter: finding the best T-shirt for you. What kind of features could we use?

  • Predominant color (black, white, and so forth)
  • Fabric (cotton, synthetic)
  • V-shaped or c-shaped neck
  • Fit (slim, regular)
  • Sleeves (long, short, three-quarter).

If buying a T-shirt was so easy, many fashion designers would be out of a job. The reality is that it’s hard to come up with criteria that capture the style of a T-shirt, as with any piece of clothing, or in general with any visual media.

Luckily, we already solved the same challenge in chapter 4, when we realized the shortcomings of conventional machine learning and introduced advanced models based on deep learning, a family of algorithms that automatically extracts features from complex data sources. These models were capable of transforming an image into a small set of numbers called embeddings , which represent high-level characteristics of the image in a compact form.

Let’s recall what we said about deep neural networks when they’re trained to recognize faces. When you train a face-recognition algorithm, it learns to recognize high-level characteristics of faces and transform each image it’s fed into that series of numbers called embeddings. These embeddings represent the presence (or absence) of certain important facial characteristics in the image; for instance, whether the person in the image has a pointy nose, large ears, and so on.

What happens if you do the same with T-shirts? Well, you’ll magically have a way to transform the picture of that T-shirt from millions of pixels to a few hundred numbers that express some of its high-level style characteristics. These may be the presence of vertical stripes, horizontal stripes, prints of different objects, sentences, logos, and more. While we intuitively know that all these features are relevant to characterize a T-shirt, we have no way of doing so by hand. A deep learning algorithm can do that automatically, saving the day for us all.

Once we have transformed an image into its embeddings using a deep neural network, we suddenly find ourselves in the same situation that we described in the previous section: we can transform all the images into a set of points in space, and use the similarity criterion to spot T-shirts that are similar to the user’s favorites. Obviously, embeddings will be made by hundreds or thousands of dimensions, in order to capture the nuances of a T-shirt’s style. To visualize the concept, we drew a 2-D representation of the embeddings of three T-shirts, as in figure 6.5.

Figure 6.5 Embeddings of three T-shirts represented on a 2-D plane. T-shirts A and B are similar, so their embeddings are close. T-shirt C is different from A and B, so it’s far from both.

Imagine now that a user clicks on the product page for T-shirt A and spends time looking at it, checking the available sizes, colors, and materials. We can interpret this as a manifestation of interest in that T-shirt’s style, and recommend another one that has an embedding close to it; for instance, T-shirt B. We’ll avoid suggesting T-shirt C, as we know that its embedding is far from that of the T-shirt that the user liked, and therefore it probably won’t match their taste.

This is again the magic of deep learning and of embeddings: neural networks found a way to “understand” images and transform them into a small set of numbers that actually make sense (for a computer, at least). If you think about it, images are not the only applications where you’ve seen the power of embeddings. In chapter 5, you saw how powerful embeddings can be when dealing with another complicated kind of data: text.

The same approach we described works with text as well. Suppose we are trying to suggest news articles to a user on a website, and we want to find a piece of news that is similar to the ones where our user spends most of his time. We can transform the words in each article in its embeddings and place our new vectorized articles in space, as shown in figure 6.6.

Figure 6.6 Deep learning can translate each news article into its embedding, which
is a compact mathematical description that can be used to compare them.

Once we have a spatial representation of the news articles, we can let the concept of similarity guide us in recommending news to a user. If a user reads article A, we know that they’re probably going to be interested in article B, because their embeddings are close and therefore their content is too. This is the magic of deep learning and embeddings: they allow us to transform even complex data sources like articles and images into numerical representations that carry meaning, applying the same techniques we used for houses at the beginning of the chapter.

6.2.2 The limitations of features and similarity

From what we have said so far, you might think that the concept of similarity (together with some deep learning magic) can solve every recommendation problem, from news articles to household products. This section will help you build your intuition about why, in some cases, similarity isn’t the best strategy. Let’s start with another classic example that we’re all familiar with: recommending movies to watch.

Let’s tackle this new problem by following the same approach that we used to recommend homes to potential buyers. The first step is to find the features that the algorithm can use to describe movies. Most people come up with a list that looks like this:

  • Director
  • Year of release
  • Genre (Comedy, Drama, Animation, Romance)
  • Top three lead actors

If we go ahead with this list of features, we’ll soon learn that the recommendations produced by the algorithm are uninteresting at best, and downright ridiculous at worst. Throughout their careers, both actors and directors worked on all sorts of movies: even if you liked Titanic , it doesn’t mean that you would enjoy The Wolf of Wall Street just because it stars Leonardo DiCaprio.

We can try to improve our model by turning to the power of deep learning. We could feed all the movie dialogue to a natural language model that can produce additional features capturing the topic and mood of the movie. This would likely work well: the love-themed dialogue and worried discussion of the captain would add more nuance to the recommendations, potentially producing other forbidden-love stories and disasters. However, your future recommendations would likely linger in the love-and-disaster category for a while, which you might find less than alluring.

Sadly, no matter how sophisticated we get with feature building, we’re left with a fundamental limitation: our recommendation engine is still based on similarity . This means that the model can suggest only items that match the past choices of the customer. This is bound to bore them to death: how many fantasy Angelina Jolie flicks or cheesy dialogues can you withstand before canceling your Netflix subscription? Thus, the Achilles’ heel of content-based recommendations is that their outputs can get bland and predictable. While this is not great for entertainment or fashion shopping, it’s a great fit for some other industries; say, drug recommendations to doctors.

Unsurprisingly, the best way to add a human touch to recommendations is to bring humans into the mix. By using a community of users who consume and rate the catalog, we can build more nuanced recommender systems. This is what we’re going to explore in the next section.

6.3 The wisdom of crowds: collaborative filtering

Just to recap, the type of recommender systems we have explored so far are commonly called content-based , as they use descriptive attributes of items the user has liked in the past to find and recommend similar ones in a catalog. However, this approach breaks down in two ways: it might be hard to express meaningful features, and the resulting recommendations can be predictable and boring.

Let’s stop for a second and think about how we tackle this problem in real life. People ask their friends for recommendations all the time: books, restaurants, movies, and so on. Over time, most of us have learned that some friends share our same tastes in music or food, and thus we eagerly listen to their recommendations because we anticipate that the similar preferences we had in the past will also extend to the future.

AI can scale this human tradition to much larger groups by taking advantage of a community of users who are all expressing preferences about items in a shared catalog. The kind of AI algorithms that build on this concept are called collaborative filtering algorithms : a social approach to recommender systems that does pretty much the same thing we do in real life with our friends.

If we can simply rely on the taste and preference histories of similar users, the role of a collaborative model is to match users who share the same tastes, so they can be mutually offered recommendations. In real life, we would start mutually sharing our preference histories and notice that they match up pretty nicely: “Oh, you also like Toy Story and Titanic ! Have you watched . . . ?” Collaborative models can do the same thing, and match users with similar tastes, albeit at a much larger scale.

Just as in content-based models, you’ll have to collect the preferences of the users. Let’s jump back to our example of a real estate platform and look at example data to understand the intuition behind collaborative models. Table 6.1 shows the preferences of three imaginary users of the platform: Alice, Bob, and Jane.

Table 6.1 Home preferences of Alice, Bob, and Jane

Home

Alice

Bob

Jane

Home A: One-bedroom in the Dogpatch neighborhood

?

Like

Not like

Home B: Three-bedroom in the Financial District

Like

Not like

Like

Home C: Studio in the Mission district

Not like

?

Not like

Home D: One-bedroom in the Mission district

Not like

Like

Not like

Home E: Three-bedroom in the Marina neighborhood

Like

?

Like

Each row of the table shows what Alice, Bob and Jane think of each of the five homes in our (admittedly small) catalog. Just as in content-based models, there are many ways to collect this preference data. The most obvious way is just to ask them; for example, with a five-star widget in the home description page. Many savvy organizations go beyond that and keep track of more detailed information, such as how much time users spend reading the description or scrolling through the images. If Bob has seen the listing of a house 10 times over the last two days and has spent five minutes on each session, it’s a pretty good indicator that he’s really interested.

The example dataset includes homes that Alice and Bob didn’t manage to look at (represented with a question mark in table 6.1). Alice never found Home A, and Bob didn’t see Homes C and E. As you’ve learned, the goal of the recommender system is to select which of these “missing” homes they’re more likely to be interested in, so that we can place them at the top of their search results and help them find new houses that they’ll potentially buy.

By looking at the table, we can already tell that Alice and Jane have similar interests: both liked Home E and didn’t like Homes C and D. On the other hand, Bob seems to have completely different taste compared to both Jane and Alice, as he didn’t agree on any of the houses that they’ve rated.

When producing recommendations, a collaborative filtering system would pair up Alice and Jane as two people with similar tastes, and won’t show Home A to Alice, as it didn’t fit Jane’s preferences. Regarding Bob, we know that he’s looking for something completely different from Alice and Jane, and therefore we’ll suggest that he check out Home C that wasn’t liked by either Alice and Jane, and won’t propose Home E that was liked by both.

In other words, the idea behind collaborative filtering is to ignore the similarity between items , and focus on the similarity between users , based on their interactions with items and their preferences. Notice that these models ignore users’ features, and rely solely on the ratings given to the homes browsed. This means that in our toy example, Alice and Jane aren’t paired based on common social features like gender or age, but are matched because of their taste in homes.

The real magic of collaborative systems is that they don’t need to know anything at all about the items or the users in the catalog: as long as there are users who rate things, we can figure out which users have similar tastes and recommend things to each other. The same model works with homes just as well as with movies, as long as the community casts enough ratings. This also makes collaborative filtering systems easier to integrate into existing platforms, because you don’t have to fish around for additional data about each item (say, the movie’s director), and can instead relax while users are busy letting you know what they think. Even better, since these recommendations are based on the real-world preferences of other humans, they can be novel and surprising, just like those you would get from a friend.

Collaborative filtering systems work well only when there are many more users than items in the catalog. Otherwise, it’s hard to select groups of users with common tastes: large catalogs with many items that have never been rated (or just once) don’t play well. For instance, let’s assume the hyperbolic case in which your real estate website has just Bob, Jane, and Alice as users, but 500 houses are listed for sale. It’s going to be highly unlikely that our lonely three users have seen the same houses, which means that we can’t use the clever trick of taste-matching that we’ve described.

An example of a company that is perfectly poised to exploit collaborative filtering is Netflix, which has tens of millions of users interacting with just hundreds of movies. At the other end of the spectrum, put yourself in Pinterest’s shoes: while it also has hundreds of millions of users, the content available for those users to interact with is basically the whole web: hundreds of billions of images. In Pinterest’s case, the best approach is using content-based recommendations. This is indeed what the company has done: in January 2017, the company announced that its new deep learning system providing content-based recommendations of pins increased users’ engagement by 30% overnight.

Another important point is that collaborative filtering works when you have enough user-item interactions to build your model. If you are a young company with a small history of interactions, you may have to ditch this approach altogether. In this case, all you can do is start off with a content-based system so you can start providing recommendations without a large database of existing ratings. As the user base grows, and the ratings start coming in, you can introduce a collaborative component to provide more unique recommendations. As you’ll see in the examples, best-in-class organizations actually mix the two approaches, providing a broad range of recommendations that can appeal to most of their user base.

6.4 Recommendations gone wrong

Many of the AI-based tasks that we have discussed so far have straightforward definitions of performance that we usually translate in terms of accuracy to compare models in different situations. For example, we have talked about the accuracy in predicting the sale price of a home, or in classifying dogs and cats in pictures. One could easily adopt the same approach for recommender systems--for example, by setting aside a test set of ratings and measuring the accuracy in predicting them. However, this simplistic approach ignores the complex reality of human tastes and preferences, and the effect they have on business performance.

The first problem involves user experience. Automated recommendations work well most of the time, but sometimes we’re left wondering how a ridiculous suggestion came about. Screen real estate is limited, and so is the attention of your users; you don’t want to waste it by showing useless products.

As you’ve seen, most state-of-the-art recommender systems operate on a variety of signals, based both on item features and community ratings. Because human tastes are fairly unpredictable, just one off-base suggestion might have a strong effect on customer trust, making it drastically less effective at driving business value. Depending on the specifics of the community, recommender systems might mistake a trivial feature (like the release year of a movie) for an important one, leading to surprising (and disappointing) recommendations. Also, a strategy can work well for a product category but be completely useless for others: it may be nice to see T-shirts similar to ones I bought, but don’t fill my screen with air conditioners similar to my recent purchase (figure 6.7 shows a report from the trenches).

Figure 6.7 An Amazon user isn’t particularly impressed with the recommendations he received.

While user experience is an important consideration, it would be foolish to ignore the enormous impact that recommender systems have in shaping public opinion and discourse. Much, if not all, of the media that we consume (whether news, videos, or social media posts) has at some point been filtered by a recommender system. This means that this technology is uniquely positioned to influence our collective world view. We’ll keep the conversation strictly about the business implications of this technology for now, and explore the broader impacts on society in chapter 10.

6.4.1 The recommender system dream

Let’s end this section with some food for thought. Let’s assume that as of now, 20% of Amazon recommendations convert to a purchase. Suppose that one of Amazon’s data scientists finds a clever way to improve its recommender algorithm so much that now 100% of the recommendations turn into purchases. Why should Amazon wait for you to go online and shop when it already knows exactly what you want to buy? The easiest solution would be to directly ship stuff to your home, with a nice postcard saying “You’re welcome.”

Of course, a 100% conversion rate seems a bit too much. Yet, the concept is still valid: theoretically, a threshold exists at which product recommendations become so effective that it will take less effort for users to return what they don’t like or don’t need rather than proactively shop. In such a scenario, Amazon could simply ship us 10 products, and we would ship back what the model got wrong.

This may sound futuristic, but fashion companies are already experimenting with this model. Think about it: improving a single piece of technology can lead to total disruption of business models and customer purchase habits. This is the power of recommender systems and many other AI technologies.

6.5 Case study: Netflix saves $1 billion a year

When Netflix was founded in 1997, it started operating a subscription model for shipping physical DVDs to households, who then returned the media through the mail after watching the movie. Over the following decade, large-scale availability of fast internet access and personal devices allowed the company to transition to digital distribution of content. With a single monthly subscription fee, customers can watch as many movies and TV series as they like, using smartphones, tablets, laptops, or smart TVs.

Convenience and lower prices compared to physical media or cable TV made Netflix one of the fastest-growing companies in recent years. From 2005 to 2018, revenues grew from $682 million to almost $16 billion, and the share price climbed by almost 150 times, going from $2 per share at the beginning of 2005 to almost $300 at the end of 2018. In 2018, the company had 150 million subscribers from more than 190 countries.

The main driver of such explosive growth has certainly been its novel concept of internet-based TV with seemingly unlimited choices. However, new opportunities brought with them new, unprecedented problems. Humans are generally bad at making decisions when confronted with too many choices, and picking a TV show to watch is no exception. Consumer research reports that the typical Netflix user will tune out after only 60 seconds spent choosing a new title to watch.

To face this new challenge, the company heavily invested in recommender systems to help users find compelling content within the extensive Netflix catalog.

6.5.1 Netflix’s recommender system

While the concept of internet TV popularized by Netflix brought along new challenges, it also brought new opportunities to collect data. Thanks to the on-demand infrastructure, Netflix has access to vast amounts of data about what each user does: what they watch, where and when they log in, and so on.

The most important way in which Netflix’s recommendation system interacts with customers is the home page they visit right after logging in (see figure 6.8). While the interface changes depending on the type of device, the starting point is a collection of about 40 rows representing different movie categories, and up to 75 items per row. To decide how to populate this screen real estate, Netflix feeds all the data it collects into a group of algorithms, each specialized in a different recommendation task. Together, these algorithms make up the Netflix recommender system.

Figure 6.8 The composition of the Netflix home page

At the core of the system is an algorithm called the Personalized Video Ranker ( PVR ). The PVR, based on the large amount of data that Netflix collects on users’ viewing habits, is used to estimate the likelihood that a user will watch movies in all the different movie categories (for example, Thriller or Drama). The Netflix home page also has other special rows that have their own specific recommender algorithm. One example is the Top Picks row, which suggests the best content selected across all categories. This algorithm is called the Top N Video Ranker . While the PVR is used to find the best movies within a specific subset of the catalog, the Top N ranker looks through the entire catalog.

Another special row is for Trending Now content. Netflix combines personalized recommendations with signals coming from temporal trends. Specifically, the company has found two types of factors that affect users’ behavior. The first factor is related to recurring trends, like Christmas or Valentine’s Day. The other kind of trend is related to shorter-term events, like elections or natural disasters that generate interest. Even in this case, Netflix doesn’t show the same row to every user, but mixes signals from the overall trend with personalized items.

Another distinctive row is Continue Watching. While all other rows focus on content that has never watched by the user, this row is focused on content that the user has begun consuming but never finished. The Continue Watching Ranker selects the titles with the highest probability of being finished, using these application-specific features:

  • Time elapsed since the last view
  • Device
  • Point of abandonment (mid-program versus beginning or end)
  • Other titles watched since the last view

All the algorithms we have seen so far rely on collaborative filtering techniques to pick their suggestions--inferring the taste of a user by looking at the choices of other users who have similar viewing patterns (and therefore have similar taste). Because You Watched rows are an exception, as they offer content similar to what the user has enjoyed before. This task is performed by the Video-Video Similarity algorithm , a content-based recommender system. This algorithm doesn’t consider the user’s taste: its output is a ranking of similarity between a user’s pick and the rest of the Netflix catalog. However, the choice of which movie to use the Video-Video Similarity algorithm on is personalized based on the user’s taste.

On top of this series of algorithms, in 2015 Netflix introduced a Page Generation algorithm to decide which row to show a specific user in different situations. Table 6.2 summarizes the Netflix algorithms that make up its recommender system.

Table 6.2 Algorithms composing the Netflix recommender system

Algorithm

Use

Criterion

Personalized Video Ranker

Given a movie category, select the movies that a user is most likely to watch among that category.

Collaborative filtering

Top N Video Ranker

Among all the movies in the catalog, choose the best ones for a specific user.

Collaborative filtering

Trending Now

Based on various temporal trends, pick the movies inline with the trend and that the user is most likely to watch.

Collaborative filtering

Continue Watching

Given a collection of movies that a user has started but didn’t finish, pick the ones that they are most likely to resume.

Collaborative filtering

Video-Video
Similarity

Given a movie, find the movies that are most similar.

Content-based
(video similarity)

Page
Generation

Select which rows to show for a specific user and in which order.

Collaborative filtering

6.5.2 Recommendations and user experience

The Netflix recommendation system is the primary way customers interact with the service, and as such has shifted to accommodate for the evolution of the platform itself. Throughout the years that Netflix operated by shipping physical DVDs to customers, the watch/rate/watch again feedback cycle that drove customer engagement was much slower than today. Customers would receive new content only once a week, and the weekly shipment had better include something to entertain them on Friday night.

With online streaming, the rules of the game changed: because users can begin and stop consuming any content at any time, recommendations can be more fluid. At the same time, extensive user research has uncovered that the most important driver of retention in the streaming era is the amount of time that customers spend watching content. The goals of Netflix’s recommendation system have evolved accordingly. Critics of this (gradual) change suggest that it’s overly skewed toward “good-enough” content with the sole purpose of keeping subscribers hooked for multiple hours a day. Riskier suggestions, with a correspondingly higher risk of disappointment, are instead underweighted.

Any algorithm used by hundreds of millions of people worldwide can’t be exempt from criticism, and the Netflix recommender system is no exception. While egregiously wrong recommendations often make the rounds on social media, subtler mispredictions also reveal unsolved problems in the domain. Users frequently complain about the model being unable to keep track of multiple independent “moods” and media consumption tendencies. The typical example is a Friday night trash-TV binge affecting recommendation for the following two months.

6.5.3 The business value of recommendations

In 2015, Netflix Chief Product Officer Neil Hunt and Vice President of Product Innovation Carlos Gomez-Uribe wrote “The Netflix Recommender System: Algorithms, Business Value, and Innovation” ( https://dl.acm.org/citation.cfm?id=2843948 ). They reported that the Netflix recommender system was responsible for 80% of the hours streamed on the platform, with the remaining 20% coming from the search functionality. However, because users often search for titles that are not in the catalog, a share of that 20% becomes a recommendation problem.

When evaluating the effectiveness of algorithms, Netflix relies on specific metrics. Two of the most important ones are the effective catalog size (ECS) and the take-rate.

The ECS is a metric that describes how spread out viewing is across the items in the catalog. The metric has a mathematical formulation, but the main idea behind it is rather simple: if most viewing comes from a single video, it will be close to 1. If all videos generate the same amount of viewing, the ECS is close to the number of videos in the catalog. If the catalog has some movies that are rarely seen and some that are more popular, ECS is somewhere in the middle. The higher this value, the more people are watching all the movies in the catalog, which results in higher customer satisfaction and increased return on the money that Netflix spent to either acquire the rights to distribute the media or to produce it. The goal of Netflix is therefore to maximize this value. To test the efficacy of the recommender system, Netflix tried serving content to users following these two approaches:

  • Using a popularity metric --Starting with the most popular movie in the catalog and gradually adding other popular movies (black line)
  • Using the personalized system --Starting with the movie ranking first with the PVR score and adding other movies following the order dictated by the PVR algorithm

Using the personalized system resulted in an increase of the ECS of 400%.

The other key engagement metric used by Netflix is the take-rate , defined as the fraction of recommendations offered that results in a play. A value close to 1 means that recommendation will be picked 100% of the time, while a value close to 0 means that the recommendation won’t be picked. Netflix tried proposing users a series of recommendations following two approaches:

  • Showing the first N most popular movies
  • Showing the N movies that, according to the recommender system, were the best pick for the user

Comparing the two approaches, Netflix noticed that the personalized system delivered substantial improvements over the “most popular” approach, with increases of take-rate of almost four times for the “best fit” movie versus the “most popular” one, and decreasing performance as we move to movies ranked as less fitting.

An improvement of recommendations and of these metrics is tightly related to improved business performance. For a purely subscription-based service like Netflix, the three key metrics to watch are the acquisition rate of new members, the member cancellation rate (churn rate), and the rate at which former customers come back. Hunt and Uribe report that Netflix’s efforts on personalization and recommendation have reduced churn rates by several percentage points. Overall, they estimate that the combined effort of all of their recommender algorithms had been saving Netflix more than $1 billion per year, at a time when overall revenues stood below $7 billion.

6.5.4 Case questions

  • Is a larger product catalog unequivocally better for consumers? And for the business?
  • How did the recommendation system evolve with the move from physical media, then online streaming, and finally original productions?
  • Besides recommendations, what are other important ways Netflix can use its data advantage when competing with incumbent distribution platforms (for example, TV, theaters, physical media)?

6.5.5 Case discussion

Netflix is probably one of the companies that has shaken the entertainment industry the most. Its success was one of the main reasons behind the failure of the DVD rental giant Blockbuster, and its move into original content poses a threat to content producers like TV networks and film studios.

Starting off as an internet-based DVD rental service, the company really started taking off when it pioneered the concept of internet TV. The main alternatives to internet TV at the time were linear broadcast and cable systems. While those two have a predetermined schedule, internet TV puts users in the driving seat, allowing them to pick whatever content they want, whenever they want.

The power of choice seems to be a sure selling point for any business. However, as Hunt and Gomez-Uribe point out, people turn out not to be particularly good at choosing, especially among a vast pool of options. What’s the point of having a large collection of movies if users can’t find anything they like? This is true not only for the entertainment industry. Think of any e-commerce: what’s the value of the catalog of a large brand if users are not buying anything?

The main problem here is that while the offering of a company can potentially be limitless, people have only so much time to browse it. In the case of Netflix, the company reports that most users will give up on their movie search if they don’t find anything within 60 to 90 seconds. This means that without helping users in any way, adding elements to a catalog may make it even harder for users to find something they like, and turn out to be counterproductive from a user experience and business standpoint. This is the value of recommender systems: making sure that users are able to find value in your offering with a data-driven approach to suggestions.

One of the key assets of the internet that has been used by Netflix is the opportunity to collect data and monitor user preferences. Think for a second about the difference between the DVD rental version of Netflix and the internet TV one. In the first case, after the DVD was shipped, the company had just one way of knowing whether users liked it: the score that they gave it (if they even bothered to leave one). With the streaming service, Netflix can record much richer information, such as the following:

  • When the user watched the movie (time, day)
  • On which device
  • Did the user watch it all in a single session or pause it several times?

Moreover, Netflix can test different versions of recommendation algorithms and measure their performance according to specific metrics. The abundance of data is a key asset that allowed Netflix not only to drive user engagement, but also to measure its performance and tune its product accordingly.

Summary

  • Recommender systems offer personalized recommendations to customers, allowing them to navigate a larger catalog of products or services, and increasing engagement.
  • Content-based recommender systems use the past history of users (the pages they visited, the products they bought, and so forth) to recommend similar items from the catalog.
  • Collaborative filtering is another approach that finds users within the community that have similar tastes, and shares recommendations between them.
  • Recommendation systems are the cornerstone of modern e-commerce and media distribution platforms, as you saw in the Netflix case study.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.59.34.87