Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 2

Predictive Analytics in the Wild

IN THIS CHAPTER

Identifying some common use cases

Implementing recommender systems

Improving targeted marketing

Optimizing customer experience by personalization

Predictive analytics sounds like a fancy name, but we use much the same process naturally in our daily decision-making. Sometimes it happens so fast that most of us don't even recognize when we’re doing it. We call that process “intuition” or “gut instinct”: In essence, it’s quickly analyzing a situation to predict an outcome — and then making a decision.

When a new problem calls for decision-making, natural gut instinct works most like predictive analytics when you’ve already had some experience in solving a similar problem. Everyone relies on individual experience, and so solves the problem or handles the situation with different degrees of success.

You’d expect the person with the most experience to make the best decisions, on average, over the long run. In fact, that is the most likely outcome for simple problems with relatively few influencing factors. For more complex problems, complex external factors influence the final result.

A hypothetical example is getting to work on time on Friday morning: You wake up in the morning 15 minutes later than you normally do. You predict — using data gathered from experience — that traffic is lighter on Friday morning than during the rest of the week. You know some general factors that influence traffic congestion:

How many commuters are going to work at the same time
Whether popular events (such as baseball games) are scheduled in the area you’re driving through
Emerging events like car accidents and bad weather

Of course, you may have considered the unusual events (outliers) but disregarded them as part of your normal decision-making. Over the long run, you’ll make a better decision about local traffic conditions than a person who just moved to the area. The net effect of that better decision mounts up: Congratulations — you’ve gained an extra hour of sleep every month.

But such competitive advantages don’t last forever. As other commuters realize this pattern, they’ll begin to take advantage of it as well — and also sleep in for an extra 15 minutes. Your returns from analyzing the Friday traffic eventually start to diminish if you don't continually optimize your get-to-work-on-Fridays model.

A model built with predictive analytics could handle far more than the few variables (influencing factors) that a human can process. A predictive model built with decision trees can find patterns with as many independent variables as can access, and may lead to a discovery that a certain variable is more influential than you initially thought. If you're a robot and can follow the rules of the decision tree, you can probably shave more time from the commute.

More complex problems lead, of course, to more complex analysis. Many factors contribute to the final decision, besides (and beyond) what the specific, immediate problem is asking for. A good example is predicting whether a stock will go up or down. At the core of the problem is a simple question: Will the stock go up or down? A simple answer is hard to get because the stock market is so fluid and dynamic. The influencers that affect a particular stock price are potentially unlimited in number.

Some influencers are logical; some are illogical. Some can't be predicted with any accuracy. Regardless, Nasim Taleb operates a hedge fund that bets on black swans — events that are very unlikely to happen, but when they do happen, the rewards can be tremendous. In his book Black Swan, he says that he only has to be right once in a decade. For the most of us, that investment strategy probably wouldn’t work; the amount of capital required to start would have to be substantially more than most of us make — because it would diminish while waiting for the major event to happen.

After the market closes, news reporters and analysts will try to explain the move with one reason or another. Was it a macro event (say, the whole stock market going up or down) or a smaller, company-specific event (say, the company released some bad news or someone tweeted negatively about its products)? Either way, be careful not to read too much into such factors; they can also be used to explain when the exact opposite result happened. Building an accurate model to predict a stock movement is still very challenging.

Predicting the correct direction of a stock with consistency has a rigid outcome: Either you make money or lose money. But the market isn't rigid: What holds true one day may not hold true the very next day. Fortunately, most such predictive modeling tasks aren't quite as complicated as predicting a stock's move upward or downward on a given trading day. Predictive analytics are more commonly used to find insights into nearly everything from marketing to law enforcement:

People’s buying patterns
Pricing of goods and services
Large-scale future events such as weather patterns
Unusual and suspicious activities

These are just a few (highly publicized) examples of predictive analytics. The potential applications are endless.

Online Marketing and Retail

Companies that have successfully used predictive analytics to improve their sales and marketing include Target Corporation, Amazon, and Netflix. Recent reports by Gartner, IBM, Sloan, and Accenture all suggest that many executives use data and predictive analytics to drive sales.

Recommender systems

You’ve probably already encountered one of the major outgrowths of predictive analytics: recommender systems. These systems try to predict your interests (for example, what you want to buy or watch) and give you recommendations. They do this by matching your preferences with items or other like-minded people, using statistics and machine learning algorithms.

If you're an online cruiser, you often see prompts like these on web pages:

People You May Know …
People Who Viewed This Item Also Viewed …
People Who Viewed This Item Bought …
Recommended Based on Your Browsing History …
Customers Who Bought This Item Also Bought …

These are examples of recommendation systems that were made mainstream by companies like Amazon, Netflix, and LinkedIn.

Obviously, these systems weren't created only for the user’s convenience — although that reason is definitely one part of the picture. No, recommender systems were created to maximize company profits. They attempt to personalize shopping on the Internet, with an algorithm serving as the salesperson. They were designed to sell, up-sell, cross-sell, keep you engaged, and keep you coming back. The goal is to turn each personalized shopper into a repeat customer. (The sidebar “The personal touch” explores one of the successful techniques.)

THE PERSONAL TOUCH

One of the authors used to work for a speech-recognition company that made order-handling systems for the top Wall Street firms. Every day the company would have to analyze a huge number of trade messages for accuracy and speed. The company came up with a system that was extremely accurate and fast. Using millions of trade messages, they constantly trained and fine-tuned the speech engine to adapt to each user’s unique speech profile. The key concept was the use of text analytics and machine learning to predict what the user (in this case, a trader) was going to do (trade) based on what the user was saying:

How the grammar was formed
Quantifiable attributes such as the size of the trade
Whether the trader was buying or selling

The predictive model, created with an ensemble of machine-learning algorithms, would spot patterns in the user’s orders — and assign weights to each word that could potentially come next. Then, after the speech engine parsed each word, the system would start predicting which word would come next. The model worked much like an auto-complete feature, using a recommender system.

The company also made noise-cancelling microphones and headsets to compensate for high-noise environments such as trade shows where the products were demonstrated. We would consistently be a convention favorite; our booths would be packed with attendees waiting to participate in our demos. We started selling the products directly at the booth, and we’d have lines of buyers throughout the day.

We had a lot of fun interacting with customers instead of the normal daily routine in front of the computer, programming or analyzing data. We cross-sold accessories and up-sold more expensive microphones and headsets. But the demos and direct selling at the trade shows taught us important lessons: We were so successful not only because we gave great product demos, but also because we were recommending products of ours that would best suit the customers’ needs — based on the information they gave us. We weren't only presenters but also salespeople; we were the “live-action” recommender system.

Personalized shopping on the Internet

A software recommender system is like an online salesperson who tries to replicate the personal process we experienced at the trade shows. What’s different about a recommender system is that it’s data-driven. It makes recommendations in volume, with some subtlety (even stealth), with a dash of unconventional wisdom and without a feeling of bias. When a customer buys a product — or shows interest in a product (say, by viewing it), the system recommends a product or service that it considers highly relevant to that customer — automatically. The goal is to generate more sales — sales that wouldn’t happen if the recommendation(s) weren’t given.

Amazon is a very successful example of implementing a recommender system; their success story highlights its importance. When you browse for an item on the Amazon website, you always find some variation on the theme of related items — “Customers who viewed this also viewed” or “Customers who bought items in your recent history also bought.”

This highly effective technique is considered one of Amazon’s “killer” features — and a big reason for their huge success as the dominant online marketplace. Amazon brilliantly adapted a successful offline technique practiced by salespeople — and perfected it for the online world.

Amazon popularized recommender systems for e-commerce. Their successful example has made recommender systems so popular and important in e-commerce that other companies are following suit.

Implementing a Recommender System

There are three main approaches to creating a recommender system: collaborative filtering, content-based filtering, and a combination of both called the hybrid approach. The collaborative filtering approach uses the collective actions of the user to achieve the goal of predicting the user’s future behavior. The content-based approach attempts to match a particular user’s preferences to an item without regard to other users’ opinions. There are challenges to both the collaborative and content-based filtering approaches, which the hybrid approach attempts to solve.

Collaborative filtering

Collaborative filtering focuses on user and item characteristics based on the actions of the community. It can group users with similar interests or tastes, using classification algorithms such as k-nearest neighbor — k-NN for short (see Chapter 6 for more on k-NN). It can compute the similarity between items or users, using similarity measures such as cosine similarity (discussed in the next section).

The general concept is to find groups of people who like the same things: If person A likes X, then person B will also like X. For example: If Tiffany likes watching Frozen, then her neighbor (person with similar taste) Victoria will also like watching Frozen.

Collaborative filtering algorithms generally require

A community of users to generate data
Creating a database of interests for items by users
Formulas that can compute the similarity between items or users
Algorithms that can match users with similar interests

Collaborative filtering uses two approaches: item-based and user-based.

Item-based collaborative filtering

One of Amazon’s recommender systems uses item-based collaborative filtering — doling out a huge inventory of products from the company database when a user views a single item on the website.

You know you’re looking at an item-based collaborative filtering system (or, often, a content-based system) if it shows you recommendations at your very first item view, even if you haven’t created a profile.

Looks like magic, but it’s not. Although your profile hasn’t been created yet (because you aren’t logged in or you don't have any previous browser history on that site) the system takes what amounts to a guess: It bases its recommendation on the item itself and what other customers viewed or bought after (or before) they purchased that item. So you’ll see some onscreen message like

Customers who bought this item also bought …
Customers who bought items in your recent history also bought …
Which other items do customers buy after viewing this item?

In essence, the recommendation is based on how similar the currently viewed item is to other items, based on the actions of the community of users.

Table 2-1 shows a sample matrix of customers and the items they purchased. It will be used as an example of item-based collaborative filtering.

TABLE 2-1 Item-Based Collaborative Filtering

Customer	Item 1	Item 2	Item 3	Item 4	Item 5	Item 6
A	X	X	X
B	X	X
C			X		X
D			X	X	X
E		X	X
F	X	X		X	X
G	X		X
H	X
I						X

Table 2-2 shows a table of item similarity calculated using the cosine similarity formula. The formula for cosine similarity is (A · B) / (||A|| ||B||), where A and B are items to compare. To read the table and find out how similar a pair of items are, just locate the cell where the two items intersect. The number will be between 0 and 1. A value of 1 means the items are perfectly similar; 0 means they aren't similar.

TABLE 2-2 Item Similarity

Item 6	0	0	0	0	0
Item 5	0.26	0.29	0.52	0.82		0
Item 4	0.32	0.35	0.32		0.82	0
Item 3	0.40	0.45		0.32	0.52	0
Item 2	0.67		0.45	0.35	0.29	0
Item 1		0.67	0.40	0.32	0.26	0
	Item 1	Item 2	Item 3	Item 4	Item 5	Item 6

The system can provide a list of recommendations that are above a certain similarity value or can recommend the top n number of items. In this scenario, we can say that any value greater than or equal to 0.40 is similar; the system will recommend those items.

For example, the similarity between item 1 and item 2 is 0.67. The similarity between item 2 and item 1 is the same. Thus the table is a mirror image across the diagonal from lower-left to upper-right. You can also see that item 6 isn't similar to any other items because it has a value of 0.

This implementation of an item-based recommendation system is simplified to illustrate how it works. For simplicity, we only use one criterion to determine item similarity: whether the user purchased the item. More complex systems could go into greater detail by

Using profiles created by users that represent their tastes
Factoring in how much a user likes (or highly rates) an item
Weighing how many items the user purchased that are similar to the potential recommended item(s)
Making assumptions about whether a user likes an item on the basis of whether the user has simply viewed the item, even though no purchase was made

Here are two common ways you could use this recommender system:

Offline via an e-mail marketing campaign or when the user is on the website while logged in.

The system could send marketing ads or make these recommendations on the website:
- Item 3 to Customer B
  
  Recommended because Customer B purchased Items 1 and 2, and both items are similar to Item 3.
- Item 4, then Item 2, to Customer C
  
  Recommended because Customer C purchased Items 3 and 5. Item 5 is similar to Item 4 (similarity value: 0.82). Item 2 is similar to Item 3 (similarity value: 0.45).
- Item 2 to Customer D
  
  Recommended because Customer D purchased Items 3, 4, and 5. Item 3 is similar to Item 2.
- Item 1 to Customer E
  
  Recommended because Customer E purchased Items 2 and 3, both of which are similar to Item 1.
- Item 3 to Customer F
  
  Recommended because Customer F purchased Items 1, 2, 4, and 5. Items 1, 2, and 5 are similar to Item 3.
- Item 2 to Customer G
  
  Recommended because Customer G purchased Items 1 and 3. They are both similar to Item 2.
- Item 2, then Item 3, to Customer H
  
  Recommended because Customer H purchased Item 1. Item 1 is similar to Items 2 and 3.
- Undetermined item to Customer A
  
  Ideally, you should have a lot more items and users. And there should be some items that a customer has purchased that are similar to other items that he or she hasn't yet purchased.
- Undetermined item to Customer I
  
  In this case, the data is insufficient to serve as the basis of a recommendation. This is an example of the cold-start problem (more about this problem later in this chapter).
Online via a page view while the user isn't logged in.

Order matters — the recommendations must start from highest similarity, unless there are other overriding factors, such as inventory or profitability.

The system would recommend similar items when the user is viewing one of its items:
- If Item 1 is being viewed, the system recommends Items 2 and 3.
- If Item 2 is being viewed, the system recommends Items 1 and 3.
- If Item 3 is being viewed, the system recommends Items 5, 2, and 1.
- If Item 4 is being viewed, the system recommends Items 5.
- If Item 5 is being viewed, the system recommends Items 4 and 3.
- If Item 6 is being viewed, there is insufficient data to make a recommendation. This is an example of the cold-start problem (as described later in this chapter).

Whether the user is logged in will affect the recommendation that the system makes. After all, you want to avoid recommending an item that the customer has already purchased. When the user is logged in, the system will use the profile that the user created (or it created for the user). Within that profile, the system will have a record of previous purchases; it removes already-purchased items from the recommendation list.

In the example, Customer H only purchased a single item (Item 1). However, the item she purchased was similar to other items that show up in data from other customer purchases.

A customer viewing Item 6 is an example of the cold-start problem: Item 6 hasn’t been purchased by enough people yet, perhaps because it’s new or it isn't very popular. Either way, the system doesn’t have enough to go on. Collaborative filtering takes a little training with data before it can be effective. In that case, the system can take the approach of making the most purchased (or profitable, or liked) item the default recommendation.

Initially, of course, these data tables will be sparse until enough data points come in. As more data points are included in the collaborative filtering algorithm, the recommender system becomes more accurate.

User-based collaborative filtering

With a user-based approach to collaborative filtering, the system can calculate similarity between pairs of users by using the cosine similarity formula, a technique much like the item-based approach. Usually such calculations take longer to do, and may need to be computed more often, than those used in the item-based approach. That’s because

You’d have a lot more users than items (ideally anyway).
You’d expect items to change less frequently than users.
With more users and less change in the items offered, you can use many more attributes than just purchase history when calculating user similarity.

A user-based system can also use machine-learning algorithms to group all users who have shown that they have the same tastes. The system builds neighborhoods of users who have similar profiles, purchase patterns, or rating patterns. When a person in a neighborhood buys and likes an item, the recommender system can recommend that item to everyone else in the neighborhood.

As with item-based collaborative filtering, the user-based approach requires sufficient data on each user to be effective. Before the system can make recommendations, it must create a user profile — so it also requires that the user create an account and be logged in (or store session information in the browser via cookies) while viewing a website. Initially the system can ask the user explicitly to create a profile, flesh out the profile by asking questions, and then optimize its suggestions after the user’s purchase data has accumulated.

Netflix is an example of quickly building a profile for each customer. Here’s the general procedure:

Netflix invites its customers to set up queues of the movies they’d like to watch.
The chosen movies are analyzed to learn about the customer’s tastes in movies.
The predictive model recommends more movies for the customer to watch, based on the movies already in the queue.

Netflix has discovered that the more movies you have in your queue, the more likely you are to stay a customer.

Table 2-3 — a sample matrix of customers and their purchased items — is an example of user-based collaborative filtering. For simplicity, we will use a rule that a user neighborhood is created from users who bought at least two things in common.

TABLE 2-3 User-Based Collaborative Filtering

Customer	Item 1	Item 2	Item 3	Item 4	Item 5	Item 6
A - N1	X	X	X
B - N1	X	X
C - N2			X		X
D - N2			X	X	X
E - N1		X	X
F - N1	X	X		X	X
G - N1	X		X
H - N3	X
I - N3						X

There are three user neighborhoods formed: N1, N2, and N3. Every user in neighborhoods N1 and N2 has purchased at least 2 items in common with someone else in the same neighborhood. N3 are users that haven't yet met the criteria and will not receive recommendations until they purchase other items to meet the criteria.

Here’s an example of how you could use this recommender system:

Offline via an e-mail marketing campaign or when the user is on the website while logged in. The system could send marketing ads or make recommendations on the website as follows:

Item 3 to Customer B
Item 4 to Customer C
Item 1 to Customer E
Item 3 to Customer F
Item 2 to Customer G
Undetermined item to Customers A and D

Ideally you should have a lot more items than six. And there should always be some items in a customer’s neighborhood that the customer hasn’t purchased yet.
Undetermined item to Customers H and I

In this case, there is insufficient data to serve as the basis of a recommendation.

Initially, this system can recommend all items that other members of the group already have that each individual member doesn’t have. In this simple example, the recommendations are similar to those produced by the item-based collaborative filter approach. You should expect the recommendations to diverge between approaches as more users, items, and data points come in.

One very important difference is that since each customer belongs to a group, any future purchases that a member makes will be recommended to the other members of the group until the filter is retrained. So customers A and D will start getting recommendations very quickly since they already belong to a neighborhood and surely the other neighbors will buy something soon.

For example: If Customer B buys Item 6, then the recommender system will recommend item 6 to everyone in N1 (Customer A, B, E, F and G).

Customer F can potentially belong to either neighborhood N1 or N2 depending how the collaborative filtering algorithm is implemented. We have chosen to group Customer F with neighborhood N1 because the user is most similar to Customer B (which belongs to N1) using cosine similarity measure. Either way, Item 3 will be recommended under this scenario.

Customers H and I provide examples of the cold-start problem: The customer just hasn’t generated enough data to be grouped into a user neighborhood. In the absence of a user profile, a new customer with very little or no purchase history — or who only buys obscure items — will always pose the cold-start problem to the system, regardless of which collaborative filtering approach is in use.

Customer I illustrates an aspect of the cold-start problem that’s unique to the user-based approach. The item-based approach would start finding other items similar to the item that the customer bought; then, when other users start purchasing Item 6, the system can start making recommendations. No further purchases need be made by the user; the item-based approach can start recommending. In a user-based system, however, Customer I has to make additional purchases in order to belong to a neighborhood of users; the system can’t make any recommendations yet.

Okay, there’s an assumption at work in these simple examples — namely, that the customer not only purchased the item but liked it enough to make similar purchases. What if the customer didn't like the item? The system needs, at very least, to produce better precision in its recommendations. You can add a criterion to the recommender system to group people who gave similar ratings to the items they purchased. When the system finds customers who like and dislike the same items, then the assumption of high precision is valid. In other words, there is a high probability that the customers share the same tastes.

User-based versus item-based collaborative filtering

In general, item-based collaborative filtering for large-scale e-commerce systems is faster and more scalable. Finding similar users, however — especially users with lots of features — takes longer than finding similar items.

Building user neighborhoods may be too time-consuming for large datasets, and may not be appropriate for large e-commerce sites that depend on real-time recommendations.

The user-based system also suffers more acutely than the item-based system from two other challenges: the cold-start problem (mentioned earlier) and sparsity — essentially the fact that even the most prolific customers can’t be expected to purchase even a fraction of a percent of the whole product catalog. So building user neighborhoods based on limited purchase histories may not produce very accurate recommendations.

The user-based approach also comes with a key restriction: Each user has to be logged in (or have a profile in his or her browser history) for user-based filtering to work. Before the system can make a recommendation, it has to know something about the prospective customer.

Content-based filtering

Content-based recommender systems mostly match features (tagged keywords) among similar items and the user’s profile to make recommendations. When a user purchases an item that has tagged features, items with features that match those of the original item will be recommended. The more features match, the higher the probability the user will like the recommendation. This degree of probability is called precision.

Content-based filtering uses various techniques to match the attributes of a particular item with a user's profile. These techniques include machine-learning algorithms to determine a user's profile without having to ask. This technique is called implicit data gathering. A more direct approach is to use explicit data gathering: Use a questionnaire to ask the users what features they like in an item. An example of that would be asking what genre of movie or which actresses they like when they first sign up for a movie subscription.

Tagging to describe items

In general, the company doing the selling (or the manufacturer) usually tags its items with keywords. In the Amazon website, however, it’s fairly typical never to see the tags for any items purchased or viewed — and not even to be asked to tag an item. Customers can review the items they’ve purchased, but that isn't the same as tagging.

Tagging items can pose a scale challenge for a store like Amazon that has so many items. Additionally, some attributes can be subjective and may be incorrectly tagged, depending on who tags it. One solution that solves the scaling issue is to allow customers or the general public to tag the items. (Photos are a good example of user-based tagging.) To keep tags manageable and accurate, an acceptable set of tags may be provided by the website. Only when an appropriate number of users agree (that is, use the same tag to describe an item), will the agreed-upon tag be used to describe the item.

User-based tagging, however, turns up other problems for a content-based filtering system (and collaborative filtering):

Credibility: Not all customers tell the truth (especially online), and users who have only a small rating history can skew the data. In addition, some vendors may give (or encourage others to give) positive ratings to their own products while giving negative ratings to their competitors’ products.
Sparsity: Not all items will be rated or will have enough ratings to produce useful data.
Inconsistency: Not all users use the same keywords to tag an item, even though the meaning may be the same. Additionally, some attributes can be subjective. For example, one viewer of a movie may consider it short while another says it’s too long.

Attributes need clear definitions. An attribute with too few boundaries is hard to evaluate; imposing too many rules on an attribute may be asking users to do too much work, which will discourage them from tagging items.

Attributes with vague or undefined boundaries can result from offering the user free-form input fields on e-commerce shopping forms. When you restrict the user to selecting tag values from a set range of possible inputs, the resulting data is easier to analyze; you won’t have to cleanse “dirty data” of irrelevant content. Of course, many existing systems were built without analytics in mind, so cleansing the data is a large part of data preparation. You can save some of that cleansing time by building predictive analytic solutions into your system. One way to do so is to carefully consider and define the allowable inputs when you’re (re)building your e-commerce site.

Tagging most items in a product catalog can help solve the cold-start problem that plagues collaborative filtering. For a while, however, the precision of the system’s recommendations will be low until it creates or obtains a user profile.

Table 2-4, a sample matrix of customers and their purchased items, shows an example of content-based filtering.

TABLE 2-4 Content-Based Filtering

Items	Feature 1	Feature 2	Feature 3	Feature 4	Feature 5
Item 1	X	X
Item 2		X	X
Item 3	X		X	X
Item 4		X		X	X
Item 5	X		X		X

Here, when a user likes Feature 2 — and that’s recorded in her profile — the system will recommend all items that have Feature 2 in them: Item 1, Item 2, and Item 4.

This approach works even when the user has never purchased or reviewed an item. The system will just look in the product database for any item that has been tagged with Feature 2. If (for example) a user who’s looking for movies with Audrey Hepburn — and that preference shows up in the user’s profile — the system will recommend all the movies that feature Audrey Hepburn to this user.

This example, however, quickly exposes a limitation of the content-based filtering technique: The user probably already knows about all the movies that Audrey Hepburn has been in, or can easily find out — so, from that user’s point of view, the system hasn’t recommended anything new or of value. In this case, the system should recommend something relevant that the user wouldn’t have thought of. In the absence of telepathy, what’s needed is greater precision.

Improving precision with constant feedback

One way to improve the precision of the system’s recommendations is to ask customers for feedback whenever possible. Collecting customer feedback can be done in many different ways, through multiple channels. Some companies ask the customer to rate an item or service after purchase. Other systems provide social-media-style links so customers can “like” or “dislike” a product. Constant interaction between customers and companies may make the customer feel more fully engaged.

For example, Netflix asks its customers to rate movies they’ve watched so that the company can use that data to train their systems (models) to make more precise movie recommendations. Amazon sends you an e-mail asking you to rate items you’ve purchased after you’ve had enough time to evaluate the product and the buying experience.

Such data points constantly improve the recommendation models — and they’re possible because Amazon and Netflix make a point of convincing customers that their feedback benefits all customers. Such continuous interaction with customers to collect feedback not only increases the precision of the system’s recommendations, it can also make customers happier, reduce churn (subscription cancellation), and generate repeat business. This is a type of cycle that results in more product sales that generate more revenue for the company.

Measuring the effectiveness of system recommendations

The success of a system’s recommendations depends on how well it meets two criteria: precision (think of it as a set of perfect matches — usually a small set) and recall (think of it as a set of possible matches — usually a larger set). Here’s a closer look:

Precision measures how accurate the system’s recommendation was. Precision is difficult to measure because it can be subjective and hard to quantify. For example, when a user first visits the Amazon site, can Amazon know for sure whether its recommendations are on target? Some recommendations may connect with the customer’s interests but the customer may still not buy. The highest confidence that a recommendation is precise comes from clear evidence: The customer buys the item. Alternatively, the system can explicitly ask the user to rate its recommendations.
Recall measures the set of possible good recommendations your system comes up with. Think of recall as an inventory of possible recommendations, but not all of them are perfect recommendations. There is generally an inverse relationship to precision and recall. That is, as recall goes up, precision goes down, and vice versa. You can’t expect to have a large inventory of items that a customer will buy. You may expect to have a large inventory of items that a customer might consider buying. But a large inventory is only half the battle; see the cautionary sidebar, “Precision versus recall.”
The ideal system has both high precision and high recall. But realistically, the best outcome is to strike a delicate balance between the two. Emphasizing precision or recall really depends on the problem you’re trying to solve.

One major problem with content-based recommender systems is that it’s easy to give them too narrow a focus. They make recommendations based on past purchases, likes, and dislikes. They may fall short for customers who want to try new things and experiences. Those customers want to hear about something totally different from what they’ve already bought or seen. People can be unpredictable that way.

PRECISION VERSUS RECALL

The relationship between precision and recall is best described through an example: A recommendation system generally recommends a list of several items (the top n), not just one or two. Let’s assume that the recommendation list only contains two precise recommendations. So the system can recommend those two items and it will have perfect precision. However, what if you need to show recommendations for an e-mail campaign that has 10 slots? You must include 8 other possible recommendations that aren't perfectly precise. Those additional 8 recommendations bring down the precision. In contrast, recall has gone up because the system determined that there were 20 possible recommendations, of which the system initially only showed the top two. Recall has gone up from 2/20 to 10/20.

Having a large inventory of recommendations seems an advantage, but what if they keep failing to stimulate a sale? Well, one problem is the customers’ reaction if they perceive the system’s recommendations as poor: They lose interest and go elsewhere. Thus offering everything and anything just to fish for possible sales can cause more damage than good: Repeated off-target recommendations can irritate customers; future attempts to market to those customers may end up in the spam folder.

Take, for example, a job-listing site sends a customer even one advertisement for a job that is completely inappropriate for the customer — say, advertising an opening in heavy construction to a person whose experience and expertise are in running a florist’s shop. The customer will probably consider the recommendation poor, even crazy. If the company doesn’t boost the precision of its recommendation algorithm, there goes the job seeker to a competing job-listing site.

Hybrid recommender systems

The best implementation may be a hybrid approach to creating a recommender system. By combining content-based and collaborative filtering into a single approach, the system can try to overcome each one’s shortcomings.

A major problem with the content-based approach is its accuracy and narrow focus. The recommendations may not be very interesting or unique. Many of the recommendations may have already been known to the user, so the system isn’t providing anything new from the user’s perspective. However, the implementation is much simpler than that of collaborative filtering. The content-based approach requires only that a profile be created for the items (keyword tagging). The user’s profile can be implicitly or explicitly created. This system can start working right away.

A major problem with collaborative filtering is that it suffers from the cold-start problem. Many users who are just starting out won’t receive accurate recommendations — or any recommendations at all — until enough data is gathered from the community of users. The data collection will require time to complete — and that collection depends on how active the website is. It may also require users to create accounts in their systems (in order to create a profile) before they can start receiving recommendations.

A hybrid recommender system can try to solve both these problems. It can start by using the content-based approach to avoid the cold-start problem. After enough data is collected from the community of users, the system can then use the collaborative filtering approach to produce more interesting and personalized recommendations.

Target Marketing

Predictive analytics make your marketing campaigns more customer-oriented. The idea is to customize your advertisements to target a segment of your total customer base — not the whole. If you send only the ads that are relevant to a segment of customers, you increase the likelihood that those particular visitors will perform the action that you hope for — buying. When you can determine which segment of your customer base will respond best to your message, you save money on the cost of convincing a customer to make the purchase (acquisition costs) and improve overall efficiency.

For example, when you pay an online ad network — for example, Google AdWords — to display your ads, typically you pay for each click that sends traffic to your website through a sponsored ad that appears in response to a search. Getting the visitor ultimately to do what you hope she’ll do while she’s on your website — become a paying customer — should be part of your marketing strategy. This type of marketing cost structure is called pay per click. You pay the network (in this case, Google) for each click, whether or not the visitor converts into a sale.

Because you're paying for each click with no guarantee of converting each visit into a sale, you’ll want to create some sort of filter to ensure that those likeliest to become customers receive your advertisement. No point displaying your ad to just anyone — a shotgun strategy is far from optimal, and your acquisition costs would be through the roof. Your ad’s target audience should be those visitors who have the highest chance of conversion.

This is where predictive analytics can come to your aid for target marketing. By creating an effective predictive model that ranks the customers in your database according to who is most likely to buy, subscribe, or meet some other organizational goal, you have the potential to increase the return on your marketing investment. Specifically, predictive analytics for marketing can

Increase profitability
Increase your conversion ratio
Increase customer satisfaction by reducing unwanted contact
Increase operational efficiencies
Learn what works (or doesn’t) in each marketing campaign

Targeting using predictive modeling

Traditional marketing targets a group of customers without applying such modern techniques as predictive modeling using data-mining, and machine-learning algorithms to the dataset. Predictive modeling, in the area of direct marketing is called response modeling using predictive analytics (or simply response modeling from here on). Sometimes analysts create filters to apply to the dataset, thereby creating a select group to target. But that select group may not be optimally configured. Response modeling, on the other hand, seeks to discover patterns in the data that are present but not immediately apparent (in part because the number of variables being considered is nearly always greater than the number of variables that would be present in a marketer’s segment or “filter”); the result is an optimized group to target.

Table 2-5 uses a small sample to compare the profit generated by direct mailings — traditional marketing versus response modeling.

TABLE 2-5 Comparing Direct Mailing Results (Small Sample)

	Traditional Marketing	Response Modeling
Number of customers targeted	1000	100
Cost per customer targeted (assume $2)	$2	$2
Number of responses	20	10
Response rate	2 percent	10 percent
Total revenue (assume $100 per response)	$2,000	$1,000
Total cost of campaign	$2,000	$200
Total profit	$0	$800

In Table 2-5, response modeling has targeted 10 percent of the traditional number of customers (100 instead of 1000) to an optimized subset. The response rate should be higher with response modeling — 10 percent instead of the 2 percent that is typical for traditional marketing. The net result is a profit of $800 under response modeling; traditional marketing breaks even. Also, as per-customer targeting costs increase, response modeling's value gets even better — without even taking into account the implicit benefits of not targeting unqualified customers.

When you make constant contact with a customer without providing any benefit, you run the risk of being ignored in the future.

Table 2-6 is an example that shows the profit comparison between direct mailings using traditional marketing and response modeling with a larger sample size.

TABLE 2-6 Comparing Direct Mailing Results (Larger Sample)

	Traditional Marketing	Response Modeling
Number of customers targeted	10000	1000
Cost per customer targeted	$2	$2
Number of responses	200	100
Response rate	2 percent	10 percent
Total revenue (assume $100 per response)	$20,000	$10,000
Total cost of campaign	$20,000	$2000
Total profit	$0	$8,000

In Table 2-6, response modeling has (again) targeted only 10 percent of the 10,000 prospective customers traditionally targeted. In an optimized subset of 1,000, the response rate should be higher. We assumed a response rate of 2 percent for a traditional direct-mailing marketing campaign; with response modeling, the response rate is 10 percent because the customers are likelier to buy in the first place.

Response modeling creates a profit of $8,000 under this scenario; traditional marketing breaks even. As in the preceding scenario, any revenue earned using traditional marketing is consumed by marketing costs. Thus, as the accuracy of customers targeted, increases, the value of response modeling also increases.

Response modeling also applies to email marketing campaigns, even though the production costs are much cheaper than other channels. Fatigue from constant email contact will diminish the value of future marketing campaigns for all other channels.

Uplift modeling

So how do you know that the customer you targeted wouldn’t have purchased anyway? To clarify this question, you can restate it in a couple different ways:

How do you know the customer wouldn’t have purchased even when she didn’t get the marketing contact from you?
How do you know that what you sent to the customer influenced her to make the purchase?

Some modelers claim that the problems with response modeling are as follows:

You’re taking a subset of your customers whom you’ve predicted will have some interest in the product or service already.
You’re wasting marketing dollars on customers who don’t need the extra influence to convert.
You may be decreasing your net margins because the discounts you’re using to entice the customer to buy may be unnecessary.
You may be reducing your customer satisfaction because some customers don’t want to be (constantly) contacted.
You’re incorrectly taking credit for the response in your evaluation of the model.

Uplift modeling, also called true lift modeling and net modeling among other terms, aims to answer those criticisms by predicting which customers will only convert when contacted.

Uplift modeling works by generating predictive scores that aim at ranking individuals by their ability to be influenced. Abstractly, there are four possible groupings of customers:

Persuadables: Customers who can be persuaded to purchase — but will only buy when contacted.
Sure Things: Customers who will buy, regardless of contact.
Lost Causes: Customers who will not buy, regardless of contact.
Do Not Disturbs: Customers whom you shouldn't contact. Contacting them may cause a negative response like provoking them to cancel a subscription, return a product, or ask for a price adjustment.

Uplift modeling is designed to target the Persuadables. The Persuadables get a high predictive score, while the Do Not Disturbs get a low predictive score.

In the middle of the range are the Sure Things and Lost Causes.

Uplift models have proven to be effective, but they are more difficult to create than a response model. Here’s why:

It generally is more complex to find a group of Persuadables than Sure Things. Sure Things usually are easier to identify, because they are customers who may have already purchased before or have shown interest in the product. Persuadables may have purchased a similar product or exhibit similar features as customers that have purchased. With a potentially smaller target size of Persuadables, complexity of building the uplift model, the operating effort, and cost, companies may not justify the use over response modeling.

In the case of voters, many undecided voters may fall into the group of Persuadables and can be switched to your candidate and motivated to vote when given a particular treatment.
It’s more difficult to measure the success of the model because it’s attempting to measure the influence that the treatment caused to change a customer’s behavior, not the concrete action of whether the customer purchased after receiving contact. A customer may have been influenced by other factors between the time they received contact and took action.

To truly measure whether a customer can be or has been influenced accurately, you would (in effect) have to clone her and split the identical clones into a separate group. The first (treated/cloned group) would receive the advertisement; the second (control/original group) would not. Setting aside such sci-fi scenarios, you have to make some concessions to reality and employ an alternative method to get a useful measurement of the model’s success.

Measuring uplift modeling requires having two test sets: a randomized control group and a treatment group. The treatment group gets the specially designed treatment (or contact), while the control group gets the default experience (no treatment or no contact). The positive difference in response rate between the treatment group and the control group is the uplift.

Even with these difficulties, some modelers argue that uplift modeling provides true marketing impact. They consider it more efficient than response modeling because it doesn’t include the Sure Things in the targeting (which artificially inflates response rates). For that reason, uplift modeling is the choice for target marketing using predictive analytics.

Uplift modeling is still a relatively new technique in target marketing. More companies are starting to use it and have found success using it in their customer retention, marketing campaigns, and even presidential campaigns. For companies with large customer lists, uplift modeling is worth exploring.

Some pundits are crediting uplift modeling for President Obama’s 2012 presidential campaign win. The campaign’s data analyst used uplift modeling to heavily target voters who were most likely to be influenced by contact. They used personalized messages via several channels of contact: social media, television, direct mail, and telephone. They concentrated their efforts to influence the group of Persuadables. They invested heavily in this strategy; apparently it paid off.

Personalization

You may have noticed that websites remember what you did or which pages you looked at on their website last week or last month. Such websites are tracking your behavior, from clicks on certain parts of the page to the order of the pages you viewed for a session, to offer you the most relevant advertisements, products, or news articles.

Online customer experience

These websites are attempting to personalize your online experience to influence you and make it easier for you to take the action the companies that operate the websites want. The desired outcomes or goals these companies are usually looking for include

Filling out a registration or an appointment form
Clicking on a product link
Reading another news article
Watching a video
Buying a product

After the company has collected enough data for customers that successfully meet the goal, they can try to learn from the patterns those customers did before the action was taken, then apply the patterns to similar customers. Finding customers that are similar is by no means a simple task. Sure, an analyst can easily identify a single to a few customer attributes to manually segment the data by and see whether there are any patterns. But this process becomes exponentially more difficult as the number of transactions gets bigger and the customer attributes get wider.

Using machine learning algorithms, the machine can find micro-segments of similar customers so companies can target a group of customers more precisely with personalized content and offers.

Retargeting

Your web surfing behavior tracking isn't limited to the current website you're on. It can be tracked across multiple websites. This technology is possible through browser cookies and advertising networks that share data with its affiliates. When you visit one website and view a product page, then move on to another new website and see an advertisement for the product on the banner or side rail, this is retargeting.

By itself, retargeting isn’t predictive analytics. But this gives you the knowledge that the data is out there and can be shared across websites. The ad networks share the data with their affiliates, and much of this data is available through data management platforms. By combining this third-party data with the company's own data, they can create more advanced predictive models.

Implementation

Personalized websites can be created in several ways. It really depends on which types of data that are available and whether the customer is logged in or not. Having profile data on the customer adds many more attributes to create a predictive model.

These are some data types and sources that can be used to create personalization models:

Customer's profile data, when she is logged in.

Profile data can include such attributes as age and gender.
Content on the page.

Using text mining techniques like TF-IDF, we can find important keywords.
The referring webpage.

The referring webpage may have keywords in its website address (URL).
The websites and pages you visited before.

When this data is available, it can show interest in a product or subject.
The geolocation of the web browser.

The physical location of the browser accessing the website, using its Internet Protocol address.
Temporal data, such as time of day.

Time of day and day of week are common temporal segmentation attributes.

Optimizing using personalization

Optimizing an e-commerce site using personalization is a great example of improving customer experience and satisfaction. By providing individualized content, customers see relevant offers, encounter fewer distractions, and build trust in the system, which will ultimately drive sales. Here’s an illustration of how personalization increases sales.

Suppose we are operating a travel booking website. We want to optimize the experience for our two biggest markets: California and Florida travelers. Our analysts know that Californians like to travel to Nevada and Floridians like to travel to Georgia. So we use a simple rule to personalize the home page by geolocation.

We show a hero image of Las Vegas to IP addresses belonging to California; for Floridians, we show a hero image of downtown Atlanta. (A hero image is a big banner placed on the top or center of the web page.) We're using the hero image to personalize, inspire, and influence the site visitor to make a booking. Some visitors will make the booking regardless of the personalized image, because that was their original intention. Others may be inspired and influenced into further researching and planning a trip to Las Vegas or Atlanta, thus increasing your booking rate.

Using predictive modeling, we can use every available attribute to create sophisticated models that target specific segments of the California and Florida markets. For example, the model could have discovered that northern Californians with children like to travel to Lake Tahoe, while the ones without children like to fly to Waikiki Beach in Hawaii. Meanwhile, southern Californians like to go to Las Vegas. Northern Floridians may like Atlanta, while southerners may prefer the Caribbean Islands.

Predictive modeling allows you to be as granular as you like in segmenting data to make personalized offers. It can detect patterns in micro-segments that would normally be very difficult. A rule-based system may start with a few simple rules, but eventually turn into a complex and convoluted set of rules. The constant manual updating and deploying may make a rule-based system unmanageable. Using machine learning to algorithmically produce content presentation for personalization is a scalable and cost-effective solution. Personalization has great potential to increase ROI.

Similarities of Personalization and Recommendations

Personalization is similar to recommendations, and they are often used together. In addition, some websites use the term personalized recommendations. Recommendation implementations are easy to identify because they are often called out specifically with “Recommended for you” or “Customers that bought this also bought that.” When you read an article on a news site, the recommendation engine will recommend other relevant articles for you to read. When you buy a product, the recommendation engine will recommend other products that may interest you to buy.

One type of personalization can refer to how you like the content of the website to be presented to you:

The website’s arrangement:
- Main navigation on the top or the left panel?
- Content groupings by vertical columns or horizontal rows?
- Text lists or thumbnail images?
The colors of CTA (call to action) buttons and links:
- Colors matter, and they often mean different things and exude different feelings to different people. Some research has shown orange to be the color that converts the best, because orange contrasts well with typical websites.
- Most people are accustomed to Google search links being blue for unvisited and red for visited. But the default colors on browsers could be different. People with color blindness may prefer different colors for links.
The flow of the checkout process.

High-frequency customers may prefer one-click ordering from the product page, while standard customers may prefer a one-page checkout process after clicking the product to review their order.

Another type of personalization refers to which type of content on the website should be presented to the customer according to their profile. This can be in the form of personalized recommendations.
Relevant ads and offerings.

For example, it may not make sense for an apparel company to show ads for men’s clothes when the customer is female and have no history of purchasing men’s clothes. They should be showing ads for women’s clothes in similar price point categories as their purchase history.
Relevant items.

Based on your profile, purchase history, reading history, and similarity to other like-minded customers, show relevant products or articles on your personalized home page.

Content and Text Analytics

There’s no shortage of information out there these days — but your success demands that you find and gather only the useful stuff. Valuable content is scattered across a massive number of files throughout your company. Harvesting that content and making sense of it can provide valuable insight — but doing that is challenging. The approach that meets the challenge is content analytics — analyzing content found in various types of documents and from a variety of sources. For example, the correct analysis of content from Word documents, system files, presentations, e-mails, and websites can illuminate a question from various angles.

Most content from such data sources is unstructured, at least as far as your business purposes are concerned. Content analytics can help you organize it into a structure that makes it easier to access, query, analyze, or feed directly into a traditional predictive analytic model.

Some common uses for content analytics and text analytics include

Summarizing documents: Reducing a document to its most import features or concepts can give the reader quick overview, saving time.
Analyzing sentiment: Determining the mood or opinion of a person as evident in the content — say, regarding a product after it’s launched or the campaigns or policies of a political figure — can help clarify what the response should be.
Scoring essays: Automatic scoring of essays for exams can help universities filter applications.
Categorizing news: Categorizing news articles according to content can enable a recommender system to link recommended news articles to users.
Retrieving information: Finding and gathering content of interest from various data sources.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 2: Predictive Analytics in the Wild

Create new playlist

Sign In

Sign Up