Chapter 14. Business Logic

By now, you may be thinking “Yes, our algorithmic ranking and recommendation has arrived! Personalization for every user with latent understanding is how we run our business.” Unfortunately, the business is rarely this simple.

Let’s take a really straightforward example, a recipe recommendation system. Consider a user who simply hates grapefruit – one of the authors of this book really does – then there are a set of other ingredients that go well with grapefruit, that they may love: asparagus, avocado, banana, butter, cashew_nut, champagne, chicken, coconut, crab, fish, ginger, hazelnut, honey, lemon, lime, melon, mint, olive oil, onion, orange, pecan, pineapple, raspberry, rum, salmon, seaweed, shrimp, star anise, strawberry, tarragon, tomato, vanilla, wine, and yoghurt. These ingredients are the most popular to pair with grapefruit, and almost all of these the user loves.

What’s the right way for the recommender to handle this case? It may seem like this is something that collaborative filtering, or latent features, or hybrid recommendations would catch. However, if the user likes all of these shared flavors, the item-based CF model would not catch this well. Similarly, if the user truly hates grapefruit, latent features may not be sufficient to truly avoid it.

In this case, the simple approach is a great one: hard avoids. In this chapter, we’ll talk about some of the intracacies of business logic intersecting the output of your recommendation system.

Instead of attempting to learn exceptions as part of the latent features that the model utilize when making recommendations, it’s more consistent and simple to integrate these business rules as an external step via deterministic logic. As an example: removing all grapefruit cocktails that are retrieved instead attempting to learn to rank them lower.

Hard Ranking

There really are a large number of examples of these phenomena that you can come up with when you start thinking of situations similar to the grapefruit examples above. Hard Ranking usually refers to one of two kinds of special ranking rules:

  1. Explicitly removing some items from the list before ranking.

  2. Using a categorical feature to rank the results by category (note that this can even be done for multiple to acheive a hierarchical hard ranking).

Have you ever observed any of the following?

  1. A user bought a sofa, and the system continues to recommend sofas to him…​ but for the next 5 years they won’t need a sofa.

  2. A user buys a birthday gift for a friend interested in gardening, then the e-commerce site keeps recommending gardening tools despite the user having no interest in it.

  3. A parent wants to buy a toy for his child. When he goes to the website where he usually buys toys, he’s recommended several things for a child a few years younger – he hasn’t purchased from the site since his child was at that age.

  4. A runner experiences serious knee pain, and determines they can no longer go on long runs. They switch to cycling which is lower impact, however, their local meetup recommendations are still all running-oriented.

All of these cases can be relatively easy to deal with via some deterministic logic. These examples are situations where we would prefer not to try to learn these rules via machine learning. We should assume that for many things like the above, that we will get low-signal about these preferences: negative implicit feedback is often lower in relevance and many of the situations listed are things that you want the system to learn once-and-for-all. Additionally, in some of the previous examples, it can be upsetting or harmful to a relationship with a user to have the preferences not respected.

The name for these preferences is avoids – or sometimes constraints, overrides, and hard rules. You should think of them as explicit expectations of the system: “don’t show me recipes with grapefruit”, “no more sofas”, “I don’t like gardening”, “my child is >10 now”, and “don’t show me trail runs”.

Learned avoids

Not all business rules are such obvious avoids that derive from explicit user feedback, and some derive from explicit feedback not directly related to specific items. There are a wide variety of avoids that are important to include when considering serving recommendations.

For the sake of simplicity, let’s assume you’re building a fashion recommender system. Some examples of more subtle avoids include:

  1. Already owned items; things that you really need to own once. This could be clothing users have bought through your platform, or told you they already own. Creating a virtual closet might be a way to ask users to tell you what they have, to assist in these avoids.

  2. Disliked features; a feature of items that the user can indicate disinterest in. During an onboarding questionnaire you may ask users if they like polka dots, or if they have a favorite color palette. These are explicitly indicated pieces of feedback that can be used for avoids.

  3. Ignored categories; some category or group of items that don’t resonate with the user. This can be implicit, but learned outside the primary recommender model. Maybe the user has never clicked on the dresses category on your e-comm website because they don’t enjoy wearing them.

  4. Low quality items; over time, you’ll learn that some items are simply low quality for most users. You can detect this via a high number of returns, or low ratings from buyers. These items ultimately should be removed from inventory, but in the mean time, including them as avoids for all but the strongest signal of match is important.

These additional avoids can most easily be implemented during the serving stage, and can even include simple models. Training linear models to capture some of these rules, and then applying them during serving can be a useful and reliable mechanism for improving ranking. Note that the small models perform very fast inference so there’s often little negative impact of including them in the pipeline. For larger scale behavior trends, or higher order factors, we expect our core recommendation models to learn these things.

Hand tuned weights

On the other side of the spectrum of avoids, is hand-tuned ranking. This was a technique that was popular in earlier days of search ranking, where humans would use analytics and observation to determine what they thought were the most important features in a ranking, and then craft a multi-objective ranker. For example, flower stores may rank higher in late April as many users search for gifts. Since there could be many variable elements to track, these kinds of approaches don’t scale well and have been largely deemphasized in modern recommendation ranking.

However, hand-tuned ranking can be incredibly useful as an avoid. While technically it’s not an avoid, we sometimes still call it that. An example of this in practice is to know that new users like to start with a lower priced item while they’re learning if your shipping is trustworthy. A useful technique is to then uprank lower price items before the first order.

While it may feel bad to consider building a hand-tuned ranking, it’s important to not count this technique out. It has a place, and is often a great place to start. One interesting human-in-the-loop application of this kind of technique is for hand-tuned ranking by experts. Back to our fashion recommender, a style expert may know that this summer’s trending color is mauve, especially among the younger generation. Then can positively influence user satisfaction if they rank these mauve items up for users in the right age persona.

Inventory Health

A very unique, and somewhat contentious, side of hard ranking is inventory health. Notoriously hard to define, inventory health estimates how good the existing inventory is for satisfying user demand.

Let’s take a quick look at one way to define inventory health, via affinity scores and forecasting. We can do this by leveraging a demand forecast, which is an incredibly powerful and popular way to optimize the business: what is the expected sales in each category over the next N time periods. Building these forecasting models is quite outside the scope of this book but the core ideas are well captured in the famous book of Forecasting Principles and Practice by Hyndman and Athanasopoulos. For the sake of our discussion, assume that you’re able to roughly approximate how many socks you’ll sell over the next month broken down by size, and usage type. This can be a really instructive estimate for how many socks of varoius types you should have on hand.

However, it doesn’t stop there; inventory may be finite, and in practice it is often the case that inventory is a major constraint on businesses that operate in the space of selling physical goods. With that caveat, we have to turn to the other side of the market demand. If our demand outstrips our availability, we are ultimately dissapointing users who don’t have access to the item they desired.

Let’s take an example of selling bagels; you’ve calculated average demand for poppy seed, onion, asiago cheese, and egg. On any given day, there will be many customers who come to buy a bagel with a clear preference in mind, but will you have enough of that bagel? Every bagel you don’t sell, is wasted – people like fresh bagels. This means that your recommended bagels to each person are at the dependent on good inventory. Some users, are less picky; they can get one of 2 or 3 of the options and be just as happy. In that case, it’s better to give them another bagel option and save the lowest inventory to the picky ones. This is a kind of model refinement called optimization, and there are a huge number of techniques for this. We wont get into optimization techniques, but books on mathematical optimization or operations research will provide direction. Algorithms for Optimization by Mykel J. Kochenderfer and Tim A. Wheeler is a good place to start.

Inventory health ties back to hard-ranking, because actively managing inventory as part of your recommendations is an incredibly important and powerful tool. Ultimately, inventory optimization will degrade the perceived performance of your recommendations, but by including it as part of your business rules, the overall health of your business and recommender system improves. This is why it is sometimes called global optimization.

The reason that these methods stir up heated discussions is because not everyone agrees that the quality of recommendations for some users should be depressed to improve those for the “greater good”. Health of the marketplace and average satisfaction are useful metrics to consider, but ensure that these are aligned with the northstar metrics for the recommendation system at large.

Implementing avoids

The simplest approach to handling avoids is via downstream filtering. To do this, you’ll want to apply the avoid rules for the user before the recommendations are passed along from the ranker to the user. Implementing this approach looks someting like this:

import pandas as pd

def filter_dataframe(df: pd.DataFrame, filter_dict: dict):
    """
    Filter a dataframe to exclude rows where columns have certain values.

    Args:
        df (pd.DataFrame): Input dataframe.
        filter_dict (dict): Dictionary where keys are column names
        and values are the values to exclude.

    Returns:
        pd.DataFrame: Filtered dataframe.
    """
    for col, val in filter_dict.items():
        df = df.loc[df[col] != val]
    return df

filter_dict = {'column1': 'value1', 'column2': 'value2', 'column3': 'value3'}

df = df.pipe(filter_dataframe, filter_dict)

Admittedly, this is a trivial, but also relatively naive attempt at avoids. First, as we’ve seen working purely in Pandas will limit some of the scalability of your recommender – so let’s convert this to JAX.

import jax
import jax.numpy as jnp

def filter_jax_array(arr: jnp.array, col_indices: list, values: list):
    """
    Filter a jax array to exclude rows where certain columns have certain values.

    Args:
        arr (jnp.array): Input array.
        col_indices (list): List of column indices to filter on.
        values (list): List of corresponding values to exclude.

    Returns:
        jnp.array: Filtered array.
    """
    assert len(col_indices) == len(values), "col_indices and values should have same length"

    masks = [arr[:, col] != val for col, val in zip(col_indices, values)]
    total_mask = jnp.logical_and(*masks)

    return arr[total_mask]

But there are deeper issues. The next issue you may face is where that collection of avoids is stored. An obvious place to store it is somewhere like a NoSQL database keyed on user, and then you can get all of the avoids as a simple lookup. This is a natural use of Feature Stores as we saw in “Feature stores”. Some avoids may be real-time, while others are learned upon user onboarding. Feature stores are a great place to house avoids.

The next potential gotcha with our naive filter, is that it doesn’t naturally extend to covariate avoids, or more complicated avoid scenarios. Some avoids are actually dependent on context – a user who doesn’t wear white after Labor Day, users who don’t eat meat on Fridays, or coffee processing methods that don’t mesh well with certain brewers. All of these require some conditional logic. You might think that your very powerful and effective recommendation system model can certainly learn all of the above – this is only sometimes true. The reality is that many of these kinds of considerations are lower signal than the large scale concepts your recommendation system should be learning, and thus hard to learn consistently. Additionally, these kinds of rules are often things you should require, as opposed to remain optimistic about. For that reason you often should explicitly specify such things.

This can often be acheived by explicit deterministic algorithms which impose these requirements. For the coffee problem above, one of the authors hand-built a decision stump to handle a few bad combinations of coffee roast features and brewers – anaerobic espresso?! yuck!

The other two examples above (not wearing white after Labor Day and not eating meat on Fridays), however, are a bit more nuanced. An explicit algorithmic approach may actually be tricky to handle. How do we know that a user doesn’t eat meat on Fridays during one period of the year?

For these use cases, model-based avoids are able to impose these requirements.

Model-based avoids

In our quest to include more complicated rules, and potentially learn them, may sound like we’re back in the realm of retrieval. Unfortunately, even with models like wide-and-deep with lots of parameters doing both user-modeling and item-modeling, it can be quite tricky to learn such high-level relationships.

While most of this book has focused on fairly large and deep, this part of recommendation systems is very well suited for simple models. For feature-based binary predictions (should this be recommended), there’s certainly a zoo of good options. It would obviously depend heavily on the number of features involved in implementing the avoid you wish to capture. It’s useful to remember that many avoids that we’re considering in this section start out as assumptions or hypotheses: we think some users may not wear white after Labor Day, and then attempt to find features which model this outcome well. In this way it can be more tractable via extremely simple regression models to find covarying features with the outcome in question.

Another related piece of this puzzle is latent representations. For our Friday vegetarians, we may be trying to infer a particular persona which we know has this rule. That persona is a latent feature that we hope to map from other attributes. It’s important to be careful with this kind of modeling (in general personas can be a bit nuanced and worthy of thoughtful decision making), but it can be quite helpful. It may seem like the user-modeling parts of your large recommender model should learn these – and they can! A useful trick is to pull forward personas learned from that model, and regress them against hypothesized avoids to allow for more signal. However, the other model doesn’t always learn these personas because our loss functions for retrieval relevance (and downsteam for ranking) are attempting to parse out relevance for individual users from the latent persona features – which may only predict these avoids amidst context features.

All in all, implementing the avoids is both very easy, and very hard. When building production recommendation systems, the journey is not over when you get to serving, many models factor into the final step of the process.

Summary

We’ve seen that you sometimes need to rely on more classic approaches to ensuring the recommendations your sending downstream are satisfying essential rules of your business. Learning explicit or subtle lessons from your users can be turned into simple strategies to continue to delight your users.

However, this is not the end of our serving challenge. There is another kind of downstream consideration that’s related to the kind of filtering we’ve done here, but derives from user preference and human behavior. Ensuring that recommendations are not repeated, rote, and redundant will be the subject of the next chapter on diversity in recs. We will also discuss how to balance multiple priorities simultaneously when determining exactly what to serve.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.36.166