4 Use cases for (big) data analytics

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 4 Use cases for (big) data analytics

In this chapter, I’ll cover important business applications of analytics, highlighting the enhancements of big data technologies, either by providing scalable computing power or through the data itself. It is not uncommon for these applications to raise KPIs by double digits.

A/B testing

In A/B testing, also called split testing, we test the impact of (typically small) product modifications. We divide customers into random groups and show each a different version. We run the test for a few weeks and then study the impact. Any attribute of your website can be tested in this way: arrangements, colours, fonts, picture sizes, etc. Companies run hundreds of A/B tests over the course of a year to find what best impacts total sales, bounce rate, conversion path length, etc.

A/B testing is the life blood of online companies, allowing them to quickly and easily test ideas and ‘fail fast’, discarding what doesn’t work and finding what does. Beyond simply observing customer behaviour, A/B testing lets you take an active role in creating data and making causal statements. You’re not simply watching customers, you’re creating new digital products and seeing how the customers react. A/B testing can boost revenue by millions of dollars.

A/B testing is not in itself a big data challenge, but coupling A/B testing with a big data application makes it much more effective. There are several reasons for this.

By eliminating sampling, big data allows you to perform deep dives into your target KPIs, exploring results within very specific segments. To illustrate with a simplistic example, if you run an A/B test in Europe where the A variant is English text and the B variant is German text, the A variant would probably do better. When you dive deeper, splitting the results by visitor country, you get a truer picture.
If you run an e-commerce platform with several hundred product categories and customers in several dozen countries, the variants of your A/B test will perform very differently by category and location. If you only study the summary test data, or if you only keep a small percentage of the test data (as was the standard practice in many companies), you’ll lose the quantity of data you would need to draw a meaningful conclusion when a product manager asks you about the performance of a certain product in a certain market within a specific time window (for example, when a high-priced marketing campaign was run during a major network event). It is big data that gives you these valuable, detailed insights from A/B tests.
The second way in which big data improves A/B testing is that, by allowing you to keep all the customer journey data for each testing session, it allows you to go beyond KPIs and begin asking nuanced questions regarding how test variants impacted customer journey. Once you have added a test variant ID to the big data customer journey storage, you can then ask questions such as ‘which variant had the shorter average length of path to purchase?’ or ‘in which variant did the customer purchase the most expensive product viewed?’ These detailed questions would not be possible in standard A/B implementations without big data.
The third way, which we touched on in the last chapter, is that big data lets you answer new questions using data that you’ve already collected. Conjectures about user responses to product changes can sometimes be answered by looking to vast stores of historical data rather than by running new tests.

To illustrate, imagine a company such as eBay is trying to understand how additional item photos might boost sales. They could test this by listing identical products for sale, differing only in the number of photos, and running this experiment for several weeks. If they instead used a big data system, they could immediately comb through the historical data and identify such pairs of products which had already been sold. Power sellers on a site such as eBay would have already run such selling experiments for their own benefit. eBay need only find these user-run experiments already stored in the big data storage system. In this way, the company gets immediate answers to their question without waiting for new test results.

Recommendation engines/next best offer

Recommendation engines have proven their worth for many companies. Netflix is the poster child for recommendation engines, having grown user base and engagement metrics not only by acquiring and producing video content, but also through personalized recommendations.

In e-commerce, a key tactical capability is to recommend the products at the appropriate moments in a manner that balances a set of sometimes conflicting goals: customer satisfaction, maximum revenue, inventory management, future sales, etc. You must assess which product would most appeal to each customer, balanced against your own business goals, and you must present the product to the customer in a manner most likely to result in a purchase.

If you’re a publisher, you are also facing the challenge of recommending articles to your readers, making choices related to content, title, graphics and positioning of articles. Even starting with a specific market segment and category (world news, local news, gardening, property etc.), you need to determine the content and format that will most appeal to your readers.

Case study – Predicting news popularity at The Washington Post³⁰

The Washington Post is one of the few news agencies that have excelled in their creation of an online platform. Acquired by Amazon founder Jeff Bezos in 2013, it’s no surprise it has become innovative and data-driven. In fact, Digiday called The Post the most innovative publisher of 2015. By 2016, nearly 100 million global readers were accessing online content each month.

The Post publishes approximately 1000 articles each day. With print, publishers choose content and layout before going to press and have very limited feedback into what works well. Online publishing provides new insights, allowing them to measure readers’ interactions with content in real time and respond by immediately updating, modifying or repositioning content. The millions of daily online visits The Post receives generate hundreds of millions of online interactions, which can immediately be used to steer publishing and advertising.

The Post is also using this big data to predict article popularity, allowing editors to promote the most promising articles and enhance quality by adding links and supporting content. Importantly, they can monetize those articles more effectively. If the model predicts an article will not be popular, editors can modify headlines and images to increase success metrics such as views and social shares.

The Post’s data-driven culture is paying off. In an age where traditional publishers are struggling to reinvent themselves, The Post recently reported a 46 per cent annual increase in online visitors and a 145 per cent increase in annual digital-only subscriptions.³¹

We see how the move to online provided insight when articles were being read and shared. The publisher could see which articles were clicked (thus demonstrating the power of the headline and photo), which were read to the end (based on scrolls and time on page) and which were shared on social media. This digital feedback enabled a feedback loop not possible in print. However, using the digital feedback effectively requires the publishers to turn to digital data solutions. As the data grows, accelerates and becomes more complex, the publisher needs advanced tools and techniques for digital insights. To illustrate, consider a publisher who knows the number of readers of certain articles, but wants to understand the sentiment of the readers. This publisher might start collecting and analysing text data from mentions of articles on social media, using sentiment analysis and more complex AI techniques to understand an article’s reception and impact.

For merchants, recommending grew more difficult. Placing an item for sale online made it easy to sell, but customers became faceless and often anonymous. As a merchant, you need to know what products customers are most likely to buy, and you need to know how to help them. Both require a continuous feedback cycle which is responsive to each question and action of the customer. When the customer enters the store, you form a sales strategy from first impressions. A young girl will likely buy different items than an older man. The first question from the customer will indicate their intention, and their response to the first items they see will give insights into their preferences.

Recommendation engines typically use a blend of two methodologies. The first, called collaborative filtering, contributes a recommendation score based on past activity. The second, content-based filtering, contributes a score based on properties of the product. As an example, after I’ve watched Star Wars Episode IV, collaborative filtering would suggest Star Wars Episode V, since people who liked Episode IV typically like Episode V. Content-based filtering, however, would recommend Episode V because it has many features in common with Episode IV (producer, actors, genre, etc.). An unwatched, newly released movie would not be recommended by the collaborative algorithm but might be by the content-based algorithm.

Big data is what makes recommendation engines work well. If you’re building a recommendation engine, you’ll want to calibrate it using abundant, detailed data, including browsing data, and this is provided by your big data stores. The big data ecosystem also provides you with the scalable computing power to run the machine learning algorithms behind your recommendation engines, whether they are crunching the numbers in daily batch jobs or performing real-time updates.

Your recommendation engine will work best when it can analyse and respond to real-time user behaviour. This ability, at scale, is what the big data ecosystem provides. Your customers are continuously expressing preferences as they type search terms and subsequently select or ignore the results. The best solution is one that learns from these actions in real time.

Forecasting: demand and revenue

If your forecasting model was built without incorporating big data, it probably is a statistical model constructed from a few standard variables and calibrated using basic historic data. You may have built it using features such as geography, date, trends and economic indicators. You may even be using weather forecasts if you are forecasting short-term demand and resulting revenue.

Big data can sharpen your forecasting in a couple of ways.

First, it gives you more tools for forecasting. You can keep using your standard statistical models, and you can also experiment using a neural network trained on a cluster of cloud-based graphical processing units (GPUs) and calibrated using all available data, not just a few pre-selected explanatory variables. Retailers are already using such a method to effectively forecast orders down to the item level.
Second, big data will provide you with additional explanatory variables for feature engineering in your current forecasting models. For example, in addition to standard features such as date, geography, etc., you can incorporate features derived from big data stores. A basic example would be sales of large ticket items, where increasingly frequent product views would be a strong predictor of an impending sale.

IT cost savings

You can save significant IT costs by moving from proprietary technology to open-source big data technology for your enterprise storage needs. Open-source technologies run on commodity hardware can be 20–30 times cheaper per terabyte than traditional data warehouses.³² In many cases, expensive software licenses can be replaced by adopting open-source technologies. Be aware, though, that you’ll also need to consider the people cost involved with any migration.

Marketing

Marketing is one of the first places you should look for applying big data. In Dell’s 2015 survey,¹ the top three big data use cases among respondents were all related to marketing. These three were:

Better targeting of marketing efforts.
Optimization of ad spending.
Optimization of social media marketing.

This highlights how important big data is for marketing. Consider the number of potential ad positions in the digital space. It’s enormous, as is the number of ways that you can compose (via keyword selection), purchase (typically through some bidding process) and place your digital advertisements. Once your advertisements are placed, you’ll collect details of the ad placements and the click responses (often by placing invisible pixels on the web pages, collectively sending millions of messages back to a central repository).

Once customers are engaged with your product, typically by visiting your website or interacting with your mobile application, they start to leave digital trails, which you can digest with traditional web analytics tools or analyse in full detail with a big data tool.

Marketing professionals are traditionally some of the heaviest users of web analytics, which in turn is one of the first points of entry for online companies that choose to store and analyse full customer journey data rather than summarized or sampled web analytics data. Marketing professionals are dependent on the online data to understand the behaviour of customer cohorts brought from various marketing campaigns or keyword searches, to allocate revenue back to various acquisition sources, and to identify the points of the online journey at which customers are prone to drop out of the funnel and abandon the purchase process.

Social media

Social media channels can play an important role in helping you understand customers, particularly in real time. Consider a recent comScore report showing that social networking accounts for nearly one out of five minutes spent online in the US (see Figure 4.1)

Figure 4.1 Share of the total digital time spent by content category.

Source: comScore Media Metrix Multi-Platform, US, Total Audience, December 2015.³³

Social media gives insight into customer sentiment, keyword usage and campaign effectiveness, and can flag a PR crisis you need to address immediately. Social media data is huge and it moves fast. Consider Twitter, where 6000 tweets are created each second, totalling 200 billion tweets per year.³⁴ You’ll want to consider a range of social channels, as each may play an important role in understanding your customer base, and each has its own mixture of images, links, tags and free text, appealing to slightly different customer segments and enabling different uses.

Pricing

You may be using one or more standard pricing methods in your organization. These methods are specialized to fit specific sectors and applications.

Financial instruments are priced to prevent arbitrage, using formulas or simulations constructed from an underlying mathematical model of market rate movements. Insurance companies use risk- and cost-based models, which may also involve simulations to estimate the impact of unusual events. If you are employing such a simulation-based pricing method, the big data ecosystem provides you with a scalable infrastructure for fast Monte Carlo simulations (albeit with issues related to capturing correlations).

If you are in commerce or travel, you may be using methods of dynamic pricing that involve modelling both the supply and the demand curves and then using experimental methods to model price elasticity over those two curves. In this case, big data provides you with the forecasting tools and methods mentioned earlier in this chapter, and you can use the micro-conversions in your customer journey data as additional input for understanding price elasticity.

Customer retention/customer loyalty

Use big data technologies to build customer loyalty in two ways.

First, play defence by monitoring and responding to signals in social media and detecting warning signals based on multiple touch points in the omni-channel experience. I’ll illustrate such an omni-channel signal in the coming section on customer churn. In Chapter 6, I’ll also discuss an example of customer service initiated by video analysis, which is a specific technique for applying non-traditional data and AI to retain customers and build loyalty.

Second, play offense by optimizing and personalizing the customer experience you provide. Improve your product using A/B testing; build a recommendation engine to enable successful shopping experiences; and deliver customized content for each customer visit (constructed first using offline big data analytics and then implemented using streaming processing for real-time customization).

Cart abandonment (real time)

Roughly 75 per cent of online shopping carts are abandoned.³⁵ Deploy an AI program that analyses customer behaviour leading up to the point of adding items to shopping carts. When the AI predicts that the customer is likely to not complete the purchase, it should initiate appropriate action to improve the likelihood of purchase.

Conversion rate optimization

Conversion rate optimization (CRO) is the process of presenting your product in a way that maximizes the number of conversions. CRO is a very broad topic and requires a multi-disciplinary approach. It is a mixture of art and science, of psychology and technology. From the technology side, CRO is aided by A/B testing, by relevant recommendations and pricing, by real-time product customization, by cart abandonment technologies, etc.

Product customization (real time)

Adjust the content and format of your website in real time based on what you’ve learned about the visitor and on the visitor’s most recent actions. You’ll know general properties of the visitor from past interactions, but you’ll know what they are looking for today based on the past few minutes or seconds. You’ll need an unsampled customer journey to build your customization algorithms and you’ll need streaming data technologies to implement the solution in real time.

Retargeting (real time)

Deploy an AI program to analyse the customer behaviour on your website in real time and estimate the probability the customer will convert during their next visit. Use this information to bid on retargeting slots on other sites that the customer subsequently visits. You should adjust your bidding prices immediately (a fraction of a second) rather than in nightly batches.

Fraud detection (real time)

In addition to your standard approach to fraud detection using manual screening or automated rules-based methods, explore alternative machine learning methods trained on large data sets.³⁶ The ability to store massive quantities of time series data provides both a richer training set as well as additional possibilities for features and scalable, real-time deployment using fast data methods (Chapter 5).

Churn reduction

You should be actively identifying customers at high risk of becoming disengaged from your product and then work to keep them with you. If you have a paid usage model, you’ll focus on customers at risk of cancelling a subscription or disengaging from paid usage. Since the cost of acquiring new customers can be quite high, the return on investment (ROI) on churn reduction can be significant.

There are several analytic models typically used for churn analysis. Some models will estimate the survival rate (longevity) of your customer, while others are designed to produce an estimated likelihood of churn over a period (e.g. the next two months). Churn is typically a rare event, which makes it more difficult for you to calibrate the accuracy of your model and balance between false positives and false negatives. Carefully consider your tolerance for error in either direction, balancing the cost of labelling a customer as a churn potential and wasting money on mitigation efforts vs the cost of not flagging a customer truly at risk of churning and eventually losing the customer.

These traditional churn models take as input all relevant and available features, including subscription data, billing history, and usage patterns. As you increase your data supply, adding customer journey data such as viewings of the Terms and Conditions webpage, online chats with customer support, records of phone calls to customer support, and email exchanges, you can construct a more complete picture of the state of the customer, particularly when you view these events as a sequence (e.g. receipt of a high bill, followed by contact with customer support, followed by viewing cancellation policy online).

In addition to utilizing the additional data and data sources to improve the execution of the traditional models, consider using artificial intelligence models, particularly deep learning, to reduce churn. With deep learning models, you can work from unstructured data sources rather than focusing on pre-selecting features for the churn model.

Predictive maintenance

If your organization spends significant resources monitoring and repairing machinery, you’ll want to utilize big data technologies to help with predictive maintenance, both to minimize wear and to avoid unexpected breakdowns. This is an important area for many industries, including logistics, utilities, manufacturing and agriculture, and, for many of them, accurately predicting upcoming machine failures can bring enormous savings. In some airlines, for example, maintenance issues have been estimated to cause approximately half of all technical flight delays. In such cases, gains from predictive maintenance can save tens of millions annually, while providing a strong boost to customer satisfaction.

The Internet of Things (IoT) typically plays a strong role in such applications. As you deploy more sensors and feedback mechanisms within machine parts and systems, you gain access to a richer stream of real-time operational data. Use this not only to ensure reliability but also for tuning system parameters to improve productivity and extend component life.

This streaming big data moves you from model-driven predictive maintenance to data-driven predictive maintenance, in which you continuously respond to real-time data. Whereas previously we may have predicted, detected and diagnosed failures according to a standard schedule, supplemented with whatever data was periodically collected, you should increasingly monitor systems in real time and adjust any task or parameter that might improve the overall efficiency of the system.

Supply chain management

If you’re managing a supply chain, you’ve probably seen the amount of relevant data growing enormously over the past few years. Over half of respondents in a recent survey of supply chain industry leaders³⁷ indicated they already had or expected to have a petabyte of data within a single database. Supply chain data has become much broader than simply inventory, routes and destinations. It now includes detailed, near-continuous inventory tracking technology at the level of transport, container and individual items, in addition to real-time environmental data from sensors within transports.

These same survey respondents indicated that the increased visibility into the movements of the supply chain was their most valuable application of big data technology, followed by an increased ability to trace the location of products. These were followed by the ability to harvest user sentiment from blogs, ratings, reviews and social media. Additional applications of value included streaming monitoring of sensor readings (particularly for temperature), equipment functionality, and applications related to processing relevant voice, video and warranty data.

Customer lifetime value (CLV)

As you work to understand your marketing ROI and the cost of customer churn, you’ll want to analyse customer lifetime value (CLV), the total future value that a customer will bring to your organization. A basic CLV calculation (before discounting) would be

(Annual profit from customer)
× (Expected number of years the customer is active)
− Cost of acquiring customer

Estimating CLV for customer segments lets you better understand the ROI from acquisition efforts in each segment. If the expected profits don’t exceed the acquisition costs, you won’t want to pursue those customers.

The accuracy of your CLV calculation increases with your ability to sub-segment customers and your ability to compute the corresponding churn rates. Your ability to mitigate churn and to further activate customers through cross-sell, up-sell and additional conversion rate optimization will boost your CLV.

Use available big data to produce the more refined customer segmentation. The additional data will primarily consist of digital activity (including acquisition source, webpage navigation, email open rates, content downloads and activity on social media) but for some industries may also include audio and video data produced by your customer. To illustrate, you may find that customers you acquire from social media referrals will remain with you longer than customers you acquire from price comparison sites.

Lead scoring

Lead scoring is the art/science/random guess whereby you rank your sales prospects in decreasing order of potential value. A 2012 study by Marketing Sherpa reported that only 21 per cent of B2B marketers were already using lead scoring,³⁸ highlighting abundant room for growth.

Use lead scoring to help your sales team prioritize their efforts, wasting less time on dead-end leads and using their time for high-potential prospects. You’ll borrow techniques you used in churn analysis and CLV to generate a lead score, which multiplies the likelihood of lead conversion with the estimated CLV of the lead.

For attempted cross-sell and up-sell to existing customers, start from the same sources of customer data. If the lead is not a current customer and conversions are infrequent, you’ll generally have much less data for them, so you’ll need to select and calibrate models that work with more limited data (e.g. machine learning models won’t generally work).

Consider using AI methods to detect signals in audio and video records matched with sales events. If there is sufficient training data, these methods could be trained to automatically flag your high-potential sales prospects (in real time). We mention a very basic example of such a method in Chapter 6.

Human resources (HR)

If you work in HR, leverage the tools and methods for lead scoring, churn analysis and conversion rate optimization to find and attract the best candidates, reduce employee churn and improve KPIs related to productivity and employee satisfaction.

Recruitment and human resource professionals examine similar data to understand and ultimately influence recruitment success, increase employee productivity and minimize regretted attrition. In addition to traditional HR data (demographics, application date, starting date, positions, salaries, etc.), leverage the new data becoming available to you, such as response patterns for different types of job postings, photos and videos of candidates, free text on CVs / interview notes / emails / manager reviews and any other digital records available, including activity on social media.

Pay attention to privacy laws and to the privacy policies of your organization. The analytics on this data can provide valuable insights even without retaining personally identifiable information. It can be done not only at the level of individual employees but also at progressively aggregate levels: department, region and country.

Sentiment analysis

You can get insights into the intentions, attitudes and emotions of your customers by analysing their text, speech, video and typing rhythms, as well as from data returned by onsite monitors such as video cameras and infra-red monitors.

Always-up monitoring systems can give you public reaction to your marketing or news events. If you are concerned with security or fraud, you can use sentiment analysis to flag high-risk individuals at entrance points or during an application process, forwarding these cases to trained staff for manual evaluation.

As with any AI, sentiment analysis will not be 100 per cent accurate, but it can prove invaluable in bringing trending opinions to your attention much more quickly than manual efforts, and in quickly combing through extensive and rapidly moving data to identify common themes. In addition, some systems can spot features and patterns more accurately than human observers.

Keep in mind

Big data technologies help you do many things better but are not a silver bullet. You should typically build your first solutions using traditional data, and then use big data to build even better solutions.

So far, we’ve painted the big picture of big data and AI, and we’ve looked at several business applications. We end Part 1 of this book with a slightly more detailed look at the tools and technologies that make big data solutions possible. We’ll then move to Part 2, which focuses on the practical steps you can take to utilize big data within your organization.

Takeaways

We provide a brief overview of 20 applications of business analytics, some of which are incrementally improved and some significantly improved by big data technologies.

Ask yourself

Which of these twenty business applications are most important for your organization? For those already in use within your organization, where could you add additional data sources, particularly big data or omni-channel data?
Which KPIs could significantly improve your results if they increased by 5 per cent? Consider that a concerted analytic effort should increase a well-managed KPI by 5 per cent and a poorly managed KPI by 20 per cent or more.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 4 Use cases for (big) data analytics

Create new playlist

Sign In

Sign Up

Chapter 4

Use cases for (big) data analytics