CHAPTER 8

A Sneak Peek Into Advanced Analytics

Trends in Advanced Analytics

Unlike previous generations, we live in a connected world surrounded by a variety of smart devices: the smart watch, smart coffee maker, smart yoga mat. (For anyone wondering what an AI yoga mat does, it helps guide your posture in real time.) And the list goes on. What do these devices have in common? They all produce a lot of data. We are bringing together disparate systems to offer efficiency, customization, and personalization, and to reduce latency. The growing amount of data in the digital world we live in is driving the demand for advanced techniques to handle this data. This leads us to question what are the advanced techniques of Analytics? The word Analytics originates from the Greek word ánalytiká, which means “science of analysis.”88 Analytics covers a wide range of topics—some of which have been discussed in previous chapters—but at a high level, it can be classified into two branches: business intelligence (BI) and advanced analytics (AA). In simple terms, BI is like a rearview mirror that shows what has happened, while AA is forward-thinking to predict the future. AA can be seen as a supercharger for traditional analytics capabilities. Other way for looking at it is AA goes beyond traditional BI solutions, which are primarily based on performance indicators, dashboards, and the querying of data warehouses compared to AA looking to incorporate algorithmic techniques from machine learning (ML), artificial intelligence (AI), natural language processing (NLP), and other computer science disciplines.89 If your company is starting to use data, jumping to AA directly will be tough without a foundation of BI and data collection practices in place.

The word advanced shouldn’t frighten people away; we will discuss a relatable example of AA with various solutions options available with analytics. Let us deep dive into the advanced data analytics concept with an analogy of car that is used to take you from point A to B. Remember the old-fashioned printed maps we used while exploring a new town or city? Old models of cars did not have a GPS, and people did not have cell phones to use as a GPS. Imagine the time it would take to go to different places by reading a paper map. The evolution of a ride-hailing service like Uber would have never been possible without AA. In the year 2021, Uber had an estimated 118 million users taking around 6.3 billion trips.90 This amount of data is massive and requires advanced techniques to process it, interpret it, derive relationships for corporate consumption, and help with business decisions. If you have ever used Uber or Lyft, the process of requesting a ride, tracking real-time driver ETA (Expected Time of Arrival), and paying for the ride are as simple as clicking a button. Compare this to traditional taxi service, when drivers did not accept credit cards; when you waited for a taxi, not knowing when it would arrive, when you couldn’t track the taxi’s route. Companies like Uber and Lyft have transformed transportation in a few short years. It looks so simple that often the underlying and behind the scenes complexity is almost invisible.

Uber drives people to places without a car, but data drives Uber. The company constantly adds sophisticated features like surge pricing (price increases with demand and peak hours) and uses cases like fake driver detection to improve the overall experience.91 These forward-thinking features of predicting, determining different patterns and behaviors are run by algorithms and ML, all part of AA capabilities. Uber is also an example of crowdsourcing described in Chapter 3 as it does not provide a taxi service directly but uses its big data infrastructure to match an idle vehicle with a rider.

Big data and AA are a part of everything Uber does. So let’s take a closer look at how Uber handles surge pricing.

Exercise: Applying Advances Analytics to a Taxi Service

Your imaginary taxi company, FunRide, is getting several customer complaints about long wait times. FunRide uses fixed trip pricing based on distance and maintains a fixed number of drivers based on the time of the day. You want to use data to staff enough drivers to meet demand and to evaluate if your pricing model needs to change. Write down the ways you will be approaching this problem, the advantages and disadvantages of each with various options wearing BI, AA, and creative hats.

Business Problem Statement

There is a supply and demand issue with lot of customer complaints increasing each month about taxi availability. You do not have enough drivers to meet demand at peak hours, but you cannot precisely identify the high-demand days or hours. You are losing customers to the competition and need to figure out a way to retain and attract customers.

Solution Approach

Collect data about trips, process trips data, analyze delays, create reports, and determine peak daytime hours for six months of historical data. Once you can identify your peak hours, we can make more drivers available. You could mandate drivers to sign up for more working hours and night shifts, but this still does not handle the problem of sudden changes in demand in situations like multiple flight delays and unexpected, onetime events. Another change can be to increase the trip price during peak and night hours to motivate more drivers to work peak or night hours by increasing their salary. But how can you verify that customers will be willing to tolerate an increase in trip price? You can roll out a price increase to a group of customers and test it.

These are just a few ways to solve this business problem using data analytics approaches. If your brainstormed approach was something similar, you are already applying the knowledge you gained in previous chapters and thriving in your data analytics journey.

Advanced Analytics Solutions Approach

Now let’s look for solutions through the lens of innovation and advanced data analytics, which was what Uber did. Instead of fixing prices based on distance, Uber used surge pricing to calculate the cost of trip. The surge price algorithm uses both predictable and unpredictable parameters like current supply and demand (the number of drivers and riders), peak hours, events or festivals, and accidents in the rider’s location to calculate the ride price. Without any algorithm or AA capabilities, Uber would not be able to create surge pricing.

We can examine if AA solutioning is better for this problem by answering the following questions:

Why is surge pricing better in this example?

Surge pricing solves the issues that come with pushing drivers to work night shifts or longer hours. Rather, it allows drivers to choose to be available due to incentives. Surge pricing also controls demand by changing the price to account for current conditions, like a snowstorm, instead of the time of the day. (People who badly want to get out of the storm will be willing to pay the premium price). And with this approach, drivers accept more rider requests due to the monetary benefits of doing so, increasing the supply of drivers in crowded areas.

How did Uber test if surge pricing would be accepted by customers?

The implementation of surge pricing involves a complex algorithm that takes into consideration many factors, including behavioral aspects. But one creative way Uber tested surge pricing was to track phone battery levels—customers were willing to accept higher prices when their phone was low on battery. They also found that customers were more willing to accept surge pricing if the rate was not rounded off to a whole number.92

Alternative Approach

There isn’t just one way to solve a business problem with AA, and AA solutions are not flawless. It is important to constantly monitor and iteratively improve the algorithm to better handle various challenges. Take a glance at the above surge pricing method and come up with one potential flaw or improvement to this algorithm. Although surge pricing offers benefits, it was slammed during the terrorist attacks in Sydney and London—the criticism resulted in ride services setting an upper limit on pricing during emergencies. For example, during states of emergency in the United States, Uber caps its fares at a price that matches the fourth-highest price in a particular area over the preceding two months.93

Now that you have introduced cap on surge prices for emergencies, think of one additional way you can alleviate the potential negative impact or public image of surge pricing. One way is to introduce loyalty rewards and establish different levels of user accounts based on the number of rides the user had taken in the past three months. Reward points can be used for future rides, priority pickups, ride cancellations, customized favorite rides, and so on.94

The purpose of this exercise is to better understand the basics of AA, how to iteratively develop an AA solution, and how to use predictive and behavioral aspects to fine-tune your approach. There are many ways to handle an AA problem, but the example mentioned above shows a logical way of thinking through various steps, even if you are technology, coding, or algorithm agnostic.

AI and ML

Artificial Analytics, Machine Learning, Natural Language Processing, Predictive Analytics. Have you heard of these words? Do you understand what they mean and can you separate them apart? Artificial intelligence (AI) and machine learning (ML) are part of our daily lives in some shape or form, whether or not we are aware of it.

Artificial Intelligence and Machine Learning are two commonly referred advanced analytics concepts, but they are often used interchangeably, which creates confusion as to their meaning. Artificial intelligence refers to things done by machines—and not humans, who have natural intelligence. In other words, AI is the ability of a digital computer or computer-controlled robot to perform tasks commonly associated with intelligent beings.95 Artificial intelligence is a broad term that can be divided into several subfields. These subfields can coexist and benefit from one another.

Machine learning (ML) is one of these subfields of artificial intelligence. The term machine learning was first used by Arthur Samuel in 1959.96 He described machine learning as a Field of study that gives computers the ability to learn without being explicitly programmed. Self-driving car is an example of machine learning where machine (car) collects data and makes driving decisions to provide safe experience to passengers. ML is a collection of complex algorithms to identify complex or hidden patterns in data and used to create an ML model (e.g. supervised, unsupervised models). Although describing ML models is beyond the scope of this book, we can think of models as outputs of ML using various algorithms to search patterns and make decisions regarding a large volume of data.

ML models can also be used to predict future trends and aid decision making in scenarios of uncertainty, which is a branch by itself called “predictive analytics” (but ML does not need to produce PA). PA has overlap with ML and expands with its employment of statistical methods and data mining. In simple terms, predictive analytics is predicting the future using past and current data. Natural Language Processing (NLP) is a subfield of AI and is about teaching machines the human language. If you have used Alexa or Siri, you can notice how you can talk to machines and get answers, which is made possible by NLP. ML can be used with NLP to improve these devices with experience. If you feel your Alexa or Siri is improving or understanding you better, you are seeing ML and NLP at work together.

Although there is a growing interest in AI and ML, there are also concerns about bias around AI. Human bias is real, and these biases do get propagated to ML models if not carefully handled. Biases come in all shapes and forms: unconscious bias, racial and gender bias, and insufficient population representation as a result of incomplete data. We sometimes hear about recruitment bias in technology, for example AI models that don’t like women or deny loans to people of color.97 Developers of AI solutions must make a conscious effort to eliminate bias. There are tools like What-IF (referred to as WIT, which stands for What If Tool) to help with bias detection.98 It is a simple and powerful visual tool that tests your trained ML models with minimal coding.

Another way to improve confidence in AI is to leverage a central organization like a data trust to govern the quality of data. A data trust provides independent, fiduciary stewardship of data.99 The data trust is a relatively new concept that is still evolving, but in simple terms, it provides guidance related to legal, governance, ethical, and technical standards for data sharing. A data trust can consist of a group of clients or organizations that want to share data with a central, independent trustee who acts as a steward for the shared data. A data trust can also aid data monetization because it increases trust in data.

While AI and ML offer several positive benefits to society, these technologies also open doors to a range of digital threats. The digital world is creating an information explosion, and it is becoming difficult to distinguish between reliable and unreliable sources of information. A malicious use of AI, for example, is DeepFake, which has received widespread media attention due to its considerable potential to be used for a range of malicious and criminal purposes.100 DeepFake uses “deep learning” branch of AI techniques to manipulate or generate visual and audio content that is difficult for humans or even technological solutions to immediately distinguish from authentic ones. Recently we have seen inauthentic videos of political leaders published, created by DeepFake.

In this technology age of misinformation wars, one can no longer rely on everything they see or hear. So creating popular awareness among people and providing tools to test the credibility of a source is a necessary exercise, a step in the data transformation process that is too often neglected. Cybersecurity, businesses using AI, and law enforcement have a daunting task of catching up with—and staying one step ahead of—technology advancements.

Data as a Product, Data as a Service, and Data Monetization

This whole book is about building data products. But is a data product the same as data as a product? Data products are solutions built using data being collected. In our coffee store example, we were collecting inventory data about coffee and building a dashboard with various KPIs to support our decision making about coffee store inventory. This is an example of a data product.

Data as a product, on the other hand, is data itself acting as a product. In simple terms, it’s the application of product thinking to raw data, thereby using data as a standalone product. Data as a product should be discoverable and readable. For example, if you created high-quality data as a product containing information about all the vendors buying coffee beans with available discounts. This could serve as a stand-alone data as a product. Data as a Service (DaaS) is a similar concept in which two or more organizations buy, sell, or trade machine-readable data. Companies like SafeGraph provide clean, high-quality data (DaaS) about places (global points of interest), and patterns (traffic, or how often people visit global points of interest) to understand consumer behavior.101

Data monetization refers to the tying of data to revenue within an organization and it also refers to the provision of new value streams as a revenue source. A constant struggle in many organizations is when business strategy is not endorsed or recognized by related data analytics initiatives, which keeps strategy in silos. Breaking down such barriers and connecting business strategy with data initiatives can demonstrate value realization and increase data monetization.

Analytics Tools for Noncoders

In recent years, there has been a lot of focus on low-code, no-code, and AutoML data analytics tools. The common theme among these tools is an invitation for everyone to use data, which increases impact and leads to greater overall data usage. As their names suggest, these tools require little-to-no code and employ visual interfaces and drag-and-drop options. AutoML focuses on automating the most time-consuming, iterative tasks of ML model development, thereby enabling faster deployment of ML models in production.102 No-code and low-code tools provide intuitive user interfaces and drag-and-drop capabilities that encourage a wide range of audiences to develop “Data as a Hobby.” Bringing more people from diverse backgrounds and cultures means integrating different points of view and new ways of thinking to solve a business problem. Data as a Hobby is my way of looking at data empowerment. It encourages people to play with data—a sharp contrast to being told by an organization to use data. When users, regardless of technical background, are provided with tools to experiment, it improves overall positive outcomes for the organization.

Although it’s handy to have a basic knowledge of some programming languages like Python, SQL, or R when analyzing data, it is no longer a strict entry point into analytics now that there are several no-code and low-code options available. These tools help users without technical (programming or data science) backgrounds join the data party of exploring data. They are simple to use but powerful enough to launch the user’s data analytics journey and contribute to the enterprise solutioning of better data products.

The number of low-code, no-code, and AutoML tools is constantly growing, but here are a few:

KNIME

Dataiku

DataRobot

Trifacta

Google Data Studio

BigML

Azure AutoML

Domo

QlikView

And the list goes on. It is tough to pick a single go-to tool, since most tools in this list offer comparable features and ease of use; it’s up to users and companies to explore various tools to and find the one best suited to their particular use cases. In addition, visualization tools like Tableau and Power BI are starting to support plug-in extensions and Python library support (PyCaret) from within their interfaces, which makes it extremely easy for users to leverage the benefits of AA. It is essential for organizations to use these tools to simply supplement traditional data tools or fill in gaps. I do not recommend replacing traditional or mainstream methods wholesale with the above-listed ease-to-use tools, which tend to suffer when task complexity increases significantly. When combined, low-code, no-code, AutoML, and traditional techniques can ensure faster go-to-market (GTM) solutions and increase the adoption of tools built using a collaborative approach.

Although these tools make implementing ML easier than ever, never overlook the underlying complexity of certain business problems. Algorithms may not be the solution for every business need. The quality of your predictions is always based on the quality of your data, regardless of the tool used. When solving problems that involve uncertainty and high complexity, it is essential to test and validate your results before relying on algorithms to predict the outcome. In 2017, Zillow—an American online real-estate marketplace company—announced a contest challenging entrants to improve its home value prediction algorithm. It offered $1.2 million in prize money.103 In the recent times, Zillow is an example of failing by relying heavily on algorithms.104 It underestimated the market risk and overestimated the accuracy of its predictive model. Both the 2007–2008 American financial crisis and the 2020–2022 global pandemic are examples of how markets can change quickly. Companies using ML need to take countermeasures to survive such events. The lesson here is not that ML doesn’t work, but that it is a powerful tool if used in combination with human in the loop, effective business processes, and procedures. Relying solely on AA to predict and handle business problems independently will lead to risky results that might end up costing tons of money in loss.

Conclusion

Advancements in technology are making data cheaper and more abundant each year. With widely available data, AA is a burgeoning opportunity to help business stay ahead and offer competitive advantage. The availability of easy-to-use analytics tools is broadening the developer and data explorer communities to include users who previously would have been waiting for solutions to be created by technical teams. In other words, these AA tools encourage data democratization across an organization and empower data users. It is a promising development—but it is equally important to consider ethics, eliminate bias, and provide transparency in the use of AA solutions by constantly evaluating their performance and changing them for the better.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.236.27