Chapter 11

Visualization of Analytical Results

IN THIS CHAPTER

Applying visualization to the predictive analytics lifecycle

Evaluating data visualization

Using visualization on different predictive analytics models

Introducing a novel predictive analytics visualization

Highlighting big data visualization tools

Visualization is an art. In predictive analytics, it’s the art of being able to analyze and tell a story from your data and analytical results. The story may not only be about the present or the past, but also about the future.

Quick, easy-to-generate visualizations would enhance the decision-making process, making it faster and more effective. Data visualization would also provide executives with a basis for asking better and smarter questions about the organization.

This chapter zooms in on the importance, benefits, and complexities of data visualization. You become familiar with four criteria you can use to evaluate a visualization of analytical results. You’re also introduced to the different types of visualizations that you can deploy for different types of prediction models.

remember This chapter focuses on the specific use of data visualization: making sense of the analytical results and using the visualization as part of your reporting back to the business stakeholders. See Chapter 4 for other data visualization techniques that can help you get a closer look at your data and understand it better.

Visualization as a Predictive Tool

Napoleon Bonaparte said, “A good sketch is better than a long speech.” The reason for this truism is that the human brain finds pictures easier to digest than text or numbers. Since the early days, mankind has been relying on pictorial representations to communicate and share information. Maps were one of the very first widespread visualizations, becoming indispensable enough to originate the field of cartography. Maps have played a far-reaching role in sharing ideas and distributing them widely to many generations, reinforcing the human tendency to communicate information visually.

In predictive analytics, data visualization presents analytical results as a picture that can be easily used to build realistic, actionable narratives of possible futures; such narratives can be archived and transmitted throughout your organization, helping to form the basis of its approach to its business.

So, how does visualization figure into the lifecycle of predictive analytics? Read on.

Why visualization matters

Reading rows of spreadsheets, scanning pages and pages of reports, and going through stacks of analytical results generated by predictive models can be painstaking, time-consuming, and — let’s face it — boring. Looking at a few graphs representing that same data is faster and easier, while imparting the same meaning. The graphs can bring more understanding more quickly, and drive the point home efficiently. Graphs can tell you more than tables. For example, summary statistics like mean and median don’t allow you to spot a bimodal distribution without an accompanying chart. A lift chart is often better than a gains table or certainly better than a simple report of overall accuracy. Such advantages are behind the increased demand for data visualization. Companies are starving for visualization tools that can help them understand the key drivers of their businesses.

Arming your data analysts with visualization tools changes the way they analyze data: They can derive more insights and respond to risks more quickly. And they will be empowered to utilize imagination and creativity in their digging and mining for deeper insights. Additionally, through visualization tools, your analysts can present their findings to executives in a way that provides easy, user-friendly access to analytical results.

For example, if you’re dealing with content analytics and have to analyze text, emails, and presentations (for openers), you can use visualization tools to convert the content and ideas mentioned in raw content (usually as text) into a clear pictorial representation.

One such visualization is the graphs shown in Figure 11-5; they represent the correlation between concepts mentioned in text sources. Think of it as a labor-saving device: Now someone doesn’t have to read thousands of pages, analyze them, extract the most relevant concepts, and derive a relationship among the items of data.

Analytics tools provide such visualizations as output, which goes beyond traditional visualizations by helping you with a sequence of tasks:

  1. Do the reading efficiently.
  2. Understand lengthy texts.
  3. Extract the most important concepts.
  4. Derive a clear visualization of the relationship between those concepts.
  5. Present the concepts in ways that your stakeholders find meaningful.

This process is known as interactive data visualization. It’s different from a simple visualization because

  • You can analyze and drill down into the data represented by the graphs and charts for more details and insights.
  • You can dynamically change the data used in those charts and graphs.
  • You can select the different predictive models or preprocessing techniques to apply to the data that generated the graph.

These visualization tools save the data analyst a tremendous amount of time when generating reports, graphs, and (most importantly) effective communication about the results of predictive analysis.

That effective communication includes getting people together in a room, presenting the visualizations, and leading discussions that emerge from questions such as these:

  • “What does that point in the graph mean?”
  • “Does everyone see what I see?”
  • “What would happen if we added or removed certain data elements or variables?”
  • “What would happen if we changed this or that variable?”

Such discussions could unveil aspects of the data that weren’t evident before, remove ambiguity, and answer some new questions about data patterns.

Getting the benefits of visualization

Using visualizations to present the results of your predictive analytics model can save you a lot of time when you’re conveying your ideas to management. Visualization can make the business case for you, providing an instant understanding of complex analytical results.

Another benefit of using charts and graphs is to ease the process of decision-making. For example, you can use visualizations to identify areas in your business that need attention, as when you show maps that present comparative sales of your product by location and can more easily identify areas that might need more advertising. Doing several such analyses and presentations over time can create a narrative of predicting sales volume by location.

Similarly, in political campaigns maps are powerful communication tools that can be used to convey visually the up-to-date status of votes and eventually help predict the chances of winning. They can also aid with rethinking the campaign strategy.

Walking into a meeting with eye-catching graphics in addition to spreadsheets of numbers can make your meeting more effective because visualizations are easy to explain to a diverse audience. Meetings can then become opportunities for discussion, focused imagination, and ingenuity, leading to the discovery of new insights.

Visualization can be used to confirm or disprove assumptions made about a specific topic or phenomenon in your data. It can also validate your predictive model by helping you determine whether the output of the model is in line with the business requirements, and the data supports the claims made for the model.

In summary, visualization

  • Is easy to understand
  • Is visually appealing
  • Simplifies the complexities of the analysis
  • Is an efficient medium for communicating results
  • Makes the business case
  • Validates the output of your model
  • Enables the decision-making process

Dealing with complexities

Let's face it: Visualization may help simplify communication, but making effective use of visualization isn’t exactly simple. Using data visualization to draft the storylines of scenarios that portray the future of your organization can be both powerful and complex.

The complexities of using visualization in predictive analytics can crop up in various areas:

  • Visualization requires a wide range of multi-disciplinary skills in (for example) statistics, analysis, graphic design, computer programming, and narrative.
  • A large body of data that comes from a variety of sources can be unruly to handle. Finding innovative ways to plot all that data — and represent it to the decision-makers in ways they find meaningful — can be challenging.
  • Visualizing analytical results can accidentally convey misleading patterns or predictions. Different interpretations and various possible insights might come from the same visualization.

    tip To head off this difficulty, you might want to have different analysts discuss these possibilities and their meanings beforehand, in depth; get them to agree on a single, consistent story derived from the visualization before you present it to management.

Evaluating Your Visualization

There are several ways to visualize data; but what defines a good visualization? The short answer: Whatever gets the meaning across is your best choice. To help you find that best choice, this section lists four criteria you can use to judge your visualization. This isn't a comprehensive list, but it should point you toward the best visualization to drive your idea home.

How relevant is this picture?

Your data visualization must have a clear, well-defined purpose — have a goal in mind and convey a clear idea of how to get there. That purpose could be the answering of the business need that brought you to apply predictive analytics in the first place. A subsidiary, immediately practical purpose could be your need to convey complex ideas through visualization. To answer both needs, first keep in mind that the data presented in the visualization has to be relevant to the overall theme of your analytics project. (That relevance won’t be far to seek; your analytical project started with selecting the relevant data to feed into your predictive model.)

With the theme in mind, the next step is to create a narrative that presents the relevant data, highlights the results that point toward the goal, and uses a relevant visualization medium. (If your company has a room that’s ideal for, say, PowerPoint presentations, consider that a big hint.)

How interpretable is the picture?

If you apply analytics to your data, build a predictive model, and then display your analytical results visually, you should be able to derive well-defined interpretations from your visualizations. Deriving those meaningful interpretations leads, in turn, to deriving insights, and that’s the linchpin for the whole predictive analytics process.

The story you tell via your visualization medium must be clear and unambiguous. A roomful of conflicting interpretations is usually a sign that something is amiss. To keep the interpretation of the visualization on track, be sure you keep it firmly in line with the model’s output — which in turn aligns the whole effort with the business questions that prompted the predictive analytics quest.

In cases where a visualization might allow several interpretations, those interpretations should converge to tell the same story in the end. As with many undertakings, multiple interpretations are often possible. Try to anticipate, discuss, and tweak them beforehand until they all convey the same underlying idea or support the same overarching concept.

Is the picture simple enough?

A visualization that’s too complex or too simple can be misleading or confusing. To achieve effectiveness, your visualization needs clarity and elegance. For example, to make it very easy to read, you might be tempted to use two simple charts instead of one more complex one; if it isn't done well, makes the relationships harder to see. Sometimes a little starting at a well-crafted graphic for several minutes is worth the effort. The trick is not to distract with needless detail. Everything should work together to communicate the pattern.

You should always aim for clarity by adding as many legends (guides to what the parts of the image mean) as needed, and making them as clear as possible. You can use legends to define all the symbols, figures, axes, colors, data ranges, and other graphical components you have in your visualization.

Choosing the right combination of colors and objects to represent your data can enhance elegance. The medium you choose to present your data is also critical. The medium refers to the images, graphs, and charts in your presentations, in addition to the conference room, and to the visual aids you use to present your analytical results, such as TV screen, white board, or projector.

tip As a rule, the simpler the visualization and the more straightforward its meaning is, the better it is. You know you’ve succeeded when the visualization does the talking for you.

Does the picture lead to new actionable insights?

Your visualization should add something new to your predictive analytics project. Ideally, it should help you find new insights that weren't known before. During the building of your predictive analytics model, you can use visualization to fine-tune the output of your model, examine the data, and plot the result of the analysis. Visualization can be your guide to discovering new insights, or discerning and learning new relationships among items of data in the sea of data you’re analyzing.

Visualization should help you seal the deal and erase any doubts about the analysis; it should support the findings and the output of the model. If it does so effectively, then presenting these findings to management will help them embrace and act upon the results.

Visualizing Your Model’s Analytical Results

This section presents some ways to use visualization techniques to report the results of your models to the stakeholders.

Visualizing hidden groupings in your data

As discussed in Chapter 6, data clustering is the process of discovering hidden groups of related items within your data. In most cases, a cluster (grouping) consists of data objects of the same type such as social network users, text documents, or emails. One way to visualize the results of a data-clustering model is shown in Figure 11-1, where the graph represents social communities (clusters) that were discovered in data collected from social network users. In Figure 11-1, the data about customers was collected in a tabular format; then a clustering algorithm was applied to the data, and the three clusters (groups) were discovered: loyal customers, wandering customers, and discount customers. Assume that the X and Y axis represent the two principal components generated of the original data. Principal component analysis (PCA) is a data reduction technique. For more information about PCA, see Chapter 9.

image

FIGURE 11-1: Clustering customers in three groups: loyal, wandering, and discount.

Here the visual relationship among the three groups already suggests where enhanced and targeted marketing efforts might do the most good.

Visualizing data classification results

A classification model assigns a specific class to each new data point it examines. The specific classes, in this case, could be the groups that result from your clustering work (see the preceding section). The output highlighted in the graph (refer to Figure 11-1) can define your target sets. For any given new customer, a predictive classification model attempts to predict which group the new customer will belong to.

After you’ve applied a clustering algorithm and discovered groupings in the customer data, you come to a moment of truth: Here comes a new customer — you want the model to predict which type of customer he or she will be.

Figure 11-2 shows how a new customer’s information is fed to your predictive analytics model, which in turn predicts which group of customers this new customer belongs to. In Figure 11-2, new Customers A, B, and C are about to be assigned to clusters according the classification model. Applying the classification model resulted in a prediction that Customer A would belong with the loyal customers, Customer B would be a wanderer, and Customer C was only showing up for the discount.

image

FIGURE 11-2: Assigning Customers A, B, and C, to their classifications (clusters).

Visualizing outliers in your data

In the course of clustering or classifying new customers, every now and then you run into outliers (special cases that don’t fit the existing divisions).

Figure 11-3 shows a few outliers that don’t fit well into the predefined clusters. In Figure 11-3, six outlier customers have been detected and visualized. They behave differently enough that the model can’t tell whether they belong to any of defined categories of customers.

image

FIGURE 11-3: Six outlier customers defy categorization just by showing up.

Visualization of Decision Trees

Many models use decision trees as their outputs: These diagrams show the possible results from alternative courses of action, laid out like the branches of a tree.

Figure 11-4 shows an example of a tree used as a classifier: It classifies baseball fans based on a few criteria, mainly the amount spent on tickets and the purchase dates. From this visualization, you can predict the type of fan that a new ticket-buyer will be: casual, loyal, bandwagon, diehard, or some other type. Attributes of each fan are mentioned at each level in the tree (total number of attended games, total amount spent, season); you can follow a path from a particular “root” to a specific “leaf” on the tree, where you hit one of the fan classes (c1, c2, c3, c4, c5). For more information on how to algorithmically generate a decision tree, see Chapter 7.

image

FIGURE 11-4: Finding the class in which a particular baseball fan belongs.

Suppose we want to determine the type of baseball fan a customer is so that we can determine what type of marketing ads to send to the customer. We want to know whether the customer is a baseball fanatic or someone who just rides the bandwagon. Suppose we hypothesize that baseball fanatics and bandwagon fans can be persuaded to buy a new car (or other discretionary goods) when their team is doing well and headed for the playoffs. We may want to send marketing ads and discounts to persuade them to make the purchase. Further, suppose we hypothesize that bandwagon fans can be persuaded to vote in support of certain political issues. We can send them marketing ads asking them for that support. If you know what type of fan base you have, using decision trees can help you decide how to approach it as a range of customer types.

Visualizing predictions

Assume you’ve run an array of predictive analytics models, including decision trees, random forests, and flocking algorithms. You can combine all those results and present a consistent narrative that they all support, as shown in Figure 11-5. Here confidence is a numerical percentage that can be calculated using a mathematical function. The result of the calculation encapsulates a score of how probable a possible occurrence is. On the x axis, the supporting evidence represents the content source that was analyzed with content-analytics models that identified the possible outcomes. In most cases, your predictive model would have processed a large dataset, using data from various sources, to derive those possible outcomes. Thus you need show only the most important supporting evidence in your visualization, as depicted in Figure 11-5.

image

FIGURE 11-5: Showing only the most important supporting evidence in the visualization.

In Figure 11-5, a summary of the results obtained from applying predictive analytics is presented as a visualization that illustrates possible outcomes, along with a confidence score and supporting evidence for each one. Three possible scenarios are shown:

  • The inventory of Item A will not keep up with demand if you don’t ship at least 100 units weekly to Store S. (Confidence score: 98 percent.)
  • The number of sales will increase by 40 percent if you increase the production of Item A by at least 56 percent. (Confidence score: 83 percent.)
  • A marketing campaign in California will increase sales of Items A and D but not Item K. (Confidence score: 72 percent.)

The confidence score represents the likelihood that each scenario will happen, according to your predictive analytics model. Note that they are listed here in descending order of likelihood.

Here the most important supporting evidence consists of how excerpts from several content sources are presented over the x axis. You can refer to them if you need to explain how you got to a particular possible scenario — and trot out the evidence that supports it.

The power behind this visualization is its simplicity. Imagine, after months of applying predictive analytics to your data, working your way through several iterations, that you walk into a meeting with the decision maker. You’re armed with one slide visualization of three possible scenarios that might have a huge impact on the business. Such a visualization creates effective discussions and can lead management to “aha” moments.

Novel Visualization in Predictive Analytics

A visualization can also represent a simulation (a pictorial representation of a what-if scenario). You can follow up a visualization of a prediction with a simulation that overlaps and supports the prediction. For example, what happens if the company stops manufacturing Product D? What happens if a natural disaster strikes the home office? What happens if your customers lose interest in a particular product? You can use visualization to simulate the future behavior of a company, a market, a weather system — you name it.

A dashboard is another type of visualization you can use to display a comprehensive predictive analytics model. The dashboard will allow you, using a control button, to change any step in the predictive analytics pipeline. This can include selecting the data, data preprocessing, selecting a predictive model, and selecting the right evaluation versions. You can easily modify any part of the pipeline at anytime using the control button on the dashboard. A dashboard is an interactive type of visualization where you have control and you can change the diagrams, tables, or maps dynamically based on the inputs you choose to include in the analyses that generate those charts and graphs.

Flock-by-leader algorithm for data visualization

At least one predictive analytics technique is purely inspired by the natural phenomenon of birds flocking (refer to Chapter 2). The bird-flocking model not only identifies groupings in data, it shows them in dynamic action. The same technique can be used to picture hidden patterns in your data.

The model represents data objects as birds flying in a virtual space, following flocking rules that orchestrate how a migrating swarm of birds moves in nature.

Representing several data objects as birds reveals that similar data objects will flock together to form subflocks (groupings). The similarity among objects in the real world is what drives the movements of the corresponding birds in the virtual space. For example, as shown in Figure 11-6, imagine that you want to analyze the online data collected from several Internet users (also known as netizens).

image

FIGURE 11-6: Using bird flocking to analyze the online behavior of Internet users.

Every piece of information (gleaned from such sources as social network user information and customer online transactions) will be represented as a corresponding bird in the virtual space, as shown in Figure 11-7.

image

FIGURE 11-7: Two netizens flocking.

If the model finds that two or more users interact with each other through email or chat, appear in the same online photo, buy the same product, or share the same interests, the model shows those two netizens as birds that flock together, following natural flocking rules.

The interaction (that is, how close the representative birds get to each other) is expressed as a mathematical function that depends on the frequency of social interaction, or the intensity with which the users buy the same products or share the same interests. This latest mathematical function depends purely on the type of analytics you’re applying.

Figure 11-7 depicts the interaction on Facebook between Netizens X and Y in cyberspace as bird-flocking virtual space, where both X and Y are represented as birds. Because Netizens X and Y have interacted with each other, the next flocking iteration will show their two birds as closer together.

An algorithm known as “flock by leader,” invented by Prof. Anasse Bari and Prof. Bellaachia (see the following references), was inspired by a recent discovery that revealed the leadership dynamics in pigeons. This algorithm can mine user input for data points that enable it to detect leaders, discover their followers, and initiate flocking behavior in virtual space that closely mimics what happens when flocks form naturally — except the flocks, in this case, are data clusters called data flocks.

This technique not only detects patterns in data, but also provides a clear pictorial representation of the results obtained by applying predictive analytics models. The rules that orchestrate natural flocking behavior in nature were extended to create new flocking rules that conform to data analytics:

  • Data flock homogeneity: Members of the flock show similarity in data.
  • Data flock leadership: The model anticipates information leaders.

tip Representing a large dataset as a flock of birds is one way to easily visualize big data in a dashboard.

This visualization model can be used to detect pieces of data that are outliers, leaders, or followers. One political application could be to visualize community outliers, community leaders, or community followers. In the biomedical field, the model can be used to visualize outliers’ genomes and leaders among genetic samples of a particular disease (say, those that show a particular mutation most consistently).

A bird-flocking visualization can also be used to predict future patterns of unknown phenomena in cyberspace — civil unrest, an emerging social movement, a future customer’s lineage.

The flocking visualization is especially useful if you’re receiving a large volume of streamed data at high velocity: You can see the formation of flocking in the virtual space that contains the birds that represent your data objects. The results of data analytics are reflected (literally) on the fly on the virtual space. Reality given a fictional, yet observable and analytically meaningful, representation purely inspired from nature. Such visualizations can also work well as simulations or what-if scenarios.

In Figure 11-8, a visualization based on flocking behavior starts by indexing each netizen to a virtual bird. Initially, all the birds are idle. As data comes in, each bird starts flocking in the virtual space according to the analytics results and the flocking rules.

image

FIGURE 11-8: Tracking the flocking netizens.

In Figure 11-9, the emerging flock is formed as the analytics are presented.

image

FIGURE 11-9: What the flock is doing.

After analyzing data over a large period of time ending at t+k, the results of this application of predictive analytics results can be depicted as shown in Figure 11-10: The flock-by-leader algorithm differentiates the members of the flock into three classes: a leader, followers, and outliers.

image

FIGURE 11-10: Flock-by-leader subdivides the flock.

technicalstuff The flock-by-leader algorithm was invented by Dr.Bari and Dr.Bellaachia and it is explained in details in these resources:

  • “Flock by Leader: A Novel Machine Learning Biologically-Inspired Clustering Algorithm”, IEEE International Conference of Swarm Intelligence, 2012.

    This also appears as a book chapter in Advances in Swarm Intelligence, 2012 Edition – (Springer-Verlag).

  • “SFLOSCAN: A Biologically Inspired Data Mining Framework for Community Identification in Dynamic Social Networks”, IEEE International Conference on Computational Intelligence, 2011 (SSCI 2011), 2011.

Big Data Visualization Tools

Big data has the potential to inspire businesses to make better decisions. It's important to be aware of the tools that can quickly help you create good visualization. You want to always keep your audience engaged and interested.

This section introduces some popular visualization tools for large scale enterprise analytics. Most of these tools don't require any coding experience and they are easy to use. If your raw data is in Excel sheets or resides in databases, you can load your data into these tools to visualize it for data exploration and analytics purposes. Alternatively, you may have the results from applying a predictive model on your data ready on spreadsheets, so (or and) you can also use these tools to visualize those results. (Examples of visualizations are illustrated in Chapter 4.)

Tableau

Tableau is a visualization tool for enterprise analytics. With Tableau, you can load your data and visualize it in charts, maps, tree maps, histograms, and word clouds. You can run Tableau as a desktop application a server, or a cloud-based solution.

Tableau integrates with many big-data platforms, such as R, RapidMiner, and Hadoop. Tableau pulls data from major databases and supports many file formats. Tableau for enterprise isn't free. For academic purposes, Tableau can provide free licenses.

For information about Tableau, visit https://public.tableau.com/s and http://www.tableau.com.

Google Charts

Google chart tools are free, and easy to use. They include histograms, geo charts, column charts, scatter charts, timeline charts, and organizational charts. Google Charts are interactive, zoomable, and can run on HTML5 and SVG. Google charts can also visualize real-time data.

For more information about Google Chart, visit https://developers.google.com/chart.

Plotly

Plotly is another visualization tool that your teams of developers can adopt using APIs. You can create charts and dashboard with Plotly.

Plotly is compatible with Python R and Matlab and its visualization can be embedded in web-based applications.

For more information about Plotly, visit https://plot.ly.

Infogram

This tool helps you create visualizations in a three-step process: choosing a template, adding charts to visualize your data, and then sharing your visualizations. A monthly fee is required to use the tool for a professional version, a business version, or an enterprise. The tool can support multiple accounts for your team.

For more information about Infogram, visit https://infogr.am.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.142.2