Chapter 19

Ten Steps to Build a Predictive Analytic Model

IN THIS CHAPTER

Building a predictive analytics team

Setting the business objectives

Preparing your data

Sampling your data

Avoiding “garbage in, garbage out”

Creating quick victories

Fostering change in your organization

Building deployable models

Evaluating your model

Updating your model

This chapter discusses best practices in building predictive analytics models. You'll get a handle on the importance of defining the business objectives early on — and on getting the leaders of your business to champion your project.

Building a Predictive Analytics Team

To assemble your predictive analytics team, you'll need to recruit business analysts, data scientists, and information technologists. Regardless of their particular areas of expertise, your team members should be curious, engaged, motivated, and excited to dig as deep as necessary to make the project — and the business — succeed.

Getting business expertise on board

Business analysts serve as your domain experts (see Chapter 15): They provide the business-based perspective on which problems to solve — and give valuable insight on all business-related questions. Their experience and domain knowledge give them an intuitive savvy about what approaches might or might not work, on where to start and what to look at to get something going.

remember A model is only as relevant as the questions you use it to answer. Solid knowledge of your specific business can start you off in the right direction; use your experts' perspectives to determine:

  • Which are the right questions? (Which aspects of your business do you want predictive analytics to improve?)
  • Which is the right data to include in the analysis? (Should your focus be on the efficiency of your business processes? The demographics of your customers? Which body of data stands out as the most critical?)
  • Who are the business stakeholders and how can they benefit from the insights gained from your predictive analytics project?

Hiring analytical team members who understand your line of business will help you focus the building of your predictive analytics solutions on the desired business outcomes.

Firing up IT and math expertise

Data scientists can play an important role linking together the worlds of business and data to the technology and algorithms while following well-established methodologies that are proven to be successful. They have a big say in developing the actual models and their views will affect the outcome of your whole project. This role will require expertise in statistics such as knowledge of regression/non-regression analysis and cluster analysis. (Regression analysis is a statistical method that investigates the relationships between variables.) The role also requires the ability to correctly choose the right technical solutions for the business problem and the ability to articulate the business value of the outcome to the stakeholders.

Your data scientists should possess knowledge of advanced algorithms and techniques such as machine learning, data mining, and natural language processing.

Then you need IT experts to apply technical expertise to the implementation, monitoring, maintenance, and administration of the needed IT systems. Their job is to make sure the IT infrastructure and all IT strategic assets are stable, secure, and available to enable the business mission. An example of this is making sure the computer network and database work smoothly together.

When data scientists have selected the appropriate techniques, then (together with IT experts) they can oversee the overall design of the system's architecture, and improve its performance in response to different environments and different volumes of data.

tip In addition to the usual suspects — business experts, math and statistical modelers, and computer scientists — you may want to spice up your team with specialists from other disciplines such as physics, psychology, philosophy, or liberal arts to generate fresh ideas and new perspectives.

Setting the Business Objectives

To give your predictive analytics project its best shot at success, be sure to set out specific business goals right from the start. Is the company adding to its product line? Targeting new customers? Changing its overall business model? Whatever the major focus is, pay particular attention to how your project will make a positive impact on the bottom line. This practical perspective will help you get your stakeholders to champion your project — which in turn generates the confidence you need to go forward.

In the early phase of the project, your analytics team should gather relevant business information by meeting with the stakeholders to understand and record their business needs — and their take on the issues that the project is expected to solve. The stakeholders' domain knowledge and firsthand insights can

  • Help the team evaluate possible solutions
  • Identify attainable, quantifiable business objectives
  • Provide a practical perspective for prioritizing the project's goals

Preparing Your Data

This step in building your predictive analytics project is as crucial as it is unavoidably time-consuming and tedious: data preparation. The actual needed steps vary from one project to the next; they depend on the initial state of your data and the requirements for your project.

remember You'll need to outline a strategy on how to handle these common data issues:

  • Which variables do you want to include in the analysis?
  • How will you check the correctness of certain field values?
  • How do you handle missing values in your data?
  • Will you include or exclude outliers?
  • Will you normalize some fields? Which ones?
  • Will you need to derive new variables from the existing data?
  • Will you need to include third-party data?
  • Does your data comprise enough records and variables?

Sampling Your Data

To ensure that you can accurately measure the performance of the predictive analytics model you're building, separate your historical business data into training and test datasets:

  • The training dataset: This dataset comprises the majority (about 70 percent) of the data. You'll use it to train the predictive model.
  • The test dataset: This is a smaller percentage (about 30 percent) of the data, used to test and measure the model's performance. It's an independent set of data that the model hasn't yet seen.

Splitting historical data into training and test datasets helps protect against overfitting the model to the training data. (See Chapter 15 for more about overfitting.) You want your model to identify true signals, patterns, and relationships, and to avoid any false ones that could be attributed to the noise within the data. The essence of overfitting is as follows: When a model is tuned to a specific dataset, there is a higher chance that any uncovered patterns are only true for that dataset; the same model may not perform as well on other datasets. Use your testing dataset to help eliminate these dataset-specific patterns (which are considered mostly noise), and your predictive model will become more accurate.

tip For better model development, make sure your training and test datasets are similar enough to each other to minimize inconsistencies in data quality, relevance, and time coverage. One common way to get a true representation of similar data in both datasets is to choose these data samples at random.

Avoiding “Garbage In, Garbage Out”

More data doesn't necessarily mean better data. A successful predictive analytics project requires, first and foremost, relevant and accurate data.

Keeping it simple isn't stupid

If you're trying to address a complex business decision, you may have to develop equally complex models. Keep in mind, however, that an overly complex model may degrade the quality of those precious predictions you're after, making them more ambiguous. The simpler you keep your model, the more control you have over the quality of the model's outputs.

Limiting the complexity of the model depends on knowing what variables to select before you even start building it — and that consideration leads right back to the people with domain knowledge. Your business experts are your best source for insights into what variables have direct impact on the business problem you're trying to solve. Also, you can decide empirically on what variables to include or exclude.

Use those insights to ensure that your training dataset includes most (if not all) the possible data that you expect to use to build the model.

Data preparation puts the good stuff in

To ensure high data quality as a factor in the success of the model you're building, data preparation and cleaning can be of enormous help. When you're examining your data, pay special attention to

  • Data that was automatically collected (for example, from web forms)
  • Data that didn't undergo thorough screening
  • Data collected via a controlled process
  • Data that may have out-of-range values, data-entry errors, and/or incorrect values

remember Common mistakes that lead to the dreaded “garbage in, garbage out” scenario include these classic goofs:

  • Including more data than necessary
  • Building more complex models than necessary
  • Selecting bad predictor variables or features in your analysis
  • Using data that lacks sufficient quality and relevance

Creating Quick Victories

An iterative approach to building the model — trying a version of the model, fine-tuning it in light of your results, and then trying the improved version — will allow you to evaluate the variables and algorithms used in your model, and choose those best suited to your final solution. Building your model iteratively may help you make some decisions and choices:

  • Determining whether to include other data types
  • Determining whether to aggregate some of the data fields
  • Clearly identifying a rollout plan
  • Identifying any data gaps early enough to improve the processes involved
  • Evaluating your model's scalability for bigger transactions and larger volume of data

remember You can show the value of the analytics for your business by implementing a small pilot project and showing quick victories. Delivering a specific solution can bring you the buy-in necessary to build larger-scale solutions and more powerful models. Creating quick victories early in the process will allow you to understand pressing business questions, and when you provide solutions to those questions, you can reinforce the buy-in from the business stakeholders. Success breeds success — and it doesn't have to be overnight. By establishing a track record of success for your model, you can help foster the cultural change needed for a widespread adoption of predictive analytics within your organization.

Fostering Change in Your Organization

Impressive past performance doesn't guarantee an equally impressive future for an organization. It isn't enough to look at how the business has been done thus far. Instead, organizations should look at how predictive analytics can transform the way they're doing business in response to a rapidly changing present environment. For that to happen, business leaders need a major shift in the way they think and operate the business. Your predictive analytics project is a good place for them to start that shift.

Granted, the old guard — traditional business leaders who have been operating their businesses on gut feelings — can be close-minded at first, reluctant to adopt new technologies and trust the predictions and recommendations that come from them. You should expect some degree of organizational resistance to the deployment of your new model. This is especially true when an analytical system detects a major shift in trends — or a bigger crisis than anticipated — prompting the business leaders to distrust the system's recommendations and rely on historical analysis. If the business managers aren't willing to act on the recommendations of the predictive model, the project will fail.

remember Creating cultural changes that promote the use of predictive analytics to drive business decisions isn't only essential to the success of your project, but also — if you've built the model well — to the success of your business. You have to build not only a working model, but also an in-house culture that champions the use of predictive analytics as an aspect of business intelligence.

When you've demonstrated that your analytics program can guide the organization effectively toward achieving its business goals, be sure you clearly communicate — and widely publicize — those results within the organization. The idea is to increase awareness and buy-in for the program. Educating stakeholders about the benefits of predictive analytics entails emphasizing the possible loss of both opportunities and competitive edge if this tool isn't developed and deployed. Maintaining focus on such business values can have a direct and positive impact on creating a cultural change that favors predictive analytics.

The process of educating and training may take time to bear fruit; most organizational changes require time to implement and to be adopted. Be sure you recruit business team members who have both an understanding of and experience in managing organizational change and developing internal communications strategy.

Building Deployable Models

In order to ensure a successful deployment of the predictive model you're building, you'll need to think about deployment very early on. The business stakeholders should have a say in what the final model looks like. Thus, at the beginning of the project, be sure your team discusses the required accuracy of the intended model and how best to interpret its results.

Data modelers should understand the business objectives the model is trying to achieve, and all team members should be familiar with the metrics against which the model will be judged. The idea is to make sure everyone is on the same page, working to achieve the same goals, and using the same metrics to evaluate the benefits of the model.

Keep in mind that the model's operational environment will most likely be different from the development environment. The differences can be significant, from the hardware and software configurations, to the nature of the data, to the footprint of the model itself. The modelers have to know all the requirements needed for a successful deployment in production before they can build a model that will actually work on the production systems. Implementation constraints can become obstacles that come between the model and its deployment.

Understanding the limitations of your model is also critical to ensuring its success. Pay particular attention to these typical limitations:

  • The time the model takes to run
  • The data the model needs; sources, types, and volume
  • The platform on which the model resides

remember Ideally, the model has a higher chance of getting deployed when

  • It uncovers some patterns within the data that were previously unknown.
  • It can be easily interpreted to the business stakeholders.
  • The newly uncovered patterns actually make sense businesswise and offer an operational advantage.

Evaluating Your Model

Your goal, of course, is to build an analytical model that can actually solve the business objectives it was built for. Expect to spend some time evaluating the accuracy of your model's predictions so as to prove its value to the decision-making process — and to the bottom line.

Evaluate your model from these two distinct angles:

  • Business: The business analyst should evaluate the model's performance and the accuracy of its predictions in terms of how well they address business objectives. Are the insights derived from the model making it easier for you to make decisions? Are you spending more time or less time in meetings because of these new insights?
  • Technical: The data scientists and IT professionals should evaluate the algorithms used and the statistical techniques and methods applied. Are the algorithms chosen optimal for the model's purpose? Are the insights being generated fast enough to produce actionable advantages?

remember In addition to closely examining the data used, selecting variables with the most predictive power, and the algorithms applied, the most critical test is to evaluate whether the model meets business needs and whether it adds value to the business.

tip Test your model in a test environment that closely resembles the production environment. Set the metrics to evaluate the success of the model at the beginning of the project. Specifying the metrics early makes the model easier to validate later on.

Updating Your Model

Successful deployment of the model in production is no time to relax. You'll need to closely monitor its accuracy and performance over time. A model tends to degrade over time (some faster than others); and a new infusion of energy is required from time to time to keep that model up and running. To stay successful, a model must be revisited and re-evaluated in light of new data and changing circumstances.

If conditions change so they no longer fit the model's original training, then you'll have to retrain the model to meet the new conditions. Such demanding new conditions include

  • An overall change in the business objective
  • The adoption of — and migration to — new and more powerful technology
  • The emergence of new trends in the marketplace
  • Evidence that the competition is catching up

Your strategic plan should include staying alert for any such emergent need to refresh your model and take it to the next level, but updating your model should be an ongoing process anyway. You'll keep on tweaking inputs and outputs, incorporating new data streams, retraining the model for the new conditions and continuously refining its outputs. Keep these goals in mind:

  • Stay on top of changing conditions by retraining and testing the model regularly; enhance it whenever necessary.
  • Monitor your model's accuracy to catch any degradation in its performance over time.
  • Automate the monitoring of your model by developing customized applications that report and track the model's performance.

    Automation of monitoring, or having other team members involved, would alleviate any concerns a data scientist may have over the model’s performance and can improve the use of everyone’s time.

    tip Automated monitoring saves time and helps you avoid errors in tracking the model's performance.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.142.250