6
The Art of Analytics

The secret of getting ahead is getting started. The secret of getting started is breaking your complex overwhelming tasks into small manageable tasks, and starting on the first one.

Mark Twain

6.1. Introduction

Through the previous chapter, you have understood that Big Data has marked a major turning point in the use of data and is a powerful vector for growth and profitability. A complete understanding of a company’s data and potential is, without a doubt, a new vector of performance.

For a company to successfully use Big Data, it is essential to obtain the right information and sources of knowledge, so as to ultimately help the company make more informed decisions and achieve better results.

But to achieve this, a company needs to go beyond simply answering the question about amounts of data, because Big Data isn’t just about collecting data, but also about using it effectively.

Data is therefore a form of wealth that we cannot achieve without understanding and mastering the process that leads us to this wealth.

This process, which gives the company the power to gain a competitive advantage, is called analysis. This is why when you run into the term “Big Data”, you often hear the term “Analytics” in the same sentence.

In the Big Data universe, data and analysis are totally interdependent: one without the other is practically useless. In other words, there is no data without analysis and no analysis without data. The process we apply to data is the means by which many products and services can be created.

To generate value, as previously discussed, we need a stack of tools and techniques that can build a framework in order to extract useful functionality from large volumes of data. This includes:

  • – the identification of relevant data;
  • – the definition and choice of methods able to detect correlations to see through the woods, in order to establish clear guidelines and know what to use or not to use.

The aim is to make operations and various functionalities more agile, to apply them to faster (real-time) and more strategic decisions based on data. It’s just like looking in the rearview mirror when driving a car.

Understanding this process is therefore essential, so that you can follow the different examples of algorithmic applications that will be discussed later in this book (Part 3), and in turn, see the opportunities of Big Data analytics.

Thus, before explaining how to use large data-sets in the context of the sharing economy, we will identify the different phases of the data analysis process. This chapter will serve as an instruction manual to help you understand this process.

6.2. From simple analysis to Big Data analytics

In his book, Big Data at Work: Dispelling the Myths, Uncovering the Opportunities (2014), Thomas H. Davenport stated that we have moved beyond the time when data was analyzed for weeks or even months, ultimately achieving results that we could make available to users we never met.

Nowadays, a new term has already been coined, that of the Internet of Things (IoT), in which everything is connected and which produces large volumes of data in different forms and in real-time. The amount of data increased and, as a result, a different need quickly emerged. Hence the importance of Big Data analytics, which is not a new trend.

The importance of data analysis began since the first use of information technology by companies in the late 1960s. This is where experts began to focus on databases and their evolution. By analyzing these databases, companies have always tried to understand their customers’ behavior, or their general context.

However, with the emergence of the 3Vs phenomenon, a new form of analysis has emerged, with different types, methods and stages. This means that traditional tools cannot manage the amounts of data that are arriving in a continuous flow, in various forms and from different sources.

This data has become so large and important that several tools and methods have been developed to analyze and leverage it. What used to take a few hours, days, weeks or even months in traditional analysis can now be processed in seconds or in real-time.

In the Big Data universe, the thing that distinguishes a company from its competitors is the ability to identify the type of analysis that can be optimally exploited to its advantage. Many companies do not know where to start and what type of analysis may be more favorable.

Data analysis can be divided into three distinct types. These are: descriptive analysis, predictive analysis and prescriptive analysis.

At first glance, you can easily distinguish between the first and second type. Moreover, their name indicates it: “describe” and “predict”. But what about the third type? Simply put, prescriptive analysis refers to the type of analysis in which the actions necessary to achieve a specific objective are determined.

These three types of analysis are the most common, and are interdependent methods that allow the company to better manage the amount of data it has.

Some companies only use descriptive analysis to provide information on the decisions they face. While others may use a combination of analysis types in order to obtain useful information for planning and decision-making.

In the next section, we explore these three types of analysis in detail, referring to an example so that you can see what each type brings to improve a company’s operational capabilities.

6.2.1. Descriptive analysis: learning from past behavior to influence future outcomes

In its simplest form, data analysis involves a form of descriptive analysis (Delen and Demirkan 2013). Descriptive analysis is simply the analysis of historical (past) data of an event to understand and assess its trends over time. It involves the use of analytical techniques to locate relevant data and identify notable trends in order to better describe and understand what is happening in the data-set (Sedkaoui 2018b).

These techniques are therefore used to describe the past, which can refer to any time in which any event has occurred. Consider basic calculations such as sums, means, percentages, etc., as well as graphical presentations. This analysis is based on standard database functions, which just require a knowledge of some elementary calculations.

The most important thing is that the company can have a clear vision of the future results of the event (production, operations, sales, stock, customers, etc.), based on how it has operated in the past, in order to know how it should act. This provides a clear explanation of the behavior of an event and why certain events occurred. The objective is to find the reason for success or failure.

What makes this type useful is the fact that it is possible to draw lessons from an event, based on its behavior in the past. This helps to understand how it can influence the future. Of course, by using the right set of tools, the company can learn powerful lessons.

This type of analysis is of great importance and uses predictive models that are considered a subset of data science (Waller and Fawcett 2013; Hazen et al. 2014).

6.2.2. Predictive analysis: analyzing data to predict future outcomes

With the increasing amount of data, the improvement of computing power, the development of Machine Learning algorithms and the use of advanced analysis tools, many companies can now use predictive analysis. This analysis goes beyond the previous analysis, it aims to predict future trends.

It should be noted here that the word “predict” is not synonymous with what will really happen in the future. This cannot be the case and no analysis or algorithm can do it flawlessly. Indeed, predictive analysis can only predict what may happen in the future. The basis for this type of analysis relies on probabilities.

By providing usable information, predictive analytics allows companies to understand the future. This type of analysis involves predicting future events and behaviors in data-sets, based on a model constructed from similar previous data (Nyce 2007; Shmueli and Koppius 2011).

There is a wide range of applications in different fields, such as finance, education, health and law (SAS 2017; Sedkaoui 2018a). From analyzing sales trends based on customers’ buying habits (to recommend personalized goods or services), to forecasting demand for operations or determining risk profiles for finance, in addition to analyzing feelings (otherwise known as a sentiment analysis) etc., the scope of this analysis is vast and varied.

Predictive analysis can be considered one of the most commonly used methods by companies to analyze large data-sets to find an answer to the following question: what could happen based on previous trends or patterns?

To answer this question, predictive analysis combines all the data, filling in the missing data with historical data from company databases, and looks for models that identify the relationships between the different variables.

In this case, it should be mentioned that the amount of data available is not a problem; the wealth of the data is, however, often questionable (Sedkaoui 2018b). This is necessary when people want to perform a prescriptive analysis.

6.2.3. Prescriptive analysis: recommending one or more action plan(s)

Delen and Demirkan (2013) found that Big Data has introduced the possibility of a third type of analysis, called “prescriptive analysis”, which goes beyond previous analyses by recommending one or more action plans to be undertaken.

Just as the two types of analysis described above are closely related, prescriptive analysis is closely linked to predictive analysis. Prescriptive analysis uses evidence-based predictions to inform and suggest a set of actions, because in order to be able to prescribe a series of actions, it is first necessary to anticipate the situation in the future.

This analysis not only allows companies to consider the future of their own processes and opportunities, but also to determine the best course of action to take in order to achieve competitive advantages. This means that this type of analysis is more heavily based on determining what actions need to be taken in order to optimize certain results, than on what can happen if the company continues to do the same.

It is an advanced analysis concept based on: optimizing analysis results and identifying the best possible action plans. This type uses a combination of Machine Learning techniques, tools and algorithms.

It thus requires a predictive model with two additional components:

  • – the data;
  • – the evaluation system, or feedback, to analyze and monitor the results of the measures taken.

Prescriptive analysis can influence the company’s activities and even the way decisions are made. It also has a very significant impact on companies in all sectors by enabling them to improve their effectiveness and become more efficient. Prescriptive analysis is therefore used to quantify the effect of future decisions in order to indicate possible outcomes before decisions are actually made.

The autonomous car is the best example of the application of this analysis. This car analyses and assesses its environment, and decides on the action to be taken based on the data. It can accelerate speed, slow down, change direction to avoid road traffic, etc. But it should be noted that this car also relies on predictive and descriptive analysis in its data analysis process.

So, with one of the three analyses previously described, or by combining two or three at a time, a company can understand its data and generate relevant information at different levels of analysis, that can facilitate decision-making and inform the different action plans.

Leveraging data to analyze trends, creating predictive models to identify potential challenges and opportunities in the near future, these all offer new ways to optimize processes and improve performance. Different types of analysis are established by experts in the field in order to facilitate the interpretation of the data.

However, the company must ensure it chooses the right analytical option in order to increase its ROI, reduce its costs, operationalize its activities and guarantee its success.

The easiest way to do this is to look at the answers that each type can generate.

6.2.4. From descriptive analysis to prescriptive analysis: an example

From one type to another, as can be seen, the analysis must be applied in a sequential way, in other words, descriptive, then predictive, then prescriptive. However, this does not mean that one type is better than another, on the contrary: these three types complement each other (Sedkaoui and Khelfaoui 2019).

What? Were the previous discussions not clear enough? Still can’t tell the difference between these three types of analysis? Do you want to understand what these different types of analysis mean? And what type of analysis is most appropriate and for which situations?

You probably need some additional explanations. In this case, we will use a simple example, hoping that it will help you.

Imagine that you are a mobile application developer, and you want to get an overview of your business. First, you would start with a simple analysis to calculate the number of downloads of your applications and the profits you have made over the last three years, for example. You would also, for example, define certain cohorts of interests, such as the most important applications downloaded, the least downloaded applications, etc., and calculate the benefits of each.

This is simple, because it is just an assessment of your activity, and it is done in order to understand its behavior (its history). What you have just done is describe your activity based on the data you already have. This is the descriptive analysis.

Now let’s assume that you want to develop these applications to expand your business (create new applications or develop some of them). The problem is that you can’t have accurate information about which applications will be the most popular in the future or how much profit they will make, etc., because all you have is historical data about your business.

What you can do here, for example, is take this historical data and create a model that will allow you to predict what will happen in a month, in six months, in a year or more. So, depending on the current and past situation, you will have a more advanced forecast (such as the number of downloads, profits, etc.) on your activity. You have then moved on to the second level of analysis, or predictive analysis.

Now, let’s imagine that you want to go further in your business by developing a new application for e-learning and increase its benefits, and that you want to know what you need to do in order to achieve it. Would you need to look for new customers (universities, etc.) (Plan A), stop developing the least downloaded applications (Plan B), develop relationships with the academic world or professionals who are best qualified to understand learning needs (Plan C) or launch an advertising campaign (Plan D)?

To know which plan to choose, you would need to illustrate some forecasts. But maybe you’ve never done an advertising campaign and you’ve always worked with the same customers. In this case, you must use new data sources to calculate the strategic effects of actions A and D. You would also need to look for an optimal solution that combines actions A, B, C and D. This is where the third type of analysis comes in, which is called prescriptive (or optimal) analysis.

6.3. The process of Big Data analytics: from the data source to its analysis

The concepts behind Big Data analytics are not new. Companies have always sought to use descriptive, predictive and prescriptive approaches to optimize solutions. Similarly, for many years, researchers and academics have been using different analytical techniques to study different phenomena.

However, valuing the volumes of data available in real-time in different forms, in addition to developing intuitive and innovative ideas, requires a solid layer of highly advanced data analysis techniques. This is not all, because following a structured approach to data analysis is also very important.

But before talking about the flow that will make your tasks easier and describing the different analysis techniques that will allow you to understand the different mechanisms of the analysis process, do you specifically know what data analysis is all about?

Data analysis is a process that starts by defining the objectives or questions you hope to answer. To achieve this, you need to collect all the data related to these questions, clean them up and prepare them for exploration and interpretation, in order to take advantage of them, or to obtain useful information that can suggest conclusions to better guide your decision-making process.

Davenport and Harris (2007) define it as:

The extensive use of data, statistical and quantitative analysis, explanatory and predictive models and evidence-based management to better guide decisions and actions.

Nowadays, in different literatures, data analysis is often linked to the notion of Business Intelligence, due in particular to the increased processing capabilities of machines. This concept, which has become popular since the 1990s (Chen et al. 2012), shows the importance of data analysis capabilities.

Data analysis is the process of inspecting, cleaning, transforming and modeling data to discover useful information, suggest conclusions and support decision-making. It focuses on the knowledge discovery for descriptive and predictive purposes, to discover new ideas or to confirm existing ideas.

It is true that analytics often gives off a “crystal ball” impression, capable of revealing the secrets behind each byte of data, but this does not prevent the fact that there is considerable work being done behind the scenes.

This process, which you may probably find complex, consists of a series of distinct steps or phases. If you want to understand this analysis flow, you must follow certain steps, from a specific idea (an objective) to the implementation of good questions, to data mining, to the preparation (collection, cleaning, etc.) of this data that is to be analyzed in order to create value.

In this context, in order for you to understand the process of data analysis and the power of analytics, we use the logic used during the “Taylorism” period. At the time, to simplify a given problem and deal with its complexity, you just needed to break it down into simpler sub-problems (Sedkaoui 2018b).

We will therefore follow the same logic to help you understand the process of Big Data analytics.

6.3.1. Definition of objectives and requirements

The first step in each process is to define the objectives that need to be achieved. It means asking the right questions. The aim here is to make sure that you have clearly defined the what, the how and the why, in other words, the context of your project, so as to hold on to the best action plan.

This means that before you enter the data analysis phase, you need to understand the context in order to define the main problems and identify needs.

At the end of this small exploration step, you will have a global view of what you want to accomplish. This step must take place before there is any data.

This step must be done before because once you have collected the necessary data, you will examine it to guide your decision-making strategy. At this point, you will need a baseline or indicator to determine whether the project will achieve its objectives.

This reference or indicator should be defined at the beginning of the process. This means defining the target by considering the different constraints and potential solutions that will make it possible to achieve this target.

Let’s keep to the previous example: if you want, for example, to develop an application, it will be necessary to define the notion of “interest” that lies behind the creation of such an application, and then produce the work plan in accordance with the requirement of that interest. It is important here to understand the motivation of your project before undertaking the actual analysis tasks.

This step will therefore allow you to explore all possible avenues in order to identify the different variables that directly or indirectly affect the phenomenon that interests you. This will help you understand the following:

  • – the objective (target);
  • – the context (opportunities and challenges);
  • – data sources;
  • – the necessary analytical techniques;
  • – relevant technologies;
  • – the cost;
  • – the necessary time.

6.3.2. Data collection

Before you begin the data collection step, you must understand and identify the data that could be useful for your business. It is not just about the quantities of data or Big Data, but the value generated by the analysis of these data.

In the era of Big Data, the amount of data produced will continue to grow. There will be more data from various sources in real-time and in different forms, meaning that data will be captured and stored in many forms, ranging from simple files and database tables, to emails, photos, videos, etc. This means that you need two types of data (Sedkaoui 2018b):

  • – Those that are already stored internally, or the data you have. These may include databases, various digital files, e-mails, photos, archives, etc. You will search these data sources for the data that are relevant to your project.
  • – The ones that come from outside, that you will couple with your internal data. External data refers to all data that is not generated by your activity. Among these data, we can distinguish the following: social network data, videos, tweets, geolocation data, etc. Most of these data are unstructured, which generally makes them difficult to use. But they are useful for enriching internal data.

There is also data from other companies, or even from government and organizations. They make their data accessible to all and share them for reuse. This type of data is known as Open Data. This data can be of excellent quality and depends on the party managing it.

In this step, you will therefore look for the necessary data to achieve the predefined objectives. This will be difficult, because during this phase, you will face some data issues, such as:

  • – the volume, variety and velocity, already detailed in Chapter 5 of this book;
  • – the complexity of data collected from different sources and in different formats;
  • – problems with missing data or outliers;
  • – the need to ensure the consistency and quality of data, etc.

You should therefore pay particular attention to this phase, as it is necessary to ensure the reliability of the data before starting an analysis. According to Soldatos (2017), data collection has several particularities compared to the consolidation of traditional data from distributed data sources, such as the need to process heterogeneous data for example.

The data collected in this step will probably be a very important source of information, on which the relevance of the model built will be based. What you need to do now is to prepare this source for the modeling phase. This is extremely important, because the performance of your model depends largely on this step.

6.3.3. Data preparation

Once you have collected the necessary data, proceed to the preparation step. This is the stage that gathers the activities related to the construction of the data to be analyzed and which, of course, have been elaborated from sources (raw data).

During this step, the data will be:

  • – classified according to selected criteria;
  • – cleaned;
  • – coded to make them automatically compatible with the tools that will be used.

It is a real challenge to take advantage of the amounts of data you have been able to collect. You will find that some types of data will be easy to analyze, such as data from relational databases. This is structured data. But if you are working on unstructured data, you need to prepare it for analysis.

Next, these data must be cleaned. Data cleaning is a sub-process of the data analysis process. It corresponds to the transformation of so-called unstructured data into semi-structured or fully structured data, in order to make them more consistent. Data cleaning includes several elements, which are elaborated on in the following three points.

6.3.3.1. Missing values

Deleting the missing value deletes incomplete and inconsistent data from a data-set. Removing them will improve the model’s performance. If you want to delete the missing value, you have two choices:

  • – simply delete them;
  • – replace them with default values.

Missing data can be deleted when the absence of certain values does not allow for tangible observation. In this case, you simply delete the observation, because it does not bring any real value. But in other cases, you can just replace it.

For example, if you work on a customer database of a chain of stores in France, you will have several variables or observations, such as: customer size, gender, date of birth, etc. And if somewhere in this database, the ‘size’ variable observation is missing, you can replace it with the average size of the French.

This may sound simple, but it is extremely necessary to understand the context of this chain of stores in order to assign logical and appropriate values to the phenomenon to be analyzed. This will help you avoid a bias that will affect your model.

This does not necessarily mean that these values are false, but they should preferably be treated separately, as some modeling techniques cannot manage them.

6.3.3.2. Outliers

An outlier is an observation that seems to be far removed from other observations. In other words, it is an observation in the database that follows a completely different logic from other observations in the database.

For example, let’s look at the database of this chain of stores, where you have several people’s salaries that vary between 2,500 euros and 5,000 euros per month. If you observe that a customer’s salary reaches 15,000 euros per month, you can consider this value as an outlier. For example, this customer may be a celebrity (a football player, an actor, etc.).

So, this type of value is a different observation from all observations. It is essential to remove these outliers so as not to bias the model obtained.

This must be done because the goal is to build a model that defines the real situation, not one that adapts to different values.

6.3.3.3. Errors

In the process of data analysis, it is very important to correct errors. But what types of errors are they? When you work with large databases, you often have to deal with two types of errors:

  • – a data entry error, blank spacing, upper-case letters, “bugs” (caused by machines), etc.;
  • – an error that reveals inconsistencies between the different data.

For the first class of error, some data require human intervention. This intervention may not correctly enter certain values or may leave blank spaces at the end. For example, if you are entering the answers from a customer satisfaction survey, and one of the questions proposes two conditions: “Female” and “Male”. In this case, if you are a man, you can enter “Man” instead of “Male”, or even enter the first lowercase letter, as you can leave a blank space after the entry. This type of error can be corrected by using Excel, for example. A simple sorting can show you the different cases where these values have been entered incorrectly. Another human intervention can correct errors.

For the second class of error, this can happen when, for example, you enter “Male” in some places and “M” in others, even though both mean the same thing. This can also happen when you indicate the salary of some employees in dollars ($) and others in euros (€).

An example of this class of error is to put “Female” in one table and “F” in another, when they represent the same thing: that the person is a woman. Another example is one where you use pounds in one table and dollars in another. These are two different currencies, whereas the same unit of measure had to be used.

This can happen, especially if you work with data from different countries around the world. In this case, a simple conversion is necessary to correct the values. This is also the case for data presented “per hour” and “per minute”. Here, you just need to use the same unit, either hour or minute, to correct errors.

You now understand the importance of cleaning, processing and preparing data from different sources. It is time to move on to the next step in the process, which will allow you to understand the different relationships and correlations that exist between the different variables in your database.

6.3.4. Exploration and interpretation

Once you have collected the right data, to answer your question in the first step, having cleaned and prepared them well, it is time for an exploratory analysis in order to interpret the data and make sense of them. This step will help you understand the composition of your data, its distribution, and the different correlations (looking for similarities and differences) that can result.

Because when you manipulate data, you may find the exact data you need, but it is more than likely that you will have to revise your initial question or collect more data. In any case, this initial analysis of trends, correlations, variations and outliers helps you focus on analyzing your data in order to better answer your question, in addition to any objections that others may have.

One of the simplest ways to explore your data is to use the basic parameters and indicators of descriptive statistics: the mean, standard deviation, frequencies, etc., which give a concise picture of the data and determine the main characteristics, thus allowing you to understand the overall trend.

For example, to manipulate the data, you can use a pivot table in Excel. This table allows you to sort and filter the data according to different variables, perform basic calculations and search for correlations.

You can also present the results in graphical form (histograms, box plots, point clouds, 3D diagrams, etc.) that will be used as tools to visualize the data.

Data visualization occurs when information is presented in a visual form. Its purpose is therefore to communicate information in an understandable way. This makes it easy to compare data when they are presented visually. Cross-referencing data with on another during visualizations allows us to see less obvious relationships to be observed at first glance.

The information will be easier to capture when presented in the form of images or graphs, thanks to the different visualization techniques. It is therefore essential to have an exploratory view during this step, to verify the data and take a step back to correct anomalies if necessary. The objective here is to understand your data and ensure that it is properly cleaned and ready for analysis.

In practice, the techniques described in this step are not limited to visualization techniques or simple basic statistics. Other techniques and methods, such as modeling, clustering, classification, etc., may also be useful for exploratory analysis. However, the statistical and technical context is not enough; you must have a good understanding of the business and be able to recognize whether the results of the models are meaningful and relevant.

6.3.5. Modeling

Now that you have completed the data mining phase and understood the data you have prepared, it is time to move on to the next phase: the model creation phase. This model will be based on a data-set that is representative of the phenomenon (problem) you are trying to model.

This phase is much more specific than the exploratory phase, because you know what you are looking for and what you want to achieve. The construction of the model therefore depends on the available data and the techniques you wish to use.

It should be noted here that when you obtain this model, you will also need to analyze the questions below:

  • – does the model really answer the initial questions (objectives) and how?
  • – can I easily implement it to explain the phenomenon under study?
  • – are there any constraints or considerations that you have not taken into account?

If your model can answer these kinds of questions, then you probably have a model that can reach productive conclusions.

Before making the model operational, you must assess the quality of the model, in other words, its ability to accurately represent your initial problem. This will ensure that it meets the objectives formulated at the beginning of the process. This assessment also contributes to the decision to deploy the model or, if necessary, to improve it. At this stage, the robustness and accuracy of the models obtained are tested.

6.3.5.1. Testing the model’s performance

After obtaining the model, the question of its performance arises. Indeed, here you will find out to what extent the model really explains the phenomenon. To do this, you will need to review the performance of the model and determine whether it meets all the requirements to be able to complete the project.

In this context, there is a set of techniques that you can use to test the model’s performance. These tests will allow you to quantify the behavior of your model in order to understand how it will react. To do this, we invite you to present the problem in a different way and test other analytical techniques before committing to a specific result.

6.3.5.2. Model optimization

Finding a consistent predictive model that is becoming well established is an iterative and empirical process. To do this, you will alternate between the modeling phase of the predictive system and the performance measurement phase. During these iterations, you will refine your hypothesis on the data, as well as on the characteristics that come into play in the prediction.

During this phase, you may need to look for more data if you realize that the data you have used is insufficient. In this case, you need to take a step back from the data preparation and exploratory analysis stage in order to integrate this new data.

Once this part is completed, you can take action.

6.3.6. Deployment

By following the previous steps of the data analysis process, you are making better decisions for your project. At this point, we can say that the choices have been guided and oriented by the data you have collected and prepared, explored and analyzed in a robust way. By performing the various tests, your model becomes more accurate and relevant. This means that you will make informed decisions to manage your business effectively. The emphasis put on the end-use is therefore one of the key factors in the data analysis process.

With clear data in place and a good understanding of the content, you will be able to create one or more models in order to make better forecasts, rank your priorities or better understand the phenomenon you modeled. Once you have acquired it, your model will be ready to be presented. This involves planning the deployment of your model, its monitoring and maintenance.

The deployment can therefore range from the simple generation of a report describing the knowledge acquired, to the implementation of an application allowing the use of the model obtained, for the prediction of unknown values of an element of interest (Sedkaoui 2018b).

At this point, you may think you have reached the end of the process flow, but this is not the case. Because it is during this stage that your general skills will be most useful and extremely important. If you have done things right, you now have an operational model and stakeholders.

This last step involves using the results of your data analysis process to determine the best course of action. This process may seem a little abstract at the moment, but we will deal with all these points in detail in the last part of this book. You will see a great commonality between all these steps.

6.4. Conclusion

Once you have collected the data you need, it is time to move on to the analysis. But to do this analysis, a structured approach is needed in order to better generate value. If you have done things right, you now have an operational model that you can use. We can therefore conclude this chapter here and move on to the relationship between Big Data analytics and the sharing economy.

TO REMEMBER.– In this chapter, you have had the opportunity to learn:

  • – the different types of analysis you can use, namely:
    • - descriptive analysis (which does exactly what its name says, “describe”), which is used to provide an overview of the past and answer the following question: “What happened?”;
    • - predictive analysis, which uses statistical models and forecasting techniques to understand the future and thus answer the following question: “What could happen?”;
    • - prescriptive analysis, which uses optimization and simulation algorithms to advise on potential outcomes, to answer questions such as: “What now?” or “What needs to be done?”;
  • – the different steps of the data analysis process:
    • -definitions of objectives and requirements: defining what, why and how;
    • -data collection: search and capture of all necessary data;
    • -data preparation: data verification and correction (cleaning, etc.);
    • -exploration and interpretation: analysis of data using statistical and visual techniques in order to better understand them;
    • -modeling: the use of advanced techniques (Machine Learning algorithms) for modeling according to the initial objective;
    • -deployment: moving from data to knowledge (using the model).
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.174.168