Managing Customer Loyalty Using Time Series Data

In Chapter 1, Time Series Modeling in the Financial Industry, we briefly touched upon the myth that every time we spoke about time series data and forecasting behavior. While this is true most of the time, it isn't valid on a lot of occasions. It was mentioned that time series data can be used for benchmarking, quality control, pattern recognition, and even estimating the effect of one variable on the observed values of another variable. Survival analysis focuses on estimating time to an event. In Chapter 2, Forecasting Stock Prices and Portfolio Decisions using Time Series Data, while discussing ARIMA, we saw that the model primarily depends on the time component and didn't focus much on the influencer variables. Survival analysis, just like multivariate regression, can also help emphasize the role of other influencer variables. The methodology isn't used to forecast solely on the basis of time components such as seasonality and trends of a single variable. We will focus on nuances of survival modeling a bit later in the chapter. Survival analysis uses the time series data to forecast when an event might happen. The event could be related to any of the following:

  • The probability of a machine to stop functioning in the next 20 years
  • A customer not renewing their internet broadband contract at the end of its fixed term
  • The likelihood of a patient consuming tobacco regularly dying early, compared to a patient with relatively healthy habits
  • The likelihood of a customer defaulting on his payments at a future date

The Kaplan-Meier estimator, the Nelson-Aalen estimator, Cox regression, parametric, and non-parametric models are some of the key aspects of survival analysis that we will explore in detail. We will also try and understand the basics of probability density function, cumulative density function, hazard ratio, survival curve, and the importance of censoring.

So, how can we define survival analysis? It is a broad range of statistical methods used to determine the time to an event. Let's plot the survival and hazard curve of a sample dataset. The dataset shows the lifespan of an engine in years:

The product-limit survival estimate chart plots the time of the machine's life span on the X-axis and the survival probability on the Y-axis. It shows that for a period up to five years and between 8 and 13 years of the machine's lifespan, the survival rate remains consistent. However, before the 10th, 15th, and 20th anniversary of the machine, the survival rates quickly drop:

Figure 6.1: Survival function and hazard rate sample data illustration

The hazard rate chart shows the rate of failures (Y-axis) at each anniversary point (X-axis) assuming that the machine hasn't already failed. Once a machine reaches it's 14th anniversary, the hazard rate after raising until now, starts falling. It shows that if a machine is able to survive the 14th anniversary, then the chances of it not surviving start falling. The lowest hazard rate is at the five-year interval, where one of the machines first stops performing.

By using the survival and hazard rate chart, we have been able to decipher the survival probability of a machine at various stages of its lifespan and also to see the hazard rate at any given point. You may have already noticed the word censoring appearing on the chart. Let's take a step back and see why survival analysis is better suited for some business problems and then explore the various aspects and terminologies used in survival models.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.237.123