The use case – clustering synthetic control data

A control chart represents how a system behaves over time. It is a graph that plots one or more variables of a system or process over time. This information can be used for quality control in manufacturing and business process. When only one variable is plotted against time, it is called a univariate control chart, and when more than one variable is plotted against time, it is called a multivariate control chart.

In this chapter, we will be working with a synthetic control chart time series data provided by the UCI Machine Learning Repository. Each of the control chart belongs to one of the following categories:

  • Normal
  • Cyclic
  • Increasing trend
  • Decreasing trend
  • Upward shift
  • Downward shift

Each of the control charts consists of 60 columns, each a decimal value. There are 100 records for each category. Further details about the dataset can be found at http://archive.ics.uci.edu/ml/databases/synthetic_control/synthetic_control.data.html.

We will be using 80 out of 100 records from each category to develop a clustering model, and then we will use the remaining 20 records to predict the category for them. We will be using the K-means clustering algorithm for this, which is provided by Trident-ML.

But before going ahead with the producer, we need to download the dataset from the UCI Machine Learning Repository located at http://archive.ics.uci.edu/ml/databases/synthetic_control/synthetic_control.data. Save this file so that it can be used later for training and testing.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.184.90