Dataset

We'll work with a publicly available dataset that was released by Yahoo! Labs, which is useful for discussing how to detect anomalies in time series data. For Yahoo, the main use case is in detecting unusual traffic on Yahoo servers.

Even though Yahoo has announced that their data is publicly available, you have to apply to use it, and it takes about 24 hours before the approval is granted. The dataset is available at http://webscope.sandbox.yahoo.com/catalog.php?datatype=s&did=70.

The dataset is comprised of real traffic for Yahoo services, along with some synthetic data. In total, the dataset contains 367 time series, each of which contains between 741 and 1,680 observations, which have been recorded at regular intervals. Each series is written in its own file, one observation per line. A series is accompanied by a second column indicator, with a one being used if the observation was an anomaly, and zero otherwise. The anomalies in real data were determined by human judgment, while those in the synthetic data were generated algorithmically. A snippet of the synthetic times series data is shown in the following table:

In the following section, you'll learn how to transform time series data into an attribute presentation that allows us to apply machine learning algorithms.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.131.47