The alternative data revolution

The data deluge driven by digitization, networking, and plummeting storage costs has led to profound qualitative changes in the nature of information available for predictive analytics, often summarized by the five Vs:

  • Volume: The amount of data generated, collected, and stored is orders of magnitude larger as the byproduct of online and offline activity, transactions, records, and other sources and volumes continue to grow with the capacity for analysis and storage.
  • Velocity: Data is generated, transferred, and processed to become available near, or at, real-time speed.
  • Variety: Data is organized in formats no longer limited to structured, tabular forms, such as CSV files or relational database tables. Instead, new sources produce semi-structured formats, such as JSON or HTML, and unstructured content, including raw text, image, and audio or video data, adding new challenges to render data suitable for ML algorithms.
  • Veracity: The diversity of sources and formats makes it much more difficult to validate the reliability of the data's information content.
  • Value: Determining the value of new datasets can be much more time—and resource-consuming, as well as more uncertain than before.

For algorithmic trading, new data sources offer an informational advantage if they provide access to information unavailable from traditional sources, or provide access sooner. Following global trends, the investment industry is rapidly expanding beyond market and fundamental data to alternative sources to reap alpha through an informational edge. Annual spending on data, technological capabilities, and related talent are expected to increase from the current $3 billion by 12.8% annually through 2020.

Today, investors can access macro or company-specific data in real-time that historically has been available only at a much lower frequency. Use cases for new data sources include the following:

  • Online price data on a representative set of goods and services can be used to measure inflation
  • The number of store visits or purchases permits real-time estimates of company or industry-specific sales or economic activity
  • Satellite images can reveal agricultural yields, or activity at mines or on oil rigs before this information is available elsewhere 

As the standardization and adoption of big datasets advances, the information contained in conventional data will likely lose most of its predictive value.

Furthermore, the capability to process and integrate diverse datasets and apply ML allows for complex insights. In the past, quantitative approaches relied on simple heuristics to rank companies using historical data for metrics such as the price-to-book ratio, whereas ML algorithms synthesize new metrics, and learn and adapt such rules taking into account evolving market data. These insights create new opportunities to capture classic investment themes such as value, momentum, quality, or sentiment:

  • Momentum: ML can identify asset exposures to market price movements, industry sentiment, or economic factors
  • Value: Algorithms can analyze large amounts of economic and industry-specific structured and unstructured data, beyond financial statements, to predict the intrinsic value of a company
  • Quality: The sophisticated analysis of integrated data allows for the evaluation of customer or employee reviews, e-commerce, or app traffic to identify gains in market share or other underlying earnings quality drivers

In practice, however, useful data is often not freely available and alternative datasets instead require thorough evaluation, costly acquisition, careful management, and sophisticated analysis to extract tradable signals.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.90.205.166