Thanks in part to vigorous efforts by vendors (led by IBM) to bring the idea to a wider public, analytics is coming closer to the mainstream. Whether in ESPN ads for fantasy football, or election-night slicing and dicing of vote and poll data, or the ever-broadening influence of quantitative models for stock trading and portfolio development, numbers-driven decisions are no longer the exclusive province of people with hard-core quantitative skills backed by expensive, often proprietary infrastructure.
Not surprisingly, the definition of “analytics” is completely problematic. At the simple end of the spectrum, one Australian firm asserts that “[a]nalytics is basically using existing business data or statistics to make informed decisions.”1 Confronting market confusion, Gartner market researchers settled on a similarly generic and less elegant assertion: “Analytics leverage data in a particular functional process (or application) to enable context-specific insight that is actionable.”2
To avoid terminological conflict, let us merely assert that analytics uses statistical and other methods of processing to tease out business insights and decision cues from masses of data. In order to see the reach of these concepts and methods, consider a few examples drawn at random:
Perhaps as interesting as the range of its application are the many converging reasons for the rise of interest in analytics. Here are ten, from perhaps a multitude of others:
Some examples follow.
For all the tools, all the data, and all the computing power, getting numbers to tell stories is still difficult. There are a variety of reasons for the current state of affairs.
First, organizational realities mean that different entities collect data for their own purposes, label and format it in often-nonstandard ways, and hold it locally, usually in Excel but also in e-mails, or pdfs, or production systems. Data synchronization efforts can be among the most difficult of a chief information officer's tasks, with uncertain payback. Managers in separate but related silos may ask the same question using different terminology or see a cross-functional issue through only one lens.
Second, skills are not yet adequately distributed. Database analysts can type SQL* queries but usually don't have the managerial instincts or experience to probe the root cause of a business phenomenon. Statistical numeracy, often at a high level, remains a requirement for many analytics efforts; knowing the right tool for a given data type, or business event, or time scale takes experience, even assuming a clean data set. For example, correlation does not imply causation, as every first-year statistics student knows, yet temptations to let it do so abound, especially as electronic scenarios outrun human understanding of ground truths.
Third, odd as it sounds in an age of assumed infoglut, getting the right data can be a challenge. Especially in extended enterprises but also in extrafunctional processes, measures are rarely sufficiently consistent, sufficiently rich, or sufficiently current to support robust analytics. Importing data to explain outside factors adds layers of cost, complexity, and uncertainty: Weather, credit, customer behavior, and other exogenous factors can be critically important to either long-term success or day-to-day operations, yet representing these phenomena in a data-driven model can pose substantial challenges. Finally, many forms of data do not readily plug into the available processing tools: Unstructured data, such as e-mails or text messages, is growing at a rapid rate, adding to the complexity of analysis.
Fourth, data often relates to people, and people may not willingly give it up. Loyalty-club bar codes have been shared (often by cashiers), smart electrical metering is being viewed as a privacy invasion in some quarters,6 and tools for online privacy (cookie blockers, adware removers, etc.) are increasingly popular.
In certain situations, the algorithmic sensemaking available in common analytical tools is useful in uncovering and providing relevant information, wherever it may have originated. The cost and availability of such information are improving: In oil and gas, for example, information technology has helped drop the cost of a three-dimensional seismic map of the subsurface from $8 million per square kilometer in 1980, to $1 million in 1990, to $90,000 in 2005. But even the best analytics cannot reliably replace the human intelligence needed to draw the right conclusions from the information. Furthermore, not every quantitative question has a calculable answer.
Getting numbers to tell stories requires the ability to ask the right question of the data, assuming the data is clean and trustworthy in the first place. This unique skill requires a blend of process knowledge, statistical numeracy, time, narrative facility, and both rigor and creativity in proper proportion. Not surprisingly, such managers are not technicians and are difficult to find in many workplaces. For the promise of analytics to match what it actually delivers, the biggest breakthroughs will likely come in education and training rather than algorithms or database technology.
1. www.onlineanalytics.com.au/glossary.
2. Jeremy Kirk, “‘Analytics’ Buzzword Needs Careful Definition,” Infoworld, February 7, 2006, www.infoworld.com/t/data-management/analytics-buzzword-needs-careful-definition-567.
3. “NYSE May Merge With German Rival,” Bloomberg News, February 10, 2011, www.fa-mag.com/fa-news/6815-nyse-may-merge-with-german-rival.html.
4. Michael Lewis, Moneyball, (New York: W.W. Norton & Company, 2011)
5. “Kryder's Law,” MattsComputerTrends.com, www.mattscomputertrends.com/Kryder%27s.html.
6. “PG&E Smart Meter Problem a PR Nightmare,” smartmeter.com, November 21, 2009, www.smartmeters.com/the-news/690-pgae-smart-meter-problem-a-pr-nightmare.html.
*Structured Query Language is the language of database interrogation: An example
would be UPDATE a
SET a.[updated_column] = updatevalue
FROM articles a
JOIN classification c
ON a.articleID = c.articleID
WHERE c.classID = 1
18.216.200.106