Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 8
Role of Collective Intelligence

By now, you may have started receiving the notification e-mails or letters from your utility company regarding your usage and how you may save if you run your laundry or dishwasher after a certain time in the evening during off-peak hours. You may also have come across an advertisement by Progressive Casualty and Insurance Company regarding Snapshot—a sensor that will capture driving patterns of a good driver—and how you may be rewarded with a good driver discount, saving a lot on your policy premium. You shouldn't be surprised to see promotional offers from your favorite retailers on specific merchandise that you care about or frequently shop for based on your buying interests. Have you ever paused and thought about how these vendors or service providers are able to analyze and communicate to you directly to suit your interests and needs? These smart meters used by utility companies, sensor devices used by insurance companies, and web logs analyzed by retailers enable them to capture data at the point of occurrence in real time, store and analyze data to help them understand the behavioral patterns, and guide them as to trends. These data are high in volume, get generated at high velocity, come in a wide variety, and are therefore rightly termed big data.

We are faced with challenges to make decisions every day; some are minor such as paying bills, and some are major such as buying a house, investing in stocks, developing a product, acquiring a company, or growing market share; those major decisions need some relevant information in that context. When we look back many years to when there were no computers or ready access to data, we wonder how people made these major decisions in day-to-day life, businesses, or even administering nations.

In ancient times, rulers based their real-time decision capabilities on intuitive and cognitive intelligence and advice by their council of ministers. They used to visit the street in disguise to gather conversations of citizens in order to get real-time feedback and sentiments to execute effective decisions. In those days, there was no support from technology, and all decisions were based on intuitive judgment. Our brains take in massive streams of sensory data and make the necessary correlations that allow us to make value judgments and decisions, all in real time.

Relating the preceding to our current era, we are given the support of computing power with additional memory and data processing, all on demand when we need it and in the cloud infrastructure (discussed in Chapter 3) to make real-time decisions. Recent technology such as big data analytics helps and supports us with the right information; real-time event messaging provides it at the right time, mobility at the right place anywhere, and social media in the right context to make the right decisions. With computing power and cognitive intelligence, we are in a much better situation to make real-time decisions in the business context.

Big data is a major revolution of the current times and will have a large impact on advanced analytics in the coming years. Big data is becoming relevant in all business cases, and it will help in gaining sustainable competitive advantage. As the technology platform is maturing rapidly, organizations need to give strategic importance to big data sources to gain insights and to offer their products and services based on customer needs. Analytics and business intelligence (BI) based on new big data sources will help business decision makers with greater predictability.

This chapter deals with big data concepts, background, and relevance across industry sectors, and offers some case studies to provide in-depth understanding and many examples of how an organization may deploy and implement big data analytics alongside its existing infrastructure.

Why Should You Care about Big Data?

Big data, one of the most talked-about information technology (IT) solutions, has emerged as a new technology paradigm to create business agility and predictability by analyzing data coming from various sources. The term big data was coined in the 1970s and was used to describe large amounts of data generated by oceanography and meteorological experiments. Big data can be understood as a natural evolution of database management techniques that has changed the way data is analyzed. Early implementations of big data solutions can be found during the 1980s—the era of the first generation of software-based parallel database architecture. However, it was not implemented significantly until the maturity of Internet usage, when web search companies faced the challenges of indexing and querying large aggregations of loosely structured data. Existing database technology was not ideal for the challenging task, and neither was it cost effective. Google developed the first wave of big data tools in the early 2000s, which gave birth to several other frameworks and techniques that make the handling, processing, and interpretation of large data sets more economical. By leveraging big data, companies can extract value and meaningful insights from voluminous data beyond what was previously possible using traditional analytical techniques. This also deals with new phenomena of the volume, velocity, and variability of massive data coming from social media, web logs, and sensors combined with transactional systems. Within these heaps of massive data, we have a treasure trove of information that can be extracted to save us from major disasters or accidents and proactively help the growth of businesses.

Big data is characterized primarily by large and rapidly growing data volumes, varied data structures, and new or newly intensified data analysis requirements. This enables us to deliver our customers in context the right offer, message, recommendation, service, or action, tailored and personalized to deliver unequaled value. With a multichannel customer experience platform for true cross-channel decisions that enables consistent operational decisions for the web channel, in the contact center, at the point of sale, and across all lines of business, we have the technology solutions for cross-channel learning and decisions. The automatic insights derived from one channel are seamlessly used both within and across other channels.

A balanced decision management framework that combines both business rules and self-learning predictive models helps in real-time decisions. This also helps in arbitrating rules and predictive model scores in the context of organizational goals/key performance indicators (KPIs) at the moment of a decision's execution.

Executive Insights
Javier Cabrerizo, Vice President, Big Data, Exadata, and Database, Oracle

When one looks back at how businesses were run not so long ago, and compares that with the way they are run today, one of the most striking things to observe is the evolution of how decisions are made inside organizations. One can claim that we have moved from a judgment-based management to a fact-based management model. In other words, today almost every company has a solution to manage its data in a more or less efficient way. This applies to financial data, customers' data, supply chain data, and so on. What has happened is that the bar is higher now for any company that wants to compete in any industry. Companies need to react fast, and in order to do so, they need to be capable of knowing the details of their business performance very rapidly. This triggered a massive adoption of IT in the past 30 years that transformed many industries.

Systems that created the data were the customer relationship management (CRM), enterprise resource planning (ERP), and supply chain management (SCM) systems, and the systems that analyzed the data were the data warehousing (DW) systems.

The parallel to these days is that new systems like web logs, sensors, and networks are generating new data, and we need new systems to be able to analyze and process that data. This is the origin of the big data movement.

Executive Bio: Javier Cabrerizo is responsible for growing Oracle's global business of big data, exadata, and database products.

With the evolution of social media, we started seeing the emergence of nontraditional, less structured data such as web logs, social media feeds, e-mail, sensors, photographs, and YouTube videos that can be analyzed for useful information. With the reduction of cost in both storage and computing power, it is now feasible to store and analyze this data for meaningful purposes. As a result, it is important for existing businesses and for new businesses to understand and evaluate the relevance of big data for their business intelligence and for decision making. Closed-loop real-time learning becomes immediately available for the next prediction to drive adaptive, high-value interactions. We may be able to discover and highlight important correlations in the data automatically by way of user-friendly reports. Automated data discovery leads user to the right and relevant business insights.

Executive Insights
Javier Cabrerizo, Vice President, Big Data, Exadata, and Database, Oracle

What is different in the way data is generated today? Essentially two things: the volume and the velocity. As technology evolves and more interactions move to the digital world, more volume of data is generated all the time. Think of the growth of the e-commerce, e-health, sensor-based monitoring activities, real-time location activities, real-time seismic activities, and real-time network usage. All these interactions are generating constant data flows that are very large in volume and have a high velocity. Maybe companies need to get ready to manage data volumes in the range of petabytes and consider it normal.

What is different in the way data is analyzed? Because of the two characteristics of the data just mentioned, the economics of analytics have changed. New models are needed that can process the vast volumes of data being generated in a cost-effective and scalable manner. This has opened the gates to a very important flow of innovative technologies in the open source community, running in low-cost commodity hardware like MapReduce, column-oriented data stores, and coprocessors.

So, as we have seen in the past, companies competing in any industry see that the bar is rising again. In order to be competitive they have to once again increase their ability to process and analyze new, larger volumes of data.

Big data addresses all types of data coming from various data sources, such as enterprise applications data that generally includes data generated from enterprise resource planning (ERP) systems, customer information from customer relationship management (CRM) systems, supply chain management systems, e-commerce transactions, and human resources (HR) and payroll transactions. It also attributes semantic data that comprise call details records (CDRs) from call centers, web logs, smart meters, manufacturing sensors, equipment logs, and trading systems data generated by machine and computer systems. Social media data that include customer feedback streams, microblogging sites like Twitter, and social media platforms like Facebook add up to big data and help in sentiment analysis.

There are four key characteristics—volume, velocity, variety, and value—that are commonly used to characterize different aspects of big data and are widely referred to in major conferences. The McKinsey Global Institute estimates that data volume is growing 40 percent per year, and will grow 44 times by 2020.

Executive Insights
Javier Cabrerizo, Vice President, Big Data, Exadata, and Database, Oracle

Importantly, business decisions need reliable systems. And reliable systems need reliable companies supporting them. Some early adopters of big data technologies, like online predictive analytics for the retail or airline industry, or risk management solutions in financial services, are expecting their vendors of choice to bring them these solutions. And they are asking them to provide the support levels that they are used to with other systems they are using to run their businesses.

So the conclusions are clear. Any company, in any industry, is going to be affected by this tide of more data generated rapidly. Those that are capable of efficiently processing it and finding insightful and actionable business conclusions will enjoy an advantage over their peers. Companies should be looking at data streams that they generate and thinking of efficient ways to store and analyze the data. Examples are:

Web logs that can contain valuable information about customer trends
Credit card usage patterns that can help identify and prevent fraud
Real-time analysis of infrastructure equipment like oil rigs that can help optimize operations
Real-time monitoring of health care information that can enable rapid intervention and prevent criminal activity

What Do Key Characteristics Signal about Big Data?

Big data is characterized by its sheer large volume, high velocity, and variety with low value. All major sources such as web logs, sensors, and social media generate new types of unstructured or semi-structured data that has given rise to a new phenomenon in decision making.

Volume

Social media (Facebook, Twitter, LinkedIn, Foursquare, YouTube, and many more) discussed in Chapter 7 generate a large volume of data that need to be stored and analyzed rapidly in context for the right decision making. The volume of machine-generated data or semantic web data is much larger than the traditional data volume. For instance, a single jet engine can generate 10 TB (terabytes) of data in 30 minutes. With more than 25,000 airline flights per day, the daily volume of just this single data source runs into the petabytes. Smart meters and heavy industrial equipment like oil refineries and drilling rigs generate similar data volumes, compounding the problem. The benefit gained from the ability to process large amounts of information is the main attraction of big data analytics. This volume presents the most immediate challenge to conventional IT structures. It requires scalable storage and a distributed approach to querying.

Velocity

The data comes into the data management system rapidly and often requires quick analysis for decision making. The importance lies in the speed of the feedback loop, taking data from input through to analysis and decision making. The tighter the feedback loop, the greater will be the competitive advantage. It's this need for speed, particularly on the web, that has driven the development of key-value stores and columnar databases, optimized for the fast retrieval of precomputed information. These databases form part of an umbrella category known as NoSQL (not only SQL) used when relational models do not suffice (discussed in detail in the technology platforms section later in this chapter). Social media data streams bring a large input of opinions and relationships that are valuable to customer relationship management. Even at 140 characters per tweet, the high velocity of Twitter data ensures large volumes (over 8 TB per day). Most of these data received may be of low value, and analytical processing may be required in order to transform the data into a usable form or derive meaningful information.

Variety

Big data brings variety of data types. It varies from text from social networks, to image or video data, to a raw feed directly from a sensor source, to semantic web logs generated by machines. These data are not easily integrated in any applications. A common use of big data processing is to take unstructured data and extract meaningful information for consumption either by humans or as a structured input to an application. Big data brings a lot of data that has patterns, sentiments, and behavioral information that need analysis.

Value

The economic value of different data varies significantly. Generally, there is good information hidden within a larger body of nontraditional data. Big data offers greater value to businesses in bringing real-time market and customer insights, enabling improvement in new products and services. Big data analytics can reveal insights such as peer influence among customers, revealed by analyzing shoppers' transactions and social and geographical data. The past decade's successful web start-ups are prime examples of big data used as an enabler of new products and services. For example, by combining a large number of signals from a user's actions and those of the user's friends, Facebook has been able to craft a highly personalized user experience and create a new kind of advertising business.

Does Size of Data Really Matter?

With the proliferation of cloud computing and commoditization of hardware, software, and storage, the growth in data has been explosive in the recent past. This exponential growth is primarily catalyzed by increased activity by digital devices and proliferation of the Internet. Massive volumes of data are generated by digital transactions between companies, machine-generated data (embedded sensors in industrial applications and automobiles), and consumer devices such as laptops, computers, and smartphones. The International Data Corporation (IDC) estimated that 1.8 zettabytes of information were created and replicated in 2011, the equivalent of 200 billion 60-minute high-definition (HD) movies that would take one person 47 million years to watch.

In the past decade, information generated grew at a 38 percent compound annual growth rate (CAGR) versus the world's storage capacity at a 23 percent CAGR. We believe the gap between information and storage will continue to widen, given increased growth in computational power (58 percent 10-year CAGR) as a result of computers, smartphones, and smart sensors that will drive information generation.

With storage on the cloud infrastructure getting cheaper and more affordable, businesses should be able to take advantage of mixing various data types coming from different data sources and analyze them to make effective decisions to manage their enterprises.

How Complex Is Big Data?

Data has traditionally been stored in a structured format, which makes archiving, querying, and analyzing easier. However, with wide usage of various devices, data has become more unstructured. It is estimated that 80 percent of the world's data is unstructured (i.e., unable to conform to traditional relational database structures), which makes analysis and insights on multiple data sets very challenging. Given the pervasiveness of unstructured data, the growth in file-based storage (unstructured data) has outpaced block-based storage (62 percent five-year CAGR versus 24 percent). The sudden rise and usage of social media, machine-generated data, and smart devices has added complexities in managing big data and deriving greater business value from them. These data have emerged recently that may provide greater intelligence and predictability if we can capture, process, and analyze the data in real time or near real time.

Social media. Increased usage of social networking sites continues to drive storage requirements for unstructured data: More than 300 million photos are uploaded daily to Facebook; Zynga processes 1 petabyte of gaming content on a daily basis; 72 hours of video are uploaded to YouTube every minute; and Twitter receives nearly 250 million tweets daily.
Machine-to-machine (M2M). The increased deployment of M2M devices such as smart meters, telematics, radio frequency identification (RFID) devices, vehicle sensors, and industrial sensors with embedded networking has driven machine-generated data. Data generated from M2M devices is expected to grow at a 35 percent compound annual growth rate (CAGR) by 2015. According to research findings, M2M will create an economic impact of $2.7 trillion to $6.2 trillion annually by 2025. And the World Bank and General Electric are pointing to a $32 trillion opportunity on the premise that a 1 percent improvement from the integration of the industrial Internet into energy, transportation, health care, aviation, and other industries can generate savings of around $200 billion, according to the McKinsey Global Institute (www.netcommwireless.com/information/articles/m2m.-the-numbers-are-big-and-only-getting-bigger).
Mobility. The widespread adoption of mobile devices (smartphones, tablets, etc.) has placed the power of the Internet within the reach of a fingertip. The number of global smartphone users recently crossed the 1 billion mark (i.e., one in seven people owns a smartphone), thus driving the consumption, demand, and generation of mobile data. Mobile data traffic is estimated to grow at a 78 percent CAGR from 2011 to 2016, reaching 10.8 exabytes per month by 2016, according to Cisco.
Enterprise data. Adoption of enterprise software solutions and greater IT sophistication has increased the data exhaust generated by enterprise firms. Unstructured data continues to garner a greater proportion of enterprise data and is expected to represent 80 percent of total enterprise data by 2015, up from 64 percent in 2006. The torrent of unstructured enterprise data places an additional strain on corporate IT systems. In a survey conducted by Avanade, a business technology consulting and solutions provider, 55 percent of respondents reported a slowdown of IT systems and 47 percent cited data security issues resulting from increased data exhaust (www.netcommwireless.com/information/articles/m2m.-the-numbers-are-big-and-only-getting-bigger).

Executive Insights
Jnan R. Dash, Former Executive at Oracle and IBM

The most fundamental technologies are those that disappear. They weave themselves into the fabric of everyday life until they are indistinguishable from it. This is pervasive computing we are witnessing today. With the explosion of devices, the computing has become pervasive. Today's cars are computers on wheels, and airplanes are computers with wings. The rapid growth of Android and iPhone applications brings the power to the mainstream like never before.

Three major forces have come together causing rapid disruption to businesses all around the world. They are: (1) cloud computing (finally we are seeing computing as a utility much like what we saw in the electric power, water distribution, and telecommunication industries), (2) smart devices, and (3) big data. As the devices proliferate, a data explosion is happening. No longer can we claim a 100 TB database as big; now we are seeing the petabyte and exabyte scale. Big data will transform business in the same scale or more just like what IT did to business many decades back. The data deluge came to the world of science long before it is coming to the commercial business world. Therefore, the late Jim Gray (Microsoft Research) called this the fourth paradigm in science (experimental science, theoretical science, and computational science were the first three)—data-intensive science (DIS). The total amount of data in the planet is around 1.27 ZB (1 zettabyte = 1 billion terabytes) and is supposed to grow to 35 ZB by 2020!

Executive Bio: Jnan Dash has worked as senior executive and IBM and Oracle and currently engaged as adviser and board member at many companies like MongoDB, ScaleDB, MobiDough, Graymatics, and Sonata Software.

With the evolution of the cloud deployment model, the majority of big data solutions are offered as software only, an appliance, or cloud-based offerings. As is the case with other applications deployments, big data deployment will also depend on several issues such as data locality, privacy and regulation, human resources, and project requirements. Many organizations are opting for a hybrid solution using on-demand cloud resources to supplement in-house deployments.

The highest value from big data can be achieved by combining data coming from big data sources such as web logs, machine data, and social media data with other transactional data within businesses. Decision makers get the big picture of their customers' behavior, patterns, and preferences over the others.

Therefore, it is highly important that businesses combine their strategy on big data with their comprehensive data analytics strategy. In order to succeed and remain competitive, organizations need to plan for comprehensive data management and analytics.

How Does Big Data Coexist with Existing Traditional Data?

Big data on its own offers great insights for businesses, but it becomes more powerful when it is combined with an organization's existing transactional data and used for analytics.

Web logs or browsing history for example indicates the customer's buying patterns and helps to determine the value of a customer from his or her purchase history in the past.

How Does Target Know?

Knowing someone is pregnant lies in the data-gathering process. To start, Target assigns each customer a Guest ID number. This ID number is then attached to the customer's known credit cards, full name, and e-mail address. By doing this, Target is then able to store and build out a historical time line of purchases by that customer. By analyzing and reviewing the historical buying data of shoppers who were part of the Target Baby Registry, Target was then able to discover patterns in shopper behavior. For example, many shoppers purchase soap and cotton balls, but when someone suddenly starts buying lots of scent-free soap and extra-big bags of cotton balls, in addition to hand sanitizers and washcloths, it signals they could be getting close to their delivery date.

Target uses data as a way to predict consumer behaviors so that it can market products most relevant to an individual shopper. As a result of Target cornering the expected mothers market, the New York Times “suggests that Target's gangbusters revenue growth—$44 billion in 2002…to $67 billion in 2010” can be attributed to its better understanding of consumers using big data analytics.

On a similar note, utility companies have started using smart grid data to track their consumers' behavior. Knowing the historical billing patterns of consumers combined with their transactional data from frequent smart meter data makes the analysis even more powerful in a business context.

TXU Energy: Smart Electric Meters

Smart grid deployments are creating exponentially more data for utilities and giving them access to information they've never had before. Accessing, analyzing, managing, and delivering this information to optimize business operations and enhance customer relationships is helping them to extract optimal business value from this data to better target, engage with, and serve customers.

With the help of smart meters, electricity providers can read the meter once every 15 minutes rather than once a month. This not only eliminates the need to send someone for meter reading, but as the meter is read once every 15 minutes, electricity can be priced differently for peak and off-peak hours. Pricing can be used to shape the demand curve during peak hours, eliminating the need to create additional generating capacity just to meet peak demand, saving electricity providers millions of dollars' worth of investment in generating capacity and plant maintenance costs.

TXU Energy is using the smart meter technology to shape the demand curve by offering “Free electricity. All Night. Every Night. All Year Long” (for more, see https://www.txu.com/residential/promotions/mass/free-nights.aspx). In fact, TXU promotes its service as “Do your laundry or run the dishwasher at night, and pay nothing for your Energy Charges.” What TXU Energy is trying to do here is to reshape energy demand using pricing so as to manage peak-time demand, resulting in savings for both TXU and customers. This wouldn't have been possible without smart electric meters.

Big data is messy and requires enormous effort in data cleansing and data quality. The phenomenon of big data is closely tied to the emergence of data science, a discipline that combines math, programming, and scientific instinct. Current data warehousing projects take a long time to offer meaningful analytics to business users.

It depends on extract, transform, and load (ETL) processes from various data sources. Big data analytics, however, can be defined as a process in relationship to or in context to the need to parse large data sets from multiple sources, and to produce information in real time or near real time.

Progressive Snapshot: Auto Sensors in Insurance

Traditionally, insurance companies have priced auto insurance based on a driver's history and have given good drivers a discount on policy premiums. Taking early advantage, Progressive Casualty and Insurance Company has introduced a sensor-based Snapshot to track vehicle condition and driving history to help reduce premiums for good drivers and penalize not-so-good drivers with higher premiums. Now with Snapshot, a revolutionary usage-based insurance program, good drivers are offered savings on their auto insurance.

The Snapshot device plugs easily into a car's diagnostic port (usually below the steering column) and automatically keeps track of good driving. The program offers 30 days' free trial to check projected savings online based on driving patterns (e.g., how often you slam on the brakes, how many miles you drive, and how often you drive between midnight and 4 A.M.).

Source: www.examiner.com/article/snapshot-by-progressive-what-is-it-does-it-work.

Big data analytics represents a big opportunity. Many large businesses are exploring the analytics capabilities to parse web-based data sources and extract value from social media. However, an even larger opportunity, the Internet of Things (IoT), is emerging as a data source. Cisco Systems estimates that there are approximately 35 billion electronic devices that can connect to the Internet. Any electronic device can be connected to the Internet, and even automakers are building Internet connectivity into vehicles. Connected cars will become commonplace by 2014 and generate millions of transient data streams.

Operational efficiencies, coupled with developments in the technologies and services that make big data a practical reality, will result in a supercharged CAGR of 58 percent between now and 2017.

Big data is the new definitive source of competitive advantage across all industries.

Executive Insights
Ramasubramanian Vaidyanathaswamy, Senior Practice Director, Business Intelligence and Analytics, Wipro

Corporations big and small have already invested heavily in business intelligence (BI)/analytics initiatives for the past several years. As we look at how companies are considering adoption of big data technology to bring forth newer insights into decision making, we could place them into one of three categories: early adopters, explorers, and aspirants. In our experience, there is none in the fourth category of “uninterested or do not need.”

The early adopters have already gotten big data infrastructure setups, added horizontal capabilities of acquiring unstructured/semistructured data into the realm of analytics, and embarked on the journey of building business algorithms iteratively to get insights from this data. Marketing is the earliest adopter of this technology for getting insights, largely in the business-to-consumer (B2C) space of getting to know more about customer buy/postbuy behavior. The striking commonality in all the big data projects has been that they have started with an exploratory mind-set. You begin with a set of hypotheses and iteratively apply statistical algorithms or pattern identification until you get the desired insights. Nearly all the initiatives start as proof of concept (PoC) or proof of value in a sandbox.

The explorer category is really a vast set of corporations that are still at the conceptual stage of exploring: (1) what use cases should be run on big data, (2) where the value lies, and (3) which is the right organization (is it the BI/DW organization or engineering or marketing?) to grow this capability.

As a practitioner in business intelligence for over a decade and having successfully seen large analytics implementation projects through for Fortune 100 clients, I am seeing that the way that BI projects are implemented is fundamentally going to shift with big data adoption. For one, the speed of value in getting big insights is at the speed of the Internet, be it social sentiments or customer feedback on product launch. Most existing BI investments are meant for large-scale data and systems integration to get structured insights and consequently are less agile for this speed. For years, investments have been directed toward data plumbing, data quality, data governance, and data reuse—the expectations from the structured data world. However, unstructured data for decision making still is in its infancy, and speed is clearly the winner against adherence to architectural standards. The ultimate value is really doing large-volume unstructured data processing in the big data world outside of the enterprise data warehouse (EDW) and funneling the insightful data back to the EDW. Both EDW and big data are addressing parts of the same puzzle, and eventually the respective architectures will blend and mutually complement each other in delivering analytic value. Often this question comes up: “What is the right unit in an organization to build big data capabilities?” Most of the big data pilots and PoCs are done by the BI organization itself, even though Java skill sets are not usually resident inside this organization.

As a matter of fact, one of the common technical use cases for big data is in shifting the ETL workloads off high CPU processing databases/appliances to Hadoop, and this trend I believe will increase significantly as big data technique gets mainstream.

Executive Bio: Ramasubramanian Vaidyanathaswamy is an experienced professional engaged in providing consultancy services in the areas of business intelligence, big data, and predictive analytics.

Motorola Solutions: Public Safety

Fast Data Streaming Video Analysis with Big Data

Motorola Solutions collects various unstructured data in the form of video surveillance pictures—images on a regular stream at public places such as airports, parking lots, traffic signals, and the like—and makes it actionable and securely distributes it across mission-critical devices and easy-to-manage networks. It's the technology and expertise that turn noise into information, information into intelligence, and intelligence into safety, thereby actively contributing to building safer communities, cities, counties, and states.

Motorola leverages Oracle event processing (OEP) capability to track incidents based on event triggers such as suspicious vehicle enters airport car park (automatic license plate readers [ALPRs]), driver is loitering in car park (video analytics), driver has a BOLO (be on the lookout) alert (facial recognition), and cross-referencing with database for an alert and action to prevent major incidents.

Source: https://solutions.oracle.com/scwar/sc/Partner/SC2PP-MOTOROLA.html.

Executive Insights
Markus Zirn, Vice President, Splunk

If you are in financial services, telecommunications, or online services, your entire business relies on key applications. In all other industries, including government sectors, key applications are also business-critical. If those applications go down, there are significant ramifications. How well they run and are supported, tested, secured, and adopted by users makes a big difference in whether your enterprise does well. Such applications can be on the web or mobile, and can be offered to customers, partners, or employees. Such business-critical applications can be complex and distributed in their architecture or strung together into larger business processes.

Each business-critical application has a complex machine data footprint. Machine data is the window, the vital signs that tell how well these applications are running. Of course, this machine data footprint includes application logs, error logs, and performance data that support the business-critical application. The reality is that the application's machine data footprint provides invaluable information to keep it up and running, to support it better, to test it more efficiently, to avoid security-related issues, and to better manage service-level agreements, track user behavior, and evaluate its overall effectiveness.

In the competitive business environment, it is critical to harness all the valuable information hidden inside the application's machine data footprint.

Executive Bio: Markus Zirn currently leads all product management for Splunk, a high-growth big data company focused on operational intelligence software. He ran product management for all of Oracle's middleware such as service-oriented architecture (SOA), business process management (BPM), process solutions, complex event processing, and ERP user productivity.

How Big Is the Big Data Market?

The big data market is on the verge of rapid growth to the tune of $50 billion worldwide within the next five years. We already see increased interest in and awareness of the power of big data and related analytic capabilities to gain competitive advantage and to improve business agility.

Of the current market, big data pure-play vendors account for $310 million in revenue. Despite their relatively small percentage of current overall revenue (approximately 5 percent), these vendors such as Vertica, Splunk, and Cloudera are responsible for the vast majority of new innovations and modern approaches to data management and analytics that have emerged over the past several years and made big data the hottest sector in IT.

Executive Insights
Shashi Upadhyay, Founder and CEO, Lattice Engines

We live in a big data world. This shift is setting the stage for numerous advantages for businesses, especially in how they market and sell. Data is pouring in from everywhere—the web, social media, marketing automation, and CRM systems and other third-party sources. But this big data isn't just about the data. It is about the insight that we can draw from it to improve our processes and businesses.

How Businesses Can Learn from Internet Giants

Internet giants have been drawing competitive advantage from the influx of data for more than 10 years. Take Netflix as an example. Netflix tracks unstructured data about viewer preferences and ratings to tee up recommendations for new programs to ensure that its customers have the best experience. The company is also using viewership data to determine the right formula for new content development to produce shows that its customer base can't wait to binge watch. In doing so, Netflix is driving engagement and usage over sustained periods of time.

Amazon is another great example of an Internet giant utilizing insights from big data to provide a better experience for its customers. Amazon tracks what its customers purchase and covet in addition to what's purchased and coveted from different locations. It uses that information to recommend products for future purchases. Consequently, Amazon has been able to produce recommendations that result in closed sales totaling more than $75 million in 2013.

To draw the value from big data and gain a true competitive advantage, companies should try to answer the following questions: Can external data be used in some way? Are structured and unstructured data being analyzed to uncover buyer intent? Is data being used both to analyze past events and to predict future events—such as who will purchase our products and services?

The real key to driving competitive advantage through data is to focus on the predictive and prescriptive qualities of the information and insight that are discovered.

Predictive Analytics in Action

Technology and data are fueling a significant transformation in marketing and sales today. Many successful marketing organizations are using the power of big data to build predictive models and transform their businesses through customer acquisition, retention, and expansion. Amazon and Netflix are leading the charge on the consumer side. Companies like Dell and Citrix are doing it on the B2B side.

Business-to-business (B2B) companies can model their own recommendations after Amazon's engines. The answer is hiding in plain sight in many cases. Oftentimes, companies are using cloud technologies like marketing automation and CRM to store data on customers and prospects. The reality is that these systems hold only about 5 percent of what is knowable about a customer or prospect. Data outside of these systems, like job postings, grants, government contracts, social media, patents, locations, credit rankings, and purchase history, can be incredibly indicative of intent.

How does this translate to the real world? One Lattice Engines customer focused on improving its marketing efforts through big data. Citrix, an enterprise software company, was experiencing a very high lead volume with a low conversion rate. By incorporating big data into its traditional lead scoring approach, the company was able to improve its lead conversion rate by 30 percent. Another customer, Dell, had similar success when implementing predictive analytics to prioritize the best leads for its sales team. By highlighting the leads that were most likely to convert, the marketing organization reduced the number of leads it passed to the sales department by 50 percent, and revenue results increased by nearly double.

Driving Competitive Advantage

With data science, marketing organizations are able to go beyond traditional creative approaches and campaigns to impact revenue growth in a real way. Marketers can use attributes from all the knowable data about their customers and prospects from the web to predict whether someone will buy their product, and use that information to determine which leads are ready for sales engagement and which require additional nurturing. With the help of data science and analytics, marketers can ensure that only the best leads are sent to the sales department, ultimately boosting conversion rates.

Companies can also turn to data science to identify opportunities for growth within their existing customer bases. For most organizations, new customers represent less than 20 percent of total revenue, while existing customers drive the remaining 80 percent. In many cases individual sales representatives do not have the information and tools to be successful at up-selling existing accounts. They tend to gravitate to the content and products with which they are most comfortable. Given the limited capacity of individual sales reps, they must make instinctive bets about which accounts and products to focus on. Many newer products, services, or messaging capabilities get limited attention. By applying data science to information hidden in marketing automation and external cues such as hiring trends, funding announcements, and office openings, companies can pinpoint the customer accounts that are ripe for up-sell and cross-sell opportunities.

Sales professionals can also utilize big data insights to get ahead of the competition. In a sense, big data can democratize the selling excellence by helping all reps get on the same page in answering the question, “How can I find the customers who are most receptive to my product/service at a given time?” The insights gleaned from the data can provide a sixth sense about what is happening and arm sales reps with the right product recommendations and messages to be able to close the sale, just as Amazon does.

Executive Bio: Shashi Upadhyay is the founder and CEO at Lattice Engines, which delivers data-driven business applications to help companies of all sizes sell more intelligently based on data science and predictive big data modeling.

Marketing and sales organizations are ready for the transformation that big data and predictive analytics bring. This approach is making existing businesses smarter and more efficient by focusing the right resources on customers and prospects that are ready to buy—crushing the competition.

How Would You Manage Big Data on Technology Platforms?

The recent cloud-based technologies and cloud operating environment (discussed in detail in Chapter 3) based on a scalable elastic model have allowed support for a new services deployment model that can be consumed globally from anywhere on any device. The cloud platform has enabled big data storage, processing, and analytics as well.

Let us examine big data tools, platforms, and applications that may offer predictive analytics capabilities to enable effective decision making for sustainable competitive advantage.

Big Data Tools, Platforms, and Applications

Cloud-based applications and services are increasingly allowing small and midsize business to take advantage of big data without needing to deploy on-premises hardware or software. Manufacturing companies deploy sensors in their products to return a stream of telemetry. The proliferation of smartphones and other global positioning system (GPS) devices offers advertisers an opportunity to target consumers when they are in close proximity to a store, coffee shop, or restaurant. This opens up new revenue for service providers and offers many businesses a chance to target new customers.

Use of social media and web log files from their e-commerce sites can help retailers understand their customers' buying patterns, behaviors, likes, and dislikes. This can enable much more effective micro customer segmentation and targeted marketing campaigns, as well as improve supply chain efficiencies.

As with data warehousing, web stores, or any IT platform, an infrastructure for big data has unique requirements. In considering all the components of a big data platform, it is important to easily integrate big data with enterprise data to conduct deep analytics on the combined data set.

In order to make the most meaningful use of big data, businesses must evolve their IT infrastructures to handle the rapid rate of delivery of extreme volumes of data, with varying data types, which can then be integrated with an organization's other enterprise data to be analyzed. When big data is captured, optimized, and analyzed in combination with traditional enterprise data, companies can develop a more thorough and insightful understanding of their business, which can lead to enhanced productivity, a stronger competitive position, and greater innovation to have an impact on the bottom line. For example, in the delivery of health care services, management of chronic or long-term conditions is expensive. Use of in-home monitoring devices to measure vital signs and monitor progress is just one way that sensor data can be used to improve patient health care and reduce both office visits and hospital admittance.

The requirements in a big data infrastructure involve data acquisition, data organization, and data analysis. Because big data refers to data streams of higher velocity and higher variety, the infrastructure required to support the acquisition of big data must deliver low, predictable latency in both capturing data and executing short, simple queries; be able to handle very high transaction volumes, often in a distributed environment; and support flexible, dynamic data structures.

In classic data warehousing terms, organizing data is called data integration. Because there is such a high volume of big data, there is a tendency to organize data at its original storage location, thus saving both time and money by not moving around unnecessarily large volumes of data. The infrastructure required for organizing big data must be able to process and manipulate data in the original storage location; support very high throughput (often in batches) to deal with large data processing steps; and handle a large variety of data formats, from unstructured to structured.

Executive Insights
Harbinder Khera, Founder and CEO, Mindmatrix

Big Data: The Biggest Words in Marketing Today

The future belongs to companies that are able to make big data actionable and use it to build a personal relationship with their prospects. Have you noticed how sensitive marketing has become today? It's all around you. You take out an auto loan; the bank follows up with a quote on auto insurance. You Liked the Pittsburgh Steelers' page on Facebook; the next time you log in you see Facebook advertisements about special deals on Steelers' jerseys. You walked into a store but didn't buy anything, and yet the next day you receive an e-mail with a special discount coupon on the product you browsed yesterday in the store. All of this is courtesy of big data. Big data has become the magic phrase in marketing today. It throws wide open a big door into the prospect's world, allowing businesses to be a part of the customer's and prospect's daily lives, inconspicuously. Thanks to big data, companies no longer have to rely on gut instinct and guesses but are instead able to make well-informed marketing decisions.

Businesses are subtly monitoring every prospect's action. Every download, every click, every form sign-up offers an insight into a prospect's interests. Information is being gathered from social media activities, web searches, point-of-sale systems, online shopping cart checkouts and even in-store (physical presence) activities to draw a complete picture of prospect preferences—in real time. In the B2C sphere, big data offers tremendous scope for cross-selling and up-selling.

Successful companies are the ones that are able to make their big data actionable and use it to build personal relationships with their prospects and clients.

Has Hadoop Solved Big Data Problems?

Apache Hadoop is a new technology that allows large data volumes to be organized and processed while keeping the data on the original data storage cluster. Hadoop Distributed File System (HDFS) is the long-term storage system for web logs. These web logs are turned into browsing behavior (sessions) by running MapReduce programs on the cluster and generating aggregated results on the same cluster. These aggregated results are then loaded into a relational DBMS system. Since data is not always moved during the organization phase, the analysis may also be done in a distributed environment, where some data will stay where it was originally stored and be transparently accessed from a data warehouse. The infrastructure required for analyzing big data must be able to support deeper analytics such as statistical analysis and data mining, on a wider variety of data types stored in diverse systems; scale to extreme data volumes; deliver faster response times driven by changes in behavior; and automate decisions based on analytical models. Most important, the infrastructure must be able to integrate analysis on the combination of big data and traditional enterprise data. New insight comes not just from analyzing new data, but from analyzing it within the context of the old to provide new perspectives on old problems. For example, analyzing inventory data from a smart vending machine in combination with the events calendar for the venue in which the vending machine is located will dictate the optimal product mix and replenishment schedule for the vending machine.

Many new technologies have emerged to address the IT infrastructure requirements just outlined. These new systems have created a divided solutions spectrum comprised of NoSQL solutions that are developer-centric specialized systems and SQL solutions that are typically equated with the manageability, security, and trusted nature of relational database management systems (RDBMSs).

A few niche vendors are developing applications and platforms that leverage the underlying Hadoop infrastructure to provide both data scientists and business users with easy-to-use tools for experimenting with big data. These include Datameer, which has developed a Hadoop-based business intelligence platform with a familiar spreadsheet-like interface; Karmasphere, whose platform allows data scientists to perform ad hoc queries on Hadoop-based data via an SQL interface; and Digital Reasoning, whose Synthesis platform sits on top of Hadoop to analyze text-based communication.

Tresata's cloud-based platform, for example, leverages Hadoop to process and analyze large volumes of financial data and returns results via on-demand visualizations for banks, financial data companies, and other financial services companies.

Additionally, 1010data offers a cloud-based application that allows business users and analysts to manipulate data in the familiar spreadsheet format but at big data scale. And the ClickFox platform mines large volumes of customer touch-point data to map the total customer experience with visuals and analytics delivered on demand.

Non-Hadoop Big Data Platforms

Other non-Hadoop vendors contributing significant innovation to the big data landscape include Splunk, which specializes in processing and analyzing log file data to allow administrators to monitor IT infrastructure performance and identify bottlenecks and other disruptions to service. HPCC (High-Performance Computing Cluster) Systems, a spin-off of LexisNexis, offers a competing big data framework to Hadoop that its engineers built internally over the past 10 years to assist the company in processing and analyzing large volumes of data for its clients in finance, utilities, and government. DataStax offers a commercial version of the open source Apache Cassandra NoSQL database along with related support services bundled with Hadoop.

NoSQL databases are frequently used to acquire and store big data. They are well suited for dynamic data structures and are highly scalable. The data stored in an NoSQL database is typically of a high variety because the systems are intended to simply capture all data without categorizing and parsing the data. For example, NoSQL databases are often used to collect and store social media data. While customer-facing applications frequently change, underlying storage structures are kept simple.

Instead of designing a schema with relationships between entities, these simple structures often just contain a major key to identify the data point and then a content container holding the relevant data. This simple and dynamic structure allows changes to take place without costly reorganizations at the storage layer.

NoSQL systems are designed to capture all data without categorizing and parsing it upon entry into the system, and therefore the data is highly varied. SQL systems, however, typically place data in well-defined structures and impose metadata on the data captured to ensure consistency and validate data types.

Distributed file systems and transaction (key-value) stores are primarily used to capture data and are generally in line with the requirements discussed earlier in this chapter. To interpret and distill information from the data in these solutions, a programming paradigm called MapReduce is used. MapReduce programs are custom-written programs that run in parallel on the distributed data nodes.

The key-value stores or NoSQL databases are the online transaction processing (OLTP) databases of the big data world; they are optimized for very fast data capture and simple query patterns. NoSQL databases are able to provide very fast performance because the data that is captured is quickly stored with a single identifying key rather than being interpreted and cast into a schema. By doing so, NoSQL database can rapidly store large numbers of transactions.

However, due to the changing nature of the data in the NoSQL database, any data organization effort requires programming to interpret the storage logic used. This, combined with the lack of support for complex query patterns, makes it difficult for end users to distill value out of data in an NoSQL database.

To get the most from NoSQL solutions and turn them from specialized, developer-centric solutions into solutions for the enterprise, they must be combined with SQL solutions into a single proven infrastructure that meets the manageability and security requirements of today's enterprises.

How Does Oracle Address Big Data Challenges?

Oracle's big data strategy is centered on current enterprise data architecture to incorporate big data and deliver business value, leveraging the proven reliability, flexibility, and performance of existing systems to address evolving big data requirements.

Oracle offers engineered and integrated systems to meet the big data challenge by including software and hardware into one engineered system. The Oracle Big Data Appliance is an engineered system that combines optimized hardware with the most comprehensive software stack featuring specialized solutions developed by Oracle to deliver a complete, easy-to-deploy solution for acquiring, organizing, and loading big data into Oracle Database 11g. It is designed to deliver extreme analytics on all data types, with enterprise-class performance, availability, supportability, and security. With Big Data Connectors, the solution is tightly integrated with Oracle Exadata and Oracle Database, so you can analyze all your data together with extreme performance.

Oracle Big Data Appliance

Oracle Big Data Appliance comes in a full rack configuration with 18 Sun servers for a total storage capacity of 648 TB. Every server in the rack has two CPUs, each with six cores for a total of 216 cores per full rack. Each server has 48 GB of memory for a total of 864 GB of memory per full rack.

Oracle Big Data Appliance includes a combination of open source software and specialized software developed by Oracle to address enterprise big data requirements.

Big Data Appliance contains Cloudera's Distribution including Apache Hadoop (CDH) and Cloudera Manager. CDH is the leading Apache Hadoop-based distribution in commercial and noncommercial environments. CDH consists of 100 percent open source Apache Hadoop plus the comprehensive set of open source software components needed to use Hadoop. Cloudera Manager is an end-to-end management application for CDH. Cloudera Manager gives a clusterwide, real-time view of nodes and services running; provides a single, central place to enact configuration changes across the cluster; and incorporates a full range of reporting and diagnostic tools to help optimize cluster performance and utilization.

Where Oracle Big Data Appliance makes it easy for organizations to acquire and organize new types of data, Oracle Big Data Connectors enables an integrated data set for analyzing all data. Oracle Big Data Connectors can be installed on an Oracle Big Data Appliance or on a generic Hadoop cluster.

Oracle Loader for Hadoop (OLH) enables users to use Hadoop MapReduce processing to create optimized data sets for efficient loading and analysis in Oracle Database 11g. Unlike other Hadoop loaders, it generates Oracle internal formats to load data faster and use fewer database system resources. OLH is added as the last step in the MapReduce transformations as a separate map–partition–reduce step. This last step uses the CPUs in the Hadoop cluster to format the data into Oracle-understood formats, allowing for a lower CPU load on the Oracle cluster and higher data ingest rates because the data is already formatted for Oracle Database. Once loaded, the data is permanently available in the database, providing very fast access to this data for general database users leveraging SQL or business intelligence tools.

Oracle Direct Connector for Hadoop Distributed File System (HDFS) is a high-speed connector for accessing data on HDFS directly from Oracle Database. Oracle Direct Connector for HDFS gives users the flexibility of querying data from HDFS at any time, as needed by their application. It allows the creation of an external table in Oracle Database, enabling direct SQL access on data stored in HDFS. The data stored in HDFS can then be queried via SQL, joined with data stored in Oracle Database, or loaded into the Oracle Database. Access to the data on HDFS is optimized for fast data movement and parallelized, with automatic load balancing. Data on HDFS can be in delimited files or in Oracle data pump files created by Oracle Loader for Hadoop.

Oracle Data Integrator Application Adapter for Hadoop simplifies data integration from Hadoop and an Oracle Database through Oracle Data Integrator's easy-to-use interface. Once the data is accessible in the database, end users can use SQL and Oracle BI Enterprise Edition to access data. Even enterprises that are already using a Hadoop solution, and don't need an integrated offering like Oracle Big Data Appliance, can integrate data from HDFS using Big Data Connectors as a stand-alone software solution.

Oracle R Connector for Hadoop is an R package that provides transparent access to Hadoop and to data stored in HDFS. R Connector for Hadoop provides users of the open source statistical environment R with the ability to analyze data stored in HDFS, and to run R models at scale against large volumes of data leveraging MapReduce processing—without requiring R users to learn yet another API or language. End users can leverage over 3,500 open source R packages to analyze data stored in HDFS, while administrators do not need to learn R to schedule R MapReduce models in production environments. Connector for Hadoop can optionally be used together with the Oracle Advanced Analytics Option for Oracle Database. The Oracle Advanced Analytics Option enables R users to work transparently with database resident data without having to learn SQL or database concepts but with R computations executing directly in database.

Oracle NoSQL Database is a distributed, highly scalable, key-value database based on Oracle Berkeley DB. It delivers a general-purpose, enterprise-class key-value store adding an intelligent driver on top of distributed Berkeley DB. This intelligent driver keeps track of the underlying storage topology, shards the data, and knows where data can be placed with the lowest latency. Unlike competitive solutions, Oracle NoSQL Database is easy to install, configure, and manage; it supports a broad set of workloads, and delivers enterprise-class reliability backed by enterprise-class Oracle support.

The primary use cases for Oracle NoSQL Database are low latency data capture and fast querying of that data, typically by key lookup. Oracle NoSQL Database comes with an easy-to-use Java API and a management framework.

The product is available in both an open source community edition and in a priced enterprise edition for large distributed data centers. The former version is installed as part of the Big Data Appliance integrated software.

Passoker Optimizes Online Betting

Passoker partners with a sports data provider that archives and distributes historical information, critical to gamers' abilities to instantly assess events. The partner collects, packages, and distributes information on some 60,000 events each year, from more than 30 different sports in approximately 70 countries. The incoming data from this service is used to determine the options available for each of the many events tracked on Passoker's betting platform, enabling bettors to wager on eventualities, such as the next goal, free kick, or penalty in a particular soccer match.

Passoker needed a database technology that could rapidly receive the XML files on which its gaming platform relies, and then quickly process those files for relevance. Also essential were high availability, rapid scaling, and guaranteed ordering of events.

Passoker deployed Oracle NoSQL Database as the new platform supporting the company's gaming solutions, reducing development and implementation times by 75 percent (saving 30 to 50 days) over other database options, due to Oracle NoSQL's built-in, configurable replication capabilities.

Source: www.oracle.com/us/corporate/customers/customersearch/passoker-1-nosql-ss-1863507.html.

In-Database Analytics

Once data has been loaded from Oracle Big Data Appliance into Oracle Database or Oracle Exadata, end users can use one of the following easy-to-use tools for in-database, advanced analytics:

Oracle R Enterprise. Oracle's version of the widely used Project R statistical environment enables statisticians to use R on very large data sets without any modifications to the end user experience. Examples of R usage include predicting airline delays at a particular airport and the submission of clinical trial analysis and results.
In-database data mining is the ability to create complex models and deploy these on very large data volumes to drive predictive analytics. End users can leverage the results of these predictive models in their BI tools without the need to know how to build the models. For example, regression models can be used to predict customer age based on purchasing behavior and demographic data.
In-database text mining is the ability to mine text from microblogs and CRM system comment fields, and to review sites combining Oracle Text and Oracle Data Mining. An example of text mining is sentiment analysis based on comments. Sentiment analysis tries to show how customers feel about certain companies, products, or activities.
In-database semantic analysis is the ability to create graphs and connections between various data points and data sets. Semantic analysis creates, for example, networks of relationships determining the value of a customer's circle of friends. When looking at customer churn, customer value is based on the value of his or her network, rather than on just the value of the customer.
In-database spatial is the ability to add a spatial dimension to data and show data plotted on a map. This ability enables end users to understand geospatial relationships and trends much more efficiently. For example, spatial data can visualize a network of people and their geographical proximity. Customers who are in close proximity can readily influence each other's purchasing behavior, an opportunity that can be easily missed if spatial visualization is left out.
In-database MapReduce is the ability to write procedural logic and seamlessly leverage Oracle Database parallel execution. In-database MapReduce allows data scientists to create high-performance routines with complex logic. In-database MapReduce can be exposed via SQL. Examples of leveraging in-database MapReduce are sessionization of web logs or organization of call details records (CDRs).

Every one of the analytical components in Oracle Database is valuable. Combining these components creates even more value to the business. Leveraging SQL or a BI tool to expose the results of these analytics to end users gives an organization an edge over others that do not leverage the full potential of analytics in Oracle Database. Connections between Oracle Big Data Appliance and Oracle Exadata are via InfiniBand, enabling high-speed data transfer for batch or query workloads. Oracle Exadata provides outstanding performance in hosting data warehouses and transaction processing databases. Now that the data is in mass-consumption format, Oracle Exalytics can be used to deliver the wealth of information to the business analyst.

Predictive Analytics

Predictive analytics is an area of data mining that deals with extracting information from data and using it to predict trends and behavior patterns. Predictive analytics offers capabilities to analyze and understand the behavior that may lead to the actions in the future. In the current business scenario, it is extremely important to implement predictive modeling, scoring data with predictive models and forecasting for future actions. Predictive analytics is business intelligence technology that produces a predictive score for each customer or other organizational element. Assigning these predictive scores is the job of a predictive model that has, in turn, been trained over data, learning from the experience of the organization. Predictive analytics optimizes marketing campaigns and website behavior to increase customer responses, conversions, and clicks, and to decrease churn. Each customer's predictive score informs actions to be taken with that customer.

Analyzing new and diverse digital data streams can reveal new sources of economic value, provide fresh insights into customer behavior, and identify market trends early on. But this influx of new data creates challenges for IT departments. To derive real business value from big data, businesses need the right tools to capture and organize a wide variety of data types from different sources, and to be able to easily analyze it within the context of all your enterprise data. In this competitive business environment, it will be important for businesses to have a blend of a healthy data-science culture for creating business agility, for staying competitive, and for survivability. Data scientists will play a major role in helping C-level decision makers with the right information based on new big data analytics in the context of the business.

In-Memory Analytics

In-memory analytics is the new revolution in data management and offers greater power to run data business with increased agility and more perfectly. In-memory analytics is a methodology used to solve complex and time-sensitive business scenarios. It works by increasing the speed, performance, and reliability when querying data. In-memory analytics is an approach to querying data when it resides in a computer's random-access memory (RAM), as opposed to querying data that is stored in databases. The software platform is optimized for distributed, in-memory processing, to help run new scenarios or complex analytical computations at a faster pace. Now businesses can instantly explore, visualize, and analyze data and tackle problems that were never considered before due to computing constraints.

In-memory analytics can provide fast access to deeper insights to seize opportunities and mitigate threats in near real time, run more sophisticated queries and models using all data to generate more precise insights that can improve business performance, and get answers to most difficult business questions quickly, with the speed and flexibility to meet business needs today and in the future.

SAP introduced HANA a few years ago to exploit in-memory processing capabilities of database to process large data workloads in order to provide data processing and analytics in real time to help businesses make the right decisions based on critical information. It converges database and application platform capabilities in memory to transform transactions, analytics, text analysis, and predictive and spatial data to enrich decision power in business.

Based on in-memory computing, Oracle has also offered an option to switch the database for in-memory processing, exploiting the memory and caching technology.

Spark cluster computing is yet another revolution exploiting in-memory computing in clusters handling large data sets based on the Hadoop framework. Spark can store data in the memory subsystems of the thousands of servers it pulls together, unlike Hadoop, which stores its data on old-fashioned hard disks. It not only fetches data fast but also provides scale-out deployment on demand based on the large number of nodes in the cluster environment.

The Spark cluster computing framework is an outcome of work by two research scientists, Matei Zaharia, a Romanian-born graduate student who has spent the past few years at Berkeley's AMPLab, a research operation dedicated to software that runs distributed software, and another Romanian, Berkeley professor Ion Stoica.

In the next chapter, we discuss the predictability of your business that drives key decisions and the business wisdom associated with it. When knowledge is power, it is extremely significant for businesses to cultivate a knowledge ecosystem for survival, sustenance, continued growth, and business agility. We have highlighted key elements of building knowledge ecosystems and a knowledge management process that would help a business to harness power for continued growth and leadership.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 8: Role of Collective Intelligence

Create new playlist

Sign In

Sign Up

Why Should You Care about Big Data?

What Do Key Characteristics Signal about Big Data?

Volume

Velocity

Variety

Value

Does Size of Data Really Matter?

How Complex Is Big Data?

How Does Big Data Coexist with Existing Traditional Data?

How Big Is the Big Data Market?

How Would You Manage Big Data on Technology Platforms?

Big Data Tools, Platforms, and Applications

Has Hadoop Solved Big Data Problems?

Non-Hadoop Big Data Platforms

How Does Oracle Address Big Data Challenges?

Oracle Big Data Appliance

In-Database Analytics

Predictive Analytics

In-Memory Analytics

Table of Contents for
Chapter 8: Role of Collective Intelligence