Introducing big data applications

Big data applications require the ability to ingest and store massive amounts of structured and unstructured data. We also need to be able access and/or process the data flexibly and securely. Additionally, we would like to future proof our big data solution (design and implementation) against rapidly evolving business use cases and technology.

There are three typical types of data-driven development:

  1. Historical analysis and reporting supported by using services such as Amazon Redshift, Amazon RDS, Amazon S3, and Amazon EMR
  2. Real-time processing and dashboards supported by using services such as Amazon Kinesis, Amazon EC2, and AWS Lambda
  3. Intelligent applications supported by using services such as Amazon Deep Learning AMI, Amazon machine learning, and Amazon SageMaker

Traditionally, batch processing has been used to process massive volumes of data such as hourly server logs, generating weekly or monthly bills, daily website clickstream analysis, and daily fraud reports. As machine learning applications are increasingly becoming mainstream, such jobs have also included training machine learning models. However, recently, there is significant shift towards real-time streaming applications.

Organizations want to use streaming data as the incoming data loses value over time. They want to ingest data as it is generated and analyze it in real time to get insights, immediately. Examples of real-time data include events from mobile apps, web clickstream, application logs, IoT sensors, and so on. Stream processing may include computation of real-time metrics, real-time spending alerts/caps, real-time clickstream analysis, real-time fraud detection, and so on. For example, applications such as web analytics and leaderboards ingest web application data, compute top 10 users and persist to feed live apps. The continuous stream of data is typically processed over moving time windows or over a number of events. Similarly, in IoT applications sensor data is ingested, and metrics such as average temperature is computed every 10 seconds, and the time series analytic is then persisted to a serving database.

The main components of a streaming application include:

  • Data producer: This continuously creates data and continuously writes the data to a stream.
  • Streaming service: This durably stores data and provides temporary buffer for data preparation/pre-processing. This service needs to support a very high throughput.
  • Data consumer: This continuously processes the data and also cleans, prepares, and aggregates incoming data.

Real-time analytics requirements have components that ingest, transform, analyze, react, and persist the event data. Such applications need to be durable, continuous, fast, reactive, available, and reliable.

There are three common patterns for streaming applications:

  • Streaming ingest-transform-load: This delivers data to analytical tools faster and cheaper
  • Continuous metric generation: This computes analytics as the data is generated
  • Actionable insights: This reacts to analytics based off of insights.

The next wave of business applications includes predictive analytics applications. Predictive analytics is important because companies have been accumulating big data about customers, product/services, and operations for many years now and big data technologies have provided proven solutions to store and process this data. Companies are feeling an increasing pressure to turn data into insights about trends, classifications, detect anomalies, and provide feedback loops to improve their businesses. There is a strong desire to evolve from backward-looking monthly or quarterly reports to real-time alerts, and now to predict the future.

Enterprises want to answer customer-related, product-related, and business operations related questions. For instance, customer predictions include: Which customers are likely to be the most profitable? How much revenue should I expect this customer to generate? Which customers are likely to churn? Among all of our customers, which are likely to respond to a given offer? What’s the probability that a given customer will respond to a given offer?

Similarly, product or service predictions typically include: what products should we offer or develop? What items are likely to be purchased together? Business operations predictions could include questions such as: are the metrics for a service nominal or anomalous? Is a specific equipment likely to fail within a given time period?

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.255.145