Building real-time applications with Amazon Kinesis Analytics

Typically, it is a three step process to build real-time applications:

  1. Connect to the streaming source: Streaming data sources include Amazon Kinesis Firehose or Amazon Kinesis Streams. Input formats supported include JSON, CSV, variable column, or unstructured text. Each input has a schema that can be automatically inferred but you can also manually edit the schema. Ensure you carefully review and test inferred input schema. You might need to manually update the schema to handle nested JSON with greater than two levels of depth. 
  2. Writing the SQL code: Build streaming applications with SQL statements. It provides robust SQL support and advanced analytic functions out of the box. Additionally, it provides extensions to the SQL standard that work seamlessly with streaming data. It has built-in support for at-least-once processing semantics. Best practices include avoiding time-based windows of greater than one hour and using smaller SQL queries, with multiple in-application streams, rather than a single, large query.
  3. Continuously deliver SQL results: You can send processed data to multiple destinations: S3, Redshift, AWS ES (through Firehose), and Streams (with AWS Lambda integration for custom destinations). It gives you end-to-end processing speeds that are in the sub-second range (depending on the query).

Additionally, you can also reference data sources (S3) for data enrichment purposes. As a practice, you should limit the number of applications reading from same source to avoid exceeding the provisioned throughput. For example, for an Amazon Kinesis Streams source, limit the total number of applications to two applications, and for Amazon Kinesis Firehose, limit it to a single application.

You can set up CloudWatch alarms to track how far behind the application is from the source and raise alarms accordingly. You can also increase input parallelism to improve the performance. For example, if the application is not keeping up with the input stream, then consider increasing input parallelism to create multiple source in-application streams. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.34.154