5. Real-Time Processing Using Azure Stream Analytics

The Lambda Architecture

In today’s modern data analytics, a new stream processing strategy has been proposed—the “Lambda” architecture—and has been widely attributed to Nathan Marz, the creator of Apache Storm.

The fundamental essence of the Lambda architecture is that it’s designed to ingest massive quantities of incoming data by taking advantage of batch and stream processing methodolog ies. Additional attributes of the Lambda architecture include the following:

Ability to process a vast array of workloads and scenarios.
High throughput characterized by low-latency reads with frequent writes and updates.
Retaining the incoming data in the original format. This is the notion of a “data lake” .
Modeling data transformations as a series of materialized stages from the original data.
Highly scalable, nearly linear, scale-out infrastructure to provide scale up/down.

Figure 5-1 depicts the Lambda architecture.

Figure 5-1. The Lambda architecture

By merging batch and stream processing in the same architecture, the result is an optimized data analytics engine that is capable of not just processing the data, but also of delivering the right data at the right time.

It should be noted that this new just-in-time (JIT) data that can now be surfaced with streaming analytics (via the Lambda architecture) can often become the source of many types of competitive advantages for a business or enterprise that knows how to exploit this type of information across a wide array of use cases.

The real magic comes in knowing how to recognize the hidden opportunities buried deep inside the data and from there take action to explore, develop, fail-fast, and finally succeed in evoking true transformational changes in a business or industry. This is basically the essence of this book and the guiding light for our own reference implementation.

Microsoft has recognized the need for streaming analytics at scale as part of today’s modern Internet-of-Things (IoT ) solutions and has incorporated streaming capabilities into several popular architectures, such as Cortana Analytics along with the IoT Suite offerings shown in Figure 5-2.

Figure 5-2. Azure IoT suite architecture

What Is Streaming Analytics?

One of the real advantages of a great streaming analytics engine is the ability to provide real-time analytics and outputs so that data becomes actionable with a minimum of delay.

To provide context to the challenges involved in developing streaming analytics solution, the best canonical example is a scenario of counting the number of red cars in a parking lot versus counting the number of red cars that pass by any major freeway (assuming it’s not rush hour) for every 10 minute interval in a one-hour period (see Figure 5-3).

Figure 5-3. Challenges of streaming analytics

The essence of this scenario is “data in motion” versus “data at rest”. The key difference here is the element of “time” and the ability to capture and analyze periodic “slices” of data across potentially millions of events in order to detect patterns and anomalies in the massive amounts of streaming data.

Real-Time Analytics

Real-time analytics is all about the ability to process data coming from literally millions of connected devices or applications, with the inherent ability to ingest and process potentially millions of events per second. A key component of this scenario is integration with a highly-scalable publish/subscribe pattern. Another key requirement is for simplified processing capabilities on continuous streams of data that allow a solution to transform, augment, correlate, and perform temporal (time-based) operations.

Correlating streaming data with reference data is also a core requirement in many cases, as the incoming data often needs to be matched with a corresponding host record.

Streaming Implementations and Time-Series Analysis

Today’s modern businesses are demanding analytics in real-time to obtain a competitive advantage; they have moved beyond the “old school” method of hourly, daily, weekly, and monthly reporting cadences to now relying on data that is only seconds old.

In order to make streaming analytics and reporting all the more relevant, support for time-window calculations becomes even more imperative. To this end, a set of time-series windows are beneficial such as:

Tumbling Windows: A series of fixed-sized, non-overlapping and contiguous time intervals. The diagram in Figure 5-4 illustrates a stream with a series of events and how they are mapped into ten-second tumbling windows.
Figure 5-4. An example of tumbling windows in streaming analytics
Hopping Windows: Model scheduled overlapping windows. A hopping window specification consist of these parameters:
- A time unit.
- A window size, how long each window lasts
- A hop size, how much each window moves forward relative to the previous one
- <Optional> Offset size, an optional fourth parameter.
The illustration in Figure 5-5 shows a stream with a series of events. Each box represents a hopping window and the events that are counted as part of that window, assuming that the hop is 5, and the size is 10.
Figure 5-5. An example of hopping windows in streaming analytics
Sliding Windows: When using a sliding window, the system is asked to logically consider all possible windows of a given length and output events only for those points in time when the content of the window actually changes, in other words when an event entered or exists the window. Figure 5-6 illustrates a sliding window.
Figure 5-6. Sample sliding window in streaming analytics

Predicting Outcomes for Competitive Advantage

In addition to detecting patterns and anomalies in the data, another key element to running a business at Internet speed is the ability to shape new business outcomes by predicting what may happen in the future. This is exactly the problem domain for predictive analytics and Machine Learning, which can use historical data combined with modern data science algorithms to predict a future outcome.

One of the key methods to accelerating business and building a competitive advantage is the ability to automate, supplement, or accelerate key business decision-making processes through the use of predictive analytics and Machine Learning. In today’s business world, decisions are made every day, often without all the facts and data, so any additional insights can often result in a huge competitive advantage.

Across many industries and verticals, many bottlenecks can be found today where there is lack of actionable data. This is a key area where Azure Stream Analytics and Machine Learning can help reduce friction and accelerate results.

A recent new feature of Azure Streaming Analytics is the ability to directly invoke Azure Machine Learning Web Services as streaming data is processed in order to enrich the incoming data stream or predict outcomes that might in turn, require triggering a notification or alert. This is certainly a key feature and capability and so we will cover this in all in detail in Chapter 8 .

Stream Processing: Implementation Options in Azure

When using Microsoft Azure, there are several choices available for implementing a Stream Processing layer, as illustrated in Figure 5-7 with an IoT architecture.

Figure 5-7. Stream processing layer in an IoT architecture

With Microsoft Azure, there are many options available for creating a steam processing layer. The choices range from using Open Source Software (OSS) packages on Linux Virtual Machines to leveraging fully managed Platform-as-a-Service, such as Azure HDInsight.

Here are the basic options for running a stream processing engine in Azure:

Streaming Options: Virtual Machines - (Infrastructure-as-a-Service):

Virtual machines (running Windows or Linux)
- Open Source Software Distributions:
- Hortonworks
- Cloudera
- Roll your own: Apache Storm/Spark, Apache Samza, Twitter Heron, Kafka Streams, Apache Flink, Apache Beam (data processing workflows), Apache Mesos (project myriad).
- Note that these options will also work for on-premise stream processing applications.
Azure Managed Services (Platform-as-a-Services – PaaS) Options:
Azure HDInsight: Managed Spark / Storm
- Essentially managed 100% compatible Hadoop in Azure.
- HBase as a columnar NoSQL transactional database running on Azure blobs.
- Apache Storm as a streaming service for near real-time processing.
- Hadoop 2.4 support for 100x query gains on Hive queries.
- Mahout support for Machine Learning and Hadoop.
- Graphical user interface for hive queries.
Azure Streaming Analytics
- Process real-time data in Azure using a simple SQL language.
- Consumes millions of real-time events from IoT or Event Hubs collected from devices, sensors, infrastructure, and applications.
- Performs time-sensitive analysis using SQL-like language against multiple real-time streams and reference data.
- Outputs to persistent stores, dashboards, or back to devices.

Choosing a Managed Streaming Analytics Engine in Azure

With the availability of Apache Storm and Spark Streaming capabilities on HDInsight, along with Azure Streaming Analytics, Microsoft has made available multiple options for both proprietary and open source technologies for implementing a streaming analytics solution.

It should be noted that both Azure managed analytics platforms provide the benefits of a managed PaaS solution, however, there are a few key distinct capabilities that differentiate them and should be considered when determining your final streaming architecture solution.

Ultimately, the final choice will be narrowed down to a few key considerations, which are highlighted in Table 5-1.

Table 5-1. Comparison Between Azure Streaming Analytics and HDInsight

Feature	Azure Stream Analytics	HDInsight Apache Storm
Input Data formats	Supported input formats are Avro, JSON, and CSV.	Any format may be implemented via custom code.
SQL Query Language	An easy-to-use SQL language support is available with the same syntax as ANSI SQL.	No, users must write code in Java C# or use Trident APIs.
Temporal Operators	Windowed aggregates and temporal joins are supported out-of-the-box.	Temporal operators must be implemented via custom code.
Custom Code Extensibility	Available via JavaScript user-defined functions.	Yes, there is availability to write custom code in C#, Java, or other supported languages on Storm.
Support for UDFs (User Defined Functions)	UDFs can be written in JavaScript and invoked as part of a real-time stream processing query.	UDFs can be written in C#, Java, or the language of your choice.
Pricing	Stream Analytics is priced by the number of streaming units required. A unit is a blend of compute, memory and throughput.	For Apache Storm on HDInsight, the unit of purchase is cluster-based, and is charged based on the time the cluster is running, independent of jobs deployed.
Business Continuity / High Availability Services with guaranteed SLAs	SLA of 99.9% uptime. Auto-recovery from failures. Recovery of state-full temporal operators is built-in.	SLA of 99.9% uptime of the Storm cluster. Apache Storm is a fault-tolerant streaming platform. Customers' responsibility to ensure their streaming jobs run uninterrupted.
Reference data	Reference data available from Azure Blobs with max size of 100MB of in-memory lookup cache. Refreshing of reference data is managed by the service.	No limits on data size. Connectors available for HBase, DocumentDB, SQL Server and Azure. Unsupported connectors may be implemented via custom code. Refreshing of reference data must be handled by custom code.
Integration with Machine Learning	Via configuration of published Azure Machine Learning models as functions during Azure Streaming Analytics job creation.	Available through Storm Bolts.

Note

Refer to the following link for additional information regarding choosing a managed streaming analytics platform in Azure:

Choosing a streaming analytics platform: Apache Storm comparison to Azure Stream Analytics:

https://azure.microsoft.com/en-us/documentation/articles/stream-analytics-comparison-storm

It should be noted that HDInsight supports both Apache Storm and Apache Spark Streaming. Each of these frameworks provide streaming capabilities and these capabilities may be worthy of further analysis depending on your specific implementation requirements.

See the following link for additional information:

Apache Spark for Azure HDInsight; https://azure.microsoft.com/en-us/services/hdinsight/apache-spark/

Here is an additional comparison of the three managed streaming implementations (via HDI and ASA) that are possible on Microsoft Azure :

Feature	Storm on HDI	SparkStreaming on HDI	Azure Streaming Analytics
Programming Model	Java, C#	Scala, Python, Java	SQL Query Language
Delivery Guarantee	At-least-once (Exactly once w/ Trident)	Exactly Once.	At least once
State Management	Yes	Yes	Yes
Processing Model	Event at-a-time	Micro-batching	Real-time event processing
Scaling	Manual	Manual	Automatic
Open Source	Yes	Yes	NA

Streaming Technology Choice: Decision Considerations

When evaluating a managed streaming analytics platform in Azure, additional consideration should be given to the factors in order to make a better educated decision regarding your choice of Azure managed streaming platforms:

Development team expertise and background
Expertise in writing SQL queries versus writing code
Required skill levels: Analyst versus a developer for queries
Development effort and velocity
Using OOB connectors versus using OSS components
Troubleshooting and diagnostics
Custom logging, Azure operation logs
Scalability, adjustability, and pricing

Pain Points with Other Streaming Solutions

Regardless of your choice of managed streaming analytics platforms in Azure, there are many advantages to choosing a platform that runs in Azure versus a one-off solution.

The implementation options with other streaming analytics engines can leave a lot to be desired when it comes to providing a holistic and easily managed solution. A few key points are listed here that should be taken into consideration when evaluating a streaming analytics platform for your solution, include the following:

Depth and breadth: Levels of development skills required.
Completeness: Typically not end-to-end solutions.
Expertise: Need for special skills to set up and maintain.
Costs: Development, testing, and production environments and licensing.

Reference Implementation Choice: Azure Streaming Analytics

In our reference implementation and throughout the remaining chapters of this book, we will be using Azure Streaming Analytics to implement the reference solution. Azure Streaming Analytics is a fully managed cloud service for real-time analytics on streams of data using a SQL-like query language with built-in temporal semantics. It’s a perfect “fit” for the solution requirements.

Advantages of Azure Streaming Analytics

Choosing the right streaming analytics platform is a critical decision that will have major impacts on the overall performance, reliability, scalability, and overall operation of your solution.

For that reason, it is helpful to document the specific reasons for choosing Microsoft Azure Streaming Analytics and share these for future solution architecture considerations.

No Challenges with Deployment

Azure Streaming Analytics is a fully managed PaaS service in the cloud, so you can quickly configure and deploy from the Azure Portal or via PowerShell deployment scripts.

No hardware acquisition and maintenance
Bypasses requirements for deployment expertise
Up and running in a few clicks (and within minutes)
No software provisioning or maintenance
Easily expand your business globally

Mission Critical Reliability

Achieve mission-critical reliability and scale with Azure Streaming Analytics
Exactly once delivery guarantee up to the output adapter that writes the output events
State management for auto recovery
Guaranteed not to lose events or incorrect output
Preserves event order on per-device basis
Guaranteed 99.9% availability SLA

See the following link for additional details:

Event Delivery Guarantees (Azure Stream Analytics)

https://msdn.microsoft.com/en-us/library/azure/mt721300.aspx

How to achieve exactly-once delivery for SQL output:

https://blogs.msdn.microsoft.com/streamanalytics/2017/01/13/how-to-achieve-exactly-once-delivery-for-sql-output/

Business Continuity

Stream Analytics processes data at a high throughput with predictable results and no data loss
Guaranteed uptime (three nines of availability)
Auto-recovery from failures
Built-in state management for fast recovery

No Challenges with Scale:

Scale to any volume of data while still achieving high throughput, low-latency, and guaranteed resiliency.

Elasticity of the cloud for scale up or scale down
Spin up any number of resources on demand
Scale from small to large when required
Distributed, scale-out architecture
Scale using slider in Azure Portal and not writing code

Low Startup Costs

Azure Stream Analytics lets you rapidly develop and deploy low-cost solutions to gain real-time insights from devices, sensors, infrastructure, and applications.

Provision and run streaming solutions for as low as $25/month
Pay only for the resources you use
Ability to incrementally add resources
Reduce costs when business needs change

Rapid Development

Reduce friction and complexity and use fewer lines of code when developing analytic functions for scale-out of distributed systems. Describe the desired transformation with SQL-based syntax, and Stream Analytics automatically distributes it for scale, performance, and resiliency.

Decrease bar to create stream processing solutions via SQL-like Language
Easily filter, project, aggregate, join streams, add static data with streaming data, and detect patterns or lack of patterns with a few lines of SQL
Built-in temporal semantics

Development and Debugging Experience Through Azure Portal

Queries in Azure Stream Analytics are expressed in a SQL-like query language. In Azure Stream Analytics, operational logging messages can be used for debugging purposes such as viewing job status, job progress, and failure messages to track the progress of a job over time; from start to processing, to output.

Manage out-of-order events and actions on late arriving events via configurations

Scheduling and Monitoring Built-In

The Azure Management Portal and Azure Portal both surface key performance metrics that can be used to monitor and troubleshoot your query and job performance.

Built-in monitoring
View your system’s performance at a glance
Help you find the cost-optimal way of deployment

Why Are Customers Using Azure Stream Analytics?

The previous section outlines some of the key advantages of utilizing Azure Streaming Analytics. By leveraging Microsoft Azure Streaming Analytics for quick infrastructure provisioning along with the low-maintenance aspects of running on a completely managed streaming analytics platform, you can avoid the usual complications listed next:

Monitoring and troubleshooting the solution.
Develop solutions and infrastructure that can scale at pace with business growth.
Develop solutions to manage resiliency, such as infrastructure failures and geo-redundancy.
Develop solutions to integrate with other components like Machine Learning, BI, etc.
Develop solution (code) for ingestion, temporal processing, and hot/cold egress operations.
Infrastructure procurement: avoid long hardware delays and provision in minutes.

Focus on building solutions, not on the solution infrastructure, and get the applications developed and deployed faster so you can truly work at Internet speed.

It should be noted that in addition to the many benefits outlined for individual Azure customers, Azure Streaming Analytics is also a core technology that makes up several other Microsoft Azure-based solution offerings, such as:

Azure IoT Suite: Microsoft provides Azure IoT Suite as part of its preconfigured IoT solutions built on the Azure platform and makes it easy to connect devices securely and ingest events at scale.
Cortana Intelligence Suite: A fully managed Big Data and advanced analytics suite to transform your data into intelligent action.

Key Vertical Scenarios to Explore for Azure Stream Analytics

There are many use cases for leveraging streaming analytics across industry verticals. Some of the more popular applications are listed here:

Financial Services:
- Fraud detection
- Asset tracking
Healthcare:
- Patient monitoring
Government:
- Surveillance and monitoring
Infrastructure, Energy, and Utilities:
- Operations management in oil and gas
- Smart buildings
Manufacturing:
- Predictive maintenance
- Remote monitoring
Retail:
- Real-time customer engagement and marketing
- Inventory optimization
Telco/IT:
- IT Infrastructure and cellular network monitoring
- Location-based awareness
Transportation and Logistics:
- Container monitoring
- Perishables shipment tracking

Our Solution: Leveraging Azure Streaming Analytics

In our reference solution, we use Azure Stream Analytics, which is a fully managed, cost effective, real-time, event processing engine that can help us unlock deep insights from our data. Azure Stream Analytics makes it easy to set up real-time analytic computations on data streaming from devices, sensors, web sites, social media, applications, infrastructure systems, and more. It’s perfect for our solution.

Before we begin the walk-through of our specific reference architecture implementation, we will explore the overall workflow of creating Streaming Analytics jobs in Microsoft Azure. Stream Analytics leverages years of Microsoft Research work in developing highly tuned streaming engines for time-sensitive processing, as well as SQL language integration for intuitive specifications of streaming jobs.

With a few clicks in the Azure Portal, you can author a Stream Analytics job specifying the three major components of an Azure Streaming Analytics Solution. These three components are inputs, outputs, and U-SQL queries.

Streaming Analytics Jobs: INPUT definitions

The first task in setting up an Azure Streaming Analytics job is to define the inputs for the new streaming job. The input definitions are related to the source of the incoming streaming data.

Note that, at the time of this writing, there are only two data format types supported:

JSON: Streaming message payloads in the JavaScript-Object-Notation format.
CSV: Streaming data in the Comma-Separated-Value text format. Header rows are also supported (and recommended) to provide additional column naming functionality.

When specifying data stream inputs , there are two definition types—Data Streams and Reference Data:

Data Streams: A data stream is denoted as an unbounded series of events flowing over time. Stream Analytics jobs must include at least one data stream input to be consumed and transformed by the job. Stream Analytics jobs must include at least one data stream input to be consumed and transformed by the job. The supported data stream input sources include the following input types: (at the time of this writing):
- IoT Hub Streams:
  - Azure IoT Hub is a highly scalable publish-subscribe event ingestion platform optimized for IoT scenarios.
  - Used for device-to-cloud and cloud-to-device messaging streams.
  - Optimized to support millions of simultaneously connected devices.
  - Can be used to send inbound and outbound messages to IoT devices.
- Event Hub Streams:
  - Enables inbound (only) event message streams for device-to-cloud scenarios.
  - Supports a more limited number of simultaneous connections.
  - Event Hubs enables you to specify the partition for each message sent for increased scalability.
- Blob Storage: Used as an input source for ingesting bulk data as a stream. For scenarios with large amounts of unstructured data to store in the cloud, blob storage offers a cost-effective and scalable solution.
  Note See the following link for more information concerning the differences between Azure IoT Hubs and Event Hubs:
  Comparison of IoT Hub and Event Hubs:
  https://azure.microsoft.com/en-us/documentation/articles/iot-hub-compare-event-hubs
Reference Data: Stream Analytics supports a second type of auxiliary input called reference data. As opposed to data in motion, reference data is static or slowing changing.
- It is typically used for performing lookups and correlations with other data streams to enrich a dataset.
- At the time of this writing, Azure blob storage is currently the only supported input source for reference data. Reference data source blobs are currently limited to 100MB in size.

Streaming Analytics Jobs: OUTPUT Definitions

We have defined our input sources for a new Azure Streaming Analytics job, so the next step is to define the output formats for the job. In order to enable a variety of application patterns, Azure Stream Analytics has different options for storing output and viewing analysis results. This makes it easy to view job outputs and provides flexibility in the consumption and storage of the job output for data warehousing and other purposes.

Note that any output configured in the job must exist before the job is started and events start flowing. For example, if you use blob storage as an output, the job will not create a storage account automatically. It needs to be provisioned by the user before the Streaming Analytics job is started.

The Output formats for Azure Streaming Analytics include the following storage options (at the time of this writing):

SQL Database: Azure SQL Database can be used as an output for data that is relational in nature or for applications that depend on content being hosted in a relational database. The one requirement for this Stream Analytics output option is that the destination be an existing table in an Azure SQL Database. Consequently, the table schema must exactly match the fields and their types being output from the streaming analytics job.
Azure SQL Data Warehouse: Note that an Azure SQL Data Warehouse can also be specified as an output via the SQL Database output option. This feature is currently in preview mode at the time of this writing.
Blob Storage: Blob storage offers a cost-effective and scalable solution for storing large amounts of unstructured data in the cloud.
Event Hubs: A highly scalable publish-subscribe event ingestion construct. Event Hubs are highly scalable and can handle ingestion of millions of events per second. Note that a key reason for using an Event Hub as an Output of a Stream Analytics job is so that the data can become the input of another streaming job. In this way, you can “chain” multiple streaming jobs together to complete an application scenario, such as providing real-time alerts and notifications.
Table Storage: A NoSQL key/attribute store that can be leveraged for structured data with minimal schema constraints. Table storage provides highly available, massively scalable storage, so that an application can automatically scale to meet user demand.
Service Bus Queue: Provides a First-In, First-Out (FIFO) message delivery to one or more consumers. Messages are then set up to be received and processed by the receivers in the date/time order in which they were added to the queue, and each message is received and processed by only one message consumer.
Service Bus Topic: While service bus queues provide a one-to-one communication pattern from sender to receiver, service bus topics provide a one-to-many form of communication where many applications can subscribe to a topic.
DocumentDB: A fully-managed NoSQL document database service that offers query and transactions over schema-free data, predictable, and reliable performance, as well as rapid development.
Power BI: Can be used as an output for a Stream Analytics job to provide for rich visualizations for analytical results. This capability can be used for operational dashboards, report generation, and metric driven reporting.
Data Lake Store: This output option enables you to store data of any size, type and ingestion speed for operational and exploratory analytics. At this time, creation and configuration of Data Lake Store outputs is supported only in the Azure Classic Portal.

Planning Streaming Analytics Outputs

The output stage of the streaming analytics process requires a little upfront, advanced planning, as some thought must be given to how the data will be delivered and consumed. This is where the real value of the modern Lambda architecture comes into play with the notion of “hot”, “warm”, and “cold” data paths.

Proper analysis and exploitation of these key reporting capabilities can mean the difference between creating a true competitive advantage and just creating a noisy data overload scenario. The fortunes of many businesses can rise and fall on the timeliness and accuracy of key operational data. Choose your outputs carefully.

Hot Path

The processing pathway for urgent data, for example, data that is sent from field devices to an IoT system. This data typically requires immediate analysis. It is frequently used for raising alerts and other critical notifications. The “hot path” output option(s) for Azure Streaming Analytics include:

Power BI : For real-time streaming integration along with rich, visual dashboards.
Event Hubs : For integrating with other Streaming Analytics jobs and outbound Notification hubs.
Service Bus Queues: For integration with other publisher/subscriber notification systems.
Service Bus Topics: For integration with one-to-many notification scenarios.

Warm Path

The processing path for device data that is not urgent but typically has a limited lifetime before it becomes stale. This data should be considered to have an “expiration date” and consequently should be processed in a specified period of time.

This data can also be used to augment the results generated by hot path processing to provide additional context. Examples of warm path data include diagnostic information for performance analysis, troubleshooting, or A/B testing. The data may need to be held in storage that is relatively quick to access (and therefore possibly more expensive than that required for the cold path), but the storage capacity is likely to be much less, as this data has a limited life span and is unlikely to be retained for an extended period.

The “warm path” output option(s) for Azure Streaming Analytics include:

Azure SQL Database: For near-real-time, relational data queries
DocumentDB: For near-real-time, NOSQL data queries

Cold Path

The processing pathway for data that is stored and processed later. For example, this data can be pulled from storage for processing at a later time in batch mode. The data can be held in relatively cheap, high capacity, storage due to its potential high volume and historical nature. The data is commonly used to provide statistical information, to generate analytical reports, and for auditing purposes.

The “cold path” output option(s) for Azure Streaming Analytics include:

Blob Storage: For low cost, high-scale, generic data storage
Table Storage: For low cost, high-scale, key-value-pair data storage
Data Lake Store: For unlimited, low cost, high-scale, historical data storage platform with deep analytical processing capabilities.
Power BI: For real-time streaming integration along with rich, visual dashboards.
Event Hubs: For integrating with other Streaming Analytics jobs and outbound Notification hubs.
Service Bus Queues: For integration with other publisher/subscriber notification systems.
Service Bus Topics: For integration with one-to-many notification scenarios.

Power BI for Real-Time Visualizations

Power BI can be used as an output for a Stream Analytics job to provide for a real-time, rich visualization experience of analysis results. This capability can be used for operational dashboards, dynamic report generation, and other forms of real-time, metric-driven reporting and analysis.

Note

See the following link for more information on specifying Streaming Analytics Outputs:

https://azure.microsoft.com/en-us/documentation/articles/stream-analytics-define-outputs

Streaming Analytics Jobs: Data Transformations via SQL Queries

After the Azure Streaming Analytics job input and output definitions have been created, the next task is creating the data transformations. This is where all the pieces start to come together and a complete solution can finally be configured based on the previously defined inputs and outputs.

Azure Stream Analytics offers a SQL-like query language for performing transformations and computations over incoming streams of event data. Stream Analytics query language is a subset of the standard Transaction-SQL (T-SQL) syntax for performing simple and complex streaming analytics computations.

Azure Streaming Analytics SQL: Developer Friendly

Since most developers today may already possess a good working knowledge of T-SQL, this feature makes it very approachable to become very productive in a very short amount of time when using Azure Streaming Analytics.

This portion of the Streaming Analytics job setup is where the actual processing will occur and we will map, enrich, and transform the incoming streaming data input into one or more pre-defined streaming outputs. Note that it is possible in a single Streaming Analytics job to send processed data from a single input to multiple output destinations by chaining the SQL statements together in the job.

Azure Streaming Analytics (ASA): SQL Query Dialect Features

The SQL language in ASA is very similar to T-SQL, which is the primary database language for modern SQL database application engines such as Microsoft SQL Server, IBM DB2, and Oracle Database server.

It should be noted that Azure Streaming Analytics (ASA) SQL also contains a superset of functions that support advanced analytics capabilities as “temporal” (date/time) operations such as applying sliding, hopping, or tumbling time windows to the event stream in order to get time-boxed, summarized data directly from the incoming event stream.

All this is accomplished eloquently and effortlessly using familiar T-SQL statements. To assist you in further understanding the features of the ASA SQL Query language, the following is a short synopsis of the primary capabilities available with the Azure Streaming Analytics SQL Query language.

SQL Query Language

All data transformation jobs are written declaratively as a series of T-SQL-like query language statements.
There is no additional programming required and no code compilation required.
The SQL scripts are easy to author and deploy.

Supported Data Types

The following data types are supported in the ASA SQL language:

Bigint: Integers in the range -2^63 (-9,223,372,036,854,775,808) to 2^63-1 (9,223,372,036,854,775,807).
Float: Floating-point numbers in the range - 1.79E+308 to -2.23E-308, 0, and 2.23E-308 to 1.79E+308.
Datetime: Defines a date that is combined with a time of day with fractional seconds that is based on a 24-hour clock and relative to UTC (time zone offset 0).
nvarchar (max): Text values made up of Unicode characters.
record: A set of name/value pairs. Values must be of supported data type.
array: An ordered collection of values. Values must be of supported data type.

Data Type Conversions

CAST

Data type conversions in types in Stream Analytics query language are accomplished via the CAST function. This function converts an expression of one data type to another data type in the supported types in Stream Analytics query language.

Proper care should be taken when using the CAST function over inconsistent data streams, as a failure will cause the streaming analytics job to stop if the conversion cannot be performed.

As a good example of what not to do—the ASA SQL statement in Listing 5-1 will result in an Azure Streaming Analytics job failure.

Listing 5-1. An Example of the CAST Operator Usage That Will Result in a Job Failure in an ASA SQL Statement

CAST ('Test-String' AS bigint)

TRY_CAST

To avoid a catastrophic job failures due to a data type conversion failure, it is highly recommended that the TRY_CAST SQL operation be used instead. This version returns either a value cast to the specified data type if the cast succeeds or else, the call returns null.

The SQL transformation job will gracefully continue no matter the result of the TRY_CAST call. Listing 5-2 illustrates the TRY_CAST call.

Listing 5-2. An Example of the TRY_CAST Operator in an ASA SQL Statement

SELECT TweetId, TweetTime
FROM Input
WHERE TRY_CAST( TweetTime AS datetime) IS NOT NULL

Temporal Semantic Function ality

All ASA SQL operators are compatible with the temporal properties of event streams. Additional functionality is added to the ASA SQL language via new operators such as:

TumblingWindow
HoppingWindow
SlidingWindow

Built-In Operators and Functions

The ASA SQL language supports other key T-SQL constructs such as filters, projections, joins, windowed (temporal) aggregates, and text and date manipulation functionality.
Advanced event stream queries can be composed via these powerful query extensions .

User Defined Functions : Azure Machine Learning Integration

The ASA SQL language now supports direct calls to Azure Machine Learning (ML) via user defined functions.
User-defined functions provide an extensible way for a streaming job to transform input data to output data using an externally defined function and accessed as part of the SQL query.
A Machine Learning function in stream analytics can be used like a regular function call in the stream analytics query language.
This functionality provides the ability to score individual events of streaming data by leveraging a Machine Learning model hosted in Azure and accessed via a web service call.
At the time of this writing, Azure Machine Learning Request-Response Service (RRS) is the only supported UDF framework and is in currently in “preview” mode.
This capability allows you easily build applications for scenarios such as real-time Twitter sentiment analytics, as illustrated in Listing 5-3. The Azure Machine Learning user-defined function named sentiment is easily incorporated into an ASA SQL query. This capability provides a powerful mechanism to leverage predictive analytics to enrich the incoming event stream data and turn it into actionable data .

Listing 5-3. Example of an Azure Machine Learning User-Defined Function

 WITH subquery AS (  
        SELECT text, sentiment(text) as result from input  
    )  
SELECT text, result.[Scored Labels]  
INTO output  
FROM subquery

Event Delivery Guarantees Provided in Azure Stream Analytics

The Azure Stream Analytics query language provides extensions to the T-SQL syntax to enable complex computations over incoming streams of events. With Azure Streaming Analytics, the following concepts related to event delivery are noteworthy to review:

Exactly once delivery
Duplicate records

Exactly Once Delivery

An “exactly once” delivery guarantee means all input events are processed exactly once by the streaming analytics system. In this way, the results are also guaranteed to be complete with no duplicate outputs. In terms of Azure Service Level Agreements (SLAs), Azure Stream Analytics guarantees exactly once delivery up to the output adapter that writes the output events.

Duplicate Records

When a Stream Analytics job is running, duplicate records may occasionally occur within the output data. These duplicate records are expected, due to the fact that Azure Stream Analytics output adapters do not write the output events in a complete transactional manner.

See the following link for more information:

How to achieve exactly-once delivery for SQL output:

https://blogs.msdn.microsoft.com/streamanalytics/2017/01/13/how-to-achieve-exactly-once-delivery-for-sql-output/

Time Management Functions

The Azure Stream Analytics SQL query language extends the T-SQL syntax to enable complex computations over streams of events. Stream Analytics provides language constructs to deal with the temporal aspects of the data. For example, it is possible to assign custom timestamps to the stream events, specify time window for aggregations, specify allowed time difference between two streams of data for JOIN operation, etc.

System.Timestamp: A system property that can be used to retrieve an event’s timestamp.
Time Skew Policies: Provides policies for out-of-order and late arrival events.
Aggregate Functions: Used to perform a calculation on a set of values from a time window and return a single value.
DATEDIFF: Allowed in the JOIN predicate and allows the specification of time boundaries for JOIN operations.
Date and Time Functions: Azure Stream Analytics provides a variety of date and time functions for use in creating time-sensitive streaming analytics queries.
TIMESTAMP BY: Allows specifying custom timestamp values.

The Importance of the TIMESTAMP BY Clause

In Azure Streaming Analytics, all incoming events have a well-defined timestamp. If a solution is required to use the application time, they can use the TIMESTAMP BY keyword to specify the column in the payload which should be used to timestamp every incoming event to perform any temporal computation like windowing functions (Hopping, Tumbling, Sliding), Temporal JOINs, etc.

Note that it is recommended to use the TIMESTAMP BY clause over an “arrival time” as a best practice since the TIMESTAMP BY clause can be used on any column of type “datetime” and all ISO 8601 formats are supported. In comparison, the System.timestamp value can only be used in the SELECT clause.

Listing 5-4 illustrates a TIMESTAMP BY example that uses the TweetTime column as the application time for all incoming events.

Listing 5-4. The TIMESTAMP BY Clause

SELECT TweetId, TweetTime
FROM TweetInput
TIMESTAMP BY TweetTime

Azure Stream Analytics: Unified Programming Model

As we have seen in the previous sections covering the superset of features, functions, and capabilities that extend the Azure Streaming Analytics SQL dialect, the end result is an extremely powerful, yet easily approachable, SQL-based programming model that brings together event streams, reference data, and Machine Learning extensions to create a comprehensive solution.

Azure Stream Analytics: Examples of the SQL Programming Model

The Simplest Example

Listing 5-5 is a very simple example of a streaming SQL Query that will copy all the fields in the input named iothub-input into an output named blob-output.

Listing 5-5. The Simplest ASA SQL Query Possible

select * into blob-output from iothub-input

In many cases, the Azure Streaming Analytics SQL queries will be more complex and will usually incorporate various temporal semantics in order to surface the data related to the sliding, hopping, or tumbling timeframe windows from the event stream. As mentioned previously, this is where the real power of Azure Streaming Analytics really shines, as it is very easy to accomplish via the superset of functionality that Microsoft has added to the familiar T-SQL dialect.

To illustrate, the following temporal window examples will make the assumption that we are reading from an input stream of tweets from Twitter.

Tumbling Windows: A 10-Second Tumbling Window

Tumbling windows can be defined as a series of fixed-sized, non-overlapping, and contiguous time intervals taken from a data stream. The ASA SQL in Listing 5-6 seeks to answer the following question:

“Tell me the count of tweets per time zone every 10 seconds”

Listing 5-6. Sample Tumbling Window SQL Statement

SELECT TimeZone, COUNT(*) AS Count
FROM TwitterStream TIMESTAMP BY CreatedAt
GROUP BY TimeZone, TumblingWindow(second,10)

Hopping Windows: A 10-Second Hopping Window with a 5-Second “Hop”

Hopping windows are designed to model scheduled overlapping windows. The ASA SQL in Listing 5-7 seeks to answer the following question:

“Every 5 seconds give me the count of tweets over the last 10 seconds”

Listing 5-7. Sample Hopping Window SQL Statement

SELECT Topic, COUNT(*) AS TotalTweets
FROM TwitterStream TIMESTAMP BY CreatedAt
GROUP BY Topic, HoppingWindow(second, 10 , 5)

Sliding Windows: A 10-Second Sliding Window

With a sliding window , the system is asked to logically consider all possible windows of a given length and output events for cases when the content of the window actually changes, for example, when an event was detected that entered or existed the window. The ASA SQL in Listing 5-8 seeks to answer the following question:

“Give me the count of tweets for all topics which are tweeted more than 10 times in the last 10 seconds”

Listing 5-8. A Sample Sliding Window SQL Statement

SELECT Topic, COUNT(*) FROM TwitterStream
TIMESTAMP BY CreatedAt
GROUP BY Topic, SlidingWindow(second, 10)
HAVING COUNT(*) > 10

Joining Multiple Streams

Similar to standard T-SQL language, the JOIN clause in the Azure Stream Analytics query language is used to combine records from two or more input sources. However, the JOIN clause in Azure Stream Analytics SQL are temporal in nature. This means that each JOIN must provide limits on how far the matching rows can be separated in time. The ASA SQL in Listing 5-9 seeks to answer the following question:

“List all users and the topics on which they switched their sentiment within a minute“

Listing 5-9. JOINing Multiple Streams in ASA SQL

SELECT TS1.UserName, TS1.Topic
FROM TwitterStream TS1 TIMESTAMP BY CreatedAt
JOIN TwitterStream TS2 TIMESTAMP BY CreatedAt
                  ON TS1.UserName = TS2.UserName AND TS1.Topic = TS2.Topic
                  AND DateDiff(second, TS1, TS2) BETWEEN 1 AND 60
WHERE TS1.SentimentScore != TS2.SentimentScore

Detecting the Absence of Events

This SQL query pattern can be extremely useful as it will provide the ability to determine if a stream has no value that matches a certain criteria. For example, Listing 5-10 is a sample ASA SQL query that will seek to provide the real-time answers for the question.

“Show me if a topic is not tweeted for 10 seconds since it was last tweeted.”

Listing 5-10. Detecting the Absence of Data in ASA SQL

SELECT TS1.CreatedAt, TS1.Topic
FROM TwitterStream TS1 TIMESTAMP BY CreatedAt
LEFT OUTER JOIN TwitterStream TS2 TIMESTAMP BY CreatedAt
ON TS1.Topic = TS2.Topic
AND DATEDIFF(second, TS1, TS2) BETWEEN 1 AND 10
WHERE TS2.Topic IS NULL

Note

The following link provides guidance for common Stream Analytics Usage Patterns:

Query examples for common Stream Analytics usage patterns:

https://azure.microsoft.com/en-us/documentation/articles/stream-analytics-stream-analytics-query-patterns

The Reference Implementation

Now that we have covered all the basics related to the configuration and setup of Azure Streaming Analytics, it is time to walk through the actual configuration steps for our reference implementation solution. In this next section, we walk through the configuration of our streaming analytics job via the Azure Portal in order to implement various data pathways for our incoming IoT data streams. As part of the configuration, we will create and configure the following artifacts in Azure:

Azure Streaming Analytics Job.
- Inputs: For our Azure Streaming Analytics Job. In this case, we will be using two inputs. The first one is for the incoming data stream from the IoT Hub, the second input is for reference data. In this example, we will read from a reference .CSV file in Azure blob storage to match a team member’s personal health information with their real-time health sensor information readings.
- Functions: Consisting of references to Azure Machine Learning Web Services. To help enrich the data with predictive analytics. In this example, we will check to see if a team member is fatigued to the point of exhaustion.
- Outputs: For output of results from the Azure Streaming Analytics Job into various storage formats: Hot, Warm, and Cold (from the Lambda architecture).
- ASA SQL Query Language: Will combine all of the previous configurations to process the incoming data streams and send computed results to various output destinations.

Business Use Case Scenario

As you may recall, the use case scenario for our reference implementation involves monitoring workers health during strenuous activities. To that end, IoT sensors are being worn and by the members of the various work teams and their sensor readings are being transmitted to the Azure cloud via an IoT Hub configuration.

The next critical step in the process is where Azure Streaming Analytics is utilized to quickly and efficiently process the incoming data streams. Figure 5-8 denotes the use of Azure Streaming Analytics as the primary ingestion processing engine in the overall architecture.

Figure 5-8. Worker Health and Safety Reference solution

Azure Setup: Create Azure Streaming Analytics Job

The first step in the process is to create a new Azure Streaming Analytics job . To do this, we will assume that you have an Azure subscription and have already deployed the Azure IoT suite infrastructure components covered in Chapters 1-4 of this book.

Start by adding a new resource to your existing Azure resource group and searching for Stream Analytics Job, as depicted in Figure 5-9.

Figure 5-9. Adding a stream analytics job to an Azure resource group

After clicking on Stream Analytics job, you will be asked to fill in the parameters for creating the Azure Stream Analytics job, as shown in Figure 5-10.

Figure 5-10. Adding Stream Analytics job parameters

Fill in your parameter choices for the corresponding values:

Job Name
Subscription
Resource Group
Location

Once you are done, click on the Create button at the bottom of the screen. Your input will then be validated and the new Azure Streaming Analytics job will be created after a brief period of time. It took less than one minute via the portal.

Once you have configured an Azure Streaming Analytics job, you can add and configure additional components of the job, such as the following:

Inputs: For defining incoming data streams and reference data in our Azure Streaming Analytics Job.
Functions: For defining references to Azure Machine Learning Web Service calls. In our example, we will check to see if a team member is fatigued to the point of exhaustion.
Outputs: For defining the output of results from our Azure Streaming Analytics Job into various storage formats and delivery platforms: Hot, Warm, and Cold (from the Lambda architecture).
SQL Query: Will combine all of the previously defined inputs, functions, and outputs in a series of SQL statements to process the incoming data stream and send computed results to various output destinations and delivery/notification methods.

Figure 5-11 displays a screen capture of a newly created Streaming Analytics job and the corresponding inputs, query, and outputs.

Figure 5-11. Stream Analytics job: add input(s), function(s), query(s), and output(s)

Azure Setup: Streaming Analytics Job and Input Iot Hub Data Stream

Let’s start with configuring the input definition for describing our incoming data stream, which arrives via an IoT Hub configuration in Azure.

IoT Hub is an event processing service that enables event and telemetry ingress to the cloud at massive scale, with low latency and high reliability.

Start by clicking on the inputs image and then + Add, as shown in Figure 5-12.

Figure 5-12. Stream Analytics job, add parameters for input from IoT Hub

Fill in your parameter choices for the corresponding values:

Input Alias: This will be the primary name used in any SQL Queries to refer to this input stream.
Source Type: Set to Data Stream. Note the additional option for Reference Data.
Source : Detected input sources that are Data Streams.
Subscription : Azure Subscription for this input source.
IoT Hub: Select the appropriate choice for your environment.
Endpoint: Select Messaging. Other option includes Operations Monitoring.
Shared Access Policy Name: For delegated and shared access to resources.
Consumer Group: All event consumers read the event stream through partitions in a consumer group.
Event Serialization Format: Specify JSON. Other options include Avro and CSV.
Encoding: Specify UTF-8.

Once you are done, click on the Create button at the bottom of the screen. Your input will then be validated and the new Azure Streaming Analytics input definition will be created after a brief period of time.

Azure Setup: Streaming Analytics Job Input Reference Data

After we have configured the primary input data path from the IoT Hub, we will next define a secondary type of input definition that will be used for reference data.

Reference data is defined as more static or slower changing data, which can often be refreshed on longer time intervals. A prerequisite for this step is to have a reference data file uploaded into an Azure blob storage container in CSV, JSON, or Avro format.

For our reference implementation, we will upload a .CSV formatted file that contains team member reference data with more static attributes like name, height, weight, gender, e-mail, etc.

This reference data can then be used in JOIN statements in the Azure Streaming Analytics SQL Query language to enrich the data output and provide more meaningful and actionable business results.

Reference data is stored in Azure blob storage and is modeled as a sequence of blobs in ascending date/time order. It only supports adding to the end of the sequence by using a date/time greater than the one specified by the last blob in the sequence.

Currently, Azure Stream Analytics jobs look for the blob refresh only when the machine time advances to the time encoded in the blob name.

For example, our job will look for our pattern named TeamReferenceData{date}{time}.csv . It will then find this file: TeamReferenceData2016-11-1217-30.csv and will process it as soon as possible but no earlier than 5:30 PM on November 13th 2016 UTC time zone. It will never look for a file with an encoded time earlier than the last one that is discovered .

Note See the following link for more information about refreshing reference data for Azure Streaming Analytics jobs using Azure Data Factory:

Refreshing reference data with Azure Data Factory for Azure Stream Analytics Jobs

https://azure.microsoft.com/en-us/blog/refreshing-reference-data-with-azure-data-factory-for-azure-stream-analytics-jobs-3

INPUT Reference Data: Sample SQL Usage

The inclusion of reference data as a potential input for a streaming analytics job means that you can utilize a SQL JOIN (INNER or LEFT OUTER) between streams and reference data sources to enrich your incoming data model.

Note that “reference” data appears as just another input in the ASA SQL query in Listing 5-11.

Listing 5-11. SQL JOIN of Reference Data with Incoming Streaming Data

SELECT myRefData.Name, myStream.Value
FROM myStream
JOIN myRefData
                  ON myStream.myKey = myRefData.myKey

Configure the Streaming Analytics INPUT: Reference Data Definition

To get started, navigate to your Streaming Analytics job in the Azure Portal that was previously defined in this chapter and select the Inputs option on the left-side navigation pane.

Click + Add at the top of the page to add a new input definition. You will need to provide configuration options for the following parameters:

Input Alias: A unique name to reference this input definition.
Source Type: Select Reference Data.
Subscription: Azure subscription for this input source.
Storage account: Your storage account (brtblobstoragedev in this case).
Storage account key: Copied from subscription
Container: ref-data-team
Path pattern: TeamReferenceData{date}{time}.csv
Date format: YYY-MM-DD.
Time format: HH-mm.
Event Serialization Format: Specify CSV. Other options include JSON and Avro.
Delimiter: Comma (,).
Encoding: Specify UTF-8.

Click on the Create button at the bottom of the screen to create the input definition.

The screenshot in Figure 5-13 denotes the input parameters for the input reference definition.

Figure 5-13. Stream Analytics job: add parameters for input for Reference data

After the new input for our reference data is created and tested (this process takes only about one minute), you can then use this new input definition as part of the SQL query to JOIN the “reference data” with the incoming sensor data from the IoT Hub.

Typically, IoT sensor data is transformed to be very “lean” when transmitted over networks, so no additional information other than what is the minimal necessary to keep packet sizes small and transmission costs efficient.

By matching the incoming data streams with more complete profile data from our reference data, we can provide more complete data profiles for the various entities being measured. This capability also allows for finer-grained decision support mechanisms, as additional meaningful attributes may now unlocked and utilized to predict better outcomes.

Azure Setup: Streaming Analytics Job, Functions to Call Azure Ml Web Services

The next step is to define an alias for an Azure Machine Learning Web Service. The Web Service will be used to call out from the processing SQL script and is used to check to see if a team member may displaying symptoms that he/she may be near physical exhaustion.

Azure Machine Learning: Predicting a Team Member’s Health

The Azure Machine Learning predictive model is based on a series of physical stress tests that each team member must undergo each quarter (every three months).

The purpose of the stress tests is to track all the same sensor readings that a team member generate while working, but in a simulated, stress-test, environment that can quickly simulate unfavorable working conditions to induce physical fatigue and mental exhaustion.

At any point that a team member feels that they cannot continue or complete the stress test tasks at hand safely, they simply press a Fatigued button on their vest to stop the test and register their physical attributes at the point of exhaustion.

In this way, key physical sensor readings are correlated with the teammate’s “fatigue” response to the stress tests.

This allows us to create a predictive analytics “training” model, which can be trained using a “binary classification” Machine Learning algorithm in order to predict future outcomes based upon certain attributes such as breathing, heart rate, speed, velocity, temperature, pulse, etc.

Obtaining Unbiased Machine Learning Training Data

It should also be noted that there is another distinct advantage to using a simulated “stress test” environment as training data to create the Machine Learning model.

This is due to the fact that the team members have financial incentives to keep working long hours under adverse conditions in order to attain financial rewards and increased compensation.

For this reason, team members may be disinclined to signal fatigue while working on the real production-line jobs in order to maximize their financial compensation. However, by measuring the team’s stress test responses outside of the production work environment—where there are no financial incentives involved—more accurate results can be obtained.

The key to success is to obtain unbiased and accurate training data that more closely resembles the key indicators of fatigue and stress that can be measured, recorded, and utilized to train the Azure Machine Learning Model.

Configure the Streaming Analytics Function Definition

To get started, navigate to your Streaming Analytics job in the Azure Portal that was previously defined in this chapter and select the Functions option on the left-side navigation pane.

Click the + Add at the top of the page to add a new function definition. You will need to provide the following:

Function Alias : A unique name to reference this function definition.
Function Type: Defaults to Azure ML. Note: the function option is currently restricted to only refer to Azure Machine Learning Web Service definitions. It is anticipated that Microsoft may open-up this capability to reference “Azure Functions” in the near future. This new functionality would dramatically expand the capabilities of the Azure Streaming Analytics SQL Query language by incorporating function callouts to custom code modules.
Subscription : The Azure subscription to be used for the function definition.
URL: Refers to the Azure Machine Learning Web Service deployment name.
Key : The security key for access to the Azure Machine Learning Web Service deployment.

Click on the Create button at the bottom of the screen to create the function definition. The screenshot in Figure 5-14 denotes the function definition parameters for the reference implementation.

Figure 5-14. Add parameters to the streaming job for function reference to desired Azure Machine Learning Web Service

After the function is created and tested (this process takes only about 1 minute), we can then use this new function definition as part of our SQL Query to call our Azure Machine Learning Web Service to predict whether a team member is at risk of being physically exhausted to the point that it could cause an unsafe work environment.

The screenshot in Figure 5-15 depicts the newly defined function that we created.

Figure 5-15. Stream Analytics job: function parameters for Calling Azure ML Web Service CheckTeamHealth

Also shown is the function signature, which shows all 13 parameters and various field types that must be configured correctly in order to make the function call from the Streaming Analytics SQL query.

We will cover the implementation details of the Azure Machine Learning Web Service called CheckTeamHealth in Chapter 8.

As part of the Chapter 8 background, we cover the creation of an Azure ML training model based on stress test results and utilize a “Binary Classification” algorithm to create a predictive model that can correctly determine if a team member is nearing the point of physical exhaustion. We will also cover the mechanics of deploying, testing, and managing the Azure ML Web Service.

Azure Setup: Streaming Analytics Job Outputs

Now that we have created the inputs and functions for our Azure Streaming Analytics job, it is time to create the various outputs for our streaming job.

This is a critical step in the process, as we will be creating multiple outputs from a single input stream coming from the IoT Hub.

One key set of decisions around defining the streaming job outputs is the notion of the Lambda architecture that we reviewed earlier in the chapter. Azure Streaming Analytics provides the ability to define various output formats and storage options that can directly relate to the notion of Hot, Warm, and Cold data paths:

Hot Path: for surfacing critical, real-time, actionable information, such as alerts and notifications. Typically, Power BI is used as the visualization tool to create real-time dashboards and visualizations for critical business data.
Warm Path: Typically slower than Hot Path options, yet provides relatively quick access to critical data via technologies such as DocumentDB and Azure SQL database via canned or ad hoc user queries.
Cold Path: This is usually the lowest cost and slowest storage option. Typical use case scenarios would be to provide an archive capability or storage that would act as a system of record for all transactions. Typically, Azure blob storage or Azure Data Lake is used for these implementations.

Our reference implementation will seek to implement all three of the Lambda data pathways.

Streaming Analytics Job: Cold Path Output Blob Storage

We will be using Azure blob storage for our “cold path” output option. A prerequisite for this step is to create an Azure Storage account and a container in the storage account to hold our data.

Note Refer to the following link for more information:

About Azure storage accounts

https://azure.microsoft.com/en-us/documentation/articles/storage-create-storage-account

A handy tool for navigating Azure Storage is the “Azure Storage Explorer,” which can be downloaded from http://azurestorageexplorer.codeplex.com . Since it is on CodePlex, it’s also great to have the source code, to see how the Azure Storage API’s are called from .NET.

Once you have created an Azure Storage Account and a container in the storage account, you will need the storage account access keys for the next step.

To get started with our first output definition, navigate (via the Azure Portal) to the Azure Streaming Analytics job we defined earlier in the chapter:

Select Outputs:

Click on the + Add to add a new output definition.
Under the Sink parameter, select Blob Storage from the drop-down list. This action will properly set the remaining fields for you to populate:
Output Alias: A name for this output definition that will be used in our SQL query.
Sink: Refers to the selected output destination; in this case, select blob storage.
Subscription: The Azure subscription to be used for this output definition.
Storage Account: The name of the Azure Storage account that was created as a prerequisite for this output definition.
Storage Account Key : The corresponding Account key for the storage account.
Container: The name of the storage account repository or folder that will contain our output data.
Path Pattern : Denotes the file path used to locate your blobs within the specified container. Within the path pattern definition, you can choose to use the keywords {date} and {time} to help create a logical representation of the data output. For our implementation, we will use [date]/[time] to keep the data separated by date and time. This will provide a file structure within the blob container similar to the path “year/month/day/time”.
Date Format : The default format is YYYY/MM/DD. Accept the default.
Time Format : The default format is HH. Accept the default.
Event Serialization Format: The choices are JSON, CSV, and Avro. We will use JSON for our output definition, especially since the incoming data stream is in the same format (JSON).

The screenshot in Figure 5-16 depicts the parameter options for defining an output definition for blob storage.

Figure 5-16. Stream Analytics job: cold path output definition parameters using blob storage

Once you have entered all the required parameters, click on the Create button at the bottom of the screen and the new output definition will be quickly tested and added to the Azure Streaming Analytics job.

Streaming Analytics Job: Warm Path Output: Azure SQL Database

For our “warm path” we will be using Azure SQL database for our output option. A mandatory prerequisite for this step is to create an Azure database server, an Azure database, and an Azure table to store the data.

Note Refer to the following link for more information on creating an Azure SQL database:

SQL Database tutorial: Create a SQL database in minutes by using the Azure portal:

https://azure.microsoft.com/en-us/documentation/articles/sql-database-get-started/

Listing 5-12 is a Transact-SQL script that will create an Azure SQL database table named IotHubSensorReadings.

Listing 5-12. SQL Table Definition for [IotHubSensorReadings]

CREATE TABLE [dbo].[IotHubSensorReadings](
     [UserId] [char](256) NOT NULL,
     [Age] [float] NOT NULL,
     [Height] [float] NOT NULL,
     [Weight] [float] NOT NULL,
     [HeartRateBPM] [float] NOT NULL,
     [BreathingRate] [float] NOT NULL,
     [Temperature] [float] NOT NULL,
     [Steps] [float] NOT NULL,
     [Velocity] [float] NOT NULL,
     [Altitude] [float] NOT NULL,
     [Ventilization] [float] NOT NULL,
     [Activity] [float] NOT NULL,
     [Cadence] [float] NOT NULL,
     [Speed] [float] NOT NULL,
     [HIB] [float] NOT NULL,
     [HeartRateRedZone] [float] NOT NULL,
     [HeartrateVariability] [float] NOT NULL,
     [Status] [int] NOT NULL,
     [Id] [char](256) NOT NULL,
     [DeviceId] [char](256) NOT NULL,
     [MessageType] [int] NOT NULL,                                                                                       
     [Longitude] [float] NOT NULL,
     [Latitude] [float] NOT NULL,
     [Timestamp] [datetime2](7) NOT NULL,
     [EventProcessedUtcTime] [datetime2](7) NOT NULL,
     [PartitionId] [int] NOT NULL,
     [EventEnqueuedUtcTime] [datetime2](7) NOT NULL
)

Note that the column names in the IotHubSensorReadings table exactly match the column names of our input data stream. This is not a necessary step, but as you will soon see, this helps make it very easy when it comes time to write the SQL query statements. To fix any column name mismatches, we can simply use the T-SQL AS clause to rename an incoming column to a destination column in our SELECT statements.

Once the prerequisite Azure SQL Database artifacts (Server, Database, and Table) have been created, you will need to gather the credentials for the database (username and password) for the next step in the configuration process—creating the streaming job output definition.

To create the Azure Streaming Analytics job output definition for the “warm” data path, start by navigating (via the Azure Portal) to the previously defined streaming job definition and selecting Outputs.

Click on + Add to add a new output definition:

Under the Sink parameter, select SQL database from the drop-down list. This action will properly set the remaining fields for you to populate:
Output Alias: A name for this output definition that will be used in our SQL query.
Sink: Refers to the selected output destination; in this case, select SQL database.
Subscription: The Azure subscription to be used for this output definition.
Database: The name of the Azure SQL database we created as a prerequisite earlier in the chapter.
Server Name: The name of the Azure SQL Server we created as a prerequisite earlier in the chapter. Note that the naming convention should follow the following format:
<Your Server Name>.database.windows.net
Username: The username credential for the Azure SQL database server.
Password: The password credential for the Azure SQL database server.
Table: The name of the Azure SQL database table we created as a prerequisite earlier in the chapter. Our reference implementation example used the table named IotHubSensorReadings.

The screenshot shown in Figure 5-17 depicts the parameter options for defining an output definition for a SQL database.

Figure 5-17. Stream Analytics job for warm path output definition parameters using SQL database

Once you have entered all the required parameters, click on the Create button at the bottom of the screen and the new output definition will be quickly tested and added to the Azure Streaming Analytics job.

Streaming Analytics Job: Hot Path OUTPUT Power BI

For our “hot path” we will be using Power BI for our output option. As stated previously, Power BI is a rich data visualization tool that allows you to create rich, real-time, dashboards and visualizations for critical business data.

Power BI displays dashboards that are interactive and can be created and updated from many different data sources in real time. In a later chapter of this book, we will be using Power BI as a real-time operational dashboard just like a dashboard in an automobile. It displays critical information about vehicles, such as its speed, its fuel level, or oil temperature. In our reference application, we will be monitoring team member’s health and activity data in real time.

A mandatory prerequisite for this step is to have set up and configured an Azure Active Directory user that we can use to authenticate against Power BI during the output definition process.

Azure Active Directory is a foundational piece of Power BI Authentication process and stores the users, groups, and domains in addition to other settings and configuration options.

Power BI apps are integrated with Azure Active Directory (AAD) to provide secure sign in and authorization for your Power BI applications. In order to integrate a Power BI application with Azure Active Directory, you would need to register the application details with Azure AAD via the Azure Management Portal.

To sign up for the Power BI service, your Azure Active Directory must have at least one organizational user. See the following link for details about creating an Azure Active Directory tenant:

Create an Azure Active Directory tenant for a Power BI app:

https://powerbi.microsoft.com/en-us/documentation/powerbi-developer-create-an-azure-active-directory-tenant

Once a tenant and a user within that tenant have been created, you can use that new organizational user to sign up for the Power BI service at the following link:

Sign up for the Power BI service:

https://powerbi.microsoft.com/en-us/documentation/powerbi-admin-free-with-custom-azure-directory

See the following link for more information about Azure Active Directory:

Azure Active Directory: http://azure.microsoft.com/en-us/services/active-directory

Once the prerequisites for the Power BI Output option have been configured (Azure Active Directory user/password and registration on the Power BI site), the next step is start the creation of the Power BI output option.

The creation of a Power BI output option is basically a two-step process:

Authenticate against Power BI. While doing this step, Azure Streaming Analytics is generating and saving those credentials for when the job is running and pushing real-time data to your Power BI dashboard applications.
Provide the necessary details for the output definition.

To create the Azure Streaming Analytics job output definition for the “hot” data path, start by navigating (via the Azure Portal) to the previously defined streaming job definition. Select Outputs.

Click on + Add to add a new output definition:

Under the Sink Parameter: Select Power BI from the drop-down list. This action will properly set the remaining fields for you to populate.
Output Alias: A name for this output definition that will be used in our SQL Query.
Sink : Refers to the selected output destination; in this case, select Power BI.
Authorize Connection : By clicking on the Authorize button, you will be presented with an authorization screen for you to authenticate against the Power BI service.

As mentioned previously, these credentials are cached and used for real-time access to push new data into your Power BI dashboard.

Figure 5-18 displays a screenshot of the required parameters for the first step.

Figure 5-18. Stream Analytics job: hot path output definition parameters, Power BI authentication

After clicking on the Authorize button, you will see a screen similar to the one in Figure 5-19 for you to enter your Power BI credentials.

Figure 5-19. Stream Analytics job: hot path output definition parameters Power BI AAD Authentication screen

After you have authenticated against Power BI with your AAD credentials, you will then gain access to the remaining configuration parameters required to complete the output definition.

Output Alias: A name for this output definition that will be used in our SQL query.
Sink : Refers to the selected output destination; in this case, select Power BI.
Group Workspace : Refers to a workspace in your Azure Power BI tenant under which the output dataset will be created.
Dataset Name : A descriptive dataset name for the Power BI output that will be used to reference when creating Power BI dashboards and visualizations.
Table Name : A descriptive table name in a dataset for the Power BI output that will be used to reference when creating Power BI dashboards and visualizations.

Figure 5-20 depicts the additional configuration parameters that are visible once the Power BI connection has been authenticated.

Figure 5-20. Stream Analytics job: hot path output definition parameters Power BI additional parameters after authentication

Once you have entered all the required parameters, click on the Create button at the bottom of the screen and the new output definition will be quickly tested and added to the Azure Streaming Analytics job.

We will explore in greater detail the use of Power BI in Chapter 9.

Azure Streaming Analytics Job: Inputs and Outputs

At this point, you have created the following components of the Azure Streaming Analytics job:

Inputs:
- IoT Hub
- Reference data
Functions:
- Azure ML Web Service – ChkTeamHealth()
Outputs:
- Blob storage
- Azure SQL database
- Power BI

At this point, your Azure Streaming Analytics job should look like the screenshot in Figure 5-21.

Figure 5-21. Stream Analytics job: two input definitions and three output definitions

Azure Setup: Streaming Analytics Job SQL Query

Now that we have created the inputs, functions, and outputs for our Azure Streaming Analytics job, it is time to pull it all together using the ASA SQL Query language.

In the reference implementation, we will make use of most of the options and features available with Azure Streaming Analytics in order to help illustrate how easy it becomes to put together a high-performance ingestion engine that can easily be enhanced to add capabilities to turn massive volumes of data into real-time, actionable data.

The SQL queries that you will implement will allow you to combine the two input definitions along with the Azure ML function definition and then populate three separate output data path destinations for our streaming data. The three paths will correlate to our Lambda architecture guidance for Hot, Warm, and Cold data paths:

Hot Path: We defined an output definition using Power BI, which will allow us to update real-time dashboards and visualizations with the sensor data plus the results of the Azure Machine Learning Web Service calls via the function definition.
Warm Path: We will target Azure SQL database which will allow us to add, update, and query data from many clients.
Cold Path: We will use Azure blob storage to archive the data in its original format plus the additional results of the JOIN on the INPUT with the reference data definition for team member reference health data.

To create the Azure Streaming Analytics job SQL Query definition, start by navigating (via the Azure Portal) to the previously defined streaming job definition. Select Query.

You will be presented with a SQL Query Editor window.
Enter the SQL Query text as shown in Figure 5-14 to implement the “hot path” output to Power BI. Note the call to ChkTeamHealth(). This calls the Azure ML Web Service to predict the team member’s exhaustion level.

SQL Query Hot Path: Call Azure ML and Then Output to Power BI

Listing 5-13. ASA SQL Query That Implements a Function Call to an Azure Machine Learning Web Service

-- ************************
-- * HOTPATH
-- * Invoke Machine Learning As a Function "ChkTeamHealth()"
-- * Via ASA SQL Subquery
-- * then output to Power BI (Hot) & BLOB Storage (COLD) Storage
-- ************************
WITH [subquery] AS
(
    SELECT UserId, ChkTeamHealth(
        UserId,
        BreathingRate,
        Temperature,
        Ventilization,
        Activity,
        HeartRateBPM,
        Cadence,
        Velocity,
        Speed,
        HIB,
        HeartrateRedZone,
        HeartrateVariability,
        "N")
    as result from [input-iothub]
    TIMESTAMP BY [Timestamp]
)
SELECT UserId,                                                                               
        result.[BreathingRate],
        result.[Temperature],
        result.[Ventilization],
        result.[Activity],
        result.[HeartRateBPM],
        result.[Cadence],
        result.[Velocity],
        result.[Speed],
        result.[HIB],
        result.[HeartrateRedZone],
        result.[HeartrateVariability],
        result.[Scored labels],
        result.[Scored Probabilities],
        result.[Timestamp]
INTO [output-powerbi]
FROM subquery

Note If there are no red squiggly lines, you can click on the Save button in the top-left navigation pane in the Azure Portal to save the SQL query.

It is highly recommended that you resolve any grammatical errors in your SQL script before progressing. The most common issues are misspellings between your input/function/output definitions and the ones in the reference implementation scripts provided as sample code with this book.

SQL Query Warm Path: Output to Azure SQL Database

Enter the SQL query text as shown in Listing 5-14 to implement the “warm path” output to Azure SQL database .

Note The field names in the SQL table definition were matched to the incoming JSON column names to expedite development efforts.

Listing 5-14. ASA SQL Query that Outputs to Azure SQL Database

-- ****************************
-- * WARM Path
-- * OUTPUT to Azure SQL DB
-- ***************************
SELECT
    UserId,
    Age,
    Height,
    Weight,
    HeartRateBPM,
    BreathingRate,
    Temperature,
    Steps,
    Velocity,
    Altitude,
    Ventilization,
    Activity,
    Cadence,
    Speed,
    HIB,
    HeartRateRedZone,
    HeartrateVariability,
    Status,
    Id,
    DeviceId,
    MessageType,
    Longitude,
    Latitude,                                                                                       
    [Timestamp],
    EventProcessedUtcTime,
    PartitionId,
    EventEnqueuedUtcTime
INTO [output-sqldb]

FROM [input-iothub]
TIMESTAMP BY [Timestamp]

SQL Query Cold Path: Output to Azure SQL Database

Enter the SQL Query text shown in Listing 5-15 to implement the “cold path” output to Azure blob storage .

Note The incoming sensor data from IoT Hub is JOINed with the team member’s reference data.

Listing 5-15. ASA SQL Query that JOINS with Reference Data and Outputs to Azure Blob Storage

-- ************************
-- * COLD Path
-- * OUTPUT ALL incoming fields into to Azure BLOB Storage
-- * JOIN on Reference Data
-- ************************
SELECT
   IH.UserId,
    IH.Age,
    IH.Height,
    IH.Weight,
    IH.HeartRateBPM,
    IH.BreathingRate,
    IH.Temperature,
    IH.Steps,                                                                                       
    IH.Velocity,
    IH.Altitude,
    IH.Ventilization,
    IH.Activity,
    IH.Cadence,
    IH.Speed,
    IH.HIB,
    IH.HeartRateRedZone,
    IH.HeartrateVariability,
    IH.Status,
    IH.Id,
    IH.DeviceId,
    IH.MessageType,
    IH.Longitude,
    IH.Latitude,
    IH.[Timestamp],
    IH.EventProcessedUtcTime,
    IH.PartitionId,
    IH.EventEnqueuedUtcTime,
    RF.healthInformation__age,
    RF.healthInformation__height,
    RF.healthInformation__weight,
    RF.healthInformation__gender,                                                                                       
    RF.healthInformation__race

INTO
    [output-blob]
FROM
    [input-iothub] IH
TIMESTAMP BY [Timestamp]

JOIN
[input-refdata-team] RF
    ON IH.UserId = RF.id

Click on the Save button in the top-left navigation pane in the Azure Portal to save the SQL query.

Azure Setup: Streaming Analytics Job, Start Job

Now that the last remaining step to completing the Azure Streaming Analytics job has been completed, it is time to start our streaming Analytics job and view the expected output results.

Navigate to the streaming analytics job you created previously and click on the |> Start button on the top-center of the Azure Portal, as shown in Figure 5-22.

Figure 5-22. Start the Stream Analytics job

This step will normally take a few minutes (three to five) for the Azure Streaming Analytics job to fully start.

Using the Device Simulator to Test the Azure Streaming Analytics Job Stream

Once the Azure streaming job is running, you can run the C# Device Simulator code from the GitHub repository to test the Azure Streaming Analytics job.

Once downloaded, you will find a Visual Studio Solution in the rtdevicesSimulator folder. This is a simple C# .NET Console application that creates a comprehensive dataset of sensor information and sends it into the Azure IoT Hub connection.

Once you locate the solution in the repository, you can then load, build, and run the solution (Simulator.sln). You will see a console window similar to the one in Figure 5-23.

Figure 5-23. C# .NET Console Device Simulator output

You should also start to see the data flowing into the various output destinations as defined in the Azure Streaming Analytics job.

Verify the Azure Streaming Analytics job Output: Azure Blob Storage

You can easily check and verify the Azure blob output destination using tools like the Cloud Explorer in Visual Studio 2015 or by using the “Azure Storage Explorer,” which can be downloaded from http://azurestorageexplorer.codeplex.com .

Figure 5-24 provides an example screenshot of the Azure blob output results when viewed with the Visual Studio Cloud Explorer tool.

Figure 5-24. Visual Studio 2015 Cloud Explorer Tool to verify Azure blob output results from THE Streaming Analytics job

Verify the Azure Streaming Analytics job Output: Azure SQL Database

To verify the Azure SQL database output results from the Azure Streaming Analytics job, YOU can use tools like SQL Server Management Studio (SSMS). You can download it from the following link:

Download SQL Server Management Studio (SSMS)

https://msdn.microsoft.com/en-us/library/mt238290.aspx

Figure 5-25 provides an example screenshot of the Azure SQL database output results when a SELECT statement is executed against the output SQL table and viewed within the SQL Server Management Studio Query window.

Figure 5-25. Using SQL Server Management Studio to verify Azure SQL database output results from THE Streaming Analytics job

Verify the Azure Streaming Analytics job Output: Power BI

To verify the Streaming Analytics job output to Power BI, you can sign in to THE Power BI web site at the following link using your Azure Active Directory organizational ID: https://powerbi.microsoft.com/en-us .

Once you have signed in to the Power BI Web Portal, you can navigate down the left-side navigation pane to the topic area named Datasets and select the option for Streaming Datasets.

After clicking on the link, you should see a list of available streaming datasets. Look for one named iothubdata, which represents the output from THE Azure Streaming Analytics job, as shown in Figure 5-26.

Figure 5-26. Working with the “hot path” output streaming dataset for Power BI to verify the output results from the Streaming Analytics job

For more in-depth coverage of Power BI as the output path for Azure Streaming Analytics, see Chapter 9.

Summary

This chapter covered the all of the basic fundamental capabilities of Azure Streaming Analytics. You learned how you can easily create streaming analytics jobs that allow you to leverage all of the positive attributes that a modern, Lambda data architecture should possess, including “hot”, “warm”, and “cold” data pathways to deliver maximum business results.

You also examined the benefits of using a fully managed PaaS service like Azure Streaming Analytics, versus building your own virtualized environment in Azure using Virtual Machine Linux images and the combination of many open source tools and utilities.

Finally, you applied knowledge of Azure Streaming Analytics to the reference IoT Architecture and created two input definitions—one for the IoT Hub events and the second for reference data for team members health-related information. Next, we created a FUNCTION definition to represent an Azure Machine Learning Web Service call that we used in our ASA SQL Query.

We then created three output definitions representing Hot, Warm, and Cold data paths using output definition parameters to update corresponding Azure data platforms—Power BI for “Hot”, Azure SQL Database for “Warm”, and Azure blob STORAGE for “Cold” storage.

As can easily be seen from this chapter, Azure Streaming Analytics can play a key role in the ingestion, organization, and orchestration of IoT sensor transactions. The environment makes it easy to get started yet is extremely powerful and flexible and can easily scale to handle millions of transactions per second. A powerful stream analytics engine is critical to success for the modern enterprise seeking to operate a business at Internet speed.

Table of Contents for 5. Real-Time Processing Using Azure Stream Analytics

Create new playlist

Sign In

Sign Up

5. Real-Time Processing Using Azure Stream Analytics

The Lambda Architecture

Figure 5-1. The Lambda architecture

Figure 5-2. Azure IoT suite architecture

What Is Streaming Analytics?

Figure 5-3. Challenges of streaming analytics

Real-Time Analytics

Streaming Implementations and Time-Series Analysis

Figure 5-4. An example of tumbling windows in streaming analytics

Figure 5-5. An example of hopping windows in streaming analytics

Figure 5-6. Sample sliding window in streaming analytics

Predicting Outcomes for Competitive Advantage

Stream Processing: Implementation Options in Azure

Figure 5-7. Stream processing layer in an IoT architecture

Choosing a Managed Streaming Analytics Engine in Azure

Table 5-1. Comparison Between Azure Streaming Analytics and HDInsight

Note

Streaming Technology Choice: Decision Considerations

Pain Points with Other Streaming Solutions

Reference Implementation Choice: Azure Streaming Analytics

Advantages of Azure Streaming Analytics

No Challenges with Deployment

Mission Critical Reliability

Business Continuity

No Challenges with Scale:

Low Startup Costs

Rapid Development

Development and Debugging Experience Through Azure Portal

Scheduling and Monitoring Built-In

Why Are Customers Using Azure Stream Analytics?

Key Vertical Scenarios to Explore for Azure Stream Analytics

Our Solution: Leveraging Azure Streaming Analytics

Streaming Analytics Jobs: INPUT definitions

Streaming Analytics Jobs: OUTPUT Definitions

Planning Streaming Analytics Outputs

Hot Path

Warm Path

Cold Path

Power BI for Real-Time Visualizations

Note

Streaming Analytics Jobs: Data Transformations via SQL Queries

Azure Streaming Analytics SQL: Developer Friendly

Azure Streaming Analytics (ASA): SQL Query Dialect Features

SQL Query Language

Supported Data Types

Data Type Conversions

CAST

Listing 5-1. An Example of the CAST Operator Usage That Will Result in a Job Failure in an ASA SQL Statement

TRY_CAST

Listing 5-2. An Example of the TRY_CAST Operator in an ASA SQL Statement

Temporal Semantic Function ality

Built-In Operators and Functions

User Defined Functions : Azure Machine Learning Integration

Listing 5-3. Example of an Azure Machine Learning User-Defined Function

Event Delivery Guarantees Provided in Azure Stream Analytics

Exactly Once Delivery

Duplicate Records

Time Management Functions

The Importance of the TIMESTAMP BY Clause

Listing 5-4. The TIMESTAMP BY Clause

Azure Stream Analytics: Unified Programming Model

Azure Stream Analytics: Examples of the SQL Programming Model

The Simplest Example

Listing 5-5. The Simplest ASA SQL Query Possible

Tumbling Windows: A 10-Second Tumbling Window

Listing 5-6. Sample Tumbling Window SQL Statement

Hopping Windows: A 10-Second Hopping Window with a 5-Second “Hop”

Listing 5-7. Sample Hopping Window SQL Statement

Sliding Windows: A 10-Second Sliding Window

Listing 5-8. A Sample Sliding Window SQL Statement

Joining Multiple Streams

Listing 5-9. JOINing Multiple Streams in ASA SQL

Detecting the Absence of Events

Listing 5-10. Detecting the Absence of Data in ASA SQL

Note

The Reference Implementation

Table of Contents for
5. Real-Time Processing Using Azure Stream Analytics