Case study 3 – an Azure IoT app

I was the API architect on a project for delivering backend services that were consumed by a mobile application. There were two main APIs. The configuration API was read-only, and the devices called it to check for updates to settings and software. The events API was write-only, and the devices posted anonymous events about user behavior, which the product team used to inform design decisions for the next generation of devices.

The APIs supported over 1.5 million devices. The configuration APIs needed high availability; they had to respond quickly to device calls and scale to thousands of concurrent requests per second. The events APIs consumed data from the devices and pushed events to a message queue. Listening on the queue were two sets of handlers: one that stored all event data in Hadoop, for long-term analysis, and one that stored a subset of events to provide real-time dashboards.

All the components ran in Azure, and at the peak of the project, we were using cloud services, Event Hubs, SQL Azure, and HDInsight. The architecture looked like this:

  • 1: The events API was hosted in a cloud service with multiple instances. Devices posted events to the API, which does some pre-processing and posts them in batches to an Azure Event Hub.
  • 2: The Configuration API was also hosted in a Cloud Service with multiple instances. Devices connect to the API to check software updates and configuration settings.
  • 3: Real-time analytics data, which is used for a subset of key performance indicators. This was stored in SQL Azure for fast access, as these are modest quantities of data.
  • 4: Batch analytics data for storing all the events that are posted by all devices. This was stored in HDInsight, the managed Hadoop service on Azure for long-running Big Data queries.

This system was expensive to run, but it gave the product team a lot of information on how the devices were used, which they fed into the design process for the next generation. Everyone was happy, but then the product roadmap was canceled and there weren't going to be any more devices, so we had to cut running costs.

I had the job of reducing the Azure bill from $50K per month to under $1K per month. I could lose some of the reporting features, but the events API and configuration API had to stay highly available.

This happened before Docker was available on Windows, so my first revision of the architecture used Linux containers running on a Docker Swarm in Azure. I replaced the analytics side of the system with Elasticsearch and Kibana and replaced the configuration API with static content served from Nginx. I left the custom .NET components running in cloud services for the events API feeding Azure Event Hubs with device data and the message handler pushing data to Elasticsearch:

  • 1: The Configuration API, now running as a static website in Nginx. Configuration data is served as JSON payloads, maintaining the original API contract.
  • 2: Kibana is used for real-time and historical analytics. By reducing the amount of data stored, we reduced the data storage requirements significantly, at the cost of losing detailed metrics.
  • 3: Elasticsearch was used to store incoming event data. A .NET Cloud Service is still used to read from Event Hubs, but this version saves data in Elasticsearch.

This first revision gave us the cost savings we needed, mainly by reducing the number of nodes needed for the APIs and the amount of data we stored from the devices. Instead of storing everything in Hadoop and real-time data in SQL Azure, I centralized on Elasticsearch and stored just a small subset of the data. Using Nginx to serve the configuration APIs, we lost the user-friendly features that the product team had for publishing configuration updates, but we could run with far smaller compute resources.

I oversaw a second revision, when Windows Server 2016 launched and Docker on Windows was supported. I added Windows nodes to the existing Linux nodes in the Docker Swarm and migrated the events API and message handlers over to Windows Docker containers. At the time, I also moved the messaging system over to NATS, running in a Linux container:

  • 1: The Events API is now hosted in a Docker container, but the code hasn't changed; this is still an ASP.NET web API project, running in a Windows container.
  • 2: The messaging component is using NATS instead of Event Hubs. We lose the ability to store and reprocess messages, but the message queue now has the same availability as the Events API.
  • 3: The message handler reads from NATS and saves data in Elasticsearch. The majority of the code is unchanged, but it now runs as a .NET console app in a Windows container.

This second revision further reduced costs and complexity:

  • Every component is now running in Docker, so I can replicate the whole system in development
  • All components are built with Dockerfiles and packaged as Docker images, so everything uses the same artifacts
  • The whole solution has the same level of service, running efficiently on a single Docker Swarm

In this case, the project is destined to wind down, and it will be easy to accommodate that with the new solution. Device usage is still recorded and shown with a Kibana dashboard. As fewer devices are used over time, the services need less compute, and we can remove nodes from the swarm. Ultimately, the project will run on a minimal infrastructure, possibly just a two-node swarm, running on small VMs in Azure, or it could move back into the company's data center.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.139.240.119