© Anand Tamboli 2019
Anand TamboliBuild Your Own IoT Platformhttps://doi.org/10.1007/978-1-4842-4498-2_4

4. Let’s Create Our Platform Wish List

Anand Tamboli1 
(1)
Sydney, NSW, Australia
 

Although it may be easier to work as we go, having the requirements defined in advance (as many as possible) will help us create the required platform fabric faster and more efficiently.

In this chapter, we list the expectations and general requirements for each module in our IoT platform. We discuss the following:
  • How we (and things) connect with the platform in real time

  • How we want to store the data

  • The types of APIs that we will build

  • The microservices and utilities the we need to build

Connecting with the Platform in Real Time

One of the challenges faced by web applications is the ability to communicate in real time. While synchronous communication is quite common, and we can achieve that with typical HTTP-like requests, being able to communicate asynchronously is not effectively possible with the same format and technique. However, connecting and communicating with the IoT platform in real time is the key requirement for IoT solutions and applications. This is where we need to use a message broker and implement a publish-subscribe-like mechanism.

This is a key reason why message brokers are important components of the latest web technologies. Message brokers are generally middleware programs that provide asynchronous communication abilities to all connected applications and devices, with the help of a publish-subscribe mechanism.

The publish-subscribe mechanism is an interesting paradigm, as it does not make it necessary for either of the parties to be online at the same time. Moreover, it also makes it possible that any party can initiate the data transfer regardless of whether the other party is ready for it. This is totally opposite to what HTTP does, where the client must originate the request to which the server will respond. The server cannot contact the client in real time. When we connect the server and client with the publish-subscribe mechanism through a message broker, either of them can send data, which is a powerful functionality.

So, in short, we need a message broker program.

It is important that the message broker we select fulfill certain essential criterion. In general, two criterions are important: easy to configure and maintain, and stable enough for the production environment.

Using MQTT as the Message Broker

While there could be several techniques for message broking, we will use the MQTT standard, as this is almost the de facto standard protocol for IoT applications and solutions.

MQTT stands for MQ Telemetry Transport. It is a publish-subscribe, extremely simple and lightweight messaging protocol designed for constrained devices and low-bandwidth, high-latency, or unreliable networks. The design principles are to minimize network bandwidth and device resource requirements while attempting to ensure reliability and assurance of delivery. These principles make the protocol ideal for the emerging machine-to-machine (M2M) or Internet of Things world of connected devices, and for mobile applications, where bandwidth and battery power are at a premium.(mqtt.​org, 2013)

There are many implementations of MQTT—commercially available and open source. Mosquitto is a popular open source MQTT implementation, and we will use it to build our message broker. We can implement a message broker with any other Node.js implementation of MQTT, and it is still open source. Let’s explore that option later, as it might be useful as a fallback secondary broker for our platform’s redundancy.

How Do We Want to Store the Data?

So far, we have decided to use Mosquitto as our MQTT message broker. Brokers are not storage providers, however. They are more like a message courier or conduit through which messages or data pass through. This data is ephemeral, and if not stored, cannot be seen or retrieved later.

From a platform’s perspective, we need this storage and retrieval mechanism so that we are able to retrieve data later; and for non-synchronized applications and devices, this data can serve as a shadow copy of the information.

Since we are building our platform on an Ubuntu server with LAMP stack, MySQL is the default and obvious choice. Not only this, MySQL consistently ranks as the second-most popular database according to DB-Engines Ranking in 2018.

The key question is how we want to store the data. The data that we refer to is transactional data that passes through our message broker and central message bus communication manager. This data has only a few information fields, which are used for data processing and audit purposes, and accordingly, our data storage schema has to be suitable for that.

With MQTT communication, a data packet comes with two fields in each message: topic and payload. The topic typically works as a key for the data, while the payload is actual data or content. Since MQTT is a messaging protocol and does not necessarily specify the format of the payload, we can be flexible. However, to maintain scalability and a unified approach throughout the platform, we will use JSON (JavaScript Object Notation) encoding for our payload (a.k.a. data packet) throughout the platform. This will not only help us in maintaining consistency, but it will also make our platform extensible and easily adaptable to new changes.

Data Storage Schema

JSON data essentially is an ASCII character string and is the topic in the MQTT message. It is important to note that MQTT also supports binary data packets, which can have non-ASCII characters too. This means that we can easily transmit binary files and data through the message broker, and we should keep this in mind when designing our platform.

Besides storing topic and related data payloads, we also need to assign a unique ID for each message stored. In addition, most importantly, since this is going to be a time-series database, we need store timestamps for each message. Apart from these fields, we do not need any other information to be stored in the core of the IoT platform at this stage. With these considerations, our database table schema is shown in Figure 4-1.
../images/474034_1_En_4_Chapter/474034_1_En_4_Fig1_HTML.jpg
Figure 4-1

Time-series data storage table schema

The following briefly explains each column.
  • ID. The incremental unique number. We are using the MySQL autoincrement feature for this.

  • Topic. Declared as a varchar to allow us to store a variable length of data in this field. A topic can be any length, and depending upon the application, it changes. We will keep a 1 KB restriction, which is big enough for any conceivable topic name.

  • Payload. The data packet is a larger size and can be any length (hence, variable type). However, we will restrict the payload packet storage to 2 KB for now. Keep in mind that these are configurable options for MySQL and thus can be changed without affecting the application. We can increase the size and limit without affecting previously stored data; however, when lowering the size, prior data may be truncated. This can be decided as needed.

  • Timestamp. We will store UNIX (a.k.a. epoch-based timestamps), which are UNIX-style, date-time stamps represented in integer format. The epoch (or UNIX time, POSIX time, or UNIX timestamp) is the number of seconds that have elapsed since January 1, 1970 (midnight UTC/GMT), and this does not account for leap seconds. This may not be a precise timestamp but close enough for real-life scenarios, which is enough for our application purposes.

Based on this table structure, we will store every data packet received in the Payload column and store its topic in the Topic column; both stored in as-is format. The timestamp will be from our platform system time, and the ID will be automatically incremented. This will enable us to query data when needed in the same sequence that it was stored and with reference to the timestamp—making it a time-series dataset.

Accessing Platform Resources Through APIs

With the Mosquitto MQTT broker and the time-series storage in place, our platform will be able to ingest data packets and communicate over MQTT in general. This communication (over MQTT) will be data stream–based and will not necessarily have any built-in intelligence without the rest of the platform.

Devices or applications that are connected to the stream are able to access the data in real time; however, when offline or not connected, there is no mechanism to ask for data. This is where our APIs will play an important role.

In the computer programming domain, API means application programming interface, which is a set of subroutines or subprocedure definitions, communication protocols, and tools for building software.

Note

In general, it is a set of clearly defined methods of communication among various components (of a computer program or system). A good API makes it easier to develop a computer program by providing all the building blocks, which the programmer can put together for a meaningful purpose.

Let’s categorize our APIs into four different types. This will help us keep the development modular and pluggable.
  • Data access APIs. These APIs help us access time-series data storage in our IoT platform and manipulate it in a limited manner. Additionally, this API helps create linkages between live data streams (MQTT based) and non-live data streams (HTTP based).

  • Utility APIs. There are certain utilities that could be required on a non-regular basis for many applications. A classic example of these utilities is data conversion or transformation in a certain format. If an application or device needs to encode or encrypt the data for one-off uses, or needs to translate or transform it for a specific condition, then it can utilize some of these APIs. Essentially, they are packed functions shared by multiple resources across and outside the platform.

  • Microservice APIs. Endpoints that are functionality based or serve a very specific and predefined purpose form part of this group. These are typically application services such as email and text messaging.

  • Plug-in APIs. Some of the interfaces that we will build will patch up two sections of the platform, which otherwise are not connected. Some of these APIs also act as a front end to mobile or computer applications.

Data Accessing APIs

To access time-series data safely and appropriately, we will design a set of APIs to cater to various scenarios and requirements. In general, we need at least seven endpoints.

Note

Each requirement is numbered so that we can easily refer to them throughout the book. Data requirements start with a D, while microservice and utility requirements start with an M.

  • D1. Get a single data record. Enables applications and devices to query a single data record from the time-series data storage based on the specified topic or topic pattern.

  • D2. Get several data records in series. Enables applications and devices to query multiple data records based on a specified topic or topic pattern.

  • D3. Get one or several records based on certain condition(s). Enables applications to query one or more data records based on a specified condition—for topic or payload, or both. The condition could be a topic or payload pattern, or timestamp dependent, such as data within a time period.

  • D4. Store data record sent over an API (if not sent over MQTT stream). In addition to querying data from time-series storage, we want applications and devices to store the data in the time-series store. This is useful for devices and applications that cannot communicate over a live MQTT data stream.

  • D5. Delete a single data record. Enables applications or devices to delete a single data record based on the specified topic. Note that we do not want to implement the topic pattern mechanism because of accidental data purges.

  • D6. Delete several data records in series. Deletes a set of data records from the dataset based on topic. It is useful if we want to keep data storage lean and light in weight. A typical scenario for this requirement is removing all the data after 24 hours, or combining it with a multirecord query, getting the data out of platform storage and storing it somewhere for audit or regulatory purposes.

  • D7. Delete one or several records based on certain condition(s). Like querying one or multiple data records based on a specified condition, we may need to delete them from the time-series storage. Although this is a useful functionality, it needs a built-in level of safety, which we will discuss in detail.

Elementary Microservices and Utilities

Here we list some of the microservices and utilities that we want to use on our IoT platform, frequently but not regularly.
  • M1. Publish current timestamp. This service is something I highly recommend for distributed applications. Often, we find that the systems are not coordinated due to time zone differences and system clock limitations. We can overcome this with the help of a time broadcasting service. The other alternative for this is the use of NTP (Network Time Protocol) ; however, not all the applications or devices have access to NTP servers, which limits their ability to time synchronize operations.

    We will use this utility to publish/broadcast time values from our own IoT platform, so that all systems are synchronized with our platform. We can synchronize the platform with NTP servers separately; regardless, there is a stable reference source in the form of our platform.

  • M2. Get current timestamp. This is a polling service of the publish current timestamp function. This service is helpful when a device or application wants to poll and wants to know the current timestamp if it missed a prior broadcast and cannot wait until the next broadcast; or in case the device or application is forced to synchronize by the user or a business rule.

  • M3. Get unique or random number/string. This is a very handy service for random strings and number generation and usage. We can use randomly generated numbers and strings for creating unique keys or reference numbers. We can also use them as random passwords or as tokens.

  • M4. Get UUID. A UUID (Universal Unique Identifier) is like a random number or string generation service, but a bit more structured and universally unique. A UUID algorithm is guaranteed to be different or it is extremely likely to be different from any other UUIDs generated until the year 3400 (i.e., 1500 years from now). Similar to random strings, we can use UUIDs for generating keys or passwords for devices and applications.

  • M5. Send an email. A ubiquitous and probably frequently used service by several applications and platforms. We need an email service for automation, alerts, user checks, and verifications; password resets; key communications; and more. This is a must-have service in our IoT platform.

  • M6. Send a text message. We can use text messages for purposes similar to email. Additionally, we can use it for implementing two-factor authentication for our systems or critical sections where an additional security layer is required. Our applications and other applications connected to the platform can use this service.

  • M7. MQTT callback registration. Because the MQTT data feed is live, for applications that depend on an HTTP-only mechanism, there is no way to be notified of newly available data unless the application is polling continuously or frequently. To avoid this, we develop a service that essentially creates a webhook to use whenever the platform receives a data packet matching given topic or payload criterion. This way, HTTP-only applications can post or transmit the data packet using the REST API (as in D4) and receive it (be notified) with this service. We may have to leverage the rules engine for writing this service. Note that this applies only to server-based applications; hardware devices are not likely getting any benefit from the callback.

Routing and Filtering Data and Messages

Routing and filtering data flow and messages are going to be only a general architecture, and will not be final at the first stage. We will keep it evolving based on additions of new devices and applications.

Updated Block Diagram of Our IoT Platform

Remember that none of the requirements that we have listed are hard and fast. Many of them could be built later, or skipped altogether. So far, we have defined the base requirements for four of the major blocks of the platform.

The agile way that we are building our platform enables us to add more features and functionalities in any of these modules. This way, we can get our core functional IoT platform up and running in less than 24 hours, and then keep augmenting it on an ongoing basis. The updated block diagram of our IoT platform is shown in Figure 4-2.
../images/474034_1_En_4_Chapter/474034_1_En_4_Fig2_HTML.jpg
Figure 4-2

Updated block diagram of our IoT platform

Summary

In this chapter, we made a few key decisions related to the data storage schema and real-time connectivity. We also defined our API, microservice, and utility requirements. Now we hit the road and start building something.

The next chapter is completely hands-on. Accordingly, you may want to ensure that you have a laptop computer with all the software utilities installed. We also require a fully qualified domain name for our platform. It would be a good idea to think about this and select one.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.220.187.178