CHAPTER 7
AI in the Modern Software World

The first half of the book is focused on Artificial Intelligence and particularly on Deep Learning. It included examples of using Machine Learning and Deep Learning to extract patterns from data and drive outcomes like classification and regression. You saw a full example of collecting data of soft drink brand logos, augmenting the data to generate more training samples, and building a deep neural network to classify these images. You used transfer learning to take a proven architecture and customize it for a specific problem. Hopefully, with all this knowledge, you are equipped to analyze your own dataset and build models to analyze it.

In the second half of the book, we try to bridge the gap between data scientists who are the algorithm experts building models and software developers who build the code that runs into production. We see how the ML and DL models we build can be packaged with software code and deployed for real‐time inference with live data from the field.

In this chapter, we take the data scientist hat off for a bit and put on the software developer's hat. We talk about how software development has evolved over the years; what kind of modern applications are being developed; and what improvements are happening in the process and tools for building software. It's important to understand these issues because this is the new domain and environment for which we need to build and deploy our ML models.

We talk about the growth of web applications, the rise of Cloud computing, SaaS versus PaaS versus CaaS, SOA versus microservices, and the latest trend of Cloud‐native applications using containers. We then spend some time understanding Kubernetes and how it can help you package your code into a container for production deployment and scale it to thousands of nodes in seconds.

A Quick Look at Modern Software Needs

Software development has undergone a major transformation in recent times. Customers (who pay for software) and consumers (who end up using it) have increased demands in terms of cost, speed of delivery, faster and automated updates, and an enhanced user experience. With the rise in mobile computing, everyone has a powerful smartphone in their pockets with dedicated Internet connectivity. The expectation is that the software will take full advantage of this processing power and connectivity to get us improved outcomes. No one expects to download a binary file and plug their phone into a laptop with a USB to update to a new OS. We have started to expect over‐the‐air (OTA) and seamless updates that happen in the background and do not interrupt our routines.

Customers are no longer expecting large bulky and monolithic software that needs to be custom installed on racks of servers in a back office. Modern web applications are moving to public Clouds like Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure. These Cloud vendors provide a unified ecosystem of building and deploying software very fast and take care of many infrastructure concerns for the developer. For example, with AWS, you can spin up new virtual machines in a matter of seconds without having to touch any hardware. All the memory, CPU, storage, and networking of the machines is done virtually. This is the age of software‐defined hardware and networking.

We are seeing an explosion in social media apps like Facebook, Twitter, WhatsApp, and Instagram, which manage millions of users connected to each other and provide real‐time updates. Customers are expecting a social media–like experience in other areas like mobile payments, movie bookings, and online shopping. A few years back I was working on software showing the status of gas turbine health. Our customer was pushing us for an interface like Twitter where, as soon as an event happened (a message by a user), the whole network was notified in milliseconds. We actually studied the Twitter architecture and ended up building a real‐time notification engine.

In the previous chapters, if you used Google Colaboratory to run some of the examples, you will have noticed an extremely seamless experience and a very powerful interface. Programmers are traditionally used to desktop‐based integrated development environments (IDE), which need to be installed and kept updated with the latest versions. We are now moving from desktop to web‐based IDEs like Google Colaboratory, where all the cool operations that an IDE supports are done in a web browser. No installation is needed and there's no updating of code and library packages. Especially for libraries like TensorFlow that get new versions every three to four months, you can expect the web IDE to always provide the latest library version to get you going.

You can do all the coding inside the browser and run the code on specialized hardware like a GPU, all in the background. This is the level of sophistication that modern software systems have grown to expect, including a user interface that gives an almost desktop feeling to programmers when writing code, with auto‐complete of code syntax. Your code runs in the background on a virtual machine with a dedicated GPU and you don't even notice it!

Another major improvement happening with software is in the user experience area. We are no longer happy with the traditional mouse and keyboard inputs for our systems. We now have to develop software that will be accessed on touchscreen phones and iPads and can listen to voice commands. We now have virtual and augmented reality devices that create an environment for the users, and our software needs to be rendered in this environment.

To meet the ever‐growing needs of modern applications, the whole software development process is changing. Traditionally, we had a waterfall development model, where engineers spent a lot of time and money upfront capturing requirements, building a complete architecture, building a detailed design specification, and then over many months, delivering working code. The problem is that in a fast‐changing world we cannot wait that long to get software. The people, environment, and requirements continuously change. Moreover, we rely heavily on the fact that we have captured all requirements perfectly, which is almost never the case. This can lead to hours of rework and extended delivery dates and missed targets.

Today, almost all organizations are moving to agile methods that promote smaller, self‐organizing teams that build and deliver working software in short iterations or sprints. Many people feel that agile is building software very fast without focus on quality and documentation. That is absolutely untrue. The expectation is to deliver working production‐quality code in each sprint with an acceptable level of software quality checks and documentation in these short sprints. There are formal project management techniques like scrum that help engineers achieve agile development practices. In order to support such an agile process, we cannot have engineers waste hours running the same unit and system tests over and over. We also cannot afford to have a manual build process to generate production code from our source repository. Our build process should be such that, as developers check in code, the tests run automatically, checking for things that are broken and helping us fix the relevant areas. Then, once the code passes all the tests, it is automatically integrated with all the right dependencies (libraries, DLLs, etc.) and deployed as a package.

To solve this exact problem, a major component of the agile process is Continuous Integration and Continuous Delivery (CI/CD). CI aims at integrating your source code with unit and integration testing and making sure the code is not broken. This is absolutely invaluable when you have multiple developers checking in code at the same time—sometimes across the globe. CD focuses on packaging the validated build into a binary to deploy on target computers. This is how companies manage nightly builds of their software, which can be tested immediately. For example, Google's Chrome browser has 6.7 million lines of code, and it's all managed through such a process. We can go to the website and download the latest version. Similarly, the entire Android operating system that powers smartphones has around 15 million lines of code. It's open source and you can also look at the code for free online.

How AI Fits into Modern Software Development

Now, you may ask, what does this have to do with AI? Excellent question. For AI to be effective, it needs to be part of the modern software development process. Imagine you build a very effective AI model that reads an image and if it sees a familiar face, it sets a flag to unlock the phone. A data scientist would focus on using tools like Python and Jupyter to master the face recognition algorithm. However, once this brilliant model with 98% precision is developed, how do we integrate this into the smartphone app? You have a friend who is a mobile software developer—she's an expert in C++, Java, and mobile software. This developer needs to build a wrapper app that takes the image from the smartphone camera, normalizes it, and provides it to your model. Now your model is developed in Python and stored as an H5 file. This nice mobile software developer now needs to find a way to call your Deep Learning model from inside her environment, which could be Java or C++, and run the model. Even if this is done, the model H5 file will stand out as a sore dependency that needs to be integrated in the CI/CD process.

Now imagine that a new paper is published with insights on better hyper‐parameter tuning, particularly for your face recognition problem. Just to explain, hyper‐parameter tuning is basically adjusting the parameters of your model that are not learned. These are configuration parameters like number of layers, neurons in each layer, etc. You are excited and integrate these changes into your data science tools and retrain a model. Now the new model has 99% precision, and you have to go back and give this new model file to the mobile developer, who has to integrate it in the code. This has the potential to happen again and again and could pretty much sour your friendship with the nice mobile developer!

As mentioned earlier, requirements keep changing in the software world and hence we need an agile process to change as requirements change. The same goes with AI. As new requirements come up, you need to modify your Deep Learning models and quickly integrate them into the software CI/CD process. Just sending a model file across the board is not the solution. We need tools to manage the model lifecycle, evaluate models by running them in parallel, and have a seamless CI/CD process for our models. The entire Machine Learning model lifecycle needs to be considered and the right points should be automated to make the overall application development agile. This is what we cover in the second half of this book.

We will show—using the latest technologies like Cloud computing, microservices, and containerized applications—how we can modernize the model development process and make it agile, just like CI/CD does for the overall software development cycle. Modernizing the ML model development process and integrating with the software development process is an active area of study (as of 2018). The technology behind this is still being developed. I will share some of the best practices used in the industry and some top tools used. I will also show some examples of taking models developed using tools you saw in the first half of the book—like Keras and TensorFlow—and deploying these into real applications.

But before we get too far, let's talk a bit about the growth of these technologies—particularly web applications, Cloud computing, microservices, containers, and Docker. This will not be a comprehensive guide of any of these technologies. I explain them in simple language and try to relate these concepts back to the AI conversation we started with.

Simple to Fancy Web Applications

In the 1990s, as the world moved from desktop applications toward web applications, more advancements were made in making these applications dynamic and having the flexibility of desktop applications. A desktop application is something like Microsoft Word or Outlook that runs on a computer or laptop and has full access to the system's resources. Thus, we see tight control over the data and some fancy user interfaces. Web applications, on the other hand, run inside a web browser like Google Chrome, Apple Safari, or Microsoft Internet Explorer. These web applications connect to a remote computer called a web server and deliver content in a universal format, known as HyperText Markup Language (HTML). This is how the majority of web content is delivered. A user with a web browser connects to a website like Google.com. The website checks for information requested by the browser and packages a response as HTML and sends back data. HTML is the language the browser understands well. It decodes this HTML into a web page that we see. The web server does all the magic of understanding our request, getting responses from some data source, and packaging them into an HTML document that can be rendered inside the browser. See Figure 7.1.

Screenshot displaying a web page as depicted in a browser and the HTML code for the webpage presented on the right.

Figure 7.1: Displaying a web page and HTML code

All the underlying communication is done using a protocol called HTTP (HyperText Transfer Protocol). A protocol is basically a language that is used to transfer data on a network. The HTTP protocol defines the structure of data to be sent from the client (browser) to the server and back. Also, the verbs—like Read, Put, Delete, and Update—define the action that needs to be done at the server. For example, a browser may send a READ HTTP message to get back contents of a web page—the most common use case. There could be a message to UPDATE a value in a database like a user address or ZIP code. This is how HTTP works through messages.

In the early 1990s, web servers were pretty dumb and served just static HTML pages. So, all the logic of collecting data and building HTML was done by some person to create a static HTML page that was stored and sent back. This was not enough to keep up with the dynamic needs of web applications. Hence methods like CGI‐Scripts, Java Servlets, and PHP were developed to enable server‐side code to generate HTML content dynamically. So, if you needed to query a database of books for a search topic, you could do that using Java code in a servlet and the results were displayed as custom HTML.

Server‐side scripting became extremely popular, but was not enough. The results still had to be sent back to the server and the client had to wait for a response, which was a full HTML document. There was advancement in scripting on the client‐side with the development of JavaScript. Developers could write some amazing JavaScript code to do things like validation of data and modifying style of pages and animations. JavaScript, combined with HTML stylesheets, gave rise to a very advanced user interface for modern web applications. With the growth of Ajax, dynamic content was available for web pages without having to request an entire HTML page. Pages could only send back relevant queries and get results packaged in small packages to display on pages using Ajax. The rise in HTML, JavaScript, and stylesheets led to something known as HTML 2.0 (see Figure 7.2), which is a modern evolving standard for building dynamic, interactive, and responsive web applications.

“Image of the HTML 2 logo, which is a modern evolving standard for building dynamic, interactive, and responsive web applications.”

Figure 7.2: The HTML 2 logo

(Source: W3C – Wikimedia)

If you have been using a web‐based email tool like Yahoo or Google Mail over the years, you have probably noticed the evolution in its user interface—from the early versions in the early 2000s, which would take a few seconds to load, to each message loading in separate tabs, to a recent (2016) more modern desktop‐like user interface, letting you click and read a message in the preview pane and select and delete multiple messages.

The Rise of Cloud Computing

As web applications got sleeker and faster in the 2010s, there was also a paradigm shift starting to evolve in the backend for hosting these applications. Traditionally organizations had in‐house servers tucked away at the back of their buildings in rooms with hundreds of wires and cables running along big box computers. This room generated lots of heat due to all the computers and needed dedicated cooling, with many fans. There was usually a dedicated IT admin team who knew where these wires connected and would spend hours debugging some issue. You have probably seen these server rooms in some of the ’90s movies like Office Space. See Figure 7.3.

Photograph of a server room  with racks of blade servers in a data center.

Figure 7.3: Data center with racks of blade servers

(Source: BalticServers.com – Wikimedia)

As applications grew in size and complexity, we soon found that the server room was not enough to maintain applications. The applications were no longer simple web pages showing reports and data entry forms. These were complex business process systems that needed high‐end processing and high availability. Also, with the rise in globalization, these applications were no longer accessed from one or two regions, but there could be customers accessing these applications 24×7, from all corners of the world. These applications now needed extremely high availability with minimal downtime.

As our web applications grew in complexity and importance, strong metrics to track downtime started coming into play. An availability number of 95%, although initially considered good, soon started becoming undesirable—95% availability for a 24×7 website translates to a downtime of 18 days in a year. Imagine Wikipedia, Facebook, BestBuy, or your bank's website being down for 18 days of the year! So, the new availability metrics were as high as 99.99% (four nines) or 99.999% (five nines). Five nines translates to a downtime of five minutes in a year, which is becoming acceptable.

This downtime was needed because software had to be upgraded with new features or broken hardware had to be fixed or replaced. Engineers soon realized that individual servers could no longer support these global high‐availability applications. This gave rise to the move of applications into dedicated data centers in late 2000s. The data centers had dedicated racks of blade servers with shared processing power, storage capacity, cooling, etc. There could be a dedicated IT team for the data center rather than having individual ones at sites—saving millions of dollars. Another major benefit that the data center provided was disaster recovery. If a data center was destroyed because of a natural disaster or a terrorist attack, organizations could lose years of transaction histories and valuable data. Data centers started supporting data replication at different sites in different geographic regions to avoid these scenarios. The data center was still on a private network and could only be accessed with network connectivity. It was still operating on its own intranet.

In the early 2010s, a new concept started emerging—more like a public data center or Cloud. The idea was that there would be one or more data centers with data storage and processing capability and companies would “rent” this storage and processing power. This was available on the public Internet, but everything behind the scenes was abstracted out for users; hence the term Cloud, since you don't really know what happens up there. You get the desired storage, memory, and processing resources and pay a monthly fee for the privilege.

The technology that enables this is called virtualization. The racks of servers that you see in a data center in Figure 7.3—using virtualization—can be divided into smaller virtual machines (VMs), each with a dedicated processor, memory, and storage. All communication with the data center happens over the public Internet using the HTTP protocol we discussed earlier.

A security layer is developed on top of HTTP to make sure the right users get access to their resources and unauthorized access is blocked. Many security standards like HTTPS, OAuth, and SAML have evolved to ensure exactly this. So, once you have an account established on a public Cloud provider website, you can connect to an endpoint with client software and launch a virtual machine. Based on usage, your account will be billed. This is just like using any paid subscription service like Netflix.

As public Clouds gained popularity, several “as‐a‐Service” paradigms evolved around Cloud computing. Let me explain these with the help of Figure 7.4. There may be different versions of this block diagram available on the Internet. It's important to get the concept behind it.

Block diagram explaining public clouds - IaaS, PaaS, and SaaS - “as-a-Service” paradigms evolved around cloud computing.

Figure 7.4: IaaS vs. PaaS vs. SaaS, explained through a block diagram

The most basic version is called Infrastructure‐as‐a‐Service, or IaaS. Here you rent the hardware and network from the Cloud provider. Basically, this is logging on to AWS and commissioning a virtual machine. You specify the number and type of CPU processors, RAM, and storage capacity needed. Of course, the bigger the resources, the more you pay per hour. Then you can log in to that virtual machine with SSH (Secure Shell prompt) using a security key assigned to your account.

You can also enable Windows Remote Desktop for Windows and treat it like a regular desktop. You can install software on this machine and run dedicated processing jobs. You can install a web server like Apache Tomcat and deploy your code and have it as your web application hosted on the Internet. Then you can install a database like SQL Server on the same or different virtual machine and have your application write to this database. Many websites are hosted in this manner. This is IaaS. The Cloud vendor only takes care of the hardware and the network—the application developer handles the runtime, application data, and logic.

Application developers have to do a lot of work using the IaaS paradigm. They have to create VMs using a web admin screen, log in to the VM and then manually install the OS, drivers, web servers, databases, applications, etc. Recently, Big Data ecosystems like Hadoop have come into prominence. Hadoop allows regular Linux machines to act like an integrated cluster and distribute jobs on that cluster. This way, you get the processing power and storage of all the machines combined. If you were to set up an eight‐node Hadoop cluster on your own, you would need to commission eight VMs and then configure each to be part of the Hadoop cluster. Lots of work!

To solve this problem, Cloud vendors started introducing the Platform‐as‐a‐Service (PaaS) paradigm. PaaS lets the developer focus on application code and data and takes care of the runtime, as shown in Figure 7.4. Here the application developers do not explicitly commission VMs; rather, they package their code into binary files and upload them to the PaaS ecosystem. The PaaS takes care of setting up the database, the app server, and, in some cases, the Big Data ecosystem. The runtime is a major concern for application developers and installing, debugging, and managing versions can become a major workload. PaaS takes care of this for you.

Java developers package their applications into JAR files. The JAR file contains application code, configuration data, and database scripts. The PaaS automatically extracts these files and creates the environment. Internally it commissions multiple VMs to address each of the runtime concerns like server, database, etc. Developers save on deployment and maintenance time, but they have to sacrifice the fine control they would have if they used an IaaS. Also, they have to rely on the server and databases supported by the PaaS solutions. Modern PaaS tools like AWS Elastic Beanstalk and GCP App Engine are pretty good at supporting all the latest development servers and databases.

The next paradigm we will talk about is Software‐as‐a‐Service (SaaS). SaaS has been used in context of the Cloud; however, SaaS solutions existed even before Cloud computing was formally defined. SaaS means you have the vendor take care of all your application concerns right from network, from the hardware to runtime to the application data and code. Most of the web‐based tools like Google Docs, Gmail, Yahoo mail, etc. are SaaS tools. You don't need to install any software on your machine; you just open a compatible web browser and the entire application runs inside the browser. Companies like SalesForce.com provide extensive tools where you can build entire applications following the SaaS model. Microsoft has also embraced the SaaS model for Office‐365, where you can build and manage all documents in the Cloud with an online interface.

In recent years, a new paradigm is evolving in the industry, called Container‐as‐a‐Service (CaaS). Let's talk about that in the next section.

Containers and CaaS

Traditionally, web applications were packaged as binary packages like JAR files in Java or ZIP files. The development and testing teams would ensure that the package contained all the dependencies and installed fine on the app server and the platform like Java or Python. However, invariably as packages were moved from development to testing to staging platforms, there would be missing dependencies, incorrect versions, etc., causing problems. This would cause major delays in deploying software and has been a major deterring factor to agile development.

I remember once in a Java application we were developing a few years back we started getting null pointer exceptions (bad bad stuff in Java) when we moved our JAR files from the development to the staging servers. We spent two days checking versions, but all seemed fine. Finally, we discovered the charting library we used had a micro‐version change on that environment and this was causing the entire chart object to be null. The problem was that we were using the chart as an external dependency, and on the new environment the expectation was that this library existed and was in the right version.

To manage problems like these, a new development pattern is evolving and getting very popular, called containerized applications. The idea is not just to package your application into a ZIP or JAR file, but to package the whole machine image, including the operating system, any dependent libraries, and your code as a container. A container is a lightweight virtual machine that uses shared‐kernel architecture.

Normally, if you package your application as a virtual machine, the file can be a few gigabytes. To initialize it on another machine, you will need specialized software called Hypervisor, and it will take a few seconds. This is because when a VM initiates, the entire OS needs to be started, then your app server, and finally your application code.

In contrast to this, a container can get started in just a few milliseconds and the size can be a few megabytes. The reason is that containers reuse the kernel of the underlying operating system, as shown in Figure 7.5.

Block diagram depicting the differences in the operations of virtual machines versus containers.

Figure 7.5: Virtual machines vs. containers

Docker is the most popular container technology today. A container is a standard unit of software that packages up the code and all its dependencies so the application runs quickly and reliably from one computing environment to another. A Docker container image is a lightweight, standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries, and settings.

All the host machine needs to have installed is a Docker agent. The container image is downloaded on this machine and instantiated as a container. The container reuses the Linux kernel of the underlying Docker agent. The agent also allows containers to share libraries among them, thus making the containers highly lightweight. Containers may be spun up in micro‐ or milliseconds and you can have thousands of containers running on a single high‐end machine.

Containers run in an isolated environment with their own network stack, giving the impression of virtual machines. They use three Linux technologies to achieve this. They use namespaces to isolate specific operations. Each container has dedicated and isolated namespaces for resources, like CPU, RAM, and storage. Linux cGroups are used to assign resources to containers. These help put limits on the amount of resources consumed by containers so they can coexist on the same machine. Finally, containers use a layered operating system with every increment made to a base image. For example, we can start with a standard version of a Linux image, add a web server, add our database, and add our code. Each of these will be separate shared layers allowing the final layer to be highly lightweight with only our code. The same OS and server layers will be used on all machines.

Containers have two major advantages. One is for DevOps. DevOps is basically a new concept in agile that enables tighter integration and coordination between developers and operations teams. Instead of developers testing their code and “throwing it across the fence” for operations to do the deployment and monitoring, modern software teams have dedicated DevOps members be part of the agile teams who work on making sure the code from developers is validated and deployed correctly. Traditionally, managing dependencies of libraries in software is a nightmare for DevOps. Developers always point out the fact that “it works on my machine” and now it's up to DevOps to make it work on the staging or production machine.

Containerized applications are lifesavers for DevOps. Since along with code, we package the app server with the right version, all the dependent libraries with the right version and even the OS, we are almost guaranteed to have the code working exactly the same way it worked on the developer's machine in the staging and production environments. We don't just deploy code anymore—we deploy fully tested environments, thus making DevOps much easier and potentially fully automated. This is the major advantage of a containerized apps drive.

A second, equally important advantage of containerization is that we can quickly (in a matter of milliseconds) spin off thousands of containers in parallel without running out of resources. Resources don't get allocated in advance to containers and only get assigned when the container does some work. This shared resource model greatly helps in improving performance of applications by running them in parallel. As long as we have a good tool to schedule containers in parallel, we can run our applications at scale, talking full advantage of parallel computing.

In the last couple of years, PaaS is actively being replaced or extended by the Container‐as‐a‐Service (CaaS) model. This is similar to PaaS but instead of sending a JAR file to PaaS, we point the CaaS engine to our container published to a registry like DockerHub. The CaaS engine pulls the image, deploys it, and instantiates the container in a standard runtime like Docker. The container is a fully self‐contained entity with the right OS version, system libraries, web server, and all other dependencies. It can be configured to bring up your application when it starts and also to monitor if the application goes down and restart it. The entire application lifecycle is managed inside the container, which gives developers way more flexibility than with a dependency‐heavy JAR file. It also gives many more insights and visibility to DevOps in terms of logging and monitoring the applications.

As CaaS is getting popular, a new complementary approach to building software applications in the Cloud is coming into prominence, called microservices. Using microservices architecture, a new breed of applications are being developed from the start with the Cloud in mind. These are called Cloud‐native applications.

Microservices Architecture with Containers

Along with the move to data centers and public Clouds, architecture of software applications was also being simplified to a great extent with new styles. Software applications were traditionally developed in tiered architectures like Model‐View‐Controller (MVC) with strict separation of data structures, view generation logic, and controller to integrate the two. However, these applications were developed in silos, with a very limited focus on the domains that a particular application would serve. For example, a company would have a very elegantly structured maintenance management application, but it would not be able to communicate effectively to another application like inventory management. Organizations undertook huge implementations of Enterprise Resource Planning (ERP) systems to try to have different parts of the organization communicate effectively with each other.

This drive to remove monolithic applications operating in silos led to development of an architecture style or pattern called Services Oriented Architecture (SOA). The goal of SOA was to find data and functionality that could be shared between monolithic applications and help them integrate better.

The focus of SOA was to enable interoperability between systems. Expert software architects started identifying integration points between systems and defining services that enabled the sharing of data and functionality. The key challenge was to manage the lifecycle of these services and provide an easy way for them to communicate smartly. This requirement led to development of Enterprise Service Bus (ESB) products.

An ESB would provide a way to host services from multiple separate products and drive communication between them using common protocols like HTTP or messaging. Also, a key thing that ESB provided was the ability to store Enterprise Integration Patterns (EIP) in the integration layer rather than storing these in individual services. So the services could be developed generically and all the smarts in the communication was encapsulated in the ESB.

An example of such an EIP is content‐based routing, where, based on the content of a message (like a mobile SMS), the request will have to be passed on to the appropriate service. This logic of processing the message and directing output to the appropriate service was managed by the ESB. An example of this is your mobile provider sending an SMS asking for feedback and you replying with 1 for positive and 2 for negative.

The goal was to minimally change the monolithic applications and capture the integration patterns and store them in the ESB. ESB has services communicating over a message broker.

With the focus on Cloud computing, there was a change in philosophy of how services would be developed and implemented. Unlike the SOA focus to integrate monolithic applications, a new architecture style started to emerge, called microservices. The idea of microservices is to have self‐contained services that can be scaled and managed independently. Unlike SOA, there was not a focus on integrating monolithic applications, but the focus was on breaking silos and distributing functionality into smaller components. The idea was to modify applications with a focus on hosting in the Cloud and taking advantage of the distributed nature of Cloud computing.

For example, let's consider a huge shopping application that would handle all features like searching for a product, finding cost, and completing the purchase. In a microservices architecture, each functionality would be distributed to a separate microservice. The search microservice will fully own the capability of the system to provide the search UI to users, run the query, and show the results. An ideal microservice will be self‐contained. So our search microservice will manage the UI shown to users and will most likely have an optimized database of products specifically for search. If a new feature needs to be added, such as a photo‐based search, that would be owned and implemented by the team owning this search microservice. It will have its own codebase, test scripts, and release cycle. Also, if we saw that search was getting slow, then this search microservice could be scaled independently from 50 to 100 nodes to double its performance.

This microservice architecture leads to a highly loosely coupled architecture. Also, team structure can be customized to provide developer, tester, and DevOps resources based on specific functionalities. Many companies are starting to adopt the microservices way of developing Cloud applications. Also, we saw earlier CI/CD pipelines for software applications. We can have independent CI/CD pipelines and releases for microservices so that key functionalities can be released faster. What microservices gets us thinking about is loosely coupled applications. So if the search feature needs a quick functionality improvement, this can be implemented in that microservice without affecting others.

Earlier we saw how containers help build independent components of your software with all dependencies. As you can see, containers are tailor‐made to fit the microservices model. You can package a microservice as a container and deploy it into a CaaS ecosystem, which can manage scaling and management of the independent microservice. Just as we saw it's easy and fast to scale a single container into thousands of instances, the same can be done with a microservice packaged as a container.

Revisiting the earlier search microservice of the shopping application—if we know that during Christmas or Diwali holidays, the search queries are going to double or triple, then we can scale the containers appropriately to handle this load. This independent scalability is just one of the many benefits the microservices architecture drives.

Now this brings us to the main topic of the chapter and I hope you have been waiting for it—Kubernetes. The next section explains how Kubernetes provides a CaaS framework for deploying microservices and helps take care of infrastructure concerns for the application. In the final part of this chapter, we cover some basic Kubernetes commands for configuring your own application packaged as containers.

Kubernetes: A CaaS Solution for Infrastructure Concerns

Kubernetes is basically a Container‐as‐a‐Service platform. For one, it allows us to deploy applications packaged as containers and scale them independently. However, it does a lot more than that. The key thing that Kubernetes brings is that it takes care of many of the infrastructure concerns for applications. Before we build applications on Kubernetes, let's quickly look at the Kubernetes architecture and its key abstractions like pods, deployments, and services. I explain these concepts at a high level and show some examples. For more details, I recommend looking at the Kubernetes.io site, which has some excellent material, and also finding online examples you can try out. I provide some good articles on this in the “References” section at the end of the book.

Also, I will provide commands that you can run in a Kubernetes environment. To run these commands, you can either have a server‐based or Cloud‐hosted Kubernetes instance and connect to it. You could have a local installation on a single node on your laptop. This single‐node installation is a separate product called a Minikube. The beauty of Kubernetes is that all the commands and containers you run on a single‐node Minikube can be pretty much run on a cluster with hundreds of nodes.

This works even for the multi‐node cluster you may have that is running on a server (on‐premise) or on a public Cloud (hosted). Kubernetes was initially developed by Google and made open source. Hence, GCP has built‐in support for Kubernetes and you can log in to GCP and quickly start a Kubernetes cluster and connect to it remotely. Internally GCP will manage the nodes, which are the virtual machine parts of the cluster—very similar to a PaaS setup. AWS and Microsoft have also recently started supporting hosted Kubernetes clusters. Kubernetes has definitely emerged as the technology of choice for managing containerized applications on a cluster.

To get familiar with it, I recommend installing Minikube on your laptop. It creates a single‐node cluster where you can deploy containers. This single node acts as the master and the slave. The master controls the slave and gets jobs scheduled. Here all that is done on single machine. You can install it on Windows, Linux, or MacOS using the installation steps at the Kubernetes.io website.

Internally, this creates a virtual machine for the node, which has a dedicated IP address and network stack. You can use any virtualization engine like VMWare or VirtualBox for this. Kubernetes will connect to the virtualization engine and create a VM internally. You don't need to do anything to manage this VM. Table 7.1 lists some handy Minikube commands to make note of.

Table 7.1: Some Useful Minikube Commands

COMMAND ACTION
$ minikube start Starts the Minikube single‐node cluster by initializing the VM.
$ minikube status Shows the status of your Minikube cluster, if it is running.
$ minikube stop Stops the cluster and shuts down the VM.
$ minikube ip Gets the IP address of the virtual machine of your single‐node cluster.
$ minikube ssh SSHs to the single node of your minikube cluster. After SSH, you will see a big Minikube logo and then run commands like ls, pwd, and ifconfig. You can see with ifconfig that this VM has a totally separate network stack than your machine where Minikube is installed.

Kubernetes is a CaaS platform, so it lets you define containers for your application or microservice and manage the lifecycle of these. It follows a master‐slave architecture pattern with slaves making their storage, memory, and CPUs available to do work and masters controlling data and jobs on slaves. The workers in Kubernetes are called nodes, which can be physical or virtual machines. Each node runs the container agent and can spin up containers. However, all this is hidden from the users. There are commands to see the cluster details, but typically you deal with abstractions pertaining to your application.

Once you have a local Minikube cluster or a Cloud‐ or server‐hosted Kubernetes cluster, you can connect to resources of this cluster. One of the salient features of Kubernetes is that it exposes an extendable Application Programming Interface (API). You can connect to the Kubernetes cluster with this API and access and modify resources. This is a very uniform way of interacting with the Kubernetes system. As new resources like custom objects and data sources get added to Kubernetes, we can still access these with the same API commands.

The tool that invokes these API commands and allows us to interact with the Kubernetes cluster is called Kubectl. Kubectl can be installed on your machine and you can connect to local or remote clusters. Table 7.2 lists some essential Kubectl commands.

Table 7.2: Useful Kubectl Commands to Access Local and Remote Kubernetes Cluster Resources Through APIs

COMMAND ACTION
$ kubectl cluster‐info Gets the information of the cluster, like the master node URL.
$ kubectl get nodes Shows all nodes in the cluster. For Minikube, that will be a single node acting as the master and worker.
$ kubectl get pods General get command to get Kubernetes resources, in this case pods. Here it will list all pods. We will talk about pods in this section.

Although nodes are workers in a Kubernetes cluster, we don't usually deal directly with them. Kubernetes provides a set of abstractions to run your applications on a cluster. These abstractions manage how the jobs get scheduled on nodes—saving you that effort. The key abstraction in Kubernetes is called a pod. A pod contains one or more containers and these share CPU, storage, and networks with each other. You will typically package your application as a single container and abstract it as a pod. Docker is the most popular container engine but Kubernetes supports others and is not tied to Docker. Each pod has an IP address associated with it. A pod is what is scheduled by Kubernetes on the different nodes. You don't have to bother with where these pods run eventually, thus freeing you from the scaling concerns.

Pods are typically not commissioned on their own. We use a higher‐level abstraction called a deployment to create pods. A deployment is the most common type of resource in a Kubernetes cluster. It defines the pod structure, what container(s) it consists of, and the number of replicas you need. The Kubernetes scheduler creates the right number of pods and runs them on specific nodes based on resource availability. You can specify pod‐creation policies; for example, create at least one instance of the pod on each node. Deployments can be created using the kubectl run command or by specifying a YAML file. YAML files are markup text files that specify details of the Kubernetes resource you are trying to build.

Let's look at an example of a very simple application packaged as a container and deployed on Kubernetes. I will not focus on packaging of the application right now but more on deploying on Kubernetes and scaling. In Chapter 8, titled “Deploying AI Models as Microservices,” I show examples of building a web application, containerizing, and deploying at scale. For now, I will use a test web application image that I created and uploaded to a common Docker registry called DockerHub (https://hub.docker.com). The image is called dattarajrao/simple‐app. It's a simple web app that displays an index page with a message in the browser. Listing 7.1 shows the deployment YAML file that creates a deployment with a Docker image.

Now let's look at the steps for running this YAML file and creating a deployment. As discussed earlier, a deployment will create pods that will contain the instance of the container or our application. See Listing 7.2.

This code creates a deployment with the YAML file. It creates a pod with a container specified by the dattarajrao/simple‐app image:

$ kubectl get deployments
NAME                    DESIRED  CURRENT  UP‐TO‐DATE  AVAILABLE  AGE
simple-app-deployment   1        1        1          1          41s 

The deployment is a resource you can get using the API.

$ kubectl get pods
NAME                                   READY   STATUS   RESTART  AGE
simple‐app‐deployment-98f597cdb-dtplp   1/1     Running  0        1m 

This command gets the pod created by this deployment. In this case, it's just 1. Listing 7.3 scales the deployment resource with three replicas. Now it will create three pods.

You can check to see if this is the case using the following command.

$ kubectl get pods
NAME                                   READY  STATUS   RESTART  AGE
simple-app-deployment-98f597cdb-dtplp  1/1    Running  0        2m
simple-app-deployment-98f597cdb-kch76  1/1    Running  0        7s
simple-app-deployment-98f597cdb-wgpq9  1/1    Running  0        7s 

Now scale the deployment resource with three replicas. Now it will create three pods.

Listing 7.4 shows how to delete a pod manually. Now the deployment should re‐create this pod.

The new pod is created with a new ID. Deployment takes care of restarting the needed pods when they go down:

$ kubectl describe pod simple-app-deployment-98f597cdb-kch76
 
Name:           simple-app-deployment-98f597cdb-kch76
Namespace:      default
Node:           minikube/172.17.0.7
Start Time:     Tue, 13 Nov 2018 13:22:12 +0000
Labels:         app=simple-app
                pod-template-hash=549153786
Annotations:    <none>
Status:         Running
IP:             172.18.0.5
Controlled By:  ReplicaSet/simple-app-deployment-98f597cdb
Containers:
  simple-app:
    Container ID:
docker://e203d9037001a44e5c3b0b93945c0d06f48be29538fabe41be012e9c7757a56b
    Image:          dattarajrao/simple-app
    Image ID:       docker-pullable://dattarajrao/simple-app@sha256:e670
81c7658e7035eab97014fb00e789ddee3df48d9f92aaacf1206ab2783543
 Port:           80/TCP
 Host Port:      0/TCP
 State:          Running
   Started:      Tue, 13 Nov 2018 13:22:15 +0000
 Ready:          True
 Restart Count:  0 

The description of the pod shows details like the image used, the IP address, and interesting log messages. We won't go through them in detail, but you can debug many problems by looking at this log and getting more logs using the – kubectl logs <podname> command.

In Listing 7.2, we saw an example of scaling a deployment by increasing the number of pods. Kubernetes will internally decide which nodes to run these pods on and that is totally agnostic to you. In case of Minikube, of course, all pods run on the same node. We also saw how we can manually terminate a pod and Kubernetes automatically starts it back up. This will happen if your application terminates during its run due to bad data or network issues. When the application packaged into a pod terminates, the deployment will automatically bring it back up. This is the reliability concern that is taken care of by Kubernetes. Reliability for applications deals with failure and being able to restart after a failure. If the application can be somehow restarted quickly after a failure, that will greatly improve the reliability.

You see that the pods created through deployments may get any ID, which can keep changing as pods are deleted and re‐created. The IP address assigned to the pod also changes. Kubernetes manages the lifecycle of these pods. So how do we let clients call our application without specifying the absolute name or IP address of the pods? This is handled by the networking concern of the application. Networking is handled by another abstraction on top of deployment, called a service.

Let's look at an example of creating a service for our deployment and using this service by our clients. Listing 7.5 shows the YAML for this.

Now let's look at the steps for deploying this YAML file in the Kubernetes environment and creating a service. We will then use the networking features of the service to call the pods from a URL. See Listing 7.6.

This creates a service with YAML file. Now let's get to the details of the service:

$ kubectl get service
NAME                TYPE       CLUSTER-IP    EXTERNAL-IP      PORT(S)  AGE
kubernetes          ClusterIP  10.96.0.1     <none>           443/TCP  47m
simple-app-service  ClusterIP  10.109.89.2   <none>           80/TCP   9s 

By default, the environment has a Kubernetes service and now our simple‐app‐service has been added. It's the default type of cluster IP, meaning a unique IP address is assigned to the cluster. This service can be accessed using this IP address. Other types of services may be NodePort, where an instance is created on each node, and LoadBalancer, where a separate IP address is assigned.

Our service points to the deployment we created earlier—by the app field in the YAML. So, when we access the service using its URL, Kubernetes automatically directs those requests to different pods that are part of the deployment app. Multiple requests get load balanced, depending on the number of pods the application is scaled to. In this way, the load balancing concern is handled.

Finally, let's call our service. We will not use a fancy client, but just use a CURL command to get the HTML content. See Listing 7.7.

We got the cluster IP of our service and call this using the CURL command. The CURL command basically gets the HTTP response from the URL. As we saw earlier, the request gets routed to the pods that are part of the deployment. This HTML looks like Figure 7.6 in a web browser.

Screenshot of a simple HTML application as presented in a browser.

Figure 7.6: The simple application shown in a browser

Summary

In this chapter, we took a break from Machine Learning and looked at how software applications are developed. We saw the rise of Cloud computing and the rise of paradigms like IaaS, PaaS, SaaS, and the new CaaS. We saw a history of software applications with the emergence of architecture patterns like Services Oriented Architecture (SOA) and microservices. We also looked at packaging software applications into containers and building microservices.

Then we spent considerable time looking at the Kubernetes platform. We saw how Kubernetes allows deployment of applications packaged as containers at scale. We learned how Kubernetes manages infrastructure concerns like scaling, fail‐over, reliability, load‐balancing, and networking. We saw an example of deploying a web application on Kubernetes.

In the next chapter, we look at the Machine Learning model development cycle and how the software development architectures and practices we studied in this chapter apply there. Then we will take the Keras model we developed earlier and deploy it as a microservice on Kubernetes.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.14.151.45