Machine Learning Trends

This chapter is similar to the previous one, where I looked at trends in Artificial Intelligence, except that here I want to accentuate actual algorithms being used. The goal here is mostly to list the most popular techniques and how they are going to be used in the future and possibly develop. Thus any decision maker should at least glance over some of those notions to get accustomed to how AI architecture looks like and what makes sense. So this chapter complements the previous one, which focuses on applications.

Natural Language Processing

2019 was especially significant for NLP. Various research breakthroughs happened, in particular the introduction of GPT-2 model for text generation by OpenAI. This model achieved never seen below accuracy in text generation, causing serious thoughts about security. The code of GPT-2 wasnt released all at once, OpenAI has opted for publishing a couple of models from weakest to strongest in 6 months span to ensure that it isnt used for malicious purposes.

Other research groups followed suit with Megatron (NVIDIA), BERT, Hugging Face, Allen Institute, culminating in Turing-NLG from Microsoft, the largest model in mid-2020. They all demonstrated that pre-trained language models could well solve various NLP tasks. All those models used massive datasets and considerable computing power. They were trained on large amounts of unlabeled text from the web (e.g. scraping articles from Reddit), and their underlying architecture was based on Transformers, an improvement on LSTMs which were popular in text generation tasks before.

In 2020 and beyond, we will see many applications of those methods: from chatbots to marketing and media. On the other hand, it seems like more data and more computing power will result in even better models. We currently dont know if there is a limit to Transformersabilities and where it lies if it exists.

AutoML or automatic AI

We have a talent shortage globally in the AI industry. With the AI boom, every company needs someone to implement automation and harness data to improve business. One solution to this problem is democratization of AI education, another is AutoML. The goal of automatic machine learning is to discover necessary architecture for algorithms automatically and thus heavily reduce the amount of time needed to build and deploy new models as well as decrease the expertise required to use AI at all.

Recently AutoML tools for AI design have been gradually increasing in tasks like data preparation, training, model search, and feature engineering. Google offers Cloud AutoML, and other cloud platforms follow it by introducing ready-to-use machine learning models that self-optimize on your data and task. AutoML can be used in computer vision, video processing, translation, and NLP tasks.

Startups are also offering plug-and-play solutions, for example, Databricks, DataRobot, H2O, and RapidMiner.

AutoML not only answers talent shortage but also lowers cost and complexity. Designing neural networks is a time-consuming manual process even for the top talents, and AutoML can save their time on the least creative activities related to ML architecture design.

One-shot learning and transfer learning

If you dont have sufficient data to train deep learning algorithms, there are three ways to work around it: generate synthetic data, scrape/buy data from external sources or develop AI models that work well with small data.

Deep learning is very data-hungry — models are trained on huge sets of labeled data, e.g. millions of tagged animal images — and large amounts of labeled data are not available for specific applications. In such cases, training an AI model from scratch is often difficult, if not impossible.

As weve mentioned, one potential solution is to enlarge real datasets with synthetic data by generating more examples. This has been successfully used in autonomous driving, where autonomous vehicles drive millions of miles in photorealistic simulated environments that recreate situations like snowstorms and unusual pedestrian behavior and where acquiring real-world data is hard.

Similarly, researchers were experimenting with augmenting data in other scenarios where we lack sufficient real-world data like rare diseases. For example, NVIDIA did it by generating abnormal brain MRIs.83 They have written in their paper: Medical imaging data sets are often imbalanced as pathologic findings are generally rare, which introduces significant challenges when training deep learning models. We propose a method to generate synthetic abnormal MRI images with brain tumors by training a generative adversarial network.

Another way to circumvent the lack of data is to develop AI models that need smaller datasets to learn. In computer vision, we see more uses of transfer learning: taking a pre-trained algorithm and using it for a different task and data. Pre-trained models have also found their way to NLP with Transformers and GPT-2 done by OpenAI. The basic principle is to try to predict the next word in a sentence based on preceding words. We should expect more applications of transfer learning across every domain as it effectively lowers the cost of entry and running a commercial model.

In general, transfer learning is a technique that takes a neural network used for one task and applies it to another domain. There are obvious problems in approaching a new problem with an old solution. Transfer learning techniques aim to overcome them. Say you have only 1,000 images of horses and want to build an algorithm for horse detection. By tapping into an existing neural network like ResNet, which was trained with more than 1 million images, you can right away get a good performance.

Reinforcement Learning

We talked about reinforcement learning a lot in the context of video games and simulations. When it comes to deploying reinforcement learning, the crucial part is to build a framework within which an RL agent can be trained.

Current most popular frameworks for reinforcement learning include:

  • DeepMind Lab for 3D games like Quake,
  • Arcade Learning Environment,
  • Google Research Football,
  • OpenAI Gym.

Reinforcement learning is still relatively new and undeveloped, and theres no one standard for benchmarking progress. However, RL is actively researched, and we should see a lot of breakthroughs in fundamental techniques and applications in upcoming years.

Computer Vision

With a new generation of hardware (AI chips, stronger GPUs), we see a lot of advances in image processing. Generative Adversarial Networks (GANs) since their introduction in 2015 became widespread and continue to surprise with their results. In particular, the next frontiers in image generation include commercial uses of GANs in the fashion and film industry; some we have discussed in the previous chapter.

In fashion, a standard process is organising a photo session for each new line of products. One needs high-quality photos with constraints on whats shown, namely a garment from a fashion designer. GANs could potentially be used to generate models on which fashion products are shown. Some experiments in this direction are already in place, but the whole problem - generating photorealistic images of models with particular pieces of clothes on them - seems still to be out of reach. Its solution will require both local and global GANs for managing the generation of images both on a micro and macro scale.

Another application frontier of GANs and other generation methods is video generation, where the ultimate problem is to generate a photorealistic video from a script. Currently, researches cant control the video. GANs are useful for generating genericimages and videos, and its hard to impose constraints. With a better understanding of generative adversarial networks, we should expect movies entirely generated by AI, used both by Hollywood and marketing agencies. The end goal here is to generate everything using AI: script, actors, voices, and the movie itself. But for that result, we might wait at least another decade.

Fundamental concepts

Machine learning from a theoretical standpoint is still a relatively new domain, and theres a lot of theory missing, which would explain why certain fundamental concepts work in some cases and not in others. As of now, machine learning and data science are more practical and experimental sciences, similar to experimental physics, rather than state-and-prove pure mathematics.

Nevertheless, I firmly believe that were going to see a more mathematical approach to neural networks, which would help clarify fundamental concepts. Such a theory would likely be within probability theory with elements of dynamical systems (training dynamics), representation theory (feature engineering, representation characteristics), and probably much more. As such, it would find a considerable appeal among mathematicians and theoretical computer scientists.

Currently, we dont even know how to answer the basic questions, like:

  • how to select models for a particular problem and dataset,
  • what influences training dynamics of a model,
  • how neural networks represent data and what representation characteristics are.

We often can only answer these questions in particular scenarios, on limited real-world data, also using our common sense or expert knowledge about the world, and theres no one universal answer. It seems that modeling neural networks and their dynamics is a hard mathematical problem in itself and should be more researched in the near future.

Another reason for understanding fundamental machine learning concepts mathematically is a potential application to transfer learning, which in itself is the best way toward general artificial intelligence. Understanding neural networksfeatures like training dynamics, representation characteristics, model selection would allow us to generalize methods and use them across multiple domains, significant progress towards better transfer learning methods.

Pushing boundaries of machine learning

Finally, various alternative approaches emerged in recent years when it comes to model architectures, training, or even what a neural network should be. As Ive written above, were still early in the process of understanding machine learning and its applications. Thats why each year brings breakthroughs, and it will stay so for the foreseeable future.

In this section, Ive gathered concepts that push boundaries and are actively tested. Some of them might need years to fully blossom, while others would never make it out of research labs. Here are the top 4 trends in machine learning to watch in the upcoming years:

  1. Graph Neural Networks
  2. Bayesian Deep Learning
  3. Active Learning
  4. Federated Learning

Graph Neural Networks84 are deep learning methods that operate on graphs. A graph is a data structure that models a set of objects (nodes) and their relationships (edges). Standard neural networks like CNNs and RNNs cannot handle the graph input properly, while understanding graphs is crucial in modeling more complex behaviors between objects. Thats why graph neural networks are worth investigating. A similar concept that also tries to model hierarchical relationships better is Capsule Neural Networks.

The use of Bayesian techniques in deep learning is not new, however, recent advances in the field seem to yield many new exciting results. Bayesian Deep Learning is a way to achieve state-of-the-art results while also controlling the uncertainty of deep learning models. This is an important concept because it boils down to being able to pinpoint what an ML model doesnt know. If you train a dog/cat classifier and then feed a photo of a truck, the model should be certain that it doesnt know what it is, rather than make a random prediction.

Active learning is a machine learning algorithm that can interactively query an information source (e.g. user) to label new data points with the desired outputs. Or in other words, it can demand what it wants to learn next or just choose a subset of a dataset on which it wants to tune itself. Active learning comes as part of a larger movement to experiment with how the training process happens and how to make it more accurate.

Federated learning is an approach to learning where data is kept on separated mobile devices or servers without exchanging data samples. This is different from a classical approach that is centralized, and learning happens in one place on all data samples. This looks like a promising approach to building common, stronger models without breaching the privacy of data, and it should allow for commercial collaborations even among competitors. A startup Owkin85 used federated learning to train machine learning models on data coming from multiple hospitals, preserving the privacy and security of each organisation.

Altogether these concepts lie at the boundary of whats possible in machine learning. We shall see in the upcoming years how they influence commercial applications. One thing is sure, AI will have a lot of surprises for us in the near future. Its time to embrace them as a society.


..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.19.244.187