DL – hype or breakthrough?

DL and the associated hype is a relatively recent development. Most discussion of its emergence centers around the ImageNet benchmarks of 2012, where a deep convolutional neural network beat the previous year's error rate by 9%, a significant improvement where previous winners had made incremental improvements at best with techniques that used hand-crafted features in their models. The following diagram shows this improvement:

Despite the recent hype, the components that make DL work, which allow us to train deep models, have proven very effective in image classification and various other tasks. These were developed in the 1980s by Geoffrey Hinton and his group at the University of Toronto. Their early work took place during one of the flow periods discussed earlier in this chapter. Indeed, they were wholly dependent on funding from the Canadian Institute for Advanced Research (CIFAR).

As the 21st century began in earnest, after the tech bubble that had burst in March 2000 began to inflate again, the availability of high-performance GPUs and the growth in computational power more generally meant that these techniques, which had been developed decades earlier but had gone unused due to a lack of funding and industry interest, suddenly became viable. Benchmarks that previously saw only incremental improvements in image recognition, speech recognition, natural language processing, and sequence modeling all had their y-axes adjusted.

It was not just massive advances in hardware paired with old algorithms that got us to this point. There have been algorithmic advances that have allowed us to train particularly deep networks. The most well-known of these is batch normalization, introduced in 2015. It ensures numeric stabilization across layers and can prevent exploding gradients, reducing training time dramatically. There is still active debate about why batch normalization is so effective. An example of this is a paper published in May 2018 refuting the central premise of the original paper, namely, that it is not the internal co-variant shift that is reduced, rather it makes the optimization landscape smoother, that is, the gradients can more reliably propagate, and the effects of a learning rate on training time and stability are more predictable.

Collectively, from the folk science of ancient Greek myths to the very real breakthroughs in information theory, neuroscience, and computer science, specifically in models of computation, have combined to produce network architectures and the algorithms needed to train them that scale well to solving many fundamental AI tasks in 2018 that had proven intractable for decades.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.146.34.146