CHAPTER 3

Images

We live in two worlds. That is, our brains perceive the world in two ways. With our eyes, we see, and so one world involves images. With our ears, we hear language and later on learn to use our eyes to read the written word.

Let’s take images first, where a form of deep learning called a ­Convolutional Neural Network (CNN) holds sway.1

For an overview of how computers can learn to process and ­“understand” images, look for two TED talk titles. The first title is The Wonderful and Terrifying Implications of Computers That Can Learn by Jeremy ­Howard (December 2014). Howard is a data scientist, the former ­president of Kaggle, and is currently the founder of Fast.ai. His TED talk is about 20 minutes and has been seen by over 2.4 million viewers.

The second title is How We’re Teaching Computers to Understand ­Pictures by Fei-Fei Li (May 2015). Dr. Li is a Professor of Computer ­Science at Stanford University. She is currently the Co-Director of ­Stanford’s Human-Centered AI Institute. Her TED talk is about 18 minutes. It’s been viewed about 2.5 million times.

Now let me give you a starting “overview” of business applications of deep learning for images, starting with one way that Walmart and ­Amazon use this new power.

Just visualize the many aisles of products in a Walmart or any other retailer. Some portion of one or more shelves in every single aisle is empty. Customers have taken items off the shelves. There are gaps and they need attention, in part so that Walmart can measure demand and in part so that the shelves can be re-stocked. For human employees, that’s a laborious and boring job: walk the aisles; make notes; walk back to “receiving;” find the items; then enter the notes in a spreadsheet; and walk back to the aisles that need additional product and put the items on the shelves where they belong.

Some of these tasks can be automated. Consider a robot body with wheels (or a track), one or more cameras, and a deep learning “brain.” With deep learning, the robot (without being sentient and without understanding anything) learns the pattern of the store; where certain items are placed for sale; and where the same items can be found in “receiving.”

On October 17, 2017, Walmart announced that it was beginning to use shelf-scanning robots to audit 50 of its stores. The robots are tracking inventory, prices, and misplaced items. Walmart wants to save its human employees from tasks that are repeatable, predictable, and manual, so (currently) these “bots” don’t have arms for re-stocking. Indeed, Walmart asserted in its 2017 announcement that no employees would lose their jobs.2

The trials in 2018 went well. On April 9, 2019, Walmart announced that “Automated Assistants” would help employees “work smarter.” In a post, Elizabeth Walker, of Walmart Corporate Affairs, wrote that Walmart was sold on robots. “Every hero needs a sidekick,” she wrote, “and some of the best have been automated. Think R2D2, Optimus Prime and Robby from Lost in Space. Just like Will Robinson and Luke Skywalker, having the right kind of support helps our associates succeed at their jobs.” She went on to describe “new technologies” that would minimize the time an employee might spend “cleaning floors” or “checking inventory on a shelf,” and instead let them spend more time serving customers on the sales floor. What’s coming? 1,500 new autonomous floor cleaners, aka Auto-C; 300 additional shelf scanners, aka Auto-S, 1,200 more FAST Unloaders, and 900 new Pickup Towers.3

As a second example, consider Amazon. Amazon is not a “big box” retailer in the same sense as Walmart. Amazon is the foremost example of an e-commerce retailer. Amazon relies on shipping product from its plethora of warehouses. In the same article announcing that Walmart was beginning to use shelf-scanning robots, the author noted that Amazon was already using more than forty-five thousand (45,000) “bots” in its warehouses.

If you think these examples are confined to the United States, think again: deep learning is having an impact worldwide. I’ve already mentioned Davos. It’s the annual meeting place of the World Economic Forum. If AI is what everyone at Davos 2019 was talking about, then some of the attendees are from countries where English is not the native language.

Are examples hard to find? No. Here’s one. On May 4, 2016, an Israeli company, eyeSight Technologies, announced a $20 million investment by a Chinese conglomerate, Kuang-Chi, to finance eyeSight’s embedded computer vision and deep learning technologies to a variety of sectors, such as the Internet of Things (IoT), robotics, and automotive.4

Fast forward to March 12, 2019, when Eyesight Technologies announced that it had signed a “strategic cooperation agreement” with Chinese Tier 1 automotive vendor Hefei Zhixin Automotive Technology (HZAT), whose relationships include major China OEM brands including JAC Motors and bus manufacturer Anhui Ankai Automobile Co. Eyesight’s Drive Sense monitoring system “watches a driver’s eyes, pupils, head and gaze to determine if the driver is paying attention to the road or is drowsy or distracted.” If warranted, the system will take over the vehicle “to prevent accidents.”5

Let’s stick with images. AI is frequently associated with autonomous vehicles such as cars on the road or drones in the air. But let’s come back to the United States, where two car companies come to mind: Tesla and Waymo.

Tesla has self-driving capabilities via its Autopilot system. Autopilot is Tesla’s advanced assisted driving program with features like Autosteer, Autopark, and Traffic-Aware Cruise Control (TACC). Currently, with over 250,000 “cars” on the road, Tesla may have deployed the largest fleet of robots in the world, and for the reason that the “cars” are regularly driven in autonomous mode.

Notice that I said autonomous mode. Currently, Tesla’s cars are not driverless.

But that day may come. On October 15, 2018, Tesla’s Autopilot’s v9 was described as having an ability to recognize roadside structures and a deep learning system with far larger processing capability than in previous versions.6 More specifically and in the same article, Elon Musk, Tesla’s founder and CEO, replied in a Tweet that the “V9.0 vs. V8.1 is more like a ~400 percent increase in useful ops/sec due to enabling integrated GPU and better use of discrete GPU.” @elonmusk on October 16, 2018. (A GPU stands for a graphics processing unit or “graphics card.”)

For the current articles about Tesla’s Autopilot, go to Electrek.co.7 For example, that is as of late May 2019, you can watch a Tesla, using ­Autopilot, react to a stop sign and make a right turn.8

The project for that ability was initiated and accomplished in only about two years. You may already know that, with images, the amount of training data a deep learning system needs to be accurate is enormous. Since Tesla didn’t already have data centers with millions of miles of ­visuals, it sold a lot of early models and then asked its customers for help. When Autopilot 2.0 was released in May of 2017, Tesla asked its car ­owners in advance for permission to collect data from its external cameras “to learn how to recognize things like lane lines, street signs, and traffic light positions,” assuring them for the sake of privacy that its external cameras were not linked to vehicle identification numbers.

Thus, using Wi-Fi, Tesla began getting a huge influx of image data, and the data was coming from the Model S cars it had sold and which were already on the road.9

I’m reminded of the 1986 movie, Short Circuit, where a robot named “Johnny 5” goes off the research reservation because it had a craving for “input.”10 Tesla reminds me of the Lab that lost track of Johnny 5, except that Tesla didn’t lose track of its Model S cars and started with 50,000 of them. Now Autopilot has moved from v2.0 to v9 and has had a lot of “input.”

So Tesla is a mobile robot and can take us to places we’ve never visited. It’s just a look-see, but for a treat, click on this September 25, 2018 link to watch a Tesla drive in Paris through the “eyes” of Autopilot.11

Wow. I have to pause here, because an AI business application just occurred to me. If Tesla’s Autopilot can take us on a drive through Paris, it can take viewers on a drive through other places of interest. But now think about recording “tours” through other cities, national parks, or various vacation destinations. Where might such tours be displayed? On a website? Could be. How about on small screens during travel by air? Yes, that’s another potential application. And how about a television “travel” channel or YouTube channel? Or in a home on a screen attached to a workout treadmill? If I could go places (virtually) on a treadmill, I might actually buy one.

Tesla may be an energy company (think batteries) and an automaker, but it also has a software product.

Waymo, on the other hand, was known initially as the Google Self-Driving Car Project, which Google started in 2009. The Self-­Driving Car Project is now independent of Google and is known as Waymo.12

As of March 20, 2019, Waymo’s website reports putting its autonomous vehicles through a lot: 10 million miles on public road and 7 billion miles in simulation. And, of course, with deep learning and every mile driven, Waymo is correct to assert that “we never stop learning.” If you visit the Waymo website, you’ll see the “brain” of the vehicle has answers to “Where am I?” “What’s around me?” “What will happen next?” and “What should I do?”

Waymo answers the last question by asserting: “Based on all this information, our software determines the exact trajectory, speed, lane, and steering maneuvers needed to progress along this route safely.”13 You can also watch the Waymo 360-degree Experience: A Fully Self-Driving Journey.

Hmm. I had wondered about the source of Waymo’s 7 billion miles of simulation; that is, until I remembered Street View. Remember Street View? Launched in 2007 by Google Maps, Google put out fleets of vehicles to drive the streets in one location after another. Their cars were equipped with video cameras and other sensors that were intended to enable users to “virtually explore the world.”14

Currently, while Google’s fleets have already covered much of the world, its fleets are still out there, and going new places all the time. And Google lets you discover where the fleets are going next and when.15

Although I’m not certain of this, my guess is that Waymo had a simulation “head start” by way of access to the Street View videos.

But to better understand autonomous vehicles, let’s think the system through. What’s the input? The input consists of images (in pixels) from cameras and various sensors, for example, infrared and lidar, which is short for “light detection and ranging.”16 What’s the output? It’s ­mechanical. The cars are still riding along on wheels and they are controlled by brakes, the gas pedal, and the steering wheel. What’s in-between the input and output? It’s a deep learning “brain,” and it consists of software.

Now, with these examples in mind, let’s consider some very different applications. These examples will help you understand the versatility of deep learning in the context of images. As a prelude, I remind you that our world is visual by our very nature. In fact, our eyes are the one and only part of our human biological system that is connected directly to our brains. Look around: applications are everywhere.

Let’s take ag-tech first. Consider a drone with cameras, deep learning software, and a sprayer. What’s the application? It’s a weed or insect killer. The drone flies up and down the long and seemingly unending rows of crops that have been planted. The software has been trained to identify, on the one hand, what the farmer wants to grow, and, on the other hand, the weeds and/or insects the farmer knows are threats to his or her ­harvest. The deep learning software is trained to spray the weed-killer or insecticide where the spray will do the most good, and to do so without hitting the crops.

Don’t believe me? See what Blue River Technology (acquired by John Deere in 2017) has already implemented with “lettuce bot” and “See and Spray” applications.17

Or how about a drone that helps farmers “see” where to apply ­fertilizers or add more irrigation? Yep, in a research paper, that’s been envisioned too.18

Now what about disasters? How would you envision a deep learning application in that context? Well, how about right after a disaster, which could be an earthquake, flood, forest fire, or something else? The first responders who are unfamiliar with the locale should be deployed where they can do the most good, right? That mission calls for priorities. Which portions of the infrastructure (bridges, buildings, telephone and electrical poles) are too damaged to warrant immediate attention? What portions are not damaged and need no attention? And, more to the point, what portions are only slightly damaged and can be returned to service if only the appropriate responders and resources were immediately devoted to the task?

That’s triage, a term that’s more frequently associated with battlefield medicine but which applies just as well to what some of the first responders should be doing with respect to infrastructure, when they’re ­responding to a disaster.

But who’s going to tell them where to go? Well, it’s a drone again, with a camera, and with software that can identify what’s been damaged and to what extent. For this inventive application, watch the video about Ocean IT’s ioView Computer Vision for Rapid Damage Assessment. With this application, “recovery crews can map, evaluate, tag, and allocate repair resources and personnel to specific areas far more quickly than ever before.”19

And then there’s medicine itself. Think about X-rays and MRIs. There’s actually a lot of “visual” in medicine. There are numerous applications here, and I’ll tell you about two of them. The first application is a deep learning system to accurately assess whether indications of metastatic breast cancer are present or not. Google reported a system to do this only a few months ago, on October 12, 2018. The system is called LYNA, a shorthand for Lymph Node Assistant. In two datasets, LYNA was able to distinguish a slide with metastatic cancer from a slide where the cancer was not present. Better still, “LYNA was able to accurately pinpoint the location of both cancers and other suspicious regions within each slide, some of which were too small to be consistently detected by pathologists. As such we reasoned that one potential benefit of LYNA would be to highlight these areas of concern for pathologists to review and determine the final diagnosis.”20

And here’s another example of another non-U.S. collaboration that was recently reported (December 20, 2018) in the Journal of the ­American Medical Association (JAMA) for Ophthalmology. The study was carried out by medical research scientists at the Centre for Eye Research in ­Australia and the State Key Laboratory of Ophthalmology in China.

Here, the problem was one familiar to data scientists. Deep learning models may appear to be accurate but, as this study puts it, “the rationale for the outputs generated by these systems is inscrutable to clinicians. A visualization tool is needed …”

This study not only validated the deep learning models for retinal images (using CNNs), but also it verified the reliability of a visualization method so “that [the findings] may promote clinical adoption of these models.” The work resulted in an automated grading system based on photographs.

I’ll report the Findings as they were stated:

In this cross-sectional study, lesions typically observed in cases of referable diabetic retinopathy (exudate, hemorrhage, or vessel abnormality) were identified as the most important prognostic regions in 96 of 100 true-positive diabetic retinopathy cases. All 100 glaucomatous optic neuropathy cases displayed heat map visualization within traditional disease regions.


The Conclusions (where DLA means “deep learning algorithm” and DR refers to “diabetic retinopathy”) were: “This artificial intelligence-based DLA can be used with high accuracy in the detection of vision-threatening referable DR in retinal images. This technology offers potential to increase the efficiency and accessibility of DR screening programs.”21

To sum up: now you know that the version of deep learning which consists of CNNs can learn from visual examples in a wide variety of contexts.

Notes

1 “Convolutional Neural Network” (2019).

2 Vincent (2017).

3 Walker (2019).

4 “Chinese Technology Conglomerate Kuang Chi to invest $20 Million in ­EyeSight ­Technologies, a Leader in Embedded Computer Vision” (2017).

5 “Eyesight Technologies closes new China Auto Deal with Hefei Zhixin ­Automotive” (2019).

6 Lambert (2018).

7 Lambert (2019).

8 Lambert (2019).

9 Lambert (2017).

10 “Short Circuit” (2019).

11 Lambert (2018).

12 “We’re building the World’s Most Experienced Driver” (2019).

13 “Technology” (2019).

14 “What is Street View” (2019).

15 “Sources of Photography” (2019).

16 NOAA (National Oceanic and Atmospheric Administration) (2012).

17 “Optimize Every Plant” (2019).

18 “AI and Drones Help Farmers Detect Crop Needs” (2018).

19 “IO View is a Computer vision AI for Rapid Damage Assessment Named After the Native Hawaiian Hawk” (2019).

20 Stumpe and Mermel (2018).

21 Keel, Wu, Lee, Scheetz, and He (2018).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.146.176.68