Sight

Sight refers to the visual interface for intelligent machines. GCP provides the following APIs for visual information and intelligence:

Cloud Vision API: This is a Representational State Transfer (REST) API abstraction found on top of pre-trained models on GCP. The API can classify images into generic categories as well as specific objects. It can also read text within images. The image metadata management, along with the moderation of unwanted content for a specific application, is provided out of the box with the Cloud Vision API. This is very easy and seamless for gathering insights from images. Some common use cases of this API are image search, document classification, and product search (retail). The following diagram shows various applications and use cases for the Cloud Vision API:

Cloud Video Intelligence API: This is a REST API that can extract information from video feeds and enable searching and extraction of metadata from video data. The API is easy to use and contains a list of more than 20,000 predefined labels. The API also provides interoperability between video tags and contents, enabling a text-based search across video assets when those are stored in Google Cloud Storage. The following diagram shows various applications and use cases for the Cloud Video Intelligence API:

AutoML Vision: This service makes it possible to custom train models for classifying visual images. The models can be trained and evaluated with an easy-to-use interface. They can also be registered with a unique namespace in order to use them through the AutoML API. If the user has a large number of images to be labeled, there is a human labeling service that complements the AutoML Vision API. Human labeling can be initiated directly from the AutoML Vision user interface.

Table of Contents for Sight