Chapter 2. Analyzing Images to Recognize a Face

"We can use the Computer Vision API to prove to our clients the reliability of the data, so they can be confident making important business decisions based on that information."

- Leendert de Voogd, CEO of Vigiglobe

In the previous chapter, you were briefly introduced to Microsoft Cognitive Services. Throughout this chapter, we will dive into image-based APIs from the vision API. We will learn how to perform image analysis. Moving on, we will dive deeper into the Face API, which we briefly looked at in the previous chapter, and we will learn how you can identify people. Next, we will learn how to use the Face API to recognize emotions in faces. Finally, we will learn about the different ways to moderate content.

In this chapter, we will cover the following topics:

  • Analyzing images to identify content, metadata, and adult ratings.
  • Recognizing celebrities in images and reading text in images.
  • Diving into the Face API:
    • Learning to find the likelihood of two faces belonging to the same person
    • Grouping faces based on visual similarities and searching similar faces
    • Identifying a person from a face
    • Recognizing emotions
  • Content moderation.

Analyze an image using the Computer Vision API

The Computer Vision API allows us to process an image and retrieve information about it. It relies on advanced algorithms to analyze the content of the image in different ways, based on our needs.

Throughout this section, we will learn how to take advantage of this API. We will look at the different ways to analyze an image through standalone examples. Some of the features we will cover will also be incorporated into our end-to-end application in a later chapter.

Calling any of the APIs will return one of the following response codes:

Code

Description

200

Information of the extracted features in JSON format.

400

Typically, this means bad request. It may be an invalid image URL, an image that is too small or too large, an invalid image format, or any other errors to do with the request body.

415

Unsupported media type.

500

Possible errors may include a failure to process the image, image processing timing out, or an internal server error.

Setting up a chapter example project

Before we go into the specifics of the API, we need to create an example project for this chapter. This project will contain all of the examples, which will not be put into the end-to-end application at this stage:

Note

If you have not already done so, sign up for an API key for Computer Vision by visiting https://portal.azure.com.

  1. Create a new project in Visual Studio using the template we created in Chapter 1, Getting Started with Microsoft Cognitive Services.
  2. Right-click on the project and choose Manage NuGet Packages. Search for the Microsoft.ProjectOxford.Vision package and install it into the project, as shown in the following screenshot:
    Setting up a chapter example project
  3. Create the following UserControls files and add them into the ViewModel folder:
    • CelebrityView.xaml
    • DescriptionView.xaml
    • ImageAnalysisView.xaml
    • OcrView.xaml
    • ThumbnailView.xaml
  4. Also, add the corresponding ViewModel instances from the following list into the ViewModel folder:
    • CelebrityViewModel.cs
    • DescriptionViewModel.cs
    • ImageAnalysisViewModel.cs
    • OcrViewModel.cs
    • ThumbnailViewModel.cs

Go through the newly created ViewModel instances and make sure that all classes are public.

We will switch between the different views using a TabControl tag. Open the MainView.xaml file and add the following in the precreated Grid tag:

    <TabControl x: Name = "tabControl"
                   HorizontalAlignment = "Left"
                   VerticalAlignment = "Top"
                   Width = "810" Height = "520">
        <TabItem Header="Analysis" Width="100">
            <controls:ImageAnalysisView />
        </TabItem>
        <TabItem Header="Description" Width="100">
            <controls:DescriptionView />
        </TabItem>
        <TabItem Header="Celebs" Width="100">
            <controls:CelebrityView />
        </TabItem>
        <TabItem Header="OCR" Width="100">
            <controls:OcrView />
        </TabItem>
        <TabItem Header="Thumbnail" Width="100">
            <controls:ThumbnailView />
        </TabItem>
    </TabControl>

This will add a tab bar at the top of the application that will allow you to navigate between the different views.

Next, we will add the properties and members required in our MainViewModel.cs file.

The following is the variable used to access the Computer Vision API:

    private IVisionServiceClient _visionClient;

The following code declares a private variable holding the CelebrityViewModel object. It also declares the public property that we use to access the ViewModel in our View:

    private CelebrityViewModel _celebrityVm;
    public CelebrityViewModel CelebrityVm
    {
        get { return _celebrityVm; }
        set
        {
            _celebrityVm = value;
            RaisePropertyChangedEvent("CelebrityVm");
        }
    }

Following the same pattern, add properties for the rest of the created ViewModel instances.

With all the properties in place, create the ViewModel instances in our constructor using the following code:

    public MainViewModel()
    {
        _visionClient = new VisionServiceClient("VISION_API_KEY_HERE", "ROOT_URI");

        CelebrityVm = new CelebrityViewModel(_visionClient);
        DescriptionVm = new DescriptionViewModel(_visionClient);
        ImageAnalysisVm= new ImageAnalysisViewModel(_visionClient);
        OcrVm = new OcrViewModel(_visionClient);
        ThumbnailVm = new ThumbnailViewModel(_visionClient);
    }

Note how we first create the VisionServiceClient object with the API key that we signed up for earlier and the root URI, as described in Chapter 1, Getting Started with Microsoft Cognitive Services. This is then injected into all the ViewModel instances to be used there.

This should now compile and present you with the application shown in the following screenshot:

Setting up a chapter example project

Generic image analysis

We start enabling generic image analysis by adding a UI to the ImageAnalysis.xaml file. All the Computer Vision example UIs will be built in the same manner.

The UI should have two columns, as shown in the following code:

    <Grid.ColumnDefinitions>
        <ColumnDefinition Width="*" />
        <ColumnDefinition Width="*" />
    </Grid.ColumnDefinitions>

The first one will contain the image selection, while the second one will display our results.

In the left-hand column, we create a vertically oriented StackPanel label. To this, we add a label and a ListBox label. The list box will display a list of visual features that we can add to our analysis query. Note how we have a SelectionChanged event hooked up in the ListBox label in the following code. This will be added behind the code, and will be covered shortly:

    <StackPanel Orientation="Vertical"Grid.Column="0">

    <TextBlock Text="Visual Features:"
               FontWeight="Bold"
               FontSize="15"
               Margin="5, 5" Height="20" />

    <ListBox: Name = "VisualFeatures"
          ItemsSource = "{Binding ImageAnalysisVm.Features}"
          SelectionMode = "Multiple" Height="150" Margin="5, 0, 5, 0"
          SelectionChanged = "VisualFeatures_SelectionChanged" />

The list box will be able to select multiple items, and the items will be gathered in the ViewModel.

In the same stack panel, we also add a button element and an image element. These will allow us to browse for an image, show it, and analyze it. Both the Button command and the image source are bound to the corresponding properties in the ViewModel, as shown in the following code:

    <Button Content = "Browse and analyze"
            Command = "{Binding ImageAnalysisVm.BrowseAndAnalyzeImageCommand}"
            Margin="5, 10, 5, 10" Height="20" Width="120"
            HorizontalAlignment="Right" />
       
    <Image Stretch = "Uniform"
           Source="{Binding ImageAnalysisVm.ImageSource}"
           Height="280" Width="395" />
    </StackPanel>

We also add another vertically oriented stack panel. This will be placed in the right-hand column. It contains a title label, as well as a textbox, bound to the analysis result in our ViewModel, as shown in the following code:

    <StackPanel Orientation= "Vertical"Grid.Column="1">
        <TextBlock Text="Analysis Results:"
                   FontWeight = "Bold"
                   FontSize="15" Margin="5, 5" Height="20" />
        <TextBox Text = "{Binding ImageAnalysisVm.AnalysisResult}"
                 Margin="5, 0, 5, 5" Height="485" />
    </StackPanel>

Next, we want to add our SelectionChanged event handler to our code-behind. Open the ImageAnalysisView.xaml.cs file and add the following:

    private void VisualFeatures_SelectionChanged(object sender, SelectionChangedEventArgs e) {
        var vm = (MainViewModel) DataContext;
        vm.ImageAnalysisVm.SelectedFeatures.Clear();

The first line of the function will give us the current DataContext, which is the MainViewModel class. We access the ImageAnalysisVm property, which is our ViewModel, and clear the selected visual features list.

From there, we loop through the selected items from our list box. All items will be added to the SelectedFeatures list in our ViewModel:

        foreach(VisualFeature feature in VisualFeatures.SelectedItems)
        {
            vm.ImageAnalysisVm.SelectedFeatures.Add(feature);
        }
    }

Open the ImageAnalysisViewModel.cs file. Make sure that the class inherits the ObservableObject class.

Declare a private variable, as follows:

    private IVisionServiceClient _visionClient;    

This will be used to access the Computer Vision API, and it is initialized through the constructor.

Next, we declare a private variable and the corresponding property for our list of visual features, as follows:

    private List<VisualFeature> _features=new List<VisualFeature>();
    public List<VisualFeature> Features {
        get { return _features; }
        set {
            _features = value;
            RaisePropertyChangedEvent("Features");
        }
    }

In a similar manner, create a BitmapImage variable and property called ImageSource. Create a list of VisualFeature types called SelectedFeatures and a string called AnalysisResult.

We also need to declare the property for our button, as follows:

    public ICommandBrowseAndAnalyzeImageCommand {get; private set;}

With that in place, we create our constructor, as follows:

    public ImageAnalysisViewModel(IVisionServiceClientvisionClient) {
        _visionClient = visionClient;
        Initialize();
    }

The constructor takes one parameter, the IVisionServiceClient object, which we have created in our MainViewModel file. It assigns that parameter to the variable that we created earlier. Then we call an Initialize function, as follows:

    private void Initialize() {
        Features = Enum.GetValues(typeof(VisualFeature))
                       .Cast<VisualFeature>().ToList();

        BrowseAndAnalyzeImageCommand = new DelegateCommand(BrowseAndAnalyze);
    }

In the Initialize function, we fetch all the values from the VisualFeature variable of the enum type. These values are added to the features list, which is displayed in the UI. We also created our button, and now that we have done so, we need to create the corresponding action, as follows:

    private async void BrowseAndAnalyze(object obj)
    {
        var openDialog = new Microsoft.Win32.OpenFileDialog();

        openDialog.Filter = "JPEG Image(*.jpg)|*.jpg";
        bool? result = openDialog.ShowDialog();

        if (!(bool)result) return;

        string filePath = openDialog.FileName;

        Uri fileUri = new Uri(filePath);
        BitmapImage image = new BitmapImage(fileUri);

        image.CacheOption = BitmapCacheOption.None;
        image.UriSource = fileUri;

        ImageSource = image;

The first lines of the preceding code are similar to what we did in Chapter 1, Getting Started with Microsoft Cognitive Services. We open a file browser and get the selected image.

With an image selected, we run an analyze on it, as follows:

    try {
        using (StreamfileStream = File.OpenRead(filePath)) {
            AnalysisResult analysisResult = await  _visionClient.AnalyzeImageAsync(fileStream, SelectedFeatures);

We call the AnalyzeImageAsync function of our _visionClient. This function has four overloads, all of which are quite similar. In our case, we pass on the image as a Stream type and the SelectedFeatures list, containing the VisualFeatures variable to analyze.

The request parameters are as follows:

Parameter

Description

Image (required)

  • Can be uploaded in the form of a raw image binary or URL.
  • Can be JPEG, PNG, GIF, or BMP.
  • File size must be less than 4 MB.
  • Image dimensions must be at least 50 x 50 pixels.

Visual features (optional)

A list indicating the visual feature types to return. It can include categories, tags, descriptions, faces, image types, color, and whether or not it is adult content.

Details (optional)

A list indicating what domain-specific details to return.

The response to this request is the AnalysisResult string.

We then check to see if the result is null. If it is not, we call a function to parse it and assign the result to our AnalysisResult string, as follows:

    if (analysisResult != null)
        AnalysisResult = PrintAnalysisResult(analysisResult);

Remember to close the try clause and finish the method with the corresponding catch clause.

The AnalysisResult string contains data according to the visual features requested in the API call.

Data in the AnalysisResult variable is described in the following table:

Visual feature

Description

Categories

Images are categorized according to a defined taxonomy. This includes everything from animals, buildings, and outdoors, to people.

Tags

Images are tagged with a list of words related to the content.

Description

This contains a full sentence describing the image.

Faces

This detects faces in images and contains face coordinates, gender, and age.

ImageType

This detects whether an image is clipart or a line drawing.

Color

This contains information about dominant colors, accent colors, and whether or not the image is in black and white.

Adult

This detects whether an image is pornographic in nature and whether or not it is racy.

To retrieve data, for example for categories, you can use the following:

    if (analysisResult.Description != null) {
        result.AppendFormat("Description: {0}
", analysisResult.Description.Captions[0].Text);
        result.AppendFormat("Probability: {0}

", analysisResult.Description.Captions[0].Confidence);
    }

A successful call would present us with the following result:

Generic image analysis

Sometimes, you may only be interested in the image description. In such cases, it is wasteful to ask for the kind of full analysis that we have just done. By calling the following function, you will get an array of descriptions:

    AnalysisResultdescriptionResult = await _visionClient.DescribeAsync(ImageUrl, NumberOfDescriptions);

In this call, we have specified a URL for the image and the number of descriptions to return. The first parameter must always be included, but it may be an image upload instead of a URL. The second parameter is optional, and in cases where it is not provided, it defaults to one.

A successful query will result in an AnalysisResult object, which is the same as the one that was described in the preceding code. In this case, it will only contain the request ID, image metadata, and an array of captions. Each caption contains an image description and the confidence of that description being correct.

We will add this form of image analysis to our smart-house application in a later chapter.

Recognizing celebrities using domain models

One of the features of the Computer Vision API is the ability to recognize domain-specific content. At the time of writing, the API only supports celebrity recognition, where it is able to recognize around 200,000 celebrities.

For this example, we choose to use an image from the internet. The UI will then need a textbox to input the URL. It will need a button to load the image and perform the domain analysis. There should be an image element to see the image and a textbox to output the result.

The corresponding ViewModel should have two string properties for the URL and the analysis result. It should have a BitmapImage property for the image and an ICommand property for our button.

Add a private variable for the IVisionServiceClient type at the start of the ViewModel, as follows:

    private IVisionServiceClient _visionClient;

This should be assigned in the constructor, which will take a parameter of the IVisionServiceClient type.

As we need a URL to fetch an image from the internet, we need to initialize the Icommand property with both an action and a predicate. The latter checks whether the URL property is set or not, as shown in the following code:

    public CelebrityViewModel(IVisionServiceClient visionClient) {
        _visionClient = visionClient;
        LoadAndFindCelebrityCommand = new DelegateCommand(LoadAndFindCelebrity, CanFindCelebrity);
    }

The LoadAndFindCelebrity load creates a Uri with the given URL. Using this, it creates a BitmapImage and assigns this to ImageSource, the BitmapImage property, as shown in the following code. The image should be visible in the UI:

    private async void LoadAndFindCelebrity(object obj) {
        UrifileUri = new Uri(ImageUrl);
        BitmapImage image = new BitmapImage(fileUri);

        image.CacheOption = BitmapCacheOption.None;
        image.UriSource = fileUri;

        ImageSource = image;

We call the AnalyzeImageInDomainAsync type with the given URL, as shown in the following code. The first parameter we pass in is the image URL. Alternatively, this could have been an image that was opened as a Stream type:

    try {
        AnalysisInDomainResultcelebrityResult = await _visionClient.AnalyzeImageInDomainAsync(ImageUrl, "celebrities");

        if (celebrityResult != null)
            Celebrity = celebrityResult.Result.ToString();
    }

The second parameter is the domain model name, which is in a string format. As an alternative, we could have used a specific Model object, which can be retrieved by calling the following:

    VisionClient.ListModelsAsync();

This would return an array of Models, which we can display and select from. As there is only one available at this time, there is no point in doing so.

The result from AnalyzeImageInDomainAsync is an object of the AnalysisInDomainResult type. This object will contain the request ID, metadata of the image, and the result, containing an array of celebrities. In our case, we simply output the entire result array. Each item in this array will contain the name of the celebrity, the confidence of a match, and the face rectangle in the image. Do try it in the example code provided.

Utilizing optical character recognition

For some tasks, optical character recognition (OCR) can be very useful. Say that you took a photo of a receipt. Using OCR, you can read the amount from the photo itself and have it automatically added to accounting.

OCR will detect text in images and extract machine-readable characters. It will automatically detect language. Optionally, the API will detect image orientation and correct it before reading the text.

To specify a language, you need to use the BCP-47 language code. At the time of writing, the following languages are supported: simplified Chinese, traditional Chinese, Czech, Danish, Dutch, English, Finnish, French, German, Greek, Hungarian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Russian, Spanish, Swedish, Turkish, Arabic, Romanian, Cyrillic Serbian, Latin Serbian, and Slovak.

In the code example, the UI will have an image element. It will also have a button to load the image and detect text. The result will be printed to a textbox element.

The ViewModel will need a string property for the result, a BitmapImage property for the image, and an ICommand property for the button.

Add a private variable to the ViewModel for the Computer Vision API, as follows:

    private IVisionServiceClient _visionClient;

The constructor should have one parameter of the IVisionServiceClient type, which should be assigned to the preceding variable.

Create a function as a command for our button. Call it BrowseAndAnalyze and have it accept object as the parameter. Then, open a file browser and find an image to analyze. With the image selected, we run the OCR analysis, as follows:

    using (StreamfileStream = File.OpenRead(filePath)) {
        OcrResultsanalysisResult = await _visionClient.RecognizeTextAsync (fileStream);

        if(analysisResult != null)
            OcrResult = PrintOcrResult(analysisResult);
    }

With the image opened as a Stream type, we call the RecognizeTextAsync method. In this case, we pass on the image as a Stream type, but we could just as easily have passed on a URL to an image.

Two more parameters may be specified in this call. First, you can specify the language of the text. The default is unknown, which means that the API will try to detect the language automatically. Second, you can specify whether or not the API should detect the orientation of the image. The default is set to false.

If the call succeeds, it will return data in the form of an OcrResults object. We send this result to a function, the PrintOcrResult function, where we will parse it and print the text, as follows:

    private string PrintOcrResult(OcrResultsocrResult)
    {
        StringBuilder result = new StringBuilder();

        result.AppendFormat("Language is {0}
", ocrResult.Language);
        result.Append("The words are:

");

First, we create a StringBuilder object, which will hold all the text. The first content we add to it is the language of the text in the image, as follows:

        foreach(var region in ocrResult.Regions) { 
            foreach(var line in region.Lines) { 
                foreach(var text in line.Words) { 
                    result.AppendFormat("{0} ", text.Text);
                }
                result.Append("
");
            }
            result.Append("

");
        }

The result has an array, which contains the Regions property. Each item represents recognized text, and each region contains multiple lines. The line variables are arrays, where each item represents recognized text. Each line contains an array of the Words property. Each item in this array represents a recognized word.

With all the words appended to the StringBuilder function, we return it as a string. This will then be printed in the UI, as shown in the following screenshot:

Utilizing optical character recognition

The result also contains the orientation and angle of the text. Combining this with the bounding box, also included, you can mark each word in the original image.

Generating image thumbnails

In today's world, we, as developers, have to consider different screen sizes when displaying images. The Computer Vision API offers some help with this by providing the ability to generate thumbnails.

Thumbnail generation, in itself, is not that big a deal. What makes the API clever is that it analyzes the image and determines the region of interest.

It will also generate smart cropping coordinates. This means that if the specified aspect ratio differs from the original, it will crop the image, with a focus on the interesting regions.

In the example code, the UI consists of two image elements and one button. The first image is the image in its original size. The second is for the generated thumbnail, which we specify to be 250 x 250 pixels in size.

The View model will need the corresponding properties, two BitmapImages methods to act as image sources, and one ICommand property for our button command.

Define a private variable in the ViewModel, as follows:

    private IVisionServiceClient _visionClient;

This will be our API access point. The constructor should accept an IVisionServiceClient object, which should be assigned to the preceding variable.

For the ICommand property, we create a function, BrowseAndAnalyze, accepting an object parameter. We do not need to check whether we can execute the command. We will browse for an image each time.

In the BrowseAndAnalyze function, we open a file dialog and select an image. When we have the image file path, we can generate our thumbnail, as follows:

    using (StreamfileStream = File.OpenRead(filePath))
    {
        byte[] thumbnailResult = await _visionClient.GetThumbnailAsync(fileStream, 250, 250);

        if(thumbnailResult != null &&thumbnailResult.Length != 0)
            CreateThumbnail(thumbnailResult);
    }

We open the image file so that we have a Stream type. This stream is the first parameter in our call to the GetThumbnailAsync method. The next two parameters indicate the width and height that we want for our thumbnail.

By default, the API call will use smart cropping, so we do not have to specify it. If we have a case where we do not want smart cropping, we could add a bool variable as the fourth parameter.

If the call succeeds, we get a byte array back. This is the image data. If it contains data, we pass it on to a new function, CreateThumbnail, to create a BitmapImage object from it, as follows:

    private void CreateThumbnail(byte[] thumbnailResult)
    {
        try {
            MemoryStreamms = new MemoryStream(thumbnailResult);
            ms.Seek(0, SeekOrigin.Begin);

To create an image from a byte array, we create a MemoryStream object from it. We make sure that we start at the beginning of the array.

Next, we create a BitmapImage object and begin to initialize it. We specify the CacheOption and set the StreamSource to the MemoryStream variables we created earlier. Finally, we stop the BitmapImage initialization and assign the image to our Thumbnail property, as shown in the following code:

        BitmapImage image = new BitmapImage();
        image.BeginInit();
        image.CacheOption = BitmapCacheOption.None;
        image.StreamSource = ms;
        image.EndInit();

        Thumbnail = image;   

Close up the try clause and add the corresponding catch clause. You should now be able to generate thumbnails.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.27.131