Adding Artificial Intelligence (AI) to your Xamarin Apps

Posted by: Gerald Versluis , on 2/2/2018, in Category Xamarin
Views: 14124
Abstract: Explore what Cognitive Services are and how easily we can incorporate it into Xamarin apps, to make them intelligent apps.

These are awesome times to be a software developer!

There is so much power at the tip of your fingers for you to use. Take for instance, Cognitive Services.

Microsoft provides you with massive Artificial Intelligence (AI) and machine learning power and you can access it through a simple REST API.

In this article, we will see what Cognitive Services is and how easily we can incorporate it into Xamarin apps, to make them smarter.

Background on Artificial Intelligence (AI)

If you have been working with Microsoft technologies (or some others as well for that matter), you couldn’t have missed all the excitement around AI and machine learning. While this is a very extensive area, I will just be focusing on a subset; Cognitive Services by Microsoft.

But first, to understand the what and how, let’s take a step back and look at the overall concept of AI.

Artificial Intelligence isn’t new. It has been the subject of many movies like The Terminator and The Matrix and there also has been a lot of talk about it in real life. All the way back in the 1950’s, two key figures played a key role in modern day AI: Alan Turing and Marvin Minsky.

Turing Test

Alan Turing was a brilliant mathematician and a computer scientist among other things and played a great role in WWII.

But more relevant to AI, he developed the Turing test in 1950.

Basically, what this test describes is a way to determine if a system is intelligent. The standard interpretation (there are variants nowadays) is to setup one human interrogator with another person and a machine, and let them communicate with just written natural language. If the human interrogator cannot tell apart the human from the system, the system is intelligent.

This setup is shown in Figure 1, below.

turing-test

Figure 1: Turing Test Setup

While the test is widely known, it is also highly criticized. Read more on the Turing test on Wikipedia: https://en.wikipedia.org/wiki/Turing_test.

Dartmouth

In the summer of 1956, the Dartmouth Summer Research Project on Artificial Intelligence took place and is now considered (by some, not others) to be the foundation of the modern ideas on artificial intelligence.

Although John McCarthy is named as ‘the father of AI’, there were others too working on this project and are duly mentioned for defining different areas of AI.

One of the outcomes the brilliant minds of the Dartmouth project came up with, is the definition of full AI. They identified five areas a system has to comply to, in order to be self-aware. The five areas are:

  • Knowledge: the ability to learn, know where to find new knowledge and link entities together;
  • Perception: see objects and recognize them, identify them and classify them;
  • Language: understand spoken and written language, as well as producing written and spoken language;
  • Problem solving/planning: analyze a problem and solve it by taking consecutive actions which built up to a solution. This includes navigation;
  • Reasoning: being able to explain its behavior and produce arguments why it is the best solution or action.

All these areas combined, produce a self-aware system that is a fully functional artificial intelligence. From this point in time, is when we probably need to start worrying...!

What are the Cognitive Services?

In line with the Dartmouth Project, Cognitive Services by Microsoft follows the same pattern.

If we look at the categories defined in Figure 2, you will notice how similar they are to the areas identified during the Dartmouth Project.

cognitive-services-overview

Figure 2: Cognitive Services Overview

In Figure 2, you can see the five categories of Cognitive Services: vision, speech, language, knowledge and search. Under each category there are several sub-categories in which Microsoft has placed its products.

In this article, I will be focusing mostly on vision, but be sure to check out the products in the other areas as well. There is a lot of awesome and powerful stuff in there. For instance, under speech and language, there are solutions to interpret spoken language or filter out the sentiment of a sentence.

You can also leverage a powerful spell checker, translate written or spoken text, as well as leverage the power of Bing. To get you started, check out www.microsoft.com/cognitive with some samples over there to get you started.

How all this is related to mobile

Everything that I have mentioned so far is not specific to mobile or Xamarin. You can incorporate it into any project, it doesn’t even have to be .NET!

Since Cognitive Services enables developers to harvest this powerful technology through simple REST APIs, you can use it from any programming language that allows you to work with HTTP and JSON.

But using it with mobile has one big advantage: a mobile device tends to have all the sensors available to capture speech or images. So, it makes sense that you use these APIs from within a mobile app because it allows the end-user to snap a picture, speak into the microphone or let them enter text.

For this article, I will focus on the Vision APIs of Cognitive Service.

Cognitive Services & Xamarin

In this example, I will build a simple app that can take a picture and describe the object that is on the image. To show you how easy it is to implement another API, I will also make use of the Emotion API to describe whether a person on the image is happy, sad, angry, etc.

For this example, I will be using Visual Studio for Mac, but everything you see here is also possible with Visual Studio on Windows. I will not go through all the Xamarin.Forms specifics in detail, as this article is mostly about Cognitive Services.

Editorial Note: To learn more about what’s new in the latest Xamarin.Forms v3, check www.dotnetcurry.com/xamarin/1393/xamarin-forms-v3-new-features

All code for the upcoming example app can be found at https://github.com/jfversluis/DNC-CognitiveServices/.

Gathering the pre-requisites

Cognitive Services are part of the Azure suite.

To be able to communicate with the APIs, you need to generate a key. This key should be included in all the requests.

Depending on the specific API that you want to use, there is usually a free tier and a priced tier available. Limitations are imposed in the form of rate-limits, i.e. a specific number of requests within a certain amount of time. Some APIs are still in preview which means they are free to use while they remain in the preview phase.

Create an Azure account if you do not have one already and log into the portal at https://portal.azure.com. In the top-left corner, find the ‘New’ button to add a new service to you Azure subscription.

If you look at the resulting panel, you will notice the ‘AI + Cognitive Services’ option in the list. This can be seen in Figure 3.

ai-cognitive-azure-portal

Figure 3: AI and Cognitive Services in Azure Portal

Here you can choose the service that you are after. I will start with the Computer Vision API. This API can be used to recognized objects in images. Later on, I will use the Emotion API.

Here are the steps to retrieve a key for this procedure.

After clicking on the Computer Vision API option, you will have to configure a few basics. Name the service appropriately, choose the right subscription (if applicable), geographic location, pricing tier and resource group. When done, click the ‘Create’ button and let Azure work its magic. Once done, go into the newly created service and find the ‘Keys’ pane, shown in Figure 4.

computer-vision-api-key

Figure 4: Computer Vision API Key

Here although two keys are shown, you only need one out of them. You can repeat this process for any other Cognitive Services APIs. Also, go into the ‘Overview’ pane and make note of the URL under ‘endpoint’.

With this key in hand, you can now fire requests at the REST API, although to make life easier, there are also a range of NuGet packages available for us to use. We will see this in a little bit.

Setting up the Xamarin project

For starters, I will just ‘new-up’ a Xamarin.Forms application that will be the base of this example. I will use a Portable Class Library (PCL) instead of a shared library. On Windows, you should be able to use .NET Standard 2.0 out of the box by now.

The UI will be straight-forward - with one Image control to preview the image taken by the user, and a Button to take a new image.

The first thing we will do is add a couple of NuGet packages. This will drastically simplify taking pictures and accessing the Cognitive Services APIs.

The first Nuget package, for taking pictures is the Xam.Plugin.Media package my James Montemagno. You need to install this package on all of your projects. Don’t forget to add the right permissions on Android and info.plist entries on iOS. This is described in the packages’ readme.txt or on the GitHub page at https://github.com/jamesmontemagno/MediaPlugin.

Second, we install the packages for Cognitive Services. Unlike what you would expect, the packages are named Project Oxford. This was the codename that Microsoft gave to Cognitive Services. The packages have not been renamed accordingly, probably due to the wide spread breaking changes.

For the Computer Vision service, we need the Microsoft.ProjectOxford.Vision package. This only needs to be installed on the PCL project.

Now we have everything we need to start implementing the real code.

Say Cheese Please! Taking the pictures

The code needed to take a picture is quite easy and is shown here.

var photo = await CrossMedia.Current.TakePhotoAsync(new Plugin.Media.Abstractions.StoreCameraMediaOptions());
if (photo != null)
{
PhotoImage.Source = ImageSource.FromStream(() => { return photo.GetStream(); });
    // TODO: Send to Cognitive Services
}

This code executes the camera command and opens a camera. When a picture has been taken successfully, it is used as a source for our Image control. I will put this code in the Clicked handler of our Button.

Note: Since the camera isn’t always supported by simulators/emulators, be sure to run this code on a physical device.

The code to send a request to the Computer Vision API can be found below. I will walk you through it. This code will be placed instead of the TODO in the above code.

var client = new VisionServiceClient("ccf53580b7494775954394530aadfd50",
                 "https://westeurope.api.cognitive.microsoft.com/vision/v1.0");

var result = await client.DescribeAsync(photo.GetStream());

await DisplayAlert("Result", result.Description.Captions.FirstOrDefault()?.Text ?? "", "Wow, thanks!"); 

First, a new VisionServiceClient is instantiated. This is done by specifying the key and endpoint URL we got earlier. And actually that is all we need to do to invoke requests!

In the above code, I have chosen the DescribeAsync method, but this is just one of the possibilities here. You can also analyse handwriting, generate a thumbnail, recognize text and much more. For this example, I will stick with just retrieving a description.

The result is then shown in a simple alert.

When we now run the app, press the button and take a picture, you will receive a description from the Computer Vision API. Take a look at Figure 5 where you see the picture on the left, and the result on the right.

computer-vision-api-output

Figure 5: Computer Vision API Output

In the result, I have also added a percentage of confidence, which in our case was 76.67%. This is something that is returned for each caption. Please refer to the extended code below to make this possible.

var client = new VisionServiceClient("ccf53580b7494775954394530aadfd50",
            "https://westeurope.api.cognitive.microsoft.com/vision/v1.0");

 var topCaption = result.Description.Captions.OrderBy( c => c.Confidence).FirstOrDefault();   var caption = string.Empty;   if (topCaption == null)      caption = "Oh no, I can't find any smart caption for this picture... Sorry!";  else      caption = $" I think I see: {topCaption.Text}, I am {Math.Round(topCaption.Confidence*100, 2)}% sure"; 

await DisplayAlert("Result", caption, "Wow, thanks!"); 

In blue, you can find the code added to implement the percentage of confidence. You will notice how I order the captions by its confidence to get the top-most one. Then I check to see if there was a caption at all, and pour it into a human readable form, together with the confidence level.

But you can retrieve much more than just the description.

It can also contain a set of tags, faces that are detected - whether it’s adult content, the color usage and more. It is also possible to return just the amount of detail that you are looking for, by adding parameters to your requests, and save yourself from the overhead of all the unwanted data .

Everybody happy?!

To show you that other APIs are just as powerful and easy to integrate, I will also demonstrate the Emotion API. With this service, you can identify emotions of a human face through Artificial Intelligence (AI).

Retrieve a key and the right endpoint from the Azure Portal and install the Microsoft.Oxford.Emotion package on the PCL project of our app.

To access the API, we can now simply create an instance of the EmotionServiceClient. I have added a second button to the (already spectacular) UI to snap a picture for our emotion recognition. I added the following code to the button click:

var client = new EmotionServiceClient("API-KEY",
                           "https://westus.api.cognitive.microsoft.com/emotion/v1.0");

var result = await client.RecognizeAsync(photo.GetStream());

var topEmotion = result.FirstOrDefault()?.Scores?.ToRankedList().FirstOrDefault();

  var caption = string.Empty;

  if (topEmotion == null || !topEmotion.HasValue)
        caption = "Oh no, no face or emotion could be detected. Are you Vulcan?";
  else
        caption = $"{topEmotion.Value.Key} is the emotion that comes to mind..";

  await DisplayAlert("Result", caption, "Wow, thanks!"); 

As you can see, this code is almost identical to the code for the Computer Vision API. For a few results, refer to Figure 6, below.

emotion-result-api-output

Figure 6: Emotion API in action

Besides the Emotion API, there is also the Face API. I was very impressed by the Face services especially. The amount of detail that is returned is astonishing. It can tell you the age, sex, if someone is wearing make-up, glasses or even swim goggles!

Why not download the sample app at https://github.com/jfversluis/DNC-CognitiveServices/ and try to add the Face API yourself and Tweet me the results? I am on twitter at @jfverlius.

Final thoughts

The Cognitive Services are not just fun to play around with, they are really powerful tools that can help you with a broad variety of tasks. For example; programmatically detect adult images, scan product reviews for negative sentiment, retrieve text or numbers through OCR to prefill fields in your app and much, much more.

All of these tasks were very hard to accomplish, or even impossible, not too long ago.

There are a few demo projects for you to play around with - how-old.net/, www.captionbot.ai/, www.celebslike.me/ and www.projectmurphy.net/. While all these options are fun, if you think about it, this stuff is very powerful.

A more serious project is the Seeing AI app (www.microsoft.com/en-us/seeing-ai/). This project is designed for the low vision community, basically enabling them to ‘see’. The app narrates the world around you and combines a lot of Cognitive Services. It can read short texts to you, identify products by the barcode, recognize people and friends, etc. Imagine this being integrated into a powerful device like the HoloLens, the possibilities will be endless.

I hope you found this article useful and you will start integrating all this goodness into your own applications. I am very curious to see what you can come up with, so don’t hesitate to reach out to me. All the technology described in this article is available to you, TODAY, so… what are you waiting for?

Smile please!!!

Download the entire source code of this article (Github).

This article was technically reviewed by Suprotim Agarwal.

What Others Are Reading!
Was this article worth reading? Share it with fellow developers too. Thanks!
Share on LinkedIn
Share on Google+

Author
Gerald Versluis (@jfversluis) is a full-stack software developer and Microsoft MVP (Xamarin) from Holland. After years of experience working with Xamarin and .NET technologies, he has been involved ina number of different projects and has been building several apps. Not only does he like to code, but he is also passionate about spreading his knowledge - as well as gaining some in the bargain. Gerald involves himself in speaking, providing training sessions and writing blogs (https://blog.verslu.is) or articles in his free time. Twitter: @jfversluis Email: gerald[at]verslu[dott]is . Website: https://gerald.verslu.is


Page copy protected against web site content infringement 	by Copyscape




Feedback - Leave us some adulation, criticism and everything in between!

Categories

JOIN OUR COMMUNITY

POPULAR ARTICLES

FREE .NET MAGAZINES

Free DNC .NET Magazine

Tags

JQUERY COOKBOOK

jQuery CookBook