Content Moderation using Azure Custom Vision and Xamarin

Posted by: Gerald Versluis , on 6/3/2018, in Category Xamarin
Views: 3270
Abstract: The Cognitive Services Toolkit is a very powerful suite on Azure. We will learn how to use the Azure Custom Vision API and Content Moderator Services to implement content moderation.

In a previous edition of DotNetCurry Magazine, I wrote about how you could make your apps smarter with the Azure Cognitive Services. In that article, I demonstrated how to leverage some simple REST APIs to have the Cognitive Services describe an image or extract the emotion of a person in the picture.

In this edition, I will go a little deeper and show you how to use the Custom Vision API and Content Moderator Services to implement content moderation.

Warning: this article may contain (references to) a small amount of offensive language to demonstrate content moderation.

Are you keeping up with new developer technologies? Advance your IT career with our Free Developer magazines covering C#, Patterns, .NET Core, MVC, Azure, Angular, React, and more. Subscribe to the DotNetCurry (DNC) Magazine for FREE and download all previous, current and upcoming editions.

Azure Cognitive Services: Training Your Own Model

When using Cognitive Services, you basically are using Azure’s highest level of APIs.

All artificial intelligence and machine learning technologies are based on trained models. These models contain information on data that is entered, and through this data, a machine can ‘learn’ to tell things apart like what objects are seen in an image or detect emotions.

With Cognitive Services, these models are shared. All the data that is going through Azure is used to train the models even better. But, like with everything on Azure, you also have the ability to go a bit deeper and do more of the work manually.

In this case, we will train our own model by supplying it with images and tag them with what can be seen in that image.

To do this on Azure, we need to head over to a special portal: https://customvision.ai. Login or create an account if you don’t have one yet. Please note that you can start for free, but if you choose to continue to use it, some costs might be involved. For more information on that, please refer to this link: https://azure.microsoft.com/en-us/pricing/details/cognitive-services/custom-vision-service/.

After you have logged in, you will land in an empty portal where you can’t do much other than create a new project, which is exactly what we will do.

custom-vision-project

To create a project, you only need to do two things: name the project and choose the domain that describes your target the best.

If you will be using the custom vision API to identify food, it’s best to pick that domain since it will be optimized to better recognize food-related images. If you’re unsure, choose the ‘General’ domain like I did for this example.

As you might have noticed there are some domains at the bottom that are marked as ‘compact’. When using one of these domains, you have the ability to export the models that you create and use them locally on your mobile device. This means that you do not have to call the Azure APIs for getting results on your image. The speed when using local models is super fast and it really is great functionality. For brevity, I will not go into details for this article, but feel free to explore.

After completing the form, create the project and we’re ready to start. Now, it is time to start adding images!

You can upload multiple images at once or one by one. When you have selected the desired images, you need to add one or more tags that apply to that set of images. With these tags, the custom vision API will be able to tell you if another image looks more like tag A or tag B. It might sound a bit cryptic right now, so let me tell you about the sample app that we will be building.

A while ago my good friend Jim Bennett (@jimbobbennett), now one of the Cloud Developer Advocates at Microsoft, created a great sample app to show the capabilities of the custom vision API.

In this app, we mimic the entry page of a social media app but apply some content moderation. More precisely, on our social media platform, we do not tolerate duck faces! Therefore, we will train our model to distinguish duck face pictures from non-duck face pictures, so we can block them. The code for this sample app can be found here: https://github.com/jfversluis/Amazing-Social-Media.

Let’s quickly go back to training our model. This is where the fun starts. To properly train our model, we are going to need images of people making a duck face and people not making a duck face.

Start running around, grab whoever you can find to take random images making or not making duck faces, and upload them with the right tags to the portal. Unfortunately, I was alone while creating this model, but a result of my photoshoot can be seen in the screenshot below.

training-images

As you can see there are as little as 22 images in this project, which is enough to make a pretty accurate assumption as we will see in a little bit. When you have added some images, a minimum of five per tag, click the green ‘Train’ button at the top. This will turn all the images into a new model iteration.

In the portal, at the top of the screen, you can also see a tab called ‘Performance’. When we go into this screen, you will see the iterations on the left side of the screen. A new iteration will be created whenever you click the train button. Each iteration has its own precision and recall. These terms are a bit tricky, so pay attention.

The precision tells you that whenever we make a prediction, how likely is it that we are right?

The recall tells you out of all images that should have been classified correctly, how many did your classifier identify correctly?

The screenshot below shows you that I have a couple of iterations and actually iteration 3 turns out to be the one with the best results.

performance-tab

To enable you to test out different iterations and their results, you are able to reach each iteration at its own URL. You can also specify one iteration as the default one. Whenever you do not specify a specific iteration in your REST call, this iteration will be used.

At the very top of the screen, next to the train button is also a ‘Quick Test’ button which allows you to upload an image and test your current model.

The last tab in the top bar is ‘Predictions’. Here you can review, on a per iteration basis, which images have been received and what their results were. You can use these images to further train your model. Just pick the ones that give you false positives, tag them the right way and your model will improve.

Using Custom Vision in our Apps

Everything is in place now to start leveraging all this power from our Xamarin app. In the screenshot below you can see a somewhat basic UI. As I have mentioned before, the app mimics a screen that would be doing a post on a social network. It can consist of a text at the top and image at the bottom.

sample-app-screen

Content Moderation on Text

One thing we haven’t discussed is applying content moderation on a text. This is something that is very easy to do and does not require you to train your own models, since there is another service that Azure provides you out of the box. Technically speaking you could also use the Content Moderator Services for image moderation, but where’s the fun in that?

To get a good overview of all the services associated with content moderation, have a look at the documentation page: https://docs.microsoft.com/en-us/azure/cognitive-services/content-moderator/overview.

As mentioned in my previous article, all cognitive (and related) services are easily accessible by REST calls which can be accessed by any tool/app that can make HTTP calls. But since I will be using a Xamarin app, I can just use the NuGet packages that are available.

For the text moderation, we will need to install the Microsoft.CognitiveServices.ContentModerator package. To use these services, you will need an API key from the associated portal: https://contentmoderator.cognitive.microsoft.com/. This exercise should be pretty straight-forward, so I won’t go into the details.

When we look at the implementation of the text moderation in context of our app, the main code looks like this:

public async Task<bool> ContainsProfanity(string text)
{
    InitIfRequired();
    
    if (string.IsNullOrEmpty(text)) return false;

    var lang = await _client.TextModeration.DetectLanguageAsync("text/plain", text);
    var moderation = await _client.TextModeration.ScreenTextAsync(lang.DetectedLanguageProperty, "text/plain", text);
    return moderation.Terms != null && moderation.Terms.Any();
 }

With this method we check if the supplied string value contains any content that we want to block. First, we need to detect the language so that Azure knows what dictionary has to be checked. By doing this, we minimize the changes of getting false-positives.

Then there is one initialization line to check whether or not the client is ready.

There are multiple sides to this, one of them is known by the name of Scunthorpe Problem (https://en.wikipedia.org/wiki/Scunthorpe_problem). As you might notice, Scunthorpe, which is a perfectly normal name for a town in England, contains the word c*nt, which is offensive. Another problem is words that appear in multiple languages, for instance, Mein Schwein ist Dick means that ‘my pig is fat’ in German. But d*ck in English is again offensive.

The models that Azure implements try to detect these exceptions the best they can, and with all the data that is coming in by using these services, the results will only get better. But don’t be surprised when something does slip in.

There are some more helpers for text moderation that come in quite handy. These helpers can also detect personal identifiable information (PII) like email addresses, mailing addresses, phone numbers, etc. The content moderation service can also tell you if a human review is recommended. That way you can put suspected malicious content on a separate queue for a human to review before you post it somewhere publicly.

An example of a result JSON object is shown here.

{
  "OriginalText": "Questions? email me at somename@microsoft.com or call me at 1-800-333-4567",
  "NormalizedText": "Questions? email me at some name@ Microsoft. com or call me at 1- 800- 333- 4567",
  "AutoCorrectedText": "Questions? email me at some name@ Microsoft. com or call me at 1- 800- 333- 4567",
  "Misrepresentation": null,
  "PII": {
    "Email": [
      {
        "Detected": "somename@microsoft.com",
        "SubType": "Regular",
        "Text": "somename@microsoft.com",
        "Index": 23
      },
      {
        "Detected": "me@somename@microsoft.com",
        "SubType": "Suspected",
        "Text": "me at somename@microsoft.com",
        "Index": 17
      }
    ],
    "IPA": [ ],
    "Phone": [
      {
        "CountryCode": "US",
        "Text": "1-800-333-4567",
        "Index": 60
      }
    ],
    "Address": [ ],
    "SSN": [ ]
  },
  "Classification": {
    "ReviewRecommended": false,
    "Category1": { "Score": 0.0065943244844675064 },
    "Category2": { "Score": 0.14019052684307098 },
    "Category3": { "Score": 0.0043589477427303791 }
  },
  "Language": "eng",
  "Terms": [
    {
      "Index": 32,
      "OriginalIndex": 32,
      "ListId": 233,
      "Term": "Microsoft"
    }
  ],
  "Status": {
    "Code": 3000,
    "Description": "OK",
    "Exception": null
  },
  "TrackingId": "bf162866-73e7-49f7-8a89-aa616a542f32"
}

This example is taken from: https://github.com/MicrosoftContentModerator/ContentModerator-API-Samples/.

Duck Face Moderation with Custom Vision

Back to our duck face filter. We also need an API key and need to know the Azure region where our instance of the service is hosted. This can be retrieved from the same portal where we trained our images.

Then getting results from the API is as easy as adding a NuGet package and retrieving the results. For this service, we will install the Microsoft.Cognitive.CustomVision.Prediction package.

To determine whether or not a duck face is present, the only code we need is shown here.

public async Task<bool> IsDuckFace(MediaFile photo)
{
    InitIfRequired();
    
    if (photo == null) return false;

    using (var stream = photo.GetStreamWithImageRotatedForExternalStorage())
    {
       var predictionModels = await _endpoint.PredictImageAsync(ApiKeys.ProjectId, stream);
       
        return predictionModels.Predictions
                  .FirstOrDefault(p => p.Tag == "Duck Face")
                  .Probability > ProbabilityThreshold;
     }
}

In this method, we send a picture to the custom vision APIs and retrieve back a result with predictions. From the predictions, we can check the tags that are applied to the images we have just sent to the API. If the tag is ‘Duck Face’, the model has determined that this picture contains an image of a duck face and we want to block this content.

azure-detect-duck-face

The screenshot above shows the image I sent to the custom vision API and the error that is shown because of the duck face that is detected on it. When another picture, this time without duck face is sent, it should be allowed without any problems.

A fully functional app can be found at https://github.com/jfversluis/Amazing-Social-Media. You can also see this in action in my recorded session here: https://youtu.be/tFF8T_AqnBM?t=2m31s.

Conclusion

The entire Cognitive Services Toolkit is a very powerful suite on Azure. There are different entry levels where you can hook in to. When using the Cognitive Services APIs, you can easily leverage the power of machine learning through a few simple REST calls. But when you want to have more fine-grained control over your models or generate offline models, you have the possibility to do so, for example by using Custom Vision.

In this article, we have seen a somewhat humorous usage of this powerful technology, but a more useful and world-changing example can be found here: https://github.com/Azure/ai-toolkit-iot-edge/tree/master/Skin%20cancer%20detection. With this project, you can send images of moles to detect possible skin cancer. Think about it, when a couple of great technologies are combined, you can easily detect medical conditions, possible dangers and other things with the use of a Mixed-Reality device and Azure Cognitive services.

The sky is the limit!

This article was technically reviewed by Suprotim Agarwal.

Was this article worth reading? Share it with fellow developers too. Thanks!
Share on LinkedIn
Share on Google+
Further Reading - Articles You May Like!
Author
Gerald Versluis (@jfversluis) is a full-stack software developer and Microsoft MVP (Xamarin) from Holland. After years of experience working with Xamarin and .NET technologies, he has been involved ina number of different projects and has been building several apps. Not only does he like to code, but he is also passionate about spreading his knowledge - as well as gaining some in the bargain. Gerald involves himself in speaking, providing training sessions and writing blogs (https://blog.verslu.is) or articles in his free time. Twitter: @jfversluis Email: gerald[at]verslu[dott]is . Website: https://gerald.verslu.is


Page copy protected against web site content infringement 	by Copyscape




Feedback - Leave us some adulation, criticism and everything in between!

Categories

JOIN OUR COMMUNITY

POPULAR ARTICLES

FREE .NET MAGAZINES

Free DNC .NET Magazine

Tags

JQUERY COOKBOOK

jQuery CookBook