Creating intelligent apps with Computer Vision

After some experimenting with SkiaSharp, I came up with the idea to build a game like Draw It. Normally you draw something and someone needs to guess what you’ve drawn. This time I wanted to take a bit of a different approach. I wanted to draw something with SkiaSharp and put Computer Vision to the test!

Computer Vision is (a small) part of the Cognitive Services that Microsoft offers. The Cognitive Services allow developers without knowledge of machine learning to create intelligent applications. There are 5 different categories of cognitive services (at the moment of writing):

  • Vision: Image and video processing.
  • Knowledge: Mapping complex data to use for recommendations or semantic search.
  • Language: Process natural language to recognise sentiment and determine what users want.
  • Speech: Speaker recognition and real-time translation.
  • Search: More intelligent search engine.

The categories have a lot more to offer than described above. If you want to know more, please visit the Cognitive Services website. There is a 30-day trial available if you want to try it!

Computer Vision 

Computer Vision API is all about analysing images and videos. After sending an image or video, the API returns a broad set of metadata on the visual aspects of the file.  Besides telling you what’s in the picture, it can also tell who’s in the picture. Because it’s not trained to recognize everybody on this planet, this will only work with celebrities. If you’re not recognized by Computer Vision, it can still tell a lot about your picture. It will add tags, captions and even guess your age! It’s really fun to experiment with!

Computer Vision also supports Optical Character Recognition (OCR) what can be used to read text from images. OCR is very useful when you’re building apps that need to interpret paper documents. Handwritten text are still in preview.

If you want to test if Computer Vision suits your needs, you can give it a try on the website without having to code a single line. This demo will only work for basic scenarios (upload an image), if you want to use the more advanced features (like real-time video), I’d suggest you try the API directly.

Getting started

Using Cognitive Services is pretty straightforward because all of the magic is available through REST API’s. For this post I will focus on the Computer Vision API, but the other APIs work in a similar way. You can use the API’s with the following steps:

  1. To start using Cognitive Services you can create a trial account on the website. Click the “Try Cognitive Services for free” button and then click “Get API Key” for the Computer Vision API. After signing in with your account you will get an API key that allows you to create 5000 transactions, limited to 20 per minute. Note: the trial is limited to the westcentralus API (https://westcentralus.api.cognitive.microsoft.com/vision/v1.0).
    If you want to implement Cognitive Service in your production environment, you can obtain an API key from the Azure Portal. Therefor you need to create an Cognitive Services resource of the type you want to use (for example: Computer Vision API). When the resource is created, you can get your keys from the Quick Start -> Keys.
  2. Because Computer Vision is exposed as an REST API, you can call it from your app using HttpClient. Instead of creating your HttpClient, you can also use the NuGet package to make your calls to the Computer Vision API.
    Note: In most cases you don’t want to make the calls directly from your app, but call the Cognitive Services from your server. This will prevent users from stealing your API key (and burning your Azure credits) when they decompile your app or intercept your calls.
  3. After adding the Microsoft.ProjectOxford.Vision to your backend or frontend, you will be able to use Computer Vision with a few lines of code:
    // If you're using a trial, make sure to pass the westcentralus api base URL
    var visionClient = new VisionServiceClient(API_Key, "https://westcentralus.api.cognitive.microsoft.com/vision/v1.0");
    VisualFeature[] features = { VisualFeature.Tags, VisualFeature.Categories, VisualFeature.Description, VisualFeature.ImageType, VisualFeature. };
    var result = await visionClient.AnalyzeImageAsync(stream, features.ToList(), null); 

    First we create an VisionServiceClient object with an API key and a base URL. Trial keys only work with the WestCentralUS instance and therefor I’m also passing the BaseUrl. When you’ve created a client, you can call AnalyzeImageAsync. This function takes in a Stream (which contains the image) and a list of VisualFeatures. The VisualFeatures specify what you want Computer Vision to analyse. If you prefer to use the HttpClient instead of the NuGet package, I’d recommend the video where René Ruppert explains how to use HttpClient to consume Cognitive Services.

Results

Unfortunately my drawing skills are not good enough for Computer Vision to recognise (with great certainty) what I was drawing. Although it was not sure what I was drawing, it suggested “Outdoor house” with a accuracy of 0.1875. An other tag Computer Vision suggested was “Clock” and I think that’s because the sun looks like a clock.

If you want to give your drawing skills a shot, the code of my experiment is on GitHub. For the drawing part of the application I’ve used some code snippets from the Xamarin.Forms samples. Besides this sample app, Xamarin has a lot of great sample apps on their GitHub.

If you want to learn more about the Computer Vision API or Cognitive Service, make sure to check out the related links below. If you have any questions or suggestions, please don’t hesitate to contact me on Twitter.

Related links:

 

 

 

Leave a comment