Building a Speech To Text Artificial Intelligence app in C# and Azure

Posted by: Atanu Gupta , on 6/14/2018, in Category Microsoft Azure
Views: 10148
Abstract: Create your first Speech-to-Text artificial intelligence console application in C# using the Microsoft Bing Speech Cognitive API

Speech recognition is a standard for modern apps. Users expect to be able to speak, be understood, and be spoken to.

The Microsoft Cognitive Services – Speech API allows you to easily add real-time speech recognition to your app, so it can recognize audio coming from multiple sources and convert it to text, the app understands.

In this tutorial, I would walk you through the steps for creating your first Speech-to-Text artificial intelligence in a simple C# console application using the Microsoft Bing Speech Cognitive API.

Prerequisites:

  • Microsoft Visual Studio 2017 (Though you can try on VS 2015, but this demo is on VS2017)
  • Azure Subscription (Free subscription of 30 days will also do)
  • Basic C# programing knowledge

Setting up the Speech API in Azure

Step 1: Login to Azure (If you do not have any subscription already then create one, else login to your existing account).

Step 2: Click the “Create a resource” option.

create-resource

Step 3: Search “Bing Speech” in the search box and select the following:

bing-speech

Step 4: Click on the “Create” button.

create-speech-api

Step 5: Fill up the necessary details and click Create. You can choose the Name, Resource Group and location as per your preference.

bing-api

Step 6: Wait a few seconds for Azure to create the service for you. Once created, it will take you to it’s landing page (Quick start) as shown below:

azure-service

Step 7: Now select the “Keys” property and copy Key 1. You can also copy Key 2 if you wish to, as any one of them will solve the purpose. Keep it in a notepad. Will need this later.

keys

We are done setting up the Speech API in Azure.

Visual Studio Application – Console App

Open visual studio 2017 and select File >> New >> Project. Select “Console App (.Net Framework)” as your project type.

Please note that Visual C# is my default language selection.

Choose the project name as per your wish (I have given it the name SpeechToText_AI).

visual-studio-speech-app

Now in order to use the Speech API, we need to add a reference to the Microsoft.ProjectOxford.SpeechRecognition NuGet library. So, right click on the project >> Manage NuGet Packages. Browse for the library and install the required version based on your system.

speech-recognition

Add the following using statement to the top

using Microsoft.CognitiveServices.SpeechRecognition;

Now declare a static variable in the Program class.

static MicrophoneRecognitionClient _microRecogClient;

The MicrophoneRecognitionClient is a public class under the Microsoft.CognitiveServices.SpeechRecognition namespace. This class exposes two important methods EndMicAndRecognition() and StartMicAndRecognition() which we will use later in our code.

Now initialize the MicrophoneRecognitionClient instance and wire it up with its respective OnResponseReceived(), OnPartialResponseReceived() and OnConversationError() events. At the end, start the microphone recognition with the StartMicAndRecognition() method as mentioned in the above point. Create a separate function to perform all the operations (for example : ConvertSpeechToText).

public static void ConvertSpeechToText (SpeechRecognitionMode mode, string language, string subscriptionKey)
{
    _microRecogClient = SpeechRecognitionServiceFactory.CreateMicrophoneClient(mode, language, subscriptionKey);
    _microRecogClient.OnResponseReceived += OnResponseReceivedHandler;
    _microRecogClient.OnPartialResponseReceived += OnPartialResponseReceivedHandler;
    _microRecogClient.OnConversationError += OnConversationError;
    _microRecogClient.StartMicAndRecognition();
}

Note that ConvertSpeechToText() function is taking three parameters. The SpeechRecognitionMode is an enum which accepts text in short phrases or in long dictation. The language is the language of the speech and the subscriptionKey is your Bing Search API key (Remember we kept this key aside in a notepad). These three parameter are required to initialize the MicrophoneClient.

All the events are also wired up. The event worth highlighting is the OnResponseReceivedHandler() event.

static void OnResponseReceivedHandler(object sender, SpeechResponseEventArgs e)
{
    for (int i = 0; i < e.PhraseResponse.Results.Length; i++)
    {
        Console.Write("{0} ", e.PhraseResponse.Results[i].DisplayText);
    }
    //_microRecogClient.EndMicAndRecognition();
    Console.WriteLine();
}

The interesting part is what's happening inside the for loop. Keep in mind that we do not speak sentence; we speak words which collectively forms a sentence. When we take a pause; it's intelligent enough to predict that now it's the time to generate the sentence and fires the OnResponseReceivedHandler() event handler to print the sentence on screen. I have purposefully commented out the EndMicAndRecognition() so that the instance remains alive and you can keep playing/trying around with your Speech AI.

Congratulations for coming this far. Your intelligent speech recognition bot is now ready.

Here is the full source code for Program.cs

using System;
using System.Media;
using Microsoft.CognitiveServices.SpeechRecognition;
using System.IO;
using System.Threading;

namespace SpeechToText_AI
{
    class Program
    {
        static MicrophoneRecognitionClient _microRecogClient;

        static void Main(string[] args)
        {
            Console.WriteLine("Press enter to start your speech");
            Console.ReadLine();
            ConvertSpeechToText(SpeechRecognitionMode.LongDictation, "en-US", "--Paste your Key from the notepad--");
        }

        public static void ConvertSpeechToText(SpeechRecognitionMode mode, string language, string subscriptionKey)
        {
            _microRecogClient = SpeechRecognitionServiceFactory.CreateMicrophoneClient(mode, language, subscriptionKey);
            _microRecogClient.OnResponseReceived += OnResponseReceivedHandler;
            _microRecogClient.OnPartialResponseReceived += OnPartialResponseReceivedHandler;
            _microRecogClient.OnConversationError += OnConversationError;
            _microRecogClient.StartMicAndRecognition();
        }

        static void OnConversationError(object sender, SpeechErrorEventArgs e)
        {
            Console.WriteLine("Error Code: {0}", e.SpeechErrorCode.ToString());
            Console.WriteLine("Error Text: {0}", e.SpeechErrorText);
            Console.WriteLine();
        }

        static void OnPartialResponseReceivedHandler(object sender, PartialSpeechResponseEventArgs e)
        {
            Console.WriteLine("{0} ", e.PartialResult);
            Console.WriteLine();
        }

        static void OnResponseReceivedHandler(object sender, SpeechResponseEventArgs e)
        {
            for (int i = 0; i & lt; e.PhraseResponse.Results.Length; i++)
{
                Console.Write("{0} ", e.PhraseResponse.Results[i].DisplayText);
            }
            Console.WriteLine();
        }
    }
}

We are done with the coding part as well. Now press F5 and run the program.

The command prompt will show you the message: Press enter to start your speech

Hit enter in your keyboard and start talking slowly (use your computer speaker/microphone). Pronounce each word clearly as you speak and see your small artificial intelligence code responding back to what you are saying in plain text.

Voila!! Your Speech to Text intelligence is up and running. Well Done!

Do share with me your experience and what you have built upon this foundation. You can take it up to any level and integrate.

I am looking forward to hear from you.

What Others Are Reading!
Was this article worth reading? Share it with fellow developers too. Thanks!
Share on LinkedIn
Share on Google+

Author
Atanu is a Microsoft Certified cloud architect with 14 years of industry experience having hands-on exposure to cloud application architecture, design and development. He has keen interest in Cloud design patterns, Industry best practices, Distributed architecture, DevOps, Artificial Intelligence and Cognitive services; and shares his knowledge and advice in various online communities, forums and blogs.


Page copy protected against web site content infringement 	by Copyscape




Feedback - Leave us some adulation, criticism and everything in between!

Categories

JOIN OUR COMMUNITY

POPULAR ARTICLES

FREE .NET MAGAZINES

Free DNC .NET Magazine

Tags

JQUERY COOKBOOK

jQuery CookBook