DotNetCurry Logo

Building Text To Speech Applications using Windows Phone 8.1 and Cortana Overview

Posted by: Vikram Pendse , on 5/3/2014, in Category Windows Phone
Views: 53493
Abstract: This article shows how to build Text to Speech Applications in Windows Phone 8.1. It also gives an overview of Cortana, the personal digital assistant.

Microsoft is making efforts to improve the User experience and Developer experience on both Phone and Desktop Platform. Recently in the Build 2014 conferenceMicrosoft announced a new version of Windows Phone OS as Windows Phone 8.1. With this announcement, they also made Windows Phone 8.1 APIs available for Developers to build next generation Windows Phone 8.1 Apps. In this article, we will talk about implementing Text To Speech Capability in Apps targeting the new Windows Phone 8.1 Platform and also we will talk about some changes in the same.

 

 

Windows Phone 8.1 Development Tools

Like Windows Phone 8.0, you will not find a standalone SDK for Windows Phone 8.1 since now Microsoft has made it available as a part of Visual Studio Update. So Visual Studio Update 2 RC has built-in tooling and set of APIs along with Project Templates, to build Windows Phone 8.1 Apps. For more information and download check: https://dev.windowsphone.com/en-us/downloadsdk

Blank App (Windows Phone) Vs Blank App (Windows Phone Silverlight)

After installing tools for Windows Phone 8.1 via Visual Studio update, when you go to File>>New Project you will find there are 2 different Project Templates for Windows Phone

1. Blank App (Windows Phone) – Creates a blank Windows Phone Application, however it does not leverage Silverlight Controls or APIs. It only targets Windows Phone 8.1, so if you have existing Windows Phone 8 Apps and wish to add 8.1 features to it, then this is not a recommended Template. You can choose this template if you are only targeting your App for Windows Phone 8.1 or if wish to add a Windows 8.1 Project to it in your future development. If you observe the Project Reference, you can see it uses assembly of .NET for Windows Store Apps

netforstoreapps

2. Blank App (Windows Phone Silverlight) - Creates a blank Windows Phone Application, however it is used mostly to upgrade your existing Windows Phone 8.0 app to Windows Phone 8.1 and leverage new features of Windows Phone 8.1 in it. So again, if you want to target Windows Phone 8.1 only; then I will suggest you to use the above template and not the Silverlight template. If you observe the Project Reference, you will see it uses assembly of .NET for Windows Phone

netforwindowsphone

Text To Speech / Speech To Text and “Cortana” in Windows Phone 8.1

In the Build 2014 Keynote on Day 1, Microsoft announced a new Personal Digital Assistant called Cortana (can compare it to Apple’s Siri or Samsung’s S-Voice) which primarily takes care of search, search by voice and a lot of other things. You can invoke “Cortana” by tapping your “Search” button on the lower right corner of your device running Windows Phone 8.1. Earlier in Windows Phone 8, tapping the search button would invoke a GSE (Global Speech Experience) popup which accepted Voice Commands. This was used to trigger by tap and hold “Windows/Home” button of your device running Windows Phone 8.0. Now with 8.1, you will not see that since “Cortana” has replaced it with a much more deeper integration for Voice Commands. We will talk about “Cortana” at the end of this article.

cortanahome

Now coming back to APIs, In Windows Phone 8.1, the primary API for Text To Speech and Speech To Text is Windows.Media.SpeechSynthesis . In Windows Phone 8.0 it was Windows.Phone.Speech.Synthesis . In Windows Phone 8.1, for Text To Speech we need SpeechSysnthesisStream to a MediaElement. SpeechSysnthesisStream helps to generate asynchronous voice output from Text. So MediaElement plays a vital role for reading text. Now we will build a simple Text To Speech App to have an idea how it works on the new platform. We will also discuss in parallel about how we as a developer need to do some code changes if we have existing Text To Speech App in Windows Phone 8.0, before migrating them to Windows Phone 8.1

Quickly Build Text To Speech App using Windows Phone 8.1 using Blank App (Windows Phone)

Firstly choose a Blank App (Windows Phone) from File > New Project Menu as shown below

newprojects

To ensure it works smoothly in Text To Speech and Speech To Text scenario, make sure you enable “Microphone” capability in Package.appxmanifest

Here is the XAML for our application:

<Grid>
<TextBlock x:Name="txtInfo" HorizontalAlignment="Left" Height="142" Margin="10,32,0,0" TextWrapping="Wrap" Text="Dubai is one of the few cities in the world that has undergone such a rapid transformation - from a humble beginning as a pearl-diving center - to one of the fastest growing cities on earth." VerticalAlignment="Top" Width="380" FontSize="24"/>
<MediaElement Name="audioPlayer" AutoPlay="True"/>
<ListView x:Name="lstvwVoices" Margin="10,199,10,254" />
<StackPanel Margin="0,404,0,0">
<Button x:Name="btnShowVoices" Content="Show Voices" Width="390" Click="btnShowVoices_Click"/>
<Button x:Name="btnTTS" Click="btnTTS_Click" Width="390" Content="Text To Speech"/>
<Button x:Name="btnSSML" Content="Speak SSML" Click="btnSSML_Click" Width="390"/>
<Button x:Name="btnSTT" Content="Speech To Text" Click="btnSTT_Click" Width="390"/>
</StackPanel>
</Grid>

This is how the screen will look like

home

Show Installed Voices on your Phone

You can loop over all voices which are by default installed/available on your device. Here we are binding it to a ListView to display them:

List<string> lstVoices = new List<string>();
private void btnShowVoices_Click(object sender, RoutedEventArgs e)
{
    //Get all the Voices
    foreach (var voice in SpeechSynthesizer.AllVoices)
    {
        lstVoices.Add(voice.DisplayName);
    }
    lstvwVoices.ItemsSource = lstVoices;
}

It will be displayed on the device with all Names as shown below. Note that beside DisplayName you can also show other information like Culture and Gender too.

loadvoice

Text To Speech

Now we will see an implementation of Text To Speech.

For Windows Phone 8.1

private void btnTTS_Click(object sender, RoutedEventArgs e)
{
   SpeakText(audioPlayer, txtInfo.Text);
}

private async void SpeakText(MediaElement audioPlayer, string TTS)
{
  SpeechSynthesizer ttssynthesizer = new SpeechSynthesizer();

    //Set the Voice/Speaker
    using (var Speaker = new SpeechSynthesizer())
    {
       Speaker.Voice = (SpeechSynthesizer.AllVoices.First(x => x.Gender == VoiceGender.Female));
       ttssynthesizer.Voice = Speaker.Voice;
    }

SpeechSynthesisStream ttsStream = await ttssynthesizer.SynthesizeTextToStreamAsync(TTS);

audioPlayer.SetSource(ttsStream, "");           
}

Here observe a clear distinction between Silverlight and WinRT. For non-Silverlight projects, you need a MediaElement which reads out stream text in voice format; whereas for Silverlight, you do not need any explicit MediaElement to read out text.

For Windows Phone 8.1 (Silverlight)

SpeechSynthesizer synthesizer = new SpeechSynthesizer();
await synthesizer.SpeakTextAsync(<string content,object UserState>);

Text To Speech using SSML

If you are not much familiar with SSML (Speech Synthesis Markup Language), then please visit http://www.w3.org/TR/speech-synthesis/ for a detailed documentation.

We will put our Text in Non-English Text (say in Japanese) to test out SSML as shown below.

For Windows Phone 8.1

private async void btnSSML_Click(object sender, RoutedEventArgs e)
{
  //Speech Synthesis Markup Language
  txtInfo.Text =
@"趣味は日本語を勉強することです

趣味はいろんな新しい食べ物に挑戦することですパソコンいじりが得意なので、何か手伝えることがありましたら声をかけて下さい。";

  var ttsJP = new SpeechSynthesizer();
  SpeechSynthesisStream ttsStream =
await ttsJP.SynthesizeSsmlToStreamAsync(@"<speak version=""1.0""
             xmlns=""http://www.w3.org/2001/10/synthesis"" xml:lang=""ja-JP"">
             <voice gender=""male"">
趣味は日本語を勉強することです

趣味はいろんな新しい食べ物に挑戦することですパソコンいじりが得意なので、何か手伝えることがありましたら声をかけて下さい。

                        </voice>                      
                        </speak>");
            audioPlayer.SetSource(ttsStream, "");
}

Here again you will see that MediaElement is used to read out SSML which comes as input stream. So a common conclusion we can draw is that we need MediaElement to read out Text irrespective of it being a normal string text format or SSML.

For Windows Phone 8.1 (Silverlight)

var ttsJP = new SpeechSynthesizer();
  SpeechSynthesisStream ttsStream =
await ttsJP.SpeakSSMLAsync(@"<speak version=""1.0""
             xmlns=""http://www.w3.org/2001/10/synthesis"" xml:lang=""ja-JP"">
             <voice gender=""male""> 
趣味は日本語を勉強することです

趣味はいろんな新しい食べ物に挑戦することですパソコンいじりが得意なので、何か手伝えることがありましたら声をかけて下さい。                        </voice>                      
</speak>");

Speech To Text

Here comes the trickiest part; recognizing your voice and translating it to Text. We are already aware of the richness of Voice Commands in Windows Phone 8.0 world. But here since we are looking at primary changes in API and new additions, we are not talking about Voice Commands. We will discuss that in depth in another upcoming article.

Earlier in Windows Phone 8.0, we would get the GSE (Global Speech Experience) when tapping the “Windows/Home” button for some time. We used it to give certain Voice Commands and GSE would translate those and do the needful, like calling a contact on our phone or running an App etc.

With introduction to Cortana, things have changed and now GSE is now available out of the box in Cortana. So pressing the “Windows/Home” in Windows Phone 8.1 world, will show you the following:

gse

Let us see how quickly we can invoke a Dialog or UI where we can speak in our Voice and ask our phone to read and understand it. As mentioned earlier, we are not doing any action as we have currently kept the Voice Command topic for a different article.

For Windows Phone 8.1

private async void btnSTT_Click(object sender, RoutedEventArgs e)
{
   // Compile the dictation grammar
   await speechRecog.CompileConstraintsAsync();

   // Start Recognition
   SpeechRecognitionResult speechRecognitionResult = await this.speechRecog.RecognizeWithUIAsync();

   // Show Output
   var sttDialog = new Windows.UI.Popups.MessageDialog(speechRecognitionResult.Text, "Heard You said...");
   await sttDialog.ShowAsync();
}

For Windows Phone 8.1 (Silverlight)

SpeechRecognizerUI mySpeechRecognizer = new SpeechRecognizerUI();           
SpeechRecognitionUIResult SpeechResult = await mySpeechRecognizer.RecognizeWithUIAsync();
if (SpeechResult.ResultStatus == SpeechRecognitionUIStatus.Succeeded)
{
    MessageBox.Show(SpeechResult.RecognitionResult.Text);
}

sttinput

sttoutput

There you go! We saw how we can build a simple Text To Speech Application in Windows Phone 8.1 and can leverage other aspects like SSML and Speech To Text as well. Now let us look at “Cortana” in some detail.

Cortana – Your own Personal Digital Assistant & Making It Available In your Region (Hack)

As I briefly mentioned earlier, during Build 2014, Microsoft announced the availability of Cortana (Personal Digital Assistant) on devices running Windows Phone 8.1 OS. “Cortana” is in a Beta stage and currently only available in USA Region on Devices. So Non USA Region locals might not find Cortana by tapping on the Search button. In that case, they will be automatically redirected to Bing Search. To read more on Cortana visit the Windows Phone Blog article which gives macro level overview of Cortana

So the next question that comes up is “Is it only on devices? What about Non US regions?” – Well you can always change your region to US and make Cortana available on your devices. This will be completely on your own risk. It is not advisable to do so, this might have impact on functioning of your device. So do the region change at your own risk. We certainly do not recommend doing it, but it has worked for some of us.

If you are a Windows Phone Developer, then Cortana is always available for you irrespective of your country. Once you start Emulator, Tap on “Search” button of your emulator and then all you need is sign-in with your Live ID. Once you configure this with Live ID, your Live ID Contacts and Data will automatically get synced on the Emulator. You can also save this to Checkpoint to avoid Sign-in every time (Checkpoint is another new feature where your emulator state is maintained with generic data like Contacts and Settings etc.)

1. Launch Cortana from “Search” button

launchcortanaemulator

2. Sign-in with your Live ID

configurecortanaemulator 

Here clicking on “no thanks” will throw you out from Cortana and you can use your normal “Bing Search” like you use to do in Windows Phone 8.0, if you “allow” it will prompt you for LiveID sign-in and post that it will sync, again you can opt out from that step if you don’t want to sync-up your LiveID

mssingincortana

syncliveid

gotobing

3. After successfully sign-in with your LiveID, you can start using Cortana, you can personalize more from its Settings

cortanahome

Your Launchers and Choosers Task will not work in Windows Phone 8.1 Non Silverlight Template. So if you need a glimpse of how it will behave if you pass a Search String from code, here is small example which launches SearchTask and invokes Cortana (provided your emulator is already configured with Cortana as mentioned in steps above) else it will only show the “Bing Search”

SearchTask tsk = new SearchTask();
tsk.SearchQuery = "Flight to Seattle";
tsk.Show();

Results will be like shown below

cortanasearchinvoke

cortanasearchresults

Cortana on the emulator is equally powerful. You can even try Music Search. Just play any music track near your PC/Laptop Microphone and Cortana on emulator recognizes it accurately. Here are some results for the Music Search

cortanamusicsearch

cortanamusicsearchresult

Summary

Windows Phone 8.1 brings out a of opportunities for Developers to build next generation apps for a hugely growing number of Windows Phone consumer across the world. Text To Speech and Speech To Text always comes handy and helpful for all consumers across the world. Such features not only save time, but also bring up a personalized experience in local languages. In this article, we saw how you can build such apps within a short span of time. We also visited the changes for Silverlight based template for Windows Phone 8.0. Finally we had a decent overview of the new personal digital assistant as Cortana. So get the tools and start building Voice based apps today.
Was this article worth reading? Share it with fellow developers too. Thanks!
Share on LinkedIn
Share on Google+
Further Reading - Articles You May Like!
Author
Vikram Pendse is currently working as a Technology Manager for Microsoft Technologies and Cloud in e-Zest Solutions Ltd. in (Pune) India. He is responsible for Building strategy for moving Amazon AWS workloads to Azure, Providing Estimates, Architecture, Supporting RFPs and Deals. He is Microsoft MVP since year 2008 and currently a Microsoft Azure and Windows Platform Development MVP. He also provides quick start trainings on Azure to startups and colleges during weekends. He is a very active member in various Microsoft Communities and participates as a Speaker in many events. You can follow him on Twitter @VikramPendse


Page copy protected against web site content infringement 	by Copyscape




Feedback - Leave us some adulation, criticism and everything in between!
Comment posted by Josiah Gilbert on Wednesday, May 7, 2014 11:05 AM
I enjoyed very much reading this blog. How I can change my region to US? I use L 520.
Comment posted by Vikram Pendse on Wednesday, May 7, 2014 11:58 PM
Hi Josiah, Thanks for your comment. Hope you find article useful.
For your query for changing region to US.If you aren't in the U.S. , some Cortana features may be unavailable.On your phone Go to Settings>>region and Select United States. You need to restart your device once you perform this. This will also might change couple of things like currency symbol etc. Some people have also faced issues to connect to Store after doing this. So "Try it on your own risk", it is not strongly recommended to do so. I did this for my own experiment purpose and on my own risk on a Test device.
Comment posted by Joaquin Buscaglia on Monday, May 12, 2014 7:09 PM
Hi, I really learned a lot reading this article, but i had a problem with the part of speech to text in windows phone 8.1, I don't know Where and How I have to declare speechRecog, so then I can use this "await speechRecog.CompileConstraintsAsync();"
Comment posted by Joaquin Buscaglia on Monday, May 12, 2014 7:11 PM
sorry for the triple post, my browser didn't respond when I sent it.
Comment posted by Vikram Pendse on Monday, May 12, 2014 11:56 PM
Hi Joaquin ! Thanks for your feedback and good to know that article helping you to learn changes in Text to Speech for Windows Phone 8.1. For your query,I have declare SpeechRecognizer speechRecog = new SpeechRecognizer(); Globally before Constructor/public MainPage(){}. So I am just using instance of SpeechRecognizer. You can declare SpeechRecognizer speechRecog = new SpeechRecognizer(); and then you can use async method "CompileConstraintsAsync" like speechRecog.CompileConstraintsAsync(); using await keyword before instance like shown in the code above. Hope this will help to resolve your query.
Comment posted by Joaquin Buscaglia on Wednesday, May 14, 2014 6:41 PM
Thank you so much, that solved my problem, and now the program is running perfectly.
Comment posted by Vikram Pendse on Wednesday, May 14, 2014 10:55 PM
Glad to know !
Comment posted by Roy on Monday, July 21, 2014 3:35 PM
Nice article, slightly related but when adding speech to a Windows Phone 8.1 xaml project how come the icon showed if I ask cortana what can I say is the starburst image and not an image of my app? No one seems to know the answer. Please help
Comment posted by Vikram Pendse on Tuesday, July 22, 2014 12:56 AM
Hi Roy, If you implement voice command with .VCD file then it will redirect to your own application and shows your app UI (again if you carefully observer Cortana during Speech To Text, it will show you list of Apps supports Voice Commands and it will then redirect to your own app depending on command), If you go to Cortana and check "see more" blue color link on Cortana UI, it will redirect you to her capabilities and integration with native apps and the apps supporting voice commands. If you implement voice command, ideally your app should be in that list as Cortana will be default UI which will understand your voice command and do the needful. Hope this helps.
Comment posted by Aleksandr on Thursday, July 24, 2014 5:11 AM
I had the same question as Roy had. Thank you for answering it and good article.
Comment posted by bharath on Wednesday, February 4, 2015 10:12 AM
learned a lot from this concept