Recognizing Speech with SRGS Grammars

,

In this section you look at creating an SRGS grammar file to augment the Voice Paint sample app from the previous section. By using an SRGS grammar, the user is able to add a shape to the canvas in a single step. You see how SRGS grammars give greater flexibility for composing phrases and for adding semantics to your grammars.

The sample for this section is located in the Speech/VoicePaint/Srgs directory of the WPUnleashed.Examples project in the downloadable sample code.

SRGS grammars have the following features:

Image They allow combining of multiple phrase lists into a single grammar.

Image They allow linking to other grammars.

Image Weights can be assigned to words and phrases to increase or decrease the likelihood that a particular phrase will be detected.

Image Optional words or phrases can be defined.

Image Rules can be defined that help to filter out unanticipated speech.

Image Semantic information can be embedded in a grammar to identify what was said without reparsing the recognized speech.

Image Pronunciation can be specified either inline in a grammar or via a link to a lexicon.

When creating an SRGS grammar, the format must conform to that of the Speech Recognition Grammar Specification (SRGS) Version 1.0, located at http://bit.ly/SfqiXH.

SRGS grammars may either be defined in an XML document or provided as a string, and then loaded into a speech recognizer.

You now look at incorporating similar phrase list grammars from the previous section into a single SRGS grammar. See Listing 23.5.

The structure of the SRGS XML document has a root grammar element and a set of rule elements that allow you to combine sets of phrases within the grammar.

The grammar element contains the following four attributes:

Image version—A required identifier that specifies the version of the SRGS specification.

Image xml:lang—A required attribute that specifies the language for the content of the grammar.

Image root—An optional attribute that specifies the name of the rule that is active when the grammar is loaded by the speech recognition engine. Rules that are not the root rule or that are not referenced by the root rule cannot be used for recognition. For this reason, the root rule often contains references to other rules that must be active when the grammar loads.

Image xmlns—A required attribute that specifies the XML namespace.

Rule elements contain text or XML elements that define what users can say and the order in which speech fragments can be said. Every grammar must have at least one rule element.

The rule element may contain item elements and/or ruleref elements. Item elements specify words that a user might say, whereas ruleref elements reference other rules within the document or in another grammar.

Item elements may also contain ruleref elements and tag elements. Tag elements are used to embed semantic information in the grammar. We explore tag elements later in the chapter.

One-of elements define a set of alternative phrases that can possibly be matched to a phrase spoken by the user. Each alternative speech fragment must be enclosed within an item element.

In Listing 23.5, you see that the ruleref with the uri value of #Action indicates that when the speech recognizer encounters the element, it shall expect the user to either say “add” or “remove.”

LISTING 23.5. Simple SRGS Voice Paint Grammar


<?xml version="1.0" encoding="utf-8" ?>
<grammar version="1.0" xml:lang="en-US" root="CoreCommands"
 xmlns="http://www.w3.org/2001/06/grammar">

  <rule id="CoreCommands" scope="public">
    <ruleref uri="#Action" />
    <item>a</item>
    <ruleref uri="#Color" />
    <ruleref uri="#Shape" />
  </rule>

  <rule id="Action">
    <one-of>
      <item>add</item>
      <item>remove</item>
    </one-of>
  </rule>

  <rule id="Color">
    <one-of>
      <item>red</item>
      <item>green</item>
      <item>blue</item>
    </one-of>
  </rule>

  <rule id="Shape">
    <one-of>
      <item>circle</item>
      <item>square</item>
    </one-of>
  </rule>

</grammar>



Note

Although version 1.0 of the SRGS supports EBNF (Extended Backus-Naur Form) notation, it is not supported by the Windows Phone SDK.


To use the grammar XML file, the sample viewmodel creates a new SpeechRecognizerUI object, passing it the location of the SRGS file. See Listing 23.6. The path to the file is constructed using the Path property current InstalledLocation object.

The ShowConfirmation property of the recognizer’s settings is switched off. This prevents the confirmation dialog from being displayed after a phrase is recognized, which speeds up the process of selecting a shape. Setting the ShowConfirmation property to false also means, however, that the user does not have an opportunity to cancel the operation.

If a grammar set contains large or numerous grammars, the time it takes the speech recognizer to load its grammar set may delay the start of recognition. To prevent a delay, the grammar set can be preloaded using the speech recognizer’s PreloadGrammarsAsync method before initiating a recognition operation.

LISTING 23.6. VoicePaintSrgsViewModel.GetSpeechRecognizerUI Method


async Task<SpeechRecognizerUI> GetSpeechRecognizerUI()
{
    if (recognizerUI == null)
    {
        recognizerUI = new SpeechRecognizerUI();

        string path = Path.Combine(Package.Current.InstalledLocation.Path,
                        @"SpeechVoicePaintSrgsVoicePaintGrammar.xml");
        Uri uri = new Uri(path, UriKind.Absolute);

        recognizerUI.Recognizer.Grammars.AddGrammarFromUri("CoreGrammar", uri);
        recognizerUI.Settings.ShowConfirmation = false;

        await recognizerUI.Recognizer.PreloadGrammarsAsync();
    }

    return recognizerUI;
}



Note

You may notice that in other examples, the ms-appx:// prefix is used to specify the root installation directory of the app, like so:

string path = "ms-appx:///Speech/VoicePaint/Srgs/VoicePaintGrammar.xml";

At the time of writing, however, the speech recognizer is unable to locate grammar files using the prefix, thus the Package class must be used instead.


Multiple SRGS grammars can be added to a grammar set using successive calls to AddGrammarFromUri. SRGS grammars may also be used alongside list grammars, in the same grammar set.


Note

The SRGS file must be a local file stored with your app. It cannot be a file located on the Internet.


Again the viewmodel’s Prompt method is tasked with commencing speech recognition; see Listing 23.7.

The TextConfidence property of the SpeechRecognitionResult is used to determine how confident the SpeechRecognizer is of the result. It can be one of the following values:

Image High

Image Medium

Image Low

Image Rejected

A Rejected value indicates that the spoken phrase was not matched to any phrase in any active grammar. Depending on the action being undertaken in response to use speech input, you can decide whether to accept input that has low confidence in its accuracy.

If the recognition result fails to meet an adequate confidence level, the viewmodel’s Prompt method loops by calling itself.

LISTING 23.7. VoicePaintSrgsViewModel.Prompt Method


async void Prompt()
{
    SpeechRecognizerUI recognizer = await GetSpeechRecognizerUI();

    recognizer.Settings.ListenText = "Say a core phrase.";
    recognizer.Settings.ExampleText
        = " 'add a red circle', 'remove a green square', 'add a blue square' ";

    SpeechRecognitionUIResult uiResult = await recognizer.RecognizeWithUIAsync();

    if (uiResult.ResultStatus == SpeechRecognitionUIStatus.Succeeded)
    {
        SpeechRecognitionResult result = uiResult.RecognitionResult;

        if (result.TextConfidence == SpeechRecognitionConfidence.Medium
            || result.TextConfidence == SpeechRecognitionConfidence.High)
        {
            PerformAction(result.Semantics);
        }
        else
        {
            MessageService.ShowMessage("Sorry, I'm not certain I understood.");
            Prompt();
        }
    }
}


The viewmodel’s PerformAction method processes the output of the speech recognizer. It does this by using semantic information returned from the speech recognizer. Semantic information can be placed within the SRGS grammar.

The grammar that was presented earlier in the chapter, in Listing 23.5, has been adapted in Listing 23.8 to include semantic tags. These tag elements allow you to add semantic information to the speech recognition information, depending on the phrase that was spoken by the user.

Semantic tags allow you to avoid reinterpreting the recognized speech after it has been passed back to your code. It also makes it easier to process localized versions of your grammar, because semantic information can remain the same across all localized grammars.

To employ semantic tags in your grammar, add the tag-format attribute to the grammar element of the SRGS XML file.

The $ symbol, as shown in Listing 23.8, is used to populate a rule variable that is passed back to your code after the speech has been evaluated.


Caution

The semantic tag format shown in Listing 23.8 is Microsoft specific, and using it in your SRGS grammar may prevent it from being used outside of the Microsoft speech ecosystem. There is a different, non-Microsoft tag format that is specified using semantics/1.0 rather than semantics-ms/1.0. It is, however, somewhat more verbose. For example, when using the non-Microsoft-specific format, the Rule Variable is named “out”. Conversely, when using the Microsoft-specific format, the Rule Variable is named “$”.


LISTING 23.8. VoicePaintGrammar.xml


<?xml version="1.0" encoding="utf-8" ?>
<grammar version="1.0" xml:lang="en-US" root="CoreCommands"
 xmlns="http://www.w3.org/2001/06/grammar" tag-format="semantics-ms/1.0">

  <rule id="CoreCommands" scope="public">
    <ruleref uri="#Action" />
    <tag>$.Action=$Action;</tag>
    <item>a</item>
    <ruleref uri="#Color" />
    <tag>$.Color=$Color;</tag>
    <ruleref uri="#Shape" />
    <tag>$.Shape=$Shape;</tag>
  </rule>

  <rule id="Action">
    <one-of>
      <item>add<tag>$ = "add"</tag></item>
      <item>remove<tag>$ = "remove"</tag></item>
    </one-of>
  </rule>

  <rule id="Color">
    <one-of>
      <item>red<tag>$ = "red"</tag></item>
      <item>green<tag>$ = "green"</tag></item>
      <item>blue<tag>$ = "blue"</tag></item>
    </one-of>
  </rule>

  <rule id="Shape">
    <one-of>
      <item>circle<tag>$ = "circle"</tag></item>
      <item>square<tag>$ = "square"</tag></item>
    </one-of>
  </rule>

</grammar>


Recall that the viewmodel’s Prompt method processes the recognized speech using its PerformAction method. The SpeechRecognitionResult object contains semantic information—in particular, the values of the rule variables—which give you a breakdown of the subphrases that were recognized.

The viewmodel’s PerformAction method extracts the semantic information from the SpeechRecognitionResult’s Semantics dictionary property. Each piece of semantic information is stored using the rule variable name as a key. For example, the outcome of the Action rule in the SRGS grammar can be retrieved using semantics["Action"]; see Listing 23.9.

With these three pieces of information: action, color, and shape, the viewmodel is able to either add or remove the particular shape to or from the canvas.

LISTING 23.9. VoicePaintSrgsViewModel.PerformAction Method


void PerformAction(IReadOnlyDictionary<string, SemanticProperty> semantics)
{
    string actionString = semantics["Action"].Value.ToString();
    string colorString = semantics["Color"].Value.ToString();
    string shapeString = semantics["Shape"].Value.ToString();

    ShapeType shapeType = (ShapeType)Enum.Parse(typeof(ShapeType), shapeString,
true);
    Color color = colorString == "red" ? Colors.Red : colorString
        == "green" ? Colors.Green : Colors.Blue;

    if (actionString == "add")
    {
        AddShape(shapeType, color);
    }
    else
    {
        Type type = shapeType == ShapeType.Square ? typeof(Square) : typeof(Circle);

        foreach (Shape shape in shapes.ToList())
        {
            if (shape.OutlineColor == color)
            {
                if (type.IsInstanceOfType(shape))
                {
                    shapes.Remove(shape);
                    break;
                }
            }
        }
    }
}


The view used for the SRGS is named VoicePaintView.xaml and is the same page that was used for the earlier list grammar sample.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.136.27.75