Chapter 7. Speech Recognition

In this chapter, we will cover the following recipes:

  • Reading and plotting audio data
  • Transforming audio signals into the frequency domain
  • Generating audio signals with custom parameters
  • Synthesizing music
  • Extracting frequency domain features
  • Building Hidden Markov Models
  • Building a speech recognizer

Introduction

Speech recognition refers to the process of recognizing and understanding spoken language. Input comes in the form of audio data, and the speech recognizers will process this data to extract meaningful information from it. This has a lot of practical uses, such as voice controlled devices, transcription of spoken language into words, security systems, and so on.

Speech signals are very versatile in nature. There are many variations of speech in the same language. There are different elements to speech, such as language, emotion, tone, noise, accent, and so on. It's difficult to rigidly define a set of rules that can constitute speech. Even with all these variations, humans are really good at understanding all of this with relative ease. Hence, we need machines to understand speech in the same way.

Over the last couple of decades, researchers have worked on various aspects of speech, such as identifying the speaker, understanding words, recognizing accents, translating speech, and so on. Among all these tasks, automatic speech recognition has been the focal point of attention for many researchers. In this chapter, we will learn how to build a speech recognizer.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.28.197