Chapter 1
In This Chapter
Clarifying what NaturallySpeaking can do
Figuring out what NaturallySpeaking can’t do
Selecting the right NaturallySpeaking product
Voice recognition is used in places like cars, hospitals, and legal offices. Yet, some people are still skeptical about software that enables you to dictate to your computer and get a transcription of what you said. People think it’s very cool, but they secretly wonder if it really works.
It works. (And it is really cool.) Right out of the box, today’s Dragon NaturallySpeaking reports 99 percent accuracy. I bet that’s a score you’d like for your own personal output. Well then, read this book and dive right in. You’ll be rewarded with higher productivity and hands-free computing.
Sections of this book were written by dictating them into Dragon NaturallySpeaking. It was a lot of fun, and I predict that you will also find NaturallySpeaking to be useful and fun — if you approach it with the appropriate expectations.
Something about dictating to a computer awakens all kinds of unrealistic expectations in people. If you expect it to serve you breakfast in bed, you’re out of luck. I didn’t write this book by saying, “Computer, write a book about NaturallySpeaking.” I had to dictate it word for word, just as I would have had to type it word for word if I didn’t have NaturallySpeaking.
So what are realistic expectations? Think of NaturallySpeaking the way that you think about your keyboard and mouse. It’s an input device for your computer, not a brain transplant. It doesn’t add any new capabilities to your computer beyond deciphering your spoken words into text or ordinary PC commands. If you say, “Go make me a sandwich,” NaturallySpeaking will dutifully type “go make me a sandwich” into whatever word-processing application happens to be open.
Just because your computer can understand what you say, don’t expect it to understand what you mean. It’s still just a computer, you know.
Here are five specific things you can expect to do with NaturallySpeaking, and where to look for more details about how to do them:
Now, I happen to think that’s plenty to get excited about. I can make my own sandwiches, thank you.
Even with NaturallySpeaking, your computer’s capability to understand English is more limited than what you can reasonably expect from a human. People use a very wide sense of context to figure out what other people are saying. We know, for instance, what the teen behind the counter at Burger King means when he asks, “Wonfryzat?” (That’s fast-food-employed teenspeak for “Do you want fries with that?”) We’d be completely confused if that same teenager walked up to us at a public library and asked, “Wonfryzat?”
NaturallySpeaking figures things out from context, too, but only from the verbal context (and a fairly small verbal context at that). It knows that “two apples” and “too far” make more sense than “too apples” and “two far.” But two- to three-word context seems to be about the extent of the software’s powers. (I can’t say exactly how far it looks for context, because Nuance, the manufacturer, is understandably pretty hush-hush about the inner workings of NaturallySpeaking.) It doesn’t understand the content of your document, so it can’t know that words like “Labradoodles” and “Morkies” are going to show up just because you’re talking about dogs.
However, Dragon does use context in its analysis. NaturallySpeaking has collected millions of hours of dictation from customers and uses this data to better understand the language. A word receives a frequency score that shows how often the word appears and how often the word appears before or after other words. This data is then used to determine the best choice; for example, when you mention “too” or “two.”
Consequently, you can’t expect NaturallySpeaking to understand every form of speech that humans understand. In order to work well, it needs advantages like these:
Even though you no longer have to train Dragon, it still creates a speaker-dependent profile. Once the audio setup is complete, NaturallySpeaking is already making it unique to you. That’s why you must always use your own profile.
For the first time, NaturallySpeaking supports using a PCs internal mic.
Dragon does a pretty good job with accents, though, as long as you’re consistent.
NaturallySpeaking isn’t a single product; it’s a family of products. And like most families, some members are richer than others. Depending on the features you want, you can pay a hefty price for software. You get what you pay for.
In spite of their socioeconomic differences, this family gets along pretty well. All the products are based on the same underlying voice recognition system, so they create the same kinds of user files. This fact has two consequences for you as a user:
You can start out with the inexpensive Home edition, test out whether you like this whole idea of dictation, and then move up to a full-featured version without having to go through training all over again.
Which edition is best for you depends on why you’re interested in NaturallySpeaking in the first place. Are you a poor typist who wants to be able to create documents more quickly? A good typist who is starting to worry about carpal tunnel syndrome? A person who can’t use a mouse or keyboard at all? A busy executive who wants to dictate into a recorder rather than sit in front of a monitor? Is price an important factor to you? Do you need NaturallySpeaking to recognize a large, specialized vocabulary? Do you want to create macros that enable you to dictate directly into your company’s special forms?
The more features you want, the more you should expect to pay.
Speech recognition software is entrenched in many private sector industries. Dragon NaturallySpeaking serves several industries, including the following:
The public sector uses Dragon NaturallySpeaking as follows:
The current generation of NaturallySpeaking, version 13, was released in the second half of 2014. In addition to the usual bug fixes and incremental improvements that you expect in a new version of an application, NaturallySpeaking 13 brings the following five major enhancements:
Here is the current lineup of some of the NaturallySpeaking products, with a few comments about their features:
In addition to these off-the-shelf products, you can also have NaturallySpeaking installed on your office network. This corporate option goes beyond the scope of what I cover in this book. If you’re interested in this option, contact Nuance directly. Training programs for your staff are also available.
Nuance has eliminated the training during set-up. But if you really want to make NaturallySpeaking blazing fast, you should train it to understand you and the special words and phrases you use. So why does NaturallySpeaking need to be customized before it understands your speech? The simple answer is that speech recognition is probably one of the hardest things your computer does. Humans may not think speech recognition is hard, but that’s because they are good at it. LeBron James probably has trouble understanding why the rest of us think it’s so hard to dunk a basketball.
This section explains why deciphering speech is hard for computers and how training NaturallySpeaking can overcome these difficulties. I hope that understanding these issues will give you confidence that the extra effort is worth it.
NaturallySpeaking comes out of the box not knowing anything about you. It has to work as well for a baritone with a Scottish accent as for a mezzo-soprano with a slight lisp. It needs time to figure out how you talk.
The exact error rate depends on many factors: how fast your computer is, how much memory it has, how good your microphone is, how quiet the environment is, how well you speak, what sound card your computer has, and so on.
If 3-year-olds can recognize and understand speech (other than the phrase “go to bed”), why is it so hard for computers? Aren’t computers supposed to be smart?
Well, yes and no. Computers are very smart when it comes to brain-straining activities like playing chess and filling out tax returns, so you may think they’d be whizzes at “simple” activities like recognizing faces or understanding speech. But after about 50 years of trying to make computers do these simple things, programmers have come to the conclusion that a skill isn’t simple just because humans master it easily. In fact, our brains and eyes and ears are chock-full of sophisticated sensing and processing equipment that still runs rings around anything we can design in silicon and metal.
We humans think it’s simple to understand speech because all the really hard work is done before we become conscious of it. To us, it seems as if English words just pop into our heads as soon as people open their mouths. The unconscious (or preconscious) nature of the process makes it doubly hard for computer programmers to mimic. If we don’t know exactly what we’re doing or how we do it, how can we tell computers how to do it?
To get an idea of why computers have such trouble with speech, think about something that they’re very good at recognizing and understanding: touch-tone phone numbers. Those blips and bloops on the phone lines are much more meaningful to computers than they are to people. Several important features make the phone tones an easy language for computers, as I discuss in the following list. English, on the other hand, is completely different:
In order to work effectively for you, a speech-recognition program like NaturallySpeaking needs to combine four vastly different areas of knowledge. It needs to know a lot about speaking in general, about the spoken English language in general, about the way your voice sounds, and about your word-choice habits.
Dragon NaturallySpeaking gets its general knowledge from the folks at Nuance, some of whom have spent most of their adult lives analyzing how English is spoken. NaturallySpeaking has been programmed to know in general what human voices sound like, how to model the characteristics of a given voice, the basic sounds that make up the English language, and the range of ways that different voices make those sounds. It has also been given a basic English vocabulary and some overall statistics about which words are likely to follow which other words. (For example, the word medical is more likely to be followed by miracle than by marigold.)
NaturallySpeaking learns about your voice by listening to you. During the training process, you read out loud some text selections that NaturallySpeaking has stored in its memory. Because it already knows the text that you’re reading, NaturallySpeaking uses this time to model your voice and learn how you pronounce words.
NaturallySpeaking goes on learning about your voice every time you use it. When you correct a word or phrase that NaturallySpeaking has guessed wrong, NaturallySpeaking adjusts its settings to make the mistake less likely in the future.
Dragon learns even when you make keyboard edits or delete everything and start over, taking the net result of your dictation or typed text and using it to help improve it’s accuracy.
NaturallySpeaking learns in two ways:
Initially, NaturallySpeaking learns how you choose words from the Vocabulary Builder phase of training. It may seem as if NaturallySpeaking is just learning how you say some unusual words. But, in fact, Vocabulary Builder is worthwhile even if no new words are found, because NaturallySpeaking analyzes how frequently you use common words and which words are likely to be used in combination.
NaturallySpeaking comes out of the box knowing general facts about the frequency of English words, but Vocabulary Builder helps sharpen those models for your particular vocabulary.
For example, if you want to use NaturallySpeaking to write letters to your mother and you let it study your previous letters, NaturallySpeaking will learn that the names of your family members appear much more frequently than they do in general English text. It is then much less likely to misinterpret your brother Johan’s name as John or yawn.
By enabling you to install NaturallySpeaking, your computer has taken on one of the hardest tasks a PC ever faces. It needs your help. If you train it with patience and persistence, and if you gently but firmly correct your NaturallySpeaking assistant whenever it makes a mistake, you’ll be rewarded with a computer that takes your verbal orders and transcribes your dictation without complaint (and even without a coffee break, unless you need one).
18.118.137.7