Tech Talk
June 23, 2006

Speech Recognition – Ready for Prime Time?

There are a couple areas in which speech recognition can be extremely useful. For one, there are physically challenged people who don’t have proper control over their arms and hands, and yet they can speak easily. Given how pervasive computers have become in everyday life, flat out denying access to certain people would be unconscionable.

Many businesses are finding speech recognition to be useful as well – or more appropriately, voice recognition. (The difference between speech recognition and voice recognition is that voice recognition generally only has to deal with a limited vocabulary.) As an example, warehousing job functions only require a relatively small vocabulary of around 400 words, and allowing a computer system to interface with the user via earphones and a microphone can free up the hands to do other things. The end result is increased productivity and reduced errors, which in turn yields better profitability.{{more}}

Health Considerations: There’s at least one more area that can be a direct benefit to many people. Typing on the keyboard for many hours every day is not the healthiest of practices. Every keyboard on the market today carries a warning about repetitive stress injuries (RSI), and with good reason. Not everyone will have problems, and not everyone that has problems will experience the same degree of discomfort. However, the more you type and the older you get, the greater your chance for developing RSI from computer use.

There are many things you can do to try and combat carpal tunnel problems. Some people feel that ergonomic keyboards will help, getting a better chair and desk will also help – you want a chair and desk that will put your wrists and hands in the proper position in order to minimize strain; if you’re not comfortable sitting at your computer, you should probably invest in a new chair at the very least. Even with modifications to your work area, though, there’s a reasonable chance that you’ll still have difficulty. You might consider surgery, but while that will generally help 70% of people initially, many find that discomfort returns within a couple years.

The simple fact of the matter is that the best way to avoid RSI complications is to eliminate the repetitive activity that’s causing the problem in the first place. That means that if typing on a keyboard is giving you CTS, the best way to alleviate the problem is to not type on a keyboard anymore.

That makes it rather difficult to write for a living, as you can imagine. Of course, it usually isn’t necessary to completely stop an activity that’s causing RSI. The phrase itself gives you an idea of how to avoid difficulties: avoid excessive repetition.

That brings us to the present topic: speech recognition. Used properly, speech recognition has the potential to eliminate a large portion of your typing, among other things. Languages are complex enough and learning a new language is always difficult. We spend years growing up in an environment, learning the language, learning the rules, developing our own accent, etc. No two people in the world are going to sound exactly alike, and it goes without saying that everyone makes periodic mistakes in grammar and pronunciation while speaking.

Programming a computer so that it understands everything that we say, corrects the mistakes, and gets all the grammar correct as well is a daunting task at best. As time has passed, computers have gotten faster and the algorithms have improved, and we’re at the point now where real-time speech recognition is actually feasible. Mistakes will still be made, and dealing with different accents and/or speech impediments only serve to make things more difficult, but for many people it is now possible to get accuracy higher than 90%. That isn’t that great, as it means one or two mistakes per sentence, but it’s a good place to start.

The Contenders: If you decide to try out speech recognition, there is an overwhelming favorite on the market: Dragon Naturally Speaking. It is generally well-regarded, and once you obtain your copy there are some prerequisites. The training process takes about 20 minutes, another 20 or 30 minutes will be spent scanning your documents for words and speech patterns, and then it is basically done and you will be ready to start dictating. Dragon isn’t a particularly cheap piece of software, but when you consider the versatility it offers it is well worth it.

Of course, Microsoft Office 2003 also has built-in speech recognition. Many businesses in the world have a copy of Microsoft Office 2003 installed, so perhaps, there isn’t even a need to go out and purchase separate speech recognition software. One other item that may be of interest is how much processing time each product needs. Voice recognition may or may not benefit from dual core processors, but there’s only one way to find out. Both Dragon and Microsoft Office have the ability to adjust the speed of speech recognition against accuracy, for Dragon, there are essentially six settings, ranging from minimum accuracy to maximum accuracy. The slider can be adjusted in smaller increments, but if you click in the slider bar it will jump between six positions, with each one bringing a moderate change in performance, and possibly a change in accuracy.

My experience with using Microsoft’s speech tool is that it is best used for rough drafts and that you shouldn’t worry about correcting errors initially. Once you’ve got the basic text in place, then you should go through and manually edit the errors. That’s basically what Microsoft’s training wizard tells you, as well, so immediately their goals seem less ambitious – and thus their market is also more limited. At first glance, both of these speech recognition packages appear pretty reasonable. Dragon is more accurate in transcribe mode, but it also requires more processing time. Both also manage to offer better than 90% accuracy, but as stated earlier that really isn’t that great. I would say that 95% accuracy is the bare minimum you want to achieve, and more is better. If you already have Microsoft Office 2003, the performance offered might be enough to keep you happy. It’s still not perfect, but speech recognition software has become a viable alternative to everyday document preparation.