Hearing Loss Products and Services
Advertise on Hearing Loss Web
Search This Site or the Web

Free Email Newsletter

Jobs, Jobs, Jobs

Hearing Loss Web Banner
Discussion Forum
Hearing Loss Events
Last Update: Nov 8

 

Home

About Us

Search

New to Hearing Loss?
In the News

Discussion Forum

HOH-LD-News

Advertise

Contact Us

Glossary

Events

 

Issues

Access

Oral Communications

Emergency Planning

Employment

Family

Hearing Aid Affordability

Identity

Law Enforcement

Psychological

Services

 

Medical

Audiology

Causes

Cures

Meniere's Disease

Tinnitus

Local Resources and Events
 
Employment Opportunities
 
Education Opportunities
 

Hearing Loss Products and Services

Advocates and Legal
Alerting Devices
Assistive Listening Devices
Business Services

Captioning

Financial Services
General Stores

Government

Health Products and Services
Hearing Aids
Hearing Aid Accessories
Hearing Aid Batteries
Hearing Aid Maintenance
Hearing Aid Repair
Hearing Dogs
Hearing Loss Organizations
Hints and Tips
Kids' Stuff
Medical Products and Services
Pagers

Publications

Relay Service
Sign Language Materials
Telecommunications Distribution Program

Telephones

Travel

TTYs (TDDs)

TTY Repairs

Two-Way Pagers

Technology

Alerting Devices

Assistive Listening Devices

Cochlear Implants

Hearing Aids

Speech Recognition

Telephones

Two Way Pagers

TTYs (TDDs)

Visual Communications

Links

ALDA Speech Recognition Panel - Part 2

Here's Part One!

Not all speech recognition tasks are equal. Some speech recognition situations are much easier than others. Early speech recognition since the '70's depended on favorable conditions such as small vocabulary, discrete speech, known subject material, one speaker who spent weeks training the system to match voice patterns, no background noise and be able to attain 90% accuracy. The ideal system would be able to function under difficult conditions and parameters such as a large vocabulary, continuous natural speech, random subject materials, many speakers who have little or no training on the system, much background noise and yet still attain 99% accuracy.

The technology is still years away from being able to handle a "hard" speech recognition task. For example, a person cannot just walk into a noisy party, point a microphone at someone and then read his or her speech on a screen. That kind of technology is not here yet, although speech recognition has advanced to the point where it can be useful in many situations. After a speech recognition system is properly trained for a specific user, it will perform about as well as a good typist. This means it could be used by deaf people in some interpreting situations, such as in a business meeting or in a classroom. The general state of ASR technology today includes large vocabulary, continuous speech, random subject material, one speaker after a few hours of training on the system, limited background noise and a 95% accuracy.

Accuracy depends on the speaker and the speech recognition system used. Some people have clearer speech, or at least speech that the computer finds easier to analyze. A speech recognition system may have difficulty understanding a person who is under stress or who has a cold. Speech recognition systems have a very hard time understanding deaf speech because deaf speech usually does not conform to the usual speech patterns the computer is expecting. The presenters favor Dragon's Naturally Speaking, as it seems to have slightly better accuracy than other models currently on the market.

Dr. Ross Stuckless points out that IBM has worked closely with the National Institute for the Deaf at Rochester Institute of Technology (NTID/RIT) using Via Voice and is more familiar with the needs of users who are deaf and hard of hearing. He also demonstrated his Dragon-Dictate speech recognition software. Despite logging 30 hours of training on his ASR system, he doesn't believe its claims of 98% accuracy rate. Instead he feels that regular practical use has more errors but the ASR technology is improving. Newer ASR systems help performance with less training time and larger vocabulary.

In ASR systems, the speaker needs to voice in the punctuation marks and the system does not distinguish one speaker from another. An example of a phone conversation on ASR: would you pick up some chinese on your way home from work tonight I dont have time to make dinner sure if you don't mind eating late I have a lot of work on my desk and probably wont be home until after eight thats ok what do you want me to pick up you know the usual remember as for no fat no salt yuk

At Gallaudet University, Dr. Judy Harkins, Director of the Technology Assessment Program (TAP) researched on the viability of ASR systems as a communication aid for people who are deaf and hard of hearing. The research questions were: How successful and efficient is ASR as a communication aid compared to CART or lipreading? What can be done to improve the effectiveness for conversation? What happens over time, with practice?

There were two single subject studies. Deaf participants had severe to profound loss resulting from progressive or sudden adult onset and had good oral and English literacy skills. Hearing participants have had previous experience communicating with deaf and hard of hearing people.

In order to have some predictable results, TAP used the Map Task, where two people have maps of the same place but different in detail. One has a route written on it, the other is blank. One person gives the other instructions on how to draw the route on the map. In the Map Task, the words used are pretty much the same and varies little between different speakers. One may say, "Turn right on Main Street" and the other may say, "Go to Main Street, turn right."

In the first phase, the experimental condition was face to face only. Then in the second phase, participants used face to face and CART or face to face and ASR. Finally, it was CART only or ASR only.

From the pilot test, TAP researchers came up with some preliminary findings. Practice is crucial to success. It took the same amount of time to complete the Map Task in face-to-face and ASR alone conditions. ASR alone used fewer words and there were fewer requests for clarification. It sometimes help if the speaker hits the key two or three times periodically to help the reader stay in place and get rid of sentences that has lots of errors. "Saying" the punctuation in ASR can be either helpful or distracting, depending on the situation. The speaker may need to maintain eye contact with the screen to help performance, although it is less natural than looking at the conversational partner.

In comparison, real-time steno-captioning or computer assisted real-time transcription (CART) comes out ahead of speech recognition as a conversational aid. Both real-time captioning and CART uses a skilled stenographer who uses a special keyboard to type in phonetic symbols that translate into a readable transcription of the dialogue. Having multiple speakers is not a problem with CART unlike in ASR; the program must unload the first speaker from the memory and then load the next speaker. Mistranslated words can be corrected quicker on a CART system than on an ASR system.

Today's ASR systems don't seem to work well over the telephone. However, Ultratec and Sprint are now conducting joint trials using ASR in an effort to boost the speed of relay conversations. Instead of using ASR to replace the relay, the CA repeats the hearing party's message into a computer that has been trained to recognize the CA's voice. If the trial is successful, this will cut down on typing keystrokes and requests for clarification, thereby reducing the error rate on relay calls.

Despite its drawbacks, ASR has proved itself to be a popular trend. Computer users with physical disabilities use it all the time. Some computers are linked to the house's heating and cooling system so a quadriplegic can adjust the thermostat by voice. Students at NTID/RIT have requested ASR training so they can voice their homework. Industry insiders predict that most computer users will soon select ASR over typing on a keyboard to write letters. In his latest book "The Age of Spiritual Machines: When Computers Exceed Human Intelligence," Ray Kurzweil predicts that palm-sized computers with ASR technology will be in wide use within ten years.

For further information on ASR, here are some other links:

http://www.speechxp.com/commercial/speech.htm
(This site has many technical links but not all of them.)

http://www.hearingresearch.org
(Lexington's RERC on Hearing Enhancement)

http://tap.gallaudet.edu
(Gallaudet Technology Access Project)

http://www.rit.edu/~Klweie/asr.htm
(First deaf female engineer in study on human dynamics at NTID/RIT)

http://www.wired.com/news/email/member/technology/story/22048.html
(University of Southern California)

http://www.scientificamerican.com/1999/0899issue/0899quicksummary.html
(MIT's Oxygen Project)