Hearing Loss Products and Services
Advertise on Hearing Loss Web
Search This Site or the Web

Free Email Newsletter

Jobs, Jobs, Jobs

Hearing Loss Web Banner
Discussion Forum
Hearing Loss Events
Last Update: Aug 29

 

Home

About Us

Search this Site

New to Hearing Loss?
In the News

Discussion Forum

HOH-LD-News

Advertise

Contact Us

Glossary

Events

 

Issues

Access

Oral Communications

Emergency Planning

Employment

Family

Hearing Aid Affordability

Identity

Law Enforcement

Psychological

Services

 

Medical

Audiology

Causes

Cures

Meniere's Disease

Tinnitus

Local Resources and Events
 
Employment Opportunities
 
Education Opportunities
 

Hearing Loss Products and Services

Advocates and Legal
Alerting Devices
Assistive Listening Devices
Business Services

Captioning

Financial Services
General Stores

Government

Health Products and Services
Hearing Aids
Hearing Aid Accessories
Hearing Aid Batteries
Hearing Aid Maintenance
Hearing Aid Repair
Hearing Dogs
Hearing Loss Organizations
Hints and Tips
Kids' Stuff
Medical Products and Services
Pagers

Publications

Relay Service
Sign Language Materials
Telecommunications Distribution Program

Telephones

Travel

TTYs (TDDs)

TTY Repairs

Two-Way Pagers

Technology

Alerting Devices

Assistive Listening Devices

Cochlear Implants

Hearing Aids

Speech Recognition

Telephones

Two Way Pagers

TTYs (TDDs)

Visual Communications

Links

Voice Recognition Captioning

February 2003

Last year I wrote about my experience using voice recognition technology to teach a computer class to people with hearing loss. It was a surprisingly positive experience for all concerned, and I mentioned at the time that I foresaw numerous applications for the technology.

I have just finished captioning an ALDA meeting using the same technology. What's different from the computer class is that in this case I was not the speaker. Rather than saying what I wanted to, my job was to repeat what the speakers said into the voice recognition software. This, in itself, is quite different from deciding what to say and just saying it.

An additional difference from speaking my own words is that I had to ensure that my repetition of the speaker's words weren't distracting to the speaker or to others in the audience.

The basic tools in either situation are a reasonably fast laptop computer equipped with voice recognition software and a microphone, an LCD projector, and a screen. I talk into the microphone, which feeds to the voice recognition software via the sound card. The voice recognition software converts the audio to text, which is output to the LCD projector. The projector puts the text up on the screen.

For the computer class I used a headset with microphone. This was appropriate equipment for that situation, because those students with enough hearing to benefit from hearing my voice were able to do so.

For the ALDA meeting, however, I wanted to muffle my voice as much as possible. To do so I used a stenomask from Talk Technologies (http://www.talk-tech.net/pages/sylencer.html). The stenomask fits over the mouth and seals against the fact to muffle the voice. It's not 100% effective, so the audience can still hear something, but it's far less distracting than normal voice volume speaking into a conventional microphone.

To train ViaVoice (the voice recognition software I'm using) for the computer class, I spent about an hour reading the stories that ViaVoice provides for software training. After that training, ViaVoice performed with about 95% accuracy, provided I was careful to speak clearly and distinctly. If I got lazy, the accuracy declined fast.

To use ViaVoice with the stenomask, I had to train a whole new model. Just as the software must be trained for each person who uses it, it must also be trained for each new hardware configuration. Changing a sound card or microphone, or even the background noise, can necessitate a new voice model.

So I trained ViaVoice with the stenomask for an hour using the provided stories, after which the accuracy was probably only 75%. I was disappointed at the performance, but not surprised, because the stenomask really requires an acclimation period. Ensuring a tight seal (to reduce escaping sound) requires that the mask be held firmly against the face. This makes it hard to move the lips in a natural and consistent manner, which certainly degrades the software performance.

The other problem is the restricted air movement that the mask causes. One result is that breathing is different, and that takes some getting used to. More closely related to the accuracy issue is the fact that the sealed stenomask prohibits normal exhalation as a person speaks. The pressure builds up and makes vocalization difficult, which affects how sounds are produced.

I continued training the stenomask voice model for another few hours, but was unable to significantly improve the accuracy.

Hmmmmm. . . . . what to do?

After awhile I realized what the problem was. When I first started using the stenomask, I was not at all used to it, and my speech was not at all natural. With additional training, I became more comfortable with the equipment, and I was able to speak more naturally. But that speech was very different from the speech with which the model had originally been trained! The problem was that the original training was not representative of my later speech, and no reasonable amount of additional training could overcome the original corrupted training.

So I started over with the provided training stories, and was able to get about 90% accuracy after the first hour. I attribute the slightly degraded performance (compared to using the microphone) to the fact that I'm still not entirely comfortable with the stenomask, so I don't speak consistently.

So how did the meeting go?

Very well, actually! The system exceeded my expectations. I was pretty much able to keep up with the speakers, and the accuracy was high, as long as I was careful to speak clearly. But as before, the first hint of lazy speech was brutally punished.

Oh, by the way, the reason I'm doing this is because our ALDA group just lost the funding that paid for CART services. We're looking for new funding, of course, but these are difficult times. It may be that I'll be providing voice recognition captioning for quite some time.

And why am I telling you all this? It's not just because I like to whine ;-} It's because voice recognition is a very real option for organizations that can't afford traditional captioning. If your organization can find a willing volunteer and can borrow an LCD projector, it's very doable at very reasonable cost.

And the quality? I'd say it was as good as some traditional CART reporters I've seen. It's nowhere near the quality of the best CART reporters - yet. But I saved the text and audio files from today's meetings and I'll use them to continue training the system. Between that and more practice time for me, I wouldn't be surprised to be rivaling the best CART reporters in a matter of months.

I'll be happy to do what I can to help anyone who wants to pursue this. Just email me - larry@hearinglossweb.com