ALDA Speech Recognition Panel - Part 1
Automatic Speech Recognition Systems as a Conversational Aid by
People Who are Deaf or Hard of Hearing Presentation
October 1999 Association for Late-Deafened Adults Conference
Reported by Jim House, TDI Presenters: (2 sessions)
Dr. Carl Jensema - Institute on Disability
Dr. Ross Stuckless - NTID/RIT
Dr. Judy Harkins - Gallaudet Technology Assessment Program
Dr. Anita Haravon - Lexington School/Center for the Deaf
In 1876, Alexander Graham Bell announced his new invention - the
telephone. Ironically, this is probably the earliest attempt to create
visible speech as an aid for his wife and daughter, who both have
hearing loss.
In 1950, Bell Labs created the first real speech recognition machine
that matched spoken audio patterns with patterns stored in the system.
It was also speaker dependent and needed extensive training for a
vocabulary of just 10 words. The president of Bell Labs did not see much
of a future in automatic speech recognition (ASR) and gave the project
very little support.
In the '50's and '60's, researchers found that ASR was a much tougher
goal than they originally expected. Speech recognition requires many
calculations in a very short time and further development has depended
on the availability of faster computers. There are many calculations
based on mathematical, audiological and computer science formulas.
Researchers decided to focus on developing a system that could recognize
the discrete speech of one person who paused between words and developed
a small vocabulary of 50 words or less. During this period, IBM and
Carnegie Mellon University in Pittsburgh, PA did much of the basic ASR
research.
During the early '70's, Threshold Technology, Inc. developed the
first real ASR product called the VIP-100 System. It had little
practical application, but nevertheless, it drew the interest of the
Advanced Research Projects Agency (ARPA) from the US Department of
Defense. ARPA began to fund Speech Understanding Research (SUR) projects
from 1971 through 1976 to three contractors: Carnegie Mellon University;
Bolt, Beranek & Newman; and System Development Corporation. These
contractors were to build an ASR system that had the ability to
recognize multiple speakers using continuous speech and 1,000-word
vocabulary. Only one of those contractors met these specifications.
Carnegie Mellon University's "Harpy" recognized 1,011 words
with 95% accuracy.
ARPA continued to support SUR projects during the '80's as personal
computers became available. Carnegie Mellon University went on to
develop what is now the Dragon Dictate speech recognition system. It was
one of the first ASR system to use the hidden Markov modeling, a popular
technique used by almost all ASR systems. IBM was also very active in
ASR and did important work on statistical modeling techniques. ASR
research began to focus on developing larger vocabulary systems and on
telephone interactive voice menus that uses a small vocabulary while
being speaker independent. The best systems were able to recognize
discrete speech from one speaker after weeks of training, on known
subject materials without background noise and attain 90% accuracy.
Several more companies began to develop their own ASR products such as
Dragon Systems, Inc.; IBM; ITT Defense Communications; Kurzweil AI;
Mimic; Speech Systems, Inc.; Vocollect; Voice Connexion; Voice Control
Systems; and Voice Processing Corporation.
In the last decade, personal computers reached the point where speech
recognition could be accomplished quickly with the introduction of the
486 processor in 1989. From there, speech recognition power increased
dramatically and prices plummeted on ASR systems. Large vocabularies
became the norm while continuous speech recognition and artificial
neural networks were introduced in commercially available systems.
Standards for computer application programming interfaces (API) began to
emerge and many applications of ASR appeared. More and more technology
companies are entering the ASR field, especially those involved in the
computer and telephone industries. Computing power has bloomed since the
486 processor. The Pentium Pro chip has a speed of 200mhz; Pentium II
has 300mhz and the Pentium III chip in use today has more than 600mhz.
Today's chip speed is 12 times as fast as the 486 processor ten years
ago. Within a year, we can expect to see computers on the market that
run at 1,000mhz.
ASR systems on the market today are relatively inexpensive and easy
to use. It helps if the speaker is reasonably computer literate and
wears a headset. The speaker also has to prepare in advance by entering
specialized words to vocabulary, and spend approximately 30 minutes
training the system to recognize his or her voice patterns before
initial use. As the speaker uses the ASR system, he or she can speak
naturally and continuously, but watch out for false starts like
"umm" "ahh", monitor the output on the screen and
fix errors as they occur.
ASR will become a common feature on personal computers as computing
power continues to grow. Major speech recognition systems in use today
include:
Dragon System's Naturally Speaking
IBM's Via Voice
Lernout & Hauspie's (L&H) Voice Xpress
Philips Dictation Systems' Free Speech
Here's Part Two