enCaption from ENCO Systems
Editor: ENCO Systems' enCaption is a speaker-independent voice
recognition (VR) system that is being proposed as a solution for TV
Captioning during emergencies. The following article from NVRC discusses
a recent enCaptions demonstration. It talks about accuracy rates in the
85% to 95% range. The company is trying to get support from the hearing
loss community to implement this system, and the folks at the meeting
seemed to be supportive.
I wasn't there, so take my comments with a grain of salt, but I think
that accepting 85% to 95% accuracy for standard captioning is a big
mistake, and accepting it for emergency captioning is out of the
question. ENCO's questions to the audience included, "Is something
better than nothing?" The answer to that is probably
"Yes", but it's not really a fair question. The FCC currently
requires 100% captioning in emergency situations. The issue is not that
the law isn't clear, but that the TV stations are ignoring it and the
FCC seems to be giving them a pass on it. The proper goal here is to
strengthen the law to require VERY HIGH accuracy in emergency
situations, not to accept 85%.
I'm a big fan of VR technology, and I use it myself to caption
presentations. But I'm not convinced that even the systems that are
trained to a specific voice are ready for prime time, much less those
that are speaker independent.
Those are my thoughts. What are yours?
~~~~~~~~~~~~~~~~~~~
enCaption from ENCO Systems: Solution for TV Captioning During
Emergencies?
At the close of NVRC's December 9, 2004 meeting with representatives
of consumer organizations, area TV stations, and Federal Communications
Commission staff, ENCO Systems, Inc. demonstrated its enCaption system.
Don Backus, ENCO's Vice President of Sales & Marketing, told the
audience that ENCO has been around since the early 1980s. It got its
start with some industrial automation projects and in 1992 it moved into
digital delivery systems for broadcasters. Some current clients include
CNN, WTOP, ESPN and The Weather Channel.
For the past 10-15 years, ENCO has worked in its lab on speech
recognition technologies. The company believes it is finally at a point
where its enCaption product can provide an alternative to traditional
captioning systems now used for TV news.
Eugene Novacek, President of ENCO, did a 90-second demonstration of a
video recorded off the air in Detroit with audio processed through a
computer. The accuracy was in the mid-90% range. The enCaption system
basically listened to the audio and created text. He said that a box
small enough to be quite portable can now provide the power needed to do
speech recognition accurately, when just a few years ago the box would
have needed to be half the size of NVRC's large meeting room.
The enCaption system doesn't require a phone line and a captioner at
the other end; it is a 'captioner in a box'.
ENCO's questions to the audience were:
"Is something better than nothing?
Are deaf and hard of hearing individuals interested in supporting this
kind of technology and encouraging broadcasters to utilize it?
The following is the Q&A session with Gene Novacek.
Q: Does enCaption software require the system to be trained to each
individual person's voice?
A: No. The system is speaker independent. In fact, ENCO is now working
on a way to identify speaker names. When there is time in advance to
train it to recognize my voice, it will say Gene Novacek and the text
will come out very accurately, 85% and up. As it learns more and more,
it improves. This system is intended for breaking news and impromptu
voice, not for soap operas or production material.
Q: If voice recognition is not the issue, what accounts for the 10%
inaccuracy?
A: Mumbling. Bad audio. Two people talking at once, a very difficult
thing for a computer to discriminate. A news broadcast is going to be
one reporter talking, but every once in a while the greatest of
reporters may mumble, or the microphone may move. Technology is not
perfect. It will get better, but is this good enough? Are we close
enough? If not, I'll go away for three or four years and come back when
it's at a higher level. It's taken 12-15 years to get this far. This is
the first time we've shown the system to this type of group.
Comment: Our organization has been behind speech recognition research
for years. A new relay service using speech recognition is a tremendous
hit with our community. It's not perfect -- it's less perfect than this
-- but for our population it can be used in a variety of situations, not
just breaking news. We're behind you 100%.
Response: Could I tell the television broadcasters that? Broadcasters
spend millions to provide the best. When I show them something that's
only 90% accurate or whatever it is, they say 'I can't put that on
there; somebody will complain.'
Comment: You're asking if the consumer organizations would support
that kind of technology? The short answer is yes. We give you the full
support if you can improve the technology. It will be very interesting
to get more information within this area.
Comment: This is very exciting, but I've seen things come and go but
I'm very skeptical. I would need to see a live demonstration before I
could give 100% support.
Q: I notice there is a slight delay from the time words are spoken
until they appear as captions. Why is there a delay?
A: I taught the system the English language. It's not just spitting out
a word at a time, like reading a book. It's reading four seconds' worth
of data and putting it into context and spitting it out. With more
development, which needs time and money, it could be improved. Money is
running out. Millions have been spent developing the system. It's been
tested in Michigan and some other places, but feedback from the deaf and
hard of hearing community hasn't been received yet. I'm interested in
whether the FCC representatives think this is a level of accuracy that
the FCC would be happy with.
Comment (Janet Sievert, FCC Disability Rights Office): Closed
captioning rules were established when closed captioning was new, and
they did not establish quality control. It's something we're looking at.
There are inaccuracies in live captioning as it is, and we get
complaints about those on a regular basis. But 90% is pretty good if
you're looking at standards.
Q: Is enCaption done entirely on site at the station?
A: Yes. It's a direct link from the microphones in the studio through
the master control, but with some audio processing so we don't have
music and applause. We're just trying to get the microphones of speech,
even the remote microphones. We've done a lot of testing with that. The
audio is fed into the enCaption box at the television station site and
from that it is fed directly into the line 21 captioning encoder, which
is already at every TV station. There is nothing back to ENCO, other
than maintenance and service and financial requirements. There's also a
link to the news room computer for the broadcasters, to improve the
accuracy. If I never said the word 'Schwarzenegger', it might caption as
'Schwarz in eggs'. So words can be fed in that might be expected.
Q: Does your software match with the vocabulary base, like 30,000
words in the software?
A: Right now there's a library that is 100 million characters long. I
may have the same word over and over and over, but placed in context, so
it knows the difference between nouns and verbs and adjectives.
Captioning doesn't have upper and lower case requirements right now. We
did a lot of work to support upper-lower, should it some day be
required. The system also allows for automatic placement, so if there's
a crawl on the bottom of the screen that's being fed in, our captioning
doesn't automatically go to the bottom. We have heard of these problems
over the years and have tried to account for them.
Q: What about punctuation? If you have a long stream of words and
it's not broken up, it's hard to read. I didn't notice any punctuation.
A: That's correct. It's an ongoing effort. The next step is speaker
identification to let you know when a speaker has changed, and then we
will work on punctuation. The only way we can detect punctuation is by
pauses. I didn't want to hold up the whole product for that. Money
generated from selling the service will go to improve the product.
Q: Will companies then be able to get upgrades?
A: Yes. Every time I come up with something new, all the systems would
get upgraded.
Q: Is there a means for automatic correction like with CART when a
word is misspelled, to go back and fix it?
A: No.
Q: So it would go out that way until you go in and physically correct
it?
A: The computer would do that automatically. We don't want to have news
people required to be on site to sit at this thing. So with a line to
the news room computer, the script for Arnold becoming governor of
California would be in that computer. The enCaption system would grab it
and know it's spelled correctly. If something comes up that is brand
new, it may be spelled phonetically. It's an evolutionary thing. It will
never be worse than it is today; it can only get better.
For more information:
The website for enCaption has TV segments and demonstrations in English
and other languages. ENCO supports seven other languages. You can
download a brochure about enCaption. There's also a price list; prices
are based on the size of the station's market and whether it captions 30
minutes or less of news each day. www.encaption.com
(c)2004 by Northern Virginia Resource Center for Deaf and Hard of
Hearing Persons (NVRC), www.nvrc.org. When sharing this information,
please ensure credit is given to NVRC.