How selective hearing works in the brain
April 2012
The longstanding mystery of how selective hearing works - how people can
tune in to a single speaker while tuning out their crowded, noisy environs -
is solved this week in the journal Nature by two scientists from the
University of California, San Francisco.
Psychologists have known for decades about the so-called "cocktail party
effect," a name that evokes the "Mad Men" era in which it was coined. It is
the remarkable human ability to focus on a single speaker in virtually any
environment - a classroom, sporting event or coffee bar - even if that
person's voice is seemingly drowned out by a jabbering crowd.
To understand how selective hearing works in the brain, UCSF neurosurgeon
Edward Chang, M.D., a faculty member in the UCSF Department of Neurological
Surgery and the Keck Center for Integrative Neuroscience, and UCSF
postdoctoral fellow Nima Mesgarani, Ph.D., worked with three patients who
were undergoing brain surgery for severe epilepsy.
Part of this surgery involves pinpointing the parts of the brain
responsible for disabling seizures. The UCSF epilepsy team finds those
locales by mapping the brain's activity over a week, with a thin sheet of up
to 256 electrodes placed under the skull on the brain's outer surface or
cortex. These electrodes record activity in the temporal lobe, home to the
auditory cortex.
UCSF is one of few leading academic epilepsy centers where these advanced
intracranial recordings are done, and, Chang said, the ability to safely
record from the brain itself provides unique opportunities to advance our
fundamental knowledge of how the brain works.
"The combination of high-resolution brain recordings and powerful
decoding algorithms opens a window into the subjective experience of the
mind that we've never seen before," Chang said.
In the experiments, patients listened to two speech samples played to
them simultaneously in which different phrases were spoken by different
speakers. They were asked to identify the words they heard spoken by one of
the two speakers.
The authors then applied new decoding methods to "reconstruct" what the
subjects heard from analyzing their brain activity patterns. Strikingly, the
authors found that neural responses in the auditory cortex only reflected
those of the targeted speaker. They found that their decoding algorithm
could predict which speaker and even what specific words the subject was
listening to based on those neural patterns. In other words, they could tell
when the listener's attention strayed to another speaker.
"The algorithm worked so well that we could predict not only the correct
responses, but also even when they paid attention to the wrong word," Chang
said.
The new findings show that the representation of speech in the cortex
does not just reflect the entire external acoustic environment but instead
just what we really want or need to hear.
They represent a major advance in understanding how the human brain
processes language, with immediate implications for the study of impairment
during aging, attention deficit disorder, autism and language learning
disorders.
In addition, Chang, who is also co-director of the Center for Neural
Engineering and Prostheses at UC Berkeley and UCSF, said that we may someday
be able to use this technology for neuroprosthetic devices for decoding the
intentions and thoughts from paralyzed patients that cannot communicate.
Revealing how our brains are wired to favor some auditory cues over
others it may even inspire new approaches toward automating and improving
how voice-activated electronic interfaces filter sounds in order to properly
detect verbal commands.
How the brain can so effectively focus on a single voice is a problem of
keen interest to the companies that make consumer technologies because of
the tremendous future market for all kinds of electronic devices with
voice-active interfaces. While the voice recognition technologies that
enable such interfaces as Apple's Siri have come a long way in the last few
years, they are nowhere near as sophisticated as the human speech system.
An average person can walk into a noisy room and have a private
conversation with relative ease - as if all the other voices in the room
were muted. In fact, said Mesgarani, an engineer with a background in
automatic speech recognition research, the engineering required to separate
a single intelligible voice from a cacophony of speakers and background
noise is a surprisingly difficult problem.
Speech recognition, he said, is "something that humans are remarkably
good at, but it turns out that machine emulation of this human ability is
extremely difficult."
The article, "Selective cortical representation of attended speaker in
multi-talker speech perception" by Mesgarani and Chang appears in the April
19 issue of the journal Nature.
This work was funded by the National Institutes of Health and the Ester
A. and Joseph Klingenstein Foundation.
UCSF is a leading university dedicated to promoting health worldwide
through advanced biomedical research, graduate-level education in the life
sciences and health professions, and excellence in patient care.
Source: UCSF