In a scientific first, Columbia neuroengineers have created a system that translates thought into intelligible, recognisable speech. By monitoring someone's brain activity, the technology can reconstruct the words a person hears with unprecedented clarity. This breakthrough, which harnesses the power of speech synthesisers and artificial intelligence, could lead to new ways for computers to communicate directly with the brain. It lays the groundwork for helping people who cannot speak, such as those living with amyotrophic lateral sclerosis (ALS) or recovering from a stroke, regain their ability to communicate with the outside world.
The findings have been published in the journal Nature. Nima Mesgarani, PhD, the paper’s senior author and a principal investigator at Columbia University’s Mortimer B. Zuckerman Mind Brain Behavior Institute said: “Our voices help connect us to our friends, family and the world around us, which is why losing the power of one’s voice due to injury or disease is so devastating. With today’s study, we have a potential way to restore that power. We’ve shown that, with the right technology, these people’s thoughts could be decoded and understood by any listener.”
Early efforts to decode brain signals by Dr. Mesgarani and others focused on simple computer models that analysed spectrograms, which are visual representations of sound frequencies.
But because this approach has failed to produce anything resembling intelligible speech, Dr. Mesgarani and his team, including the paper's first author Hassan Akbari, turned instead to a vocoder, a computer algorithm that can synthesize speech after being trained on recordings of people talking.
To teach the vocoder to interpret to brain activity, Dr. Mesgarani teamed up with Ashesh Dinesh Mehta, MD, PhD, a neurosurgeon at Northwell Health Physician Partners Neuroscience Institute and co-author of today’s paper. Dr. Mehta treats epilepsy patients, some of whom must undergo regular surgeries.
“Working with Dr. Mehta, we asked epilepsy patients already undergoing brain surgery to listen to sentences spoken by different people, while we measured patterns of brain activity”, said Dr. Mesgarani. “These neural patterns trained the vocoder.”
Next, the researchers asked those same patients to listen to speakers reciting digits between 0 to 9, while recording brain signals that could then be run through the vocoder. The sound produced by the vocoder in response to those signals was analyzed and cleaned up by neural networks, a type of artificial intelligence that mimics the structure of neurons in the biological brain.