Decoding speech from brain activity is a long-awaited goal in the fields of health and neuroscience. Invasive devices have recently taken major steps in this regard: deep learning algorithms trained on intracranial recordings can now begin to decode basic linguistic features such as letters, words, and audio spectrograms.
Alexandre Défossez, Charlotte Caucheteux, Jérémy Rapin, Ori Kabeli and Jean-Rémi King from (Meta/Facebook) and the École Normale Supérieure in France describe a computer model trained to decode representations of perceived speech from non-invasive recordings of a large cohort of healthy individuals.
To evaluate this approach, they used four public datasets, totalizing the recordings of 175 volunteers by magnetoencephalography or electroencephalography while they listened to short stories and isolated sentences. The languages of these texts were English and Dutch. Their results show that the model can identify, from 3 seconds of magnetoencephalographic signals, the corresponding speech segment with up to 41% accuracy, which is, however, lower than previous results. Furthermore, we know that the more complex the sentence, the less precise the results. The texts here are extremely short.
An excellent example of previous work is the one published recently by Francis R. Willet and colleagues which demonstrates a speech-to-text BCI that records spiking activity from intracortical microelectrode arrays.
Their participant who can no longer speak intelligibly owing to bulbar-onset amyotrophic lateral sclerosis attempted speech was decoded at 62 words per minute, which is 3.4 times as fast as the previous record and begins to approach the speed of natural conversation (160 words per minute9 ). The implant needs surgeons to open the skull, and find the optimal location, and when the operation is completed, the patient has a small box on their skull that is connected to a computer via cables. The box could certainly be miniaturized in the future and the cables replaced by Bluetooth or a similar radio device, yet this is quite invasive. Extending this approach to non-invasive brain recordings remains a challenge.
“We were surprised by the decoding performance obtained,” King said. "In most cases, we can retrieve what the participants hear, and if the decoder makes a mistake, it tends to be semantically similar to the target sentence."
“Our team is devoted to fundamental research: to understand how the brain works, and how this functioning can relate and inform AI,” King said. "There is a long road before a practical application, but our hope is that this development could help patients whose communication is limited or prevented by paralysis. The major next step, in this regard, is to move beyond decoding perceived speech and to decode produced speech."
However, such a system, if it uses magnetoencephalography as suggested, is even less practicable than that of Willet. A miniaturized magnetoencephalography helmet currently requires equipment weighing approximately one ton.
However, 29% of the samples used by the French scientists were recorded by EEG, which makes it possible for a patient to use such a headset as well as computer equipment. Here too we could consider replacing the cables with a wireless connection. However, we know that the results obtained by EEG are of lower quality than those obtained with an implant.
The road to easy-to-use brain-speech interfaces is still long ahead of us.