For many people who are paralyzed and unable to speak, signals of what they'd like to say hide in their brains. No one has been able to decipher those signals directly. But three research teams recently made progress in turning data from electrodes surgically placed on the brain into computer-generated speech. Using computational models known as neural networks, they reconstructed words and sentences that were, in some cases, intelligible to human listeners. 對於許多癱瘓和不能說話的人來說,他們想說的話的信號隱藏在他們的大腦裡。目前還沒有人能夠直接破譯這些信號。但最近,三個研究小組在將電極植入大腦的數據轉化為計算機生成的語音方面取得了進展。他們利用被稱為神經網絡的計算模型,重建了在某些情況下人類聽眾能夠理解的單詞和句子。
People who have lost the ability to speak after a stroke or disease can use their eyes or make other small movements to control a cursor or select on-screen letters. (Cosmologist Stephen Hawking tensed his cheek to trigger a switch mounted on his glasses.) But if a brain-computer interface could re-create their speech directly, they might regain much more: control over tone and inflection, for example, or the ability to interject in a fast-moving conversation. 中風或疾病後失去說話能力的人可以用眼睛或做其他小動作來控制光標或選擇屏幕上的字母。(宇宙學家史蒂芬·霍金(Stephen Hawking)繃緊了臉頰,觸發了安裝在眼鏡上的開關。)但如果大腦-電腦界面能夠直接重現他們的講話,他們可能會重新獲得更多:例如,對音調和音調變化的控制,或在快速移動的對話中插話的能力。
The hurdles are high. "We are trying to work out the pattern of … neurons that turn on and off at different time points, and infer the speech sound," says Nima Mesgarani, a computer scientist at Columbia University. "The mapping from one to the other is not very straightforward." How these signals translate to speech sounds varies from person to person, so computer models must be "trained" on each individual. And the models do best with extremely precise data, which requires opening the skull. 困難重重。哥倫比亞大學(Columbia University)計算機科學家尼瑪梅斯加拉尼(Nima Mesgarani)表示:「我們正試圖找出……神經元在不同時間點開啟和關閉的模式,並推斷出語音。」「從一個到另一個的映射不是很簡單。」這些信號轉換成語音的方式因人而異,因此計算機模型必須針對每個人進行「訓練」。這些模型最擅長處理極其精確的數據,這需要打開頭骨。
Researchers can do such invasive recording only in rare cases. One is during the removal of a brain tumor, when electrical readouts from the exposed brain help surgeons locate and avoid key speech and motor areas. Another is when a person with epilepsy is implanted with electrodes for several days to pinpoint the origin of seizures before surgical treatment. "We have, at maximum, 20 minutes, maybe 30," for data collection, Martin says. "We're really, really limited." 研究人員只能在極少數情況下進行這種侵入性記錄。一個是在移除腦瘤的過程中,當暴露的大腦發出的電子讀數幫助外科醫生定位和避免關鍵的語言和運動區域。另一種是將電極植入癲癇病人體內數天,以便在手術治療前查明癲癇發作的原因。馬丁說:「我們最多只有20分鐘,或許30分鐘的時間來收集數據。」「我們搜集數據的時間真的、真的很有限。」
The groups behind the new papers made the most of precious data by feeding the information into neural networks, which process complex patterns by passing information through layers of computational "nodes." The networks learn by adjusting connections between nodes. In the experiments, networks were exposed to recordings of speech that a person produced or heard and data on simultaneous brain activity. 這些新發表的論文背後的團隊通過將信息輸入神經網絡,充分利用了寶貴的數據。神經網絡通過計算「節點」層傳遞信息,處理複雜的模式。網絡通過調整節點之間的連接來學習。在實驗中,一個人講話的錄音和同時進行的大腦活動的數據都能夠通過網絡而獲得。
Mesgarani's team relied on data from five people with epilepsy. Their network analyzed recordings from the auditory cortex (which is active during both speech and listening) as those patients heard recordings of stories and people naming digits from zero to nine. The computer then reconstructed spoken numbers from neural data alone; when the computer "spoke" the numbers, a group of listeners named them with 75% accuracy. Mesgarani的團隊依靠5名癲癇患者的數據。他們的網絡分析了來自聽覺皮層的錄音(在聽和說的過程中都是活躍的),這些病人聽到的是故事的錄音,以及人們從0到9來命名數字的錄音。然後,計算機僅用神經數據重建語音數字;當計算機「說出」這些數字時,一組聽眾的正確率達到了75%。
最後,加州大學舊金山分校的神經外科醫生Edward Chang和他的團隊在三個癲癇病人大聲朗讀的同時,從語言和運動區域捕捉到的大腦活動中重構出完整的句子。在一項在線測試中,166人聽到了其中一個句子,並從10個書面選擇中進行選擇。有些句子在80%以上的情況下都能正確識別。研究人員還進一步推進了這一模型:他們使用該模型從人們默不作聲時記錄下來的數據中重新創建句子。這是一個重要的結果,赫夫說——「離我們腦海中的語音假肢又近了一步。」
註明:本文及圖片來自於網絡,經樂英語教育作者翻譯編輯成文,如有侵權請告知刪除!