ArXiv

Zero-Shot Imagined Speech Decoding via Imagined-to-Listened MEG Mapping

Authors
Maryam Maghsoudi, Shihab Shamma
Categories
cs.LG, eess.AS
arXiv
https://arxiv.org/abs/2605.08075v1
PDF
https://arxiv.org/pdf/2605.08075v1

Brief

Imagined speech decoding: the authors recorded paired listened and imagined MEG from trained musicians and built a three-stage pipeline that maps imagined MEG to listened MEG (six mapping models), decodes words with a contrastive listened-only decoder (four embedding strategies), and applies the decoder to mapped imagined data from held-out subjects. Proof-of-concept results show significant above-chance decoding and scalability with more training data; only the abstract was available for this summary.

Why it matters

Collected paired listened and imagined MEG from trained musicians listening to rhythmic, melodic, and spoken stimuli and trained six linear and neural mapping models to predict listened responses from imagined MEG.

Key details

  • Trained a contrastive word decoder solely on listened MEG using four embedding strategies (including semantic, acoustic, phonetic); applying the mapped imagined→listened responses from held-out subjects produced word decoding significantly above chance by rank-based analysis, and performance improved with more training data.
Source evidence

Abstract

Decoding imagined speech from non-invasive brain recordings is challenging because imagined datasets are scarce and difficult to align temporally across subjects and sessions In this work, we propose a new approach to the decoding of imagined speech that leverages the richer and more reliably labeled recordings during listening to speech. We collected paired listened and imagined MEG recordings to rhythmic melodic and spoken stimuli from trained musicians. Using trained musicians helped improve temporal alignment across conditions. We then developed a three-stage decoding pipeline that revealed consistent and meaningful relationships between neural activity evoked by imagining and listening to the same stimuli. First, we trained six linear and neural models to map imagined MEG responses to listened responses. We evaluated these models against a null baseline from unseen subjects to validate that the predicted-listening responses preserve stimulus-specific information. In the second stage, we trained a contrastive word decoder exclusively on the listened MEG responses, and evaluated it using four embedding strategies including semantic, acoustic, and phonetic representations. In the third stage, we process the imagined MEG responses from held-out subjects through the mapping pipeline to compute the corresponding listening responses that are then decoded by the listened decoder. Using rank-based analysis, we show that the imagined words are decodable significantly above chance. We shall report here the results of a proof-of-concept implementation to decode imagined speech, where all evaluations are performed on held-out subjects. We also demonstrate that performance improves with training data size, suggesting that this approach is scalable and can directly be made applicable to realistic brain-computer interface scenarios.