Brain decoding, reconstruction of visual perception // brain hacking

sbagency
4 min readFeb 21, 2024

--

https://arxiv.org/pdf/2310.19812.pdf

In the past five years, the use of generative and foundational AI systems has greatly improved the decoding of brain activity. Visual perception, in particular, can now be decoded from functional Magnetic Resonance Imaging (fMRI) with remarkable fidelity. This neuroimaging technique, however, suffers from a limited temporal resolution (≈0.5 Hz) and thus fundamentally constrains its real-time usage. Here, we propose an alternative approach based on magnetoencephalography (MEG), a neuroimaging device capable of measuring brain activity with high temporal resolution (≈5,000 Hz). For this, we develop an MEG decoding model trained with both contrastive and regression objectives and consisting of three modules: i) pretrained embeddings obtained from the image, ii) an MEG module trained end-to-end and iii) a pretrained image generator. Our results are threefold: Firstly, our MEG decoder shows a 7X improvement of image-retrieval over classic linear decoders. Second, late brain responses to images are best decoded with DINOv2, a recent foundational image model. Third, image retrievals and generations both suggest that MEG signals primarily contain high-level visual features, whereas the same approach applied to 7T fMRI also recovers low-level features. Overall, these results provide an important step towards the decoding — in real time — of the visual processes continuously unfolding within the human brain.

This paper proposes a new approach for decoding and reconstructing visual perceptions in real-time from brain activity measured using magnetoencephalography (MEG). The key contributions are:

1. Developing a deep learning pipeline consisting of 3 modules — pretrained image embeddings, an MEG module trained end-to-end, and a pretrained image generator. This allows both image retrieval and generation conditioned on MEG signals.

2. Demonstrating a 7-fold improvement in image retrieval accuracy over classic linear decoders by leveraging the deep learning approach and pretrained vision models like VGG-19, CLIP and DINOv2.

3. Showing that MEG signals primarily encode high-level semantic features rather than low-level visual details, in contrast to fMRI which can reconstruct both levels well.

4. Providing a temporally-resolved analysis revealing peaks in decoding performance shortly after image onset and offset, suggesting the neural representations evolve dynamically.

5. Highlighting the potential of this work for real-time decoding of visual processes in the brain, which has applications in brain-computer interfaces and understanding visual cognition.

The paper outlines an important step towards reconstructing visual experience from non-invasive brain recordings in real-time, paving the way for neuroscientific insights and assistive technologies.

https://www.nature.com/articles/s41598-023-42891-8

In neural decoding research, one of the most intriguing topics is the reconstruction of perceived natural images based on fMRI signals. Previous studies have succeeded in re-creating different aspects of the visuals, such as low-level properties (shape, texture, layout) or high-level features (category of objects, descriptive semantics of scenes) but have typically failed to reconstruct these properties together for complex scene images. Generative AI has recently made a leap forward with latent diffusion models capable of generating high-complexity images. Here, we investigate how to take advantage of this innovative technology for brain decoding. We present a two-stage scene reconstruction framework called “Brain-Diffuser”. In the first stage, starting from fMRI signals, we reconstruct images that capture low-level properties and overall layout using a VDVAE (Very Deep Variational Autoencoder) model. In the second stage, we use the image-to-image framework of a latent diffusion model (Versatile Diffusion) conditioned on predicted multimodal (text and visual) features, to generate final reconstructed images. On the publicly available Natural Scenes Dataset benchmark, our method outperforms previous models both qualitatively and quantitatively. When applied to synthetic fMRI patterns generated from individual ROI (region-of-interest) masks, our trained model creates compelling “ROI-optimal” scenes consistent with neuroscientific knowledge. Thus, the proposed methodology can have an impact on both applied (e.g. brain–computer interface) and fundamental neuroscience.

This paper presents a new framework called “Brain-Diffuser” for reconstructing natural scene images from fMRI brain activity data. The framework consists of two stages:

1. A Very Deep Variational Autoencoder (VDVAE) is used to generate an initial rough reconstruction that captures low-level details like layout and shapes from the fMRI data.

2. The Versatile Diffusion latent diffusion model conditions the image generation process on both the initial VDVAE reconstruction and predicted CLIP vision and text features from the fMRI data. This allows generating naturalistic final reconstructions with accurate high-level semantics.

The Brain-Diffuser model outperforms previous methods on both qualitative and quantitative metrics for the Natural Scenes Dataset benchmark of complex scene images. The paper also demonstrates using the model to visualize “optimal” stimuli for activating different brain regions by passing synthetic fMRI patterns. Overall, this work shows latent diffusion models can significantly advance the state-of-the-art in reconstructing natural images from brain activity.

--

--

sbagency
sbagency

Written by sbagency

Tech/biz consulting, analytics, research for founders, startups, corps and govs.

Responses (1)