Representation learning is a core component in data-driven modeling of various complex phenomena. Learning a contextually informative representation can specially benefit the analysis of fMRI data due to the complexities and dynamic dependencies present in such datasets. In this work, we propose a framework based on transformer models to learn an embedding of the fMRI data by taking the spatio-temporal contextual information in the data into account. This approach takes the multivariate BOLD time series of the regions of the brain as well as their functional connectivity network simultaneously as the input to create a set of meaningful features which can in turn be used in various downstream tasks such as classification, feature extraction, and statistical analysis. The proposed spatio-temporal framework uses the attention mechanism as well as the graph convolution neural network to jointly inject the contextual information regarding the dynamics in time series data and their connectivity into the representation. We demonstrate the benefits of this framework by applying it to two resting state fMRI datasets, and provide further discussion on various aspects and advantages of it over a number of other commonly adopted architectures.

This content is only available as a PDF.

Author notes

Handling Editor: Vince Calhoun

This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. For a full description of the license, please visit

Article PDF first page preview

Article PDF first page preview