Neural encoding and decoding provide perspectives for understanding neural representations of sensory inputs. Recent functional magnetic resonance imaging (fMRI) studies have succeeded in building prediction models for encoding and decoding numerous stimuli by representing a complex stimulus as a combination of simple elements. While arbitrary visual images were reconstructed using a modular model that combined the outputs of decoder modules for multiscale local image bases (elements), the shapes of the image bases were heuristically determined. In this work, we propose a method to establish mappings between the stimulus and the brain by automatically extracting modules from measured data. We develop a model based on Bayesian canonical correlation analysis, in which each module is modeled by a latent variable that relates a set of pixels in a visual image to a set of voxels in an fMRI activity pattern. The estimated mapping from a latent variable to pixels can be regarded as an image basis. We show that the model estimates a modular representation with spatially localized multiscale image bases. Further, using the estimated mappings, we derive encoding and decoding models that produce accurate predictions for brain activity and stimulus images. Our approach thus provides a novel means of revealing neural representations of stimuli by automatically extracting modules, which can be used to generate effective prediction models for encoding and decoding.