(a) To obtain the weight matrix, we first take the convolution of video frames with features from the feature basis (e.g., ). We then consider the convolution of these convolved image frames to detect feature co-occurrence (e.g., ). (b) Schematic of how weights are represented. Normalized convolutions between patches separated by the same spatial and temporal distances are averaged and stored in the corresponding entry of the weight matrix. (c) Top: Static weights for the data set of images of bars. Bottom: Moving weights for the data set of videos of bars. (d) Static weights (above) and moving weights (below) for the data set of natural images/videos during horizontal motion only. (e) Sparse versions of slices from the static and moving weights for the data sets of natural images/videos during horizontal motion. Weights between neurons whose receptive fields are not at certain preselected, sufficiently far apart locations in the visual space were discarded to satisfy the constraint that patches are independent. (f) The full (nonsparse) tensors , , and , ordered first by spatial position, then by filter.
This site uses cookies. By continuing to use our website, you are agreeing to our privacy policy. No content on this site may be used to train artificial intelligence systems without permission in writing from the MIT Press.