Abstract
Motion is a crucial source of information for a variety of tasks in social interactions. The process of how humans recognize complex articulated movements such as gestures or face expressions remains largely unclear. There is an ongoing discussion if and how explicit low-level motion information, such as optical flow, is involved in the recognition process. Motivated by this discussion, we introduce a computational model that classifies the spatial configuration of gradient and optical flow patterns. The patterns are learned with an unsupervised learning algorithm based on translation-invariant nonnegative sparse coding called VNMF that extracts prototypical optical flow patterns shaped, for example, as moving heads or limb parts. A key element of the proposed system is a lateral inhibition term that suppresses activations of competing patterns in the learning process, leading to a low number of dominant and topological sparse activations. We analyze the classification performance of the gradient and optical flow patterns on three real-world human action recognition and one face expression recognition data set. The results indicate that the recognition of human actions can be achieved by gradient patterns alone, but adding optical flow patterns increases the classification performance. The combined patterns outperform other biological-inspired models and are competitive with current computer vision approaches.