Audiovisual perception and imitation are essential for musical learning and skill acquisition. We compared professional pianists to musically naive controls with fMRI while observing piano playing finger–hand movements and serial finger–thumb opposition movements both with and without synchronous piano sound. Pianists showed stronger activations within a fronto-parieto-temporal network while observing piano playing compared to controls and contrasted to perception of serial finger–thumb opposition movements. Observation of silent piano playing additionally recruited auditory areas in pianists. Perception of piano sounds coupled with serial finger–thumb opposition movements evoked increased activation within the sensorimotor network. This indicates specialization of multimodal auditory– sensorimotor systems within a fronto-parieto-temporal network by professional musical training. Musical “language,” which is acquired by observation and imitation, seems to be tightly coupled to this network in accord with an observation– execution system linking visual and auditory perception to motor performance.