We demonstrate a method for efficiently rendering the audio generated by graphical scenes with a large number of sounding objects. This is achieved by using modal synthesis for rigid bodies and rendering only those modes that we judge to be audible to a user observing the scene. We show how excitations of modes can be estimated and inaudible modes eliminated based on the masking characteristics of the human ear. We describe a novel technique for generating contact events by performing closed-form particle simulation and collision detection with the aid of programmable graphics hardware. The effectiveness of our system is shown in the context of suitably complex simulations.