An artificial neural network model is proposed that combines several aspects taken from physiological observations (oscillations, synchronizations) with a visual latency mechanism in order to achieve an improved analysis of visual scenes. The network consists of two parts. In the lower layers that contain no lateral connections the propagation velocity of the activity of the units depends on the contrast of the individual objects in the scene. In the upper layers lateral connections are used to achieve synchronization between corresponding image parts. This architecture assures that the activity that arises in response to a scene containing objects with different contrast is spread out over several layers in the network. Thereby adjacent objects with different contrast will be separated and synchronization occurs in the upper layers without mutual disturbance between different objects. A comparison with a one-layer network shows that synchronization in the latency dependent multilayer net is indeed achieved much faster as soon as more than five objects have to be recognized. In addition, it is shown that the network is highly robust against noise in the stimuli or variations in the propagation delays (latencies), respectively. For a consistent analysis of a visual scene the different features of an individual object have to be recognized as belonging together and separated from other objects. This study shows that temporal differences, naturally introduced by stimulus latencies in every biological sensory system, can strongly improve the performance and allow for an analysis of more complex scenes.