During the past few decades, remarkable progress has been made in solving pattern recognition problems using networks of spiking neurons. However, the issue of pattern recognition involving computational process from sensory encoding to synaptic learning remains underexplored, as most existing models or algorithms target only part of the computational process. Furthermore, many learning algorithms proposed in the literature neglect or pay little attention to sensory information encoding, which makes them incompatible with neural-realistic sensory signals encoded from real-world stimuli. By treating sensory coding and learning as a systematic process, we attempt to build an integrated model based on spiking neural networks (SNNs), which performs sensory neural encoding and supervised learning with precisely timed sequences of spikes. With emerging evidence of precise spike-timing neural activities, the view that information is represented by explicit firing times of action potentials rather than mean firing rates has been receiving increasing attention. The external sensory stimulation is first converted into spatiotemporal patterns using a latency-phase encoding method and subsequently transmitted to the consecutive network for learning. Spiking neurons are trained to reproduce target signals encoded with precisely timed spikes. We show that when a supervised spike-timing-based learning is used, different spatiotemporal patterns are recognized by different spike patterns with a high time precision in milliseconds.