Tour into the picture (TIP), proposed by Horry et al. (Horry, Anjyo, & Arai, 1997, ACM SIGGRAPH '97 Conference Proceedings, 225–232) is a method for generating a sequence of walk-through images from a single reference image. By navigating a 3D scene model constructed from the image, TIP provides convincing 3D effects. This paper presents a comprehensive scheme for creating walk-through images from a video sequence by generalizing the idea of TIP. To address various problems in dealing with a video sequence rather than a single image, the proposed scheme is designed to have the following features: First, it incorporates a new modeling scheme based on a vanishing circle identified in the video, assuming that the input video contains a negligible amount of motion parallax effects and that dynamic objects move on a flat terrain. Second, we propose a novel scheme for automatic background detection from the video, based on 4-parameter motion model and statistical background color estimation. Third, to assist the extraction of static or dynamic foreground objects from video, we devised a semiautomatic boundary-segmentation scheme based on enhanced lane (Kang & Shin, 2002, Graphical Models, 64 (5), 282–303). The purpose of this work is to let users experience the feel of navigating into a video sequence with their own interpretation and imagination about a given scene. The proposed scheme covers various types of video films of dynamic scenes, such as sports coverage, cartoon animation, and movie films, in which objects are continuously changing their shapes and locations. It can also be used to produce a variety of synthetic video sequences by importing and merging dynamic foreign objects with the original video.