Current head-mounted displays (HMDs) provide only a fixed lens focus. Viewers have to decouple their accommodation and vergence responses when viewing stereoscopic images presented on an HMD. This study investigates the time taken to fuse a pair of stereoscopic images displayed on an HMD when the accommodative demand is matched to the vergence demand. Four testing conditions exhausting the factorial combinations of accommodative demands (2.5 D and 0.5 D) and vergence demands (2.5 MA and 0.5 MA) were investigated. The results indicate that viewers take a significantly shorter amount of time to fuse a pair of stereoscopic images (i.e., fusion time) when the accommodative demand and the stereoscopic depth cues match. Further analysis suggests that an unnatural demand for the eyes to verge toward stereoscopic images whose stereo depth is farther than the accommodative demand is associated with significantly longer fusion time. This study evaluates the potential benefits of using a dynamically adjustable lens focus in future designs of HMDs.