In this paper, a method for 3D human body tracking using multiple cameras and an automatic evaluation method using machine learning are developed to construct a virtual reality (VR) dance self-training system for fast-moving hip-hop dance. Dancers’ movement data are input as time-series data of temporal changes in joint point positions and rotations and are categorized into instructional items that are frequently pointed out by coaches as areas for improvement in actual dance lessons. For automatic dance evaluation, contrastive learning is used to obtain better expression vectors with less data. As a result, the accuracy when using contrastive learning was 0.79, a significant improvement from 0.65 without contrastive learning. In addition, since each dance is modeled by a coach, the accuracy was slightly improved to 0.84 by using, as input, the difference between the expression vectors of the model's and the user's movement data. Eight subjects used the VR dance training system, and results of a questionnaire survey confirmed that the system is effective.