The Pk evaluation metric, initially proposed by Beeferman, Berger, and Lafferty (1997), is becoming the standard measure for assessing text segmentation algorithms. However, a theoretical analysis of the metric finds several problems: the metric penalizes false negatives more heavily than false positives, overpenalizes near misses, and is affected by variation in segment size distribution. We propose a simple modification to the Pk metric that remedies these problems. This new metric—called Window Diff—moves a fixed-sized window across the text and penalizes the algorithm whenever the number of boundaries within the window does not match the true number of boundaries for that window of text.