Results for verb-role SPs in the development partition of WSJ, the test partition of WSJ, and the Brown corpus. For each experiment, we show precision (P), recall (R), and F1. Values in boldface font are the highest in the corresponding column. F1 values marked with † are significantly lower than the highest F1 score in the same column.