Many-to-Many Performance by Domain. We show spBLEU on three partitions of the Flores-101 devtest according to the originating domains. We compute the corpus spBLEU for each language in each domain, and then average across languages.
Num Sentences . | News . | Junior . | Voyage . | Avg . |
---|---|---|---|---|
993 . | 1006 . | 1002 . | ||
English | 20.64 | 20.67 | 19.41 | 20.24 |
English | 16.85 | 16.67 | 15.48 | 16.33 |
Chinese | 11.57 | 9.66 | 9.55 | 10.26 |
Chinese | 10.02 | 9.93 | 9.57 | 9.84 |
Spanish | 14.91 | 13.80 | 13.23 | 13.98 |
Spanish | 11.67 | 10.96 | 10.37 | 11.00 |
Hindi | 14.33 | 14.15 | 13.84 | 14.11 |
Hindi | 10.88 | 10.86 | 10.11 | 10.62 |
Arabic | 8.39 | 8.23 | 7.74 | 8.12 |
Arabic | 9.81 | 10.31 | 9.54 | 9.88 |
Many-to-Many | 8.56 | 7.97 | 7.59 |
Num Sentences . | News . | Junior . | Voyage . | Avg . |
---|---|---|---|---|
993 . | 1006 . | 1002 . | ||
English | 20.64 | 20.67 | 19.41 | 20.24 |
English | 16.85 | 16.67 | 15.48 | 16.33 |
Chinese | 11.57 | 9.66 | 9.55 | 10.26 |
Chinese | 10.02 | 9.93 | 9.57 | 9.84 |
Spanish | 14.91 | 13.80 | 13.23 | 13.98 |
Spanish | 11.67 | 10.96 | 10.37 | 11.00 |
Hindi | 14.33 | 14.15 | 13.84 | 14.11 |
Hindi | 10.88 | 10.86 | 10.11 | 10.62 |
Arabic | 8.39 | 8.23 | 7.74 | 8.12 |
Arabic | 9.81 | 10.31 | 9.54 | 9.88 |
Many-to-Many | 8.56 | 7.97 | 7.59 |