Comparison of Various Evaluation Benchmarks. We compare Flores-101 to a variety of popular, existing translation benchmarks, indicating language coverage, topic diversity, whether many-to-many translation is supported, if the translations are manually aligned by humans, and if the tasks of document-level translation or multimodal translation are supported.
. | # Languages . | Diverse Topics . | Many to Many . | Manual Alignments . | Document Level . | Multi modal . |
---|---|---|---|---|---|---|
FLORES v1 (Guzmán et al., 2019) | 2 | ✓ | ✗ | ✓ | ✗ | ✗ |
AmericasNLI (Ebrahimi et al., 2021) | 10 | ✓ | ✓ | ✓ | ✗ | ✗ |
ALT (Riza et al., 2016) | 13 | ✓ | ✓ | ✓ | ✗ | ✗ |
Europarl (Koehn, 2005) | 21 | ✗ | ✓ | ✗ | ✓ | ✗ |
TICO-19 (Anastasopoulos et al., 2020) | 36 | ✗ | ✓ | ✓ | ✗ | ✗ |
OPUS-100 (Zhang et al., 2020) | 100 | ✓ | ✓ | ✗ | ✗ | ✗ |
M2M (Fan et al., 2020) | 100 | ✗ | ✓ | ✓✗ | ✗ | ✗ |
Flores-101 | 101 | ✓ | ✓ | ✓ | ✓ | ✓ |
. | # Languages . | Diverse Topics . | Many to Many . | Manual Alignments . | Document Level . | Multi modal . |
---|---|---|---|---|---|---|
FLORES v1 (Guzmán et al., 2019) | 2 | ✓ | ✗ | ✓ | ✗ | ✗ |
AmericasNLI (Ebrahimi et al., 2021) | 10 | ✓ | ✓ | ✓ | ✗ | ✗ |
ALT (Riza et al., 2016) | 13 | ✓ | ✓ | ✓ | ✗ | ✗ |
Europarl (Koehn, 2005) | 21 | ✗ | ✓ | ✗ | ✓ | ✗ |
TICO-19 (Anastasopoulos et al., 2020) | 36 | ✗ | ✓ | ✓ | ✗ | ✗ |
OPUS-100 (Zhang et al., 2020) | 100 | ✓ | ✓ | ✗ | ✗ | ✗ |
M2M (Fan et al., 2020) | 100 | ✗ | ✓ | ✓✗ | ✗ | ✗ |
Flores-101 | 101 | ✓ | ✓ | ✓ | ✓ | ✓ |