Overview of evaluation methods for each criterion.
Criterion . | Automatic Evaluation . | Human Evaluation . |
---|---|---|
Overall | BLEU with gold references | Rating or ranking |
- Transferred Style Strength | Accuracy by a separately trained style classifier | Rating or ranking |
- Semantic Preservation | BLEU/ROUGE/etc. with (modified) inputs | Rating or ranking |
- Fluency | Perplexity by a separately trained language model | Rating or ranking |
Criterion . | Automatic Evaluation . | Human Evaluation . |
---|---|---|
Overall | BLEU with gold references | Rating or ranking |
- Transferred Style Strength | Accuracy by a separately trained style classifier | Rating or ranking |
- Semantic Preservation | BLEU/ROUGE/etc. with (modified) inputs | Rating or ranking |
- Fluency | Perplexity by a separately trained language model | Rating or ranking |