Abstract
We propose two models for verbalizing numbers, a key component in speech recognition and synthesis systems. The first model uses an end-to-end recurrent neural network. The second model, drawing inspiration from the linguistics literature, uses finite-state transducers constructed with a minimal amount of training data. While both models achieve near-perfect performance, the latter model can be trained using several orders of magnitude less data than the former, making it particularly useful for low-resource languages.
This content is only available as a PDF.
©2016 Association for Computational Linguistics. Distributed
under a CC-BY 4.0 license.
2016
Association for Computational Linguistics
This is an open-access article distributed under the terms of the
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International
License, which permits you to copy and redistribute in any medium or format,
for non-commercial use only, provided that the original work is not remixed,
transformed, or built upon, and that appropriate credit to the original
source is given. For a full description of the license, please visit https://creativecommons.org/licenses/by/4.0/legalcode.