Abstract
The paper discusses the problem of determinizing finite-state automata containing large numbers of ε-moves. Experiments with finite-state approximations of natural language grammars often give rise to very large automata with a very large number of ε-moves. The paper identifies and compares a number of subset construction algorithms that treat ε-moves. Experiments have been performed which indicate that the algorithms differ considerably in practice, both with respect to the size of the resulting deterministic automaton, and with respect to practical efficiency. Furthermore, the experiments suggest that the average number of ε-moves per state can be used to predict which algorithm is likely to be the fastest for a given input automaton.
This content is only available as a PDF.
© 2000 Association for Computational Linguistics
2000