Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
Date
Availability
1-5 of 5
Zhaopeng Tu
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Salute the Classic: Revisiting Challenges of Machine Translation in the Age of Large Language Models
Publisher: Journals Gateway
Transactions of the Association for Computational Linguistics (2025) 13: 73–95.
Published: 07 January 2025
FIGURES
| View All (11)
Abstract
View article
PDF
The evolution of Neural Machine Translation (NMT) has been significantly influenced by six core challenges (Koehn and Knowles, 2017 ) that have acted as benchmarks for progress in this field. This study revisits these challenges, offering insights into their ongoing relevance in the context of advanced Large Language Models (LLMs): domain mismatch , amount of parallel data , rare word prediction , translation of long sentences , attention model as word alignment , and sub-optimal beam search . Our empirical findings show that LLMs effectively reduce reliance on parallel data for major languages during pretraining and significantly improve translation of long sentences containing approximately 80 words, even translating documents up to 512 words. Despite these improvements, challenges in domain mismatch and rare word prediction persist. While NMT-specific challenges like word alignment and beam search may not apply to LLMs, we identify three new challenges in LLM-based translation: inference efficiency, translation of low-resource languages during pretraining, and human-aligned evaluation.
Journal Articles
Publisher: Journals Gateway
Transactions of the Association for Computational Linguistics (2024) 12: 229–246.
Published: 08 March 2024
FIGURES
| View All (7)
Abstract
View article
PDF
Large language models (LLMs) have demonstrated impressive capabilities in general scenarios, exhibiting a level of aptitude that approaches, in some aspects even surpasses, human-level intelligence. Among their numerous skills, the translation abilities of LLMs have received considerable attention. Compared to typical machine translation that focuses solely on source-to-target mapping, LLM-based translation can potentially mimic the human translation process, which might take preparatory steps to ensure high-quality translation. This work explores this possibility by proposing the MAPS framework, which stands for M ulti- A spect P rompting and S election. Specifically, we enable LLMs first to analyze the given source sentence and induce three aspects of translation-related knowledge (keywords, topics, and relevant demonstrations) to guide the final translation process. Moreover, we employ a selection mechanism based on quality estimation to filter out noisy and unhelpful knowledge. Both automatic (3 LLMs × 11 directions × 2 automatic metrics) and human evaluation (preference study and MQM) demonstrate the effectiveness of MAPS. Further analysis shows that by mimicking the human translation process, MAPS reduces various translation errors such as hallucination, ambiguity, mistranslation, awkward style, untranslated text, and omission. Source code is available at https://github.com/zwhe99/MAPS-mt .
Journal Articles
Publisher: Journals Gateway
Transactions of the Association for Computational Linguistics (2018) 6: 407–420.
Published: 01 July 2018
Abstract
View article
PDF
Existing neural machine translation (NMT) models generally translate sentences in isolation, missing the opportunity to take advantage of document-level information. In this work, we propose to augment NMT models with a very light-weight cache-like memory network, which stores recent hidden representations as translation history. The probability distribution over generated words is updated online depending on the translation history retrieved from the memory, endowing NMT models with the capability to dynamically adapt over time. Experiments on multiple domains with different topics and styles show the effectiveness of the proposed approach with negligible impact on the computational cost.
Journal Articles
Publisher: Journals Gateway
Transactions of the Association for Computational Linguistics (2018) 6: 145–157.
Published: 01 March 2018
Abstract
View article
PDF
Existing neural machine translation systems do not explicitly model what has been translated and what has not during the decoding phase. To address this problem, we propose a novel mechanism that separates the source information into two parts: translated P ast contents and untranslated F uture contents, which are modeled by two additional recurrent layers. The P ast and F uture contents are fed to both the attention model and the decoder states, which provides Neural Machine Translation (NMT) systems with the knowledge of translated and untranslated contents. Experimental results show that the proposed approach significantly improves the performance in Chinese-English, German-English, and English-German translation tasks. Specifically, the proposed model outperforms the conventional coverage model in terms of both the translation quality and the alignment error rate.
Journal Articles
Publisher: Journals Gateway
Transactions of the Association for Computational Linguistics (2017) 5: 87–99.
Published: 01 March 2017
Abstract
View article
PDF
In neural machine translation (NMT), generation of a target word depends on both source and target contexts. We find that source contexts have a direct impact on the adequacy of a translation while target contexts affect the fluency . Intuitively, generation of a content word should rely more on the source context and generation of a functional word should rely more on the target context. Due to the lack of effective control over the influence from source and target contexts, conventional NMT tends to yield fluent but inadequate translations. To address this problem, we propose context gates which dynamically control the ratios at which source and target contexts contribute to the generation of target words. In this way, we can enhance both the adequacy and fluency of NMT with more careful control of the information flow from contexts. Experiments show that our approach significantly improves upon a standard attention-based NMT system by +2.3 BLEU points.