Enhancing English to Amharic Machine Translation with Prior  Knowledge Integration

Muluken Hussen

Enhancing English to Amharic Machine Translation with Prior Knowledge Integration

Files

Primary Muluken Hussen.pdf (2.2 MB)

Date

2025

Authors

Muluken Hussen

Publisher

ASTU

Abstract

English and Amharic serve as widely used languages in Ethiopia and belong to distinct linguistic families: English is an Indo-European language, whereas Amharic is a Semitic language within the Afro-Asiatic family. This linguistic divergence poses substantial challenges for machine translation, particularly due to Amharic’s rich morphology, non-Latin script, and subject-object verb (SOV) syntactic structure. Existing neural machine translation (NMT) systems often struggle to model these characteristics effectively, resulting in inadequate alignment, word order errors, and reduced translation fluency. This study addresses these challenges by integrating prior syntactic knowledge into English–Amharic machine translation through a Graph-to-Sequence (Graph2Seq) framework. Specifically, the proposed model incorporates syntactic dependency trees of the source language to enhance the representation of grammatical relationships and long distance dependencies. To evaluate this approach, the study utilizes a large-scale parallel corpus comprising over 1.14 million English-Amharic sentence pairs, divided into training (70%), validation (10%), and testing (20%) sets. The proposed Graph2Seq model is evaluated against a standard Transformer model and the pretrained M2M100 multilingual model. Experimental results demonstrate substantial improvements in translation quality: the Graph2Seq model achieves a BLEU score of 37.30, significantly outperforming the Transformer model (13.06) and surpassing the M2M100 model (32.74). Qualitative and quantitative analyses indicate that incorporating syntactic dependency structures reduces alignment errors, improves word ordering, and enhances the handling of long-distance dependencies. Overall, the findings confirm that embedding syntactic prior knowledge through Graph Neural Networks markedly improves English-Amharic machine translation performance. This work highlights the effectiveness of graph-based approaches for morphologically rich and low-resource languages and provides a foundation for future research. Potential extensions include integrating semantic role labeling, expanding and refining parallel corpora, and developing computationally efficient models suitable for resource-constrained environments. By addressing linguistic structure explicitly, this study advances the development of more accurate, fluent, and linguistically informed graph neural machine translation systems.

Keywords

Machine Translation, English–Amharic Translation, Linguistic Prior Knowledge, Syntactic Dependency Trees, Graph Neural Networks, Low-Resource Languages.

URI

https://etd.astu.edu.et/handle/123456789/3070

Collections

Information System Engineering

Full item page

Enhancing English to Amharic Machine Translation with Prior Knowledge Integration

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By