Enhancing English to Amharic Machine Translation with Prior  Knowledge Integration

Muluken Hussen

Enhancing English to Amharic Machine Translation with Prior Knowledge Integration

dc.contributor.author	Muluken Hussen
dc.date.accessioned	2026-04-09T06:26:54Z
dc.date.issued	2025
dc.description.abstract	English and Amharic serve as widely used languages in Ethiopia and belong to distinct linguistic families: English is an Indo-European language, whereas Amharic is a Semitic language within the Afro-Asiatic family. This linguistic divergence poses substantial challenges for machine translation, particularly due to Amharic’s rich morphology, non-Latin script, and subject-object verb (SOV) syntactic structure. Existing neural machine translation (NMT) systems often struggle to model these characteristics effectively, resulting in inadequate alignment, word order errors, and reduced translation fluency. This study addresses these challenges by integrating prior syntactic knowledge into English–Amharic machine translation through a Graph-to-Sequence (Graph2Seq) framework. Specifically, the proposed model incorporates syntactic dependency trees of the source language to enhance the representation of grammatical relationships and long distance dependencies. To evaluate this approach, the study utilizes a large-scale parallel corpus comprising over 1.14 million English-Amharic sentence pairs, divided into training (70%), validation (10%), and testing (20%) sets. The proposed Graph2Seq model is evaluated against a standard Transformer model and the pretrained M2M100 multilingual model. Experimental results demonstrate substantial improvements in translation quality: the Graph2Seq model achieves a BLEU score of 37.30, significantly outperforming the Transformer model (13.06) and surpassing the M2M100 model (32.74). Qualitative and quantitative analyses indicate that incorporating syntactic dependency structures reduces alignment errors, improves word ordering, and enhances the handling of long-distance dependencies. Overall, the findings confirm that embedding syntactic prior knowledge through Graph Neural Networks markedly improves English-Amharic machine translation performance. This work highlights the effectiveness of graph-based approaches for morphologically rich and low-resource languages and provides a foundation for future research. Potential extensions include integrating semantic role labeling, expanding and refining parallel corpora, and developing computationally efficient models suitable for resource-constrained environments. By addressing linguistic structure explicitly, this study advances the development of more accurate, fluent, and linguistically informed graph neural machine translation systems.
dc.identifier.uri	https://etd.astu.edu.et/handle/123456789/3070
dc.language.iso	en_US
dc.publisher	ASTU
dc.subject	Machine Translation
dc.subject	English–Amharic Translation
dc.subject	Linguistic Prior Knowledge
dc.subject	Syntactic Dependency Trees
dc.subject	Graph Neural Networks
dc.subject	Low-Resource Languages.
dc.title	Enhancing English to Amharic Machine Translation with Prior Knowledge Integration
dc.type	Dissertation

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Muluken Hussen.pdf
Size:: 2.2 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed to upon submission
Description:

Download

Collections

Information System Engineering