Lemmatization For Afan Oromo Text

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

ASTU

Abstract

This study has been intended to evaluate lemmatization of the Afan Oromo text performance in natural language processing like information retrieval system, machine learning. The objectives of the study are: to identify the various affixes and the lemma?�?s from the collected written Afan Oromo surface words, to perform text lemmatization and its relationship with stemming and to evaluate its performance. The study conducted on manually collected 1000 tokens with its candidate lemma by consulting language experts. The collected tokens were analyzed using weka package which was presented in hierarchal clusterer of the candidate lemma 231 was selected and tested in 10,15,and 20 clusters using the single-link criterion with edit distance similarity of the tokens. The study mainly focused on the lemmatization of Afan Oromo text that achieved good performance in its accuracy. The study findings revealed that the single-link of hierarchical clustering with edit distance produced 98.3 % of accuracy. Using the above findings, it is proved that there is a strong lemmatization performance on Afan Oromo text. According to the study, lemmatization contributes towards a better performance than stemming when we compered the results of each other. The error rate was 1.7% where seen due the manually annotated lemma and the hierarchical clustering algorithms itself. Improvements should be made for the implementation of lemmatization on this language to make easy to all user of the language

Description

Keywords

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By