A Comparative Analysis of Machine Learning Algorithms for Word  Sense Disambiguation: In the Case of Wolaita Language

Destaye, Ukumo

A Comparative Analysis of Machine Learning Algorithms for Word Sense Disambiguation: In the Case of Wolaita Language

Files

Destaye Ukumo.pdf (1.74 MB)

Date

2023-06

Authors

Destaye, Ukumo

Publisher

ASTU

Abstract

Words that can signify different things in various circumstances are present in all human languages. The term "word sense" in natural language processing (NLP) refers to the various interpretations or meanings that a word may have depending on the context in which it is used. Word Sense Disambiguation (WSD), in the context of natural language processing, has been defined as, a task that involves determining the correct meaning of a word within a given context. Word ambiguity problems have faced the difficulties for Natural Language Processing and computers can’t understand ambiguous words as human beings. As the solution to this big challenge, WSD is developed for different languages by different researchers. In the Wolaita language also there are different ambiguous words like in all other languages. So, this thesis presents a research work on Word Sense Disambiguation in the Wolaita Language. To conduct this study, we selected a corpus-based machine-learning approach for 3560 sentences collected from different data sources in the language. To conduct the research, we selected seven ambiguous words from the language namely “Sintta”, “Haytta”, “Ayfiya”, “Doona”, “Aadhdha”, “Naaga”, and “Ogiya” and seven different datasets are prepared. After the dataset was prepared, we applied preprocessing techniques like tokenization, stopword removal, stemming, and normalization. We used BOW, Word2vec, and Tf-idf integrating with N-gram for feature extraction. We tested four different clustering algorithms (EM, simple k-means, farthest first, and hierarchical clustering) for unlabeled data for comparison and we also selected six different algorithms (SVM, NB, NN, Adaboost, RFC, and Bagging) for the supervised approach. Finally, we compared the performance of the algorithms in both clustering and classification models. From the selected clustering algorithms, the EM had the best performance with 63.4%, and from the selected six supervised algorithms the SVM and NB achieved good performance with an accuracy of 86.5% and 84.1% respectively with optimum window size 5-5 for the Wolaita language WSD.

Keywords

Word Sense Disambiguation, Natural Language Processing, Wolaita language, Machine learning

URI

http://10.240.1.28:4000/handle/123456789/1602

Collections

Thesis

Full item page

A Comparative Analysis of Machine Learning Algorithms for Word Sense Disambiguation: In the Case of Wolaita Language

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By