A Comparative Analysis of Machine Learning Algorithms for Word  Sense Disambiguation: In the Case of Wolaita Language

Destaye, Ukumo

A Comparative Analysis of Machine Learning Algorithms for Word Sense Disambiguation: In the Case of Wolaita Language

dc.contributor.advisor	Sintayehu Hirpassa (Ph.D.)
dc.contributor.author	Destaye, Ukumo
dc.date.accessioned	2025-12-17T10:54:23Z
dc.date.issued	2023-06
dc.description.abstract	Words that can signify different things in various circumstances are present in all human languages. The term "word sense" in natural language processing (NLP) refers to the various interpretations or meanings that a word may have depending on the context in which it is used. Word Sense Disambiguation (WSD), in the context of natural language processing, has been defined as, a task that involves determining the correct meaning of a word within a given context. Word ambiguity problems have faced the difficulties for Natural Language Processing and computers can’t understand ambiguous words as human beings. As the solution to this big challenge, WSD is developed for different languages by different researchers. In the Wolaita language also there are different ambiguous words like in all other languages. So, this thesis presents a research work on Word Sense Disambiguation in the Wolaita Language. To conduct this study, we selected a corpus-based machine-learning approach for 3560 sentences collected from different data sources in the language. To conduct the research, we selected seven ambiguous words from the language namely “Sintta”, “Haytta”, “Ayfiya”, “Doona”, “Aadhdha”, “Naaga”, and “Ogiya” and seven different datasets are prepared. After the dataset was prepared, we applied preprocessing techniques like tokenization, stopword removal, stemming, and normalization. We used BOW, Word2vec, and Tf-idf integrating with N-gram for feature extraction. We tested four different clustering algorithms (EM, simple k-means, farthest first, and hierarchical clustering) for unlabeled data for comparison and we also selected six different algorithms (SVM, NB, NN, Adaboost, RFC, and Bagging) for the supervised approach. Finally, we compared the performance of the algorithms in both clustering and classification models. From the selected clustering algorithms, the EM had the best performance with 63.4%, and from the selected six supervised algorithms the SVM and NB achieved good performance with an accuracy of 86.5% and 84.1% respectively with optimum window size 5-5 for the Wolaita language WSD.	en_US
dc.description.sponsorship	ASTU	en_US
dc.identifier.uri	http://10.240.1.28:4000/handle/123456789/1602
dc.language.iso	en_US	en_US
dc.publisher	ASTU	en_US
dc.subject	Word Sense Disambiguation, Natural Language Processing, Wolaita language, Machine learning	en_US
dc.title	A Comparative Analysis of Machine Learning Algorithms for Word Sense Disambiguation: In the Case of Wolaita Language	en_US
dc.type	Thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Destaye Ukumo.pdf
Size:: 1.74 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Plain Text
Description:

Download

Collections

Thesis