Word Sense Disambiguation for Wolaita Language Using Machine  Learning Approach

Temesgen, Tadesse

Word Sense Disambiguation for Wolaita Language Using Machine Learning Approach

dc.contributor.advisor	Dr. Mesfin Abebe (Ph.D.)
dc.contributor.author	Temesgen, Tadesse
dc.date.accessioned	2025-12-17T10:54:13Z
dc.date.issued	2021-08
dc.description.abstract	The amount of data accessible online has been increasing and the need for Natural Language Processing significantly increasing to access and process this data. However, ambiguity problems have faced the difficulties for Natural Language Processing. As human beings, computers can’t understand one word in different way. As solution to this, Word Sense Disambiguation models developed for many languages to address the problem of lexical ambiguity. For the Wolaita language, there are also a lot of polysemy words and these can be the cause of difficulties for Natural Language processing applications developed by previous researchers. Therefore, Word Sense Disambiguation Model for Wolaita language using a machine learning approach was proposed. To conduct the research, a total of 2797 sense examples were collected from Holy Bible, academic books, media agencies (Sport, Health, Business and national and international News), and data from prior researchers. The collected data was annotated by the language experts and then five datasets prepared for five ambiguous words such as “Doona”, “Ayfiya”, “Aadhdha”, “Naaga” and “Ogiya”. We employed quantitative experimental research approach to determine the best combination of the machine learning algorithms and features extraction techniques. Support Vector Classifier , Bagging, Random Forest Classifier, and AdaBoost classifier with BOW, TF-IDF and Wor2Vec feature extraction techniques selected and trained using five datasets on six-window sizes (WS3, WS5, WS7, WS9, WS11 and WS13). From the six window sizes, WS11 (5-5) was selected as the optimal window size in terms of accuracy and the computational time it costs. Among four algorithms, Support Vector Classifier and Bagging classifiers with TF-IDF achieved accuracy of 83.22% and 82.82 % respectively on WS11 (5-5) using 10-fold cross-validation.	en_US
dc.description.sponsorship	ASTU	en_US
dc.identifier.uri	http://10.240.1.28:4000/handle/123456789/1563
dc.language.iso	en_US	en_US
dc.publisher	ASTU	en_US
dc.subject	Word Sense Disambiguation, Natural Language Processing, Window Sizes, Machine Learning Algorithms, Feature Extraction, Wolaita language	en_US
dc.title	Word Sense Disambiguation for Wolaita Language Using Machine Learning Approach	en_US
dc.type	Thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Temesgen Tadesse.pdf
Size:: 2.56 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Plain Text
Description:

Download

Collections

Thesis