Word Sense Disambiguation for Wolaita Language Using Machine Learning Approach

dc.contributor.advisorDr. Mesfin Abebe (Ph.D.)
dc.contributor.authorTemesgen, Tadesse
dc.date.accessioned2025-12-17T10:54:13Z
dc.date.issued2021-08
dc.description.abstractThe amount of data accessible online has been increasing and the need for Natural Language Processing significantly increasing to access and process this data. However, ambiguity problems have faced the difficulties for Natural Language Processing. As human beings, computers can’t understand one word in different way. As solution to this, Word Sense Disambiguation models developed for many languages to address the problem of lexical ambiguity. For the Wolaita language, there are also a lot of polysemy words and these can be the cause of difficulties for Natural Language processing applications developed by previous researchers. Therefore, Word Sense Disambiguation Model for Wolaita language using a machine learning approach was proposed. To conduct the research, a total of 2797 sense examples were collected from Holy Bible, academic books, media agencies (Sport, Health, Business and national and international News), and data from prior researchers. The collected data was annotated by the language experts and then five datasets prepared for five ambiguous words such as “Doona”, “Ayfiya”, “Aadhdha”, “Naaga” and “Ogiya”. We employed quantitative experimental research approach to determine the best combination of the machine learning algorithms and features extraction techniques. Support Vector Classifier , Bagging, Random Forest Classifier, and AdaBoost classifier with BOW, TF-IDF and Wor2Vec feature extraction techniques selected and trained using five datasets on six-window sizes (WS3, WS5, WS7, WS9, WS11 and WS13). From the six window sizes, WS11 (5-5) was selected as the optimal window size in terms of accuracy and the computational time it costs. Among four algorithms, Support Vector Classifier and Bagging classifiers with TF-IDF achieved accuracy of 83.22% and 82.82 % respectively on WS11 (5-5) using 10-fold cross-validation.en_US
dc.description.sponsorshipASTUen_US
dc.identifier.urihttp://10.240.1.28:4000/handle/123456789/1563
dc.language.isoen_USen_US
dc.publisherASTUen_US
dc.subjectWord Sense Disambiguation, Natural Language Processing, Window Sizes, Machine Learning Algorithms, Feature Extraction, Wolaita languageen_US
dc.titleWord Sense Disambiguation for Wolaita Language Using Machine Learning Approachen_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Temesgen Tadesse.pdf
Size:
2.56 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description:

Collections