Word Sense Disambiguation for Wolaita Language Using Machine Learning Approach

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

ASTU

Abstract

The amount of data accessible online has been increasing and the need for Natural Language Processing significantly increasing to access and process this data. However, ambiguity problems have faced the difficulties for Natural Language Processing. As human beings, computers can’t understand one word in different way. As solution to this, Word Sense Disambiguation models developed for many languages to address the problem of lexical ambiguity. For the Wolaita language, there are also a lot of polysemy words and these can be the cause of difficulties for Natural Language processing applications developed by previous researchers. Therefore, Word Sense Disambiguation Model for Wolaita language using a machine learning approach was proposed. To conduct the research, a total of 2797 sense examples were collected from Holy Bible, academic books, media agencies (Sport, Health, Business and national and international News), and data from prior researchers. The collected data was annotated by the language experts and then five datasets prepared for five ambiguous words such as “Doona”, “Ayfiya”, “Aadhdha”, “Naaga” and “Ogiya”. We employed quantitative experimental research approach to determine the best combination of the machine learning algorithms and features extraction techniques. Support Vector Classifier , Bagging, Random Forest Classifier, and AdaBoost classifier with BOW, TF-IDF and Wor2Vec feature extraction techniques selected and trained using five datasets on six-window sizes (WS3, WS5, WS7, WS9, WS11 and WS13). From the six window sizes, WS11 (5-5) was selected as the optimal window size in terms of accuracy and the computational time it costs. Among four algorithms, Support Vector Classifier and Bagging classifiers with TF-IDF achieved accuracy of 83.22% and 82.82 % respectively on WS11 (5-5) using 10-fold cross-validation.

Description

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By