Developing an Automated Machine Learning Based Sentiment Analysis for Afaan Oromoo
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
ASTU
Abstract
Social media users are rapidly increasing from time to time, and it is becoming a common source
of information for many sectors. The social media content contains several attitudes and sentiments
or feelings. However, those sentiments on social media have no structure and difficult to identify
their polarities for decision-making. Therefore, we are motivated to design, develop, and
implement automated Machine Learning based sentiment analysis (AMLSA) for Afaan Oromoo.
The reason to select ‘Afaan Oromoo’ is that even though Afaan Oromoo has a large number of
speakers it is an under-resourced language. For this study, 6670 statements collected from the
Facebook public page, and manually labeled into three classes namely positive 2270, neutral 2210,
and negative 2190. The statement contains words, phrases and sentences. The dataset was split
into 80% for training and 20% for testing sets. The research followed an experimental approach to
determine the best combination of the conventional machine learning and Neural Network
algorithm and features extraction for models. The researcher applied six different techniques; four
of the six techniques are conventional machine learning techniques. The rest two of them are neural
network techniques. The four, Conventional machine learning techniques are Multinomial Naïve
Bayes (MNB), Logistic Regression (LR), Support Vector Machine (SVM), and Random Forest
(RF). The two neural network techniques experimented in this research are Convolutional Neural
Network (CNN) and Long Short Term Memory (LSTM) techniques. To evaluate the performance
of each technique, the researcher used several performance evaluation metrics such as Accuracy,
F-score, Precession, and Recall. The feature extraction techniques used for conventional machine
learning techniques are unigram with TF-IDF, Bigram with TF-IDF, and trigram with TF-IDF,
and neural network techniques used word embedding of word2vec methods. We used the trigram
with collaboration TF-IDF feature extraction, and accuracy for performance evaluation because
they achieved the highest performance result. The Conventional Machine Learning achieved
accuracy for, MNB 80.12%, for LR 82.52%, for RF 80.62%, and for SVM 84.62%. The Neural
network techniques achieved accuracy for CNN 72.91%, for LSTM 85.01%. According to the
classification performance result from the entire techniques applied, the LSTM technique achieved
the highest accuracy and we used LSTM to deploy our models.
