Developing an Automated Machine Learning Based Sentiment Analysis for Afaan Oromoo

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

ASTU

Abstract

Social media users are rapidly increasing from time to time, and it is becoming a common source of information for many sectors. The social media content contains several attitudes and sentiments or feelings. However, those sentiments on social media have no structure and difficult to identify their polarities for decision-making. Therefore, we are motivated to design, develop, and implement automated Machine Learning based sentiment analysis (AMLSA) for Afaan Oromoo. The reason to select ‘Afaan Oromoo’ is that even though Afaan Oromoo has a large number of speakers it is an under-resourced language. For this study, 6670 statements collected from the Facebook public page, and manually labeled into three classes namely positive 2270, neutral 2210, and negative 2190. The statement contains words, phrases and sentences. The dataset was split into 80% for training and 20% for testing sets. The research followed an experimental approach to determine the best combination of the conventional machine learning and Neural Network algorithm and features extraction for models. The researcher applied six different techniques; four of the six techniques are conventional machine learning techniques. The rest two of them are neural network techniques. The four, Conventional machine learning techniques are Multinomial Naïve Bayes (MNB), Logistic Regression (LR), Support Vector Machine (SVM), and Random Forest (RF). The two neural network techniques experimented in this research are Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM) techniques. To evaluate the performance of each technique, the researcher used several performance evaluation metrics such as Accuracy, F-score, Precession, and Recall. The feature extraction techniques used for conventional machine learning techniques are unigram with TF-IDF, Bigram with TF-IDF, and trigram with TF-IDF, and neural network techniques used word embedding of word2vec methods. We used the trigram with collaboration TF-IDF feature extraction, and accuracy for performance evaluation because they achieved the highest performance result. The Conventional Machine Learning achieved accuracy for, MNB 80.12%, for LR 82.52%, for RF 80.62%, and for SVM 84.62%. The Neural network techniques achieved accuracy for CNN 72.91%, for LSTM 85.01%. According to the classification performance result from the entire techniques applied, the LSTM technique achieved the highest accuracy and we used LSTM to deploy our models.

Description

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By