Amharic Content Cyberbullying Detection on Social Media Using Deep Learning Approach
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
ASTU
Abstract
Cyberbullying on social media has evolved into a complex issue in the modern era. Despite its
importance in communication and information sharing, social media is clearly being used as a
playground for cyberbullying. Sexual content-based cyberbullying has recently become a
widespread problem in Ethiopia. When a large number of people share a problem, it can lead to
social, mental and psychological problems. Because social media content is unstructured,
detecting cyberbullying based on sexual texts on social media is a time-consuming and complex
task. In recent years, it has become more difficult to detect sexual texts on social media, attracting
several researchers to work on cyberbullying detection. Due to the success of deep learning
algorithms in natural language processing tasks, some researchers proposed deep learning
models for cyberbullying detection. Furthermore, most studies on this topic have concentrated on
ethnicity, religion, disability, and socio-political context, and this study proposed deep learning
cyberbullying detection for Amharic sexual text on social media as a solution. A sexual dataset
prepared from collected Amharic text from the Facebook platform and the collected binary
Amharic sexual dataset is adopted to develop models. The prepared dataset contains binary
classes “Bullying” and “Non-bullying”. Bidirectional RNNs and attention mechanisms are
implemented using Word2vec as feature representation. The word2vec model is trained based on
a Skip-gram model due to its suitability for representing non frequent keywords of limited size
dataset. Additionally, LSTM and GRU networks are also implemented for model comparison. The
models are trained using 5 k-fold cross-validation. The results show that models achieved good
performance when using 5-fold cross-validation on our dataset. Then, several experiments are
employed to select the best-performing model and finally, the Bi-LSTM model outperformed all
other models with an accuracy of 96.94% and an f1_score of 95%.
