Amharic Content Cyberbullying Detection on Social Media Using Deep Learning Approach

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

ASTU

Abstract

Cyberbullying on social media has evolved into a complex issue in the modern era. Despite its importance in communication and information sharing, social media is clearly being used as a playground for cyberbullying. Sexual content-based cyberbullying has recently become a widespread problem in Ethiopia. When a large number of people share a problem, it can lead to social, mental and psychological problems. Because social media content is unstructured, detecting cyberbullying based on sexual texts on social media is a time-consuming and complex task. In recent years, it has become more difficult to detect sexual texts on social media, attracting several researchers to work on cyberbullying detection. Due to the success of deep learning algorithms in natural language processing tasks, some researchers proposed deep learning models for cyberbullying detection. Furthermore, most studies on this topic have concentrated on ethnicity, religion, disability, and socio-political context, and this study proposed deep learning cyberbullying detection for Amharic sexual text on social media as a solution. A sexual dataset prepared from collected Amharic text from the Facebook platform and the collected binary Amharic sexual dataset is adopted to develop models. The prepared dataset contains binary classes “Bullying” and “Non-bullying”. Bidirectional RNNs and attention mechanisms are implemented using Word2vec as feature representation. The word2vec model is trained based on a Skip-gram model due to its suitability for representing non frequent keywords of limited size dataset. Additionally, LSTM and GRU networks are also implemented for model comparison. The models are trained using 5 k-fold cross-validation. The results show that models achieved good performance when using 5-fold cross-validation on our dataset. Then, several experiments are employed to select the best-performing model and finally, the Bi-LSTM model outperformed all other models with an accuracy of 96.94% and an f1_score of 95%.

Description

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By