Tigrinya Hate Speech Detection and Classification from Facebook Posts and Comments Using Deep Learning Approaches

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

ASTU

Abstract

Nowadays, using social media to share information with loved ones, friends, and coworkers is one of the easiest things to do. Even though people are free to express their feelings as they choose, spreading hate speech using this platform has become more convenient. Hate speech detection is a way in which a machine with trained model detects the speech and classifies it as hate or hate-free speech accordingly. As a result, to the best of our knowledge there is no single study conducted regarding the Tigrinya hate speech detection. The main objective of this study is to design and develop hate speech detection model for the Tigrinya language from Facebook posts and comments. Hate speech detection models have been developed worldwide, including in our own country of Ethiopia, for a variety of languages by various scholars. However, due to the difference in language, the hate speech detection model designed for Amharic or Afan Oromo won't be applicable in the case of the Tigrinya language. As a result, we have developed a model that detects the Tigrinya hate speeches for both binary and multi-class classification. In this study to achieve our objective we have collected 5400 posts and comments as a dataset, and the dataset was collected from different Facebook pages using the Facepager tool and prepared it in CSV file format. While the 5400 of the datasets have been used for multi-class classification, only 3608 of it has been applied for binary classification. We have used the stratified k fold cross-validation techniques for the purpose of splitting our dataset to train and test our model. We have designed the model using three different deep learning approaches with three different feature extraction techniques. For the purpose of designing the model the Bi LSTM, CNN-LSTM, and CNN has been applied with the feature extraction techniques of Word2vec, Fast text and the Keras Embedding layer. We have applied the dropout and L2 regularization techniques to overcome the overfitting problem occurred in each model. As a result, for the multi-class classification the Bi-LSTM model with the Fast text of CBOW has outperformed the CNN-LSTM and the CNN model with the Accuracy of 87.41 %, Precision of 88.02 %, Recall of 85.74 %, and F1-Score of 86.86 %. Additionally, for the binary classification the Bi-LSTM model with the Fast text of skip-gram has outperformed the CNN-LSTM and the CNN model with the Accuracy of 96.11 %.

Description

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By