Clickbait Detection for Amharic Language Using Neural Networks

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

ASTU

Abstract

With the increasing number of Ethiopians actively engaging on the Internet and social media platforms, the prevalence of clickbait content has become a significant concern. Clickbait, often utilizing enticing titles to tempt users into clicking, has become rampant for various reasons, including advertising and revenue generation. However, the Amharic language, spoken by a large population, lacks sufficient NLP resources for addressing this issue. In this research, we developed a model for classifying and detecting clickbait titles in Amharic. To facilitate this, we present the first Amharic clickbait dataset, comprising 53,227 posts collected from prominent social media platforms such as YouTube, Twitter, and Facebook. We established a baseline using traditional machine learning models like Logistic Regression (LR), Random Forest (RF), and Support Vector Machines (SVM) using TF-IDF and N-gram features. We then explored the effectiveness of neural network architectures, including Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Convolutional Neural Network (CNN), combined with two kinds of word embedding techniques, namely word2vec and fastText. Our findings reveal that the CNN model with fastText word embedding achieves the highest performance, with an accuracy of 94.27% and an F1-score of 94.24%. This research provides valuable insights into combating clickbait content in Amharic and contributes to the advancement of natural language processing for resource-poor languages

Description

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By