Clickbait Detection for Amharic Language Using Neural Networks
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
ASTU
Abstract
With the increasing number of Ethiopians actively engaging on the Internet and social media
platforms, the prevalence of clickbait content has become a significant concern. Clickbait,
often utilizing enticing titles to tempt users into clicking, has become rampant for various
reasons, including advertising and revenue generation. However, the Amharic language,
spoken by a large population, lacks sufficient NLP resources for addressing this issue. In
this research, we developed a model for classifying and detecting clickbait titles in Amharic.
To facilitate this, we present the first Amharic clickbait dataset, comprising 53,227 posts
collected from prominent social media platforms such as YouTube, Twitter, and Facebook.
We established a baseline using traditional machine learning models like Logistic
Regression (LR), Random Forest (RF), and Support Vector Machines (SVM) using TF-IDF
and N-gram features. We then explored the effectiveness of neural network architectures,
including Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and
Convolutional Neural Network (CNN), combined with two kinds of word embedding
techniques, namely word2vec and fastText. Our findings reveal that the CNN model with
fastText word embedding achieves the highest performance, with an accuracy of 94.27%
and an F1-score of 94.24%. This research provides valuable insights into combating
clickbait content in Amharic and contributes to the advancement of natural language
processing for resource-poor languages
