Afaan Oromo MultiLabel News Text Classification Using Convolutional Neural Network(CNN)
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
ASTU
Abstract
In today’s world, news texts are based on multi-label classification. Furthermore, large amount
of text documents are generated from different sources, particlularly from online and offline.
With the explosive advance in Internet news media and the disordered status of news texts, it is
difficult to access the desired content from the sources on time. Therefore the thesis puts forward
an automatic classification model for news text based on a Convolutional Neural Network. It has
developed a model for multi-label news text classification for Afaan Oromo using a Convolution al Neural Network model. The model takes text as input and classifies it to the predefined la bels/categories based on the content of the text. Text classification is a technique that classifies
textual information into a predefined set of Classes. In this work, various natural language pro cessing tasks are performed. This includes text preprocessing which includes normalization, to kenization, text cleaning, and removal of stop words. The main objective of natural language
processing is to make computers perform tasks that require the participation of humans to solve
labor force, cost, and time devoted. As most previous researchers have done works associated
with single-level which only consider mutually exclusive. And this research mainly focused on
classifying the news text in multi-label classification. In this study, six thousand four hundred
eleven (6411) newly collected and annotated news datasets have been used to build the model for
the Afaan Oromo language using the Convolutional Neural Network model. After experiments
performed by a convolutional neural network on the problem domain, Convolutional Neural
Network has been selected because of the ability to simply assimilate pre-trained word embed ding as well as the non-linearity of the network lead to greater classification accuracy. The ex periment undertaken has shown different results for pre-trained word embedding when compared
to the non-pre-trained word embedding model. That means, Convolutional Neural Network mod el implemented on news text classification based on pre-trained word embedding using a 10/90
train test ratio has resulted in greater performance with precision 83.3%, recall 76.3%, F1-
score 79.3%, and accuracy of 73.2%. On the other hand, the result of the experiment on non-pre trained word embedding shows precision 74%, recall 73.6%, F1-Score 73.8%, and accuracy 68
% .
