Afaan Oromo MultiLabel News Text Classification Using Convolutional Neural Network(CNN)

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

ASTU

Abstract

In today’s world, news texts are based on multi-label classification. Furthermore, large amount of text documents are generated from different sources, particlularly from online and offline. With the explosive advance in Internet news media and the disordered status of news texts, it is difficult to access the desired content from the sources on time. Therefore the thesis puts forward an automatic classification model for news text based on a Convolutional Neural Network. It has developed a model for multi-label news text classification for Afaan Oromo using a Convolution al Neural Network model. The model takes text as input and classifies it to the predefined la bels/categories based on the content of the text. Text classification is a technique that classifies textual information into a predefined set of Classes. In this work, various natural language pro cessing tasks are performed. This includes text preprocessing which includes normalization, to kenization, text cleaning, and removal of stop words. The main objective of natural language processing is to make computers perform tasks that require the participation of humans to solve labor force, cost, and time devoted. As most previous researchers have done works associated with single-level which only consider mutually exclusive. And this research mainly focused on classifying the news text in multi-label classification. In this study, six thousand four hundred eleven (6411) newly collected and annotated news datasets have been used to build the model for the Afaan Oromo language using the Convolutional Neural Network model. After experiments performed by a convolutional neural network on the problem domain, Convolutional Neural Network has been selected because of the ability to simply assimilate pre-trained word embed ding as well as the non-linearity of the network lead to greater classification accuracy. The ex periment undertaken has shown different results for pre-trained word embedding when compared to the non-pre-trained word embedding model. That means, Convolutional Neural Network mod el implemented on news text classification based on pre-trained word embedding using a 10/90 train test ratio has resulted in greater performance with precision 83.3%, recall 76.3%, F1- score 79.3%, and accuracy of 73.2%. On the other hand, the result of the experiment on non-pre trained word embedding shows precision 74%, recall 73.6%, F1-Score 73.8%, and accuracy 68 % .

Description

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By