Information Extraction from Amharic News Text by Integrating Reinforcement Learning
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
ASTU
Abstract
Information extraction is a technology that enables relevant contents to be extracted from textual
information available electronically. Information Extraction techniques, in the opinion of many
researchers, require access to a large collection of documents. Preparing a large collection of
documents and well-annotated datasets in low-resource languages like Amharic, however, has
become a challenging task due to the scarcity of documents. Since many researchers working in
this area didn’t consider the fact that information is relative based on the source the developed
IE systems are source-dependent. The objective of this research is to suggest a model that
extracts information from Amharic news text where the amount of training data is an ounce and,
in a case, where more related information is needed. The model is going to work by integrating
Reinforcement learning techniques to gain additional documents with related information. The
work in this research has two phases with different components and subcomponents inside them.
Phase one is a rule-based extraction system and the other phase consists of a reinforcement
learning agent that is responsible for making decisions based on the environmental state. The
research work comprises extraction from a given news text and extraction from new sources or
documents, which are recurring until satisfactory information is collected and this is addressed
by the DQN algorithm. We use a deep Q network that has been tuned to maximize a reward
function that measures extraction precision. Where our model learns to choose the best course of
action based on the information in the document. The DQN agent is trained for 50,000 steps
using 3 different policies to compare a better reward result. We performed individual
experimental evaluations for each phase and component separately the experiment result
performed on a total of 1590 news stories with 65810 sentences with 225269 words shows that
our solution gives a promising result and has a combination of 93.5 % accuracy score
