Information Extraction from Amharic News Text by Integrating  Reinforcement Learning

Natnael, Girma

Information Extraction from Amharic News Text by Integrating Reinforcement Learning

Files

Natnael Girma.pdf (3.26 MB)

Date

2022-09

Authors

Natnael, Girma

Publisher

ASTU

Abstract

Information extraction is a technology that enables relevant contents to be extracted from textual information available electronically. Information Extraction techniques, in the opinion of many researchers, require access to a large collection of documents. Preparing a large collection of documents and well-annotated datasets in low-resource languages like Amharic, however, has become a challenging task due to the scarcity of documents. Since many researchers working in this area didn’t consider the fact that information is relative based on the source the developed IE systems are source-dependent. The objective of this research is to suggest a model that extracts information from Amharic news text where the amount of training data is an ounce and, in a case, where more related information is needed. The model is going to work by integrating Reinforcement learning techniques to gain additional documents with related information. The work in this research has two phases with different components and subcomponents inside them. Phase one is a rule-based extraction system and the other phase consists of a reinforcement learning agent that is responsible for making decisions based on the environmental state. The research work comprises extraction from a given news text and extraction from new sources or documents, which are recurring until satisfactory information is collected and this is addressed by the DQN algorithm. We use a deep Q network that has been tuned to maximize a reward function that measures extraction precision. Where our model learns to choose the best course of action based on the information in the document. The DQN agent is trained for 50,000 steps using 3 different policies to compare a better reward result. We performed individual experimental evaluations for each phase and component separately the experiment result performed on a total of 1590 news stories with 65810 sentences with 225269 words shows that our solution gives a promising result and has a combination of 93.5 % accuracy score

Keywords

information extraction, Reinforcement learning, environmental state, deep Q network (DQN)

URI

http://10.240.1.28:4000/handle/123456789/1612

Collections

Thesis

Full item page

Information Extraction from Amharic News Text by Integrating Reinforcement Learning

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By