Identification of Cyber Threats Information from Online News using Hybrid Machine Learning Algorithm
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
ASTU
Abstract
There are large volumes of data from the online generated cyber threat news that are freely
available which might contain valuable information. Cyber threat information is highly
increasing and can be analysed to gather informative insights of current situation.
However, news is delivered in a variety of forms, and the emergence of new cyber-attacks,
as well as the usage of ambiguous news items, has made detecting linked news more
challenging. Thus, to handle these situations, the aim of this paper is to propose
identification mechanism for cyber threat information. The system starts with identifying
cyber-attack features that will be used to classify cyber threat information using Bi directional long short time memory with Conditional Random filed (BI-LSTM-CRF) model,
as well as categorize similar news articles using Latent Semantic Analysis (LSA) to
eliminate ambiguous cyber threat news. Data will be collected from the news article
related to the cyber-attacks that include incidents or attacks that had happened. Data can
be obtained from the news websites such as Recorded Future.com, Fire Eye, Security Week,
Micro Trend and Ethiopian Monitor Website. The cyber-attack features will be identified
from the collected data. Futures such as type of cyber-threat, threat actor, the organization
affect and Country affect. For this research 2019 cyber related news articles are collected
in the form of unstructured text. Experimental results demonstrate that using Bi directional long short time memory with Conditional Random filed is an effective way of
classification performance. The model achieves an overall F-measure of 98.48% for Cyber
threat information identification with accuracy 99.12%. The findings of this study should
assist individuals by presenting a realistic picture of cyber-attack occurrences in our
environment and providing useful information to the general public, thereby improving
societal awareness about cyber-attack activities. In addition, the model requires further
improvement regarding feature selection of the cyber -attack that would be difficult for
machine to catagorized.
