A Hybrid Analysis And Detection Of Android Malware Using Machine Learning And Blockchain Technology
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
ASTU
Abstract
Through an increasing number of mobile devices running the Android mobile operating system, their extensive usage, and various application possibilities, those devices have become valuable targets for malicious applications. The goal of this study is to design a hybrid analysis framework for android Malware Detection through integrating Machine learning and Blockchain technology. With static and dynamic analyses, researchers gain valuable insights into the technique of malware, where machine learning is often used to detect new android maliciously. Android malware is continuously developing, so training a machine learning model using out-of-date malware could negatively affect the performance of the predictive identification of more recent malware. Several existing solutions used outdated malware collections. In this study, recent malicious and benign Android apps from suitable repositories were collected. Component and interactive hybrid analyses were implemented to extract dynamic and static data from Android apps. Both methods attempt to increase the code coverage of the dynamic analysis, which was executed on real devices. The analysis result of the hybrid analysis is stored permanently in Elasticsearch in JSON format. Random forest and Extra trees classifier used to perform binary classification on malicious and benign Android apps using a total of 49554 and 46625 features of the component and interactive, respectively. A Virustotal service is applied to reduce the training noise. The classification result, Android malicious codes sent to the blockchain for automatically generating new blocks. K-fold cross-validation was used to evaluate test error, measured using well-known metrics: Precision, Recall, and F1score. Finally, the prediction of unknown application results in an accuracy of approximately 93%. From this result, it is possible to conclude that the hybrid analysis is preferred since it has a better performance and is less error-prone compared to the individual analysis approach.
