Plasmodium Falciparum Microarray Data Analysis Using Machine Learning Approaches

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

ASTU

Abstract

Malaria is one of the deadliest diseases to humans. The disease is developing resistance to antimalarial drugs in different countries worldwide. In addition to this problem, there is no vaccine available despite decades of research. The systematic development of resistance to antimalarial drugs forces researchers to generate a massive volume of data. The data retrieved from Microarrays shows this fact. This research objective is to identify drug targets at the gene level from gene expression data obtained from microarrays. In this research we perform the preprocessing of the raw MA Data before high-level analysis. After loading the raw data into the working environment of the R studio, first we explore the dataset to check if it contains the required components. Then we apply microarray data quality control and after that we go for background correction, normalization, and log2 transformation of the data. Finally, we check for the existence of missing values and screening of outliers. We applied the Empirical Bayes method to identify differentially expressed genes in the high level analysis. Identification of differentially expressed genes results in 2500 differentially expressed genes out of 22769 genes. The study applied clustering of DEGs to group them based on their expression values, considering that genes within the same cluster have the same biological behavior. We applied the hierarchical clustering technique. The clustering result gives us 226 genes falling in the fourth cluster, 283 genes in the third cluster, 616 genes in the first cluster, and 904 genes in the second cluster. We validated the clustering result using an internal validation measure in which hierarchical clustering is selected as the best clustering technique for this study. The final step in this research is constructing a gene-gene interaction network for the up-regulated genes. We used Cytoscape software to construct the network. Finally in this study, from the network we extracted top 20 genes that can be used as drug targets.

Description

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By