Plasmodium Falciparum Microarray Data Analysis Using Machine Learning Approaches
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
ASTU
Abstract
Malaria is one of the deadliest diseases to humans. The disease is developing resistance to
antimalarial drugs in different countries worldwide. In addition to this problem, there is no
vaccine available despite decades of research. The systematic development of resistance to
antimalarial drugs forces researchers to generate a massive volume of data. The data
retrieved from Microarrays shows this fact. This research objective is to identify drug targets
at the gene level from gene expression data obtained from microarrays. In this research we
perform the preprocessing of the raw MA Data before high-level analysis. After loading the
raw data into the working environment of the R studio, first we explore the dataset to check
if it contains the required components. Then we apply microarray data quality control and
after that we go for background correction, normalization, and log2 transformation of the
data. Finally, we check for the existence of missing values and screening of outliers. We
applied the Empirical Bayes method to identify differentially expressed genes in the high level analysis. Identification of differentially expressed genes results in 2500 differentially
expressed genes out of 22769 genes. The study applied clustering of DEGs to group them
based on their expression values, considering that genes within the same cluster have the
same biological behavior. We applied the hierarchical clustering technique. The clustering
result gives us 226 genes falling in the fourth cluster, 283 genes in the third cluster, 616
genes in the first cluster, and 904 genes in the second cluster. We validated the clustering
result using an internal validation measure in which hierarchical clustering is selected as
the best clustering technique for this study. The final step in this research is constructing a
gene-gene interaction network for the up-regulated genes. We used Cytoscape software to
construct the network. Finally in this study, from the network we extracted top 20 genes that
can be used as drug targets.
