Cardiovascular Disease Prediction Using Machine Learning Approach
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
ASTU
Abstract
Cardiovascular diseases (CVDs) are the primary contributors to worldwide mortality, responsible
for 17.9 million deaths, constituting 32% of the total global fatalities. Notably, more than 75% of
these deaths occurred in low and middle-income countries. In Ethiopia, CVDs kills 170 people
every day. It accounts for a significant portion of non-communicable disease (NCD) deaths
fatalities and healthcare expenses. Early diagnosis of CVDs is crucial but challenging due to
limited access to primary health care program, lack of expertise, shortage of diagnosis and
treatment apparatus and the inefficiency of traditional diagnostic methods, which are ineffective,
costly, and time-consuming. Recently, machine learning has emerged as a valuable tool to support
the diagnosis of CVDs. This study leverages a stacking-based machine learning model to address
the challenge of CVD diagnosis. While prior research has focused on general CVD prediction or
specific disease types, this study classifies the disease into four common CVDs categories such as
CAD, PAD, RHD, and Stroke. The proposed stacking model was implemented using a CVD dataset
collected at St. Paul’s Hospital Millennium Medical College (SPHMMC) which consists of 2196
instances with 19 features. The obtained dataset was prepared before exposed to the model using
different preprocessing techniques, including imputer, z-score, label encoder and min-max
normalization. Stratified k-fold cross validation was utilized as dataset splitting methods during
model construction process. Performance comparisons were made with three individual machine
learning models: SVM, RF, and XGB. Model performance evaluation was carried out with and
without applying feature selection techniques (recursive feature elimination (RFE) and lasso
regularization (L1) using metrics such as accuracy, precision, recall, and F1-score metrics. Our
experiments revealed stacking model with RFE outperformed the others achieving the highest
accuracy of 97.55%.
The proposed model helps medical practitioners or cardiologists in classifying CVDs into four
common categories effectively, thereby potentially saving lives and reducing the burden of these
diseases
