Auto-Insurance Fraud Claim Detection by using Ensemble Learning Techniques
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
ASTU
Abstract
Insurance companies have recently suffered from dishonest insured claims. As a result, auto
insurance fraud has become a major source of concern for both businesses and consumers.
Multiple researchers have proposed auto insurance fraud claim detection systems that are
implemented using machine learning techniques, but they lack a good feature engineering
approach, they use too small features, and they do not identify which type of auto insurance
fraud has occurred. The goal of this thesis is to develop an auto insurance fraud claim detector
that improves the feature engineering approach by introducing a fraud type-based ensemble
classifier. To achieve this goal, several ensemble learning models have been experimented with
and tested with different methods to handle the auto-insurance claim dataset, which has 11,210
rows and 40 features and is publicly available. Different data preprocessing techniques were
applied, such as data cleaning, missing value handling, data encoding, data scaling, and
feature selection methods. Our study has proposed combine feature selection method, one
detection model and one classification model with three functions. The first one is determining
whether a claim is fraud or notfraud called a claim detection model, in which three experiments
were carried out. Experiment 1 on the original dataset with no feature selection, Experiment 2
with sequential feature selection, and Experiment 3 with superior feature selection with
ensembles such as RF, bagging classifiers, XGBoost, AdaBoost, and stacking classifiers. To
avoid overfitting, the proposed model is tested using K-fold cross-validation. Other evaluation
metrics such as precision, recall, f1 score, and specificity are also used. The stacking classifiers
with sequential feature selection produced the highest accuracy of 99.31% and an F1 score of
0.98 across all three experiments. The second one determines the type of fraud, which has five
classes called the fraud type classification model. The RF classifier with sequential feature
selection produced the highest precision, recall, and f1-score of 97%. The third function is
feature selection using a combined feature selector, which combines and votes on the top
selected features that increase the classification performance of the auto insurance claim
detection model. In the future, text analytics and image processing, as well as video processing,
should be considered.
