Auto-Insurance Fraud Claim Detection by using Ensemble Learning Techniques

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

ASTU

Abstract

Insurance companies have recently suffered from dishonest insured claims. As a result, auto insurance fraud has become a major source of concern for both businesses and consumers. Multiple researchers have proposed auto insurance fraud claim detection systems that are implemented using machine learning techniques, but they lack a good feature engineering approach, they use too small features, and they do not identify which type of auto insurance fraud has occurred. The goal of this thesis is to develop an auto insurance fraud claim detector that improves the feature engineering approach by introducing a fraud type-based ensemble classifier. To achieve this goal, several ensemble learning models have been experimented with and tested with different methods to handle the auto-insurance claim dataset, which has 11,210 rows and 40 features and is publicly available. Different data preprocessing techniques were applied, such as data cleaning, missing value handling, data encoding, data scaling, and feature selection methods. Our study has proposed combine feature selection method, one detection model and one classification model with three functions. The first one is determining whether a claim is fraud or notfraud called a claim detection model, in which three experiments were carried out. Experiment 1 on the original dataset with no feature selection, Experiment 2 with sequential feature selection, and Experiment 3 with superior feature selection with ensembles such as RF, bagging classifiers, XGBoost, AdaBoost, and stacking classifiers. To avoid overfitting, the proposed model is tested using K-fold cross-validation. Other evaluation metrics such as precision, recall, f1 score, and specificity are also used. The stacking classifiers with sequential feature selection produced the highest accuracy of 99.31% and an F1 score of 0.98 across all three experiments. The second one determines the type of fraud, which has five classes called the fraud type classification model. The RF classifier with sequential feature selection produced the highest precision, recall, and f1-score of 97%. The third function is feature selection using a combined feature selector, which combines and votes on the top selected features that increase the classification performance of the auto insurance claim detection model. In the future, text analytics and image processing, as well as video processing, should be considered.

Description

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By