Feature Selection and Imbalanced Data Problem Solving in Classification of Banking Fraud Prevention

Rachatawan Virakul; Kitsana Waiyamai

Download PDF Open PDF in browser

Switch back to the title and the abstract in Thai

Feature Selection and Imbalanced Data Problem Solving in Classification of Banking Fraud Prevention

EasyChair Preprint no. 8776

5 pages•Date: September 3, 2022

Rachatawan Virakul and Kitsana Waiyamai

Abstract

Feature selection and imbalanced data are important problems for classification techniques. Therefore, this research aims to compare the efficiency of feature selection and imbalanced data problem solving for customer classification in the case study of banking fraud prevention. We perform feature selection to find the relationship between each of the independent variables and the dependent variable. The Weight Of Evidence (WOE) and the Information Value (IV) are used to rank variables based on their importance to affect the dependent variable. For solving imbalanced data, Random Undersampling, SMOTE (Synthetic Minority Oversampling Technique), Borderline-SMOTE, and SMOTE-ENN (Synthetic Minority Oversampling Technique-EditedNearestNeighbours) are used to pre-process data and compared their accuracy with logistic regression and decision tree. Our experiment results show that WOE-based feature selection with sampling methods RUS + SMOTE using logistic regression provides the best accuracy.

Keyphrases: Banking, Classification, feature selection, fraud, imbalance

Links:

https://easychair.org/publications/preprint/L9vn

BibTeX entry

BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:

@Booklet{EasyChair:8776,
  author = {Rachatawan Virakul and Kitsana Waiyamai},
  title = {Feature Selection and Imbalanced Data Problem Solving in Classification of Banking Fraud Prevention},
  howpublished = {EasyChair Preprint no. 8776},

  year = {EasyChair, 2022}}

Download PDF Open PDF in browser