Download PDFOpen PDF in browser
TH
Switch back to the title and the abstract in Thai

Feature Selection and Imbalanced Data Problem Solving in Classification of Banking Fraud Prevention

EasyChair Preprint no. 8776

5 pagesDate: September 3, 2022

Abstract

Feature selection and imbalanced data are important problems for classification techniques. Therefore, this research aims to compare the efficiency of feature selection and imbalanced data problem solving for customer classification in the case study of banking fraud prevention. We perform feature selection to find the relationship between each of the independent variables and the dependent variable. The Weight Of Evidence (WOE) and the Information Value (IV) are used to rank variables based on their importance to affect the dependent variable. For solving imbalanced data, Random Undersampling, SMOTE (Synthetic Minority Oversampling Technique), Borderline-SMOTE, and SMOTE-ENN (Synthetic Minority Oversampling Technique-EditedNearestNeighbours) are used to pre-process data and compared their accuracy with logistic regression and decision tree. Our experiment results show that WOE-based feature selection with sampling methods RUS + SMOTE using logistic regression provides the best accuracy.

Keyphrases: Banking, Classification, feature selection, fraud, imbalance

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@Booklet{EasyChair:8776,
  author = {Rachatawan Virakul and Kitsana Waiyamai},
  title = {Feature Selection and Imbalanced Data Problem Solving in Classification of Banking Fraud Prevention},
  howpublished = {EasyChair Preprint no. 8776},

  year = {EasyChair, 2022}}
Download PDFOpen PDF in browser