Tags:Discrete Minimax Classifier, Imbalance data and Multi-label classification
Abstract:
Multi-label classification (MLC) is a supervised learning problem where each instance can be associated with none, one, or multiple labels. MLC has received increasing attention due to its wide range of applications, such as text categorization and medical diagnosis. Despite a rich literature on MLC, handling imbalanced data, often encountered in real-world MLC datasets, has not been tackled satisfactorily. Based on a thorough literature review, it appears that the existing methods for imbalanced MLC are either hard to be coupled with sound theoretical guarantees or of limited scalability. This paper discusses the potential (dis)advantages of existing methods for imbalanced MLC, when being coupled with Binary relevance classifier (BRC), and introduces Discrete Minimax BRC (DMBRC), which would be a promising attempt to robustify the BRC by leveraging theoretically sound properties of the Discrete Minimax Classifier. We also provide empirical evidence to illustrate how DMBRC may be advantageous in balancing the label-wise error rates. Finally, we envision future works on further strengthening DMBRC in both label-wise error rates and conventional MLC evaluation metrics.
Discrete Minimax Binary Relevance Classifier for Imbalanced Multi-Label Classification