Detecting Toxic Content Online and the Effect of Training Data on Classification Performance

The spread of toxic content online has attracted a wealth of research into methods of automatic detection and classification in recent years. However, two limitations still exist: 1) the lack of support for multi-label classification; and 2) the lack of understanding of the impact of the typical unbalanced datasets on such tasks. In this work, we build three state of the art methods for the task of multi-label classification of toxic content online, and compare the effect of training data size on their performance. The three methods of choice are based on Support Vector Machine (SVM), Convolutional Neural Networks (CNN) and Long-Short-Term Memory Networks (LSTM), respectively. We conduct learning curve analysis and show that CNN is the most robust method as it outperforms the other two regardless of the sizes of the dataset, even on very small amounts of data. This challenges the conventional belief that Neural Networks require significant amounts of data to train accurate models. We also empirically derive indicative thresholds of training data size to help determine a reliable estimate of classifier performance, or maximise potential classifier performance in such tasks.

Keyphrases: classifier performance, Convolutional Neural Network, deep learning, Deep Neural Network, detecting hate speech, hate speech, learning curve, machine learning, multi-label classification, Natural Language Processing, neural network, NLP, offensive language, text classification, text mining, toxic comment, toxic content, toxic content classification, training data

