Download PDFOpen PDF in browserHate Speech Analysis and Moderation on Twitter Data Using BERT and Ensemble TechniquesEasyChair Preprint 13799, version 26 pages•Date: August 26, 2024AbstractTwitter, a popular social media platform, has be- come a platform for spreading hate speech, racism, sexism and other sentiments. This has raised ethical, social, and legal concerns, and researchers have developed methods to identify and classify hate speech. This paper investigates Twitter discourse with a focus on detecting hate speech, a prevalent form of online expression. The study utilizes a curated dataset to analyze negative tweets, employing the BERT model and ensemble techniques in the model, trained to detect and classify hateful content. The best classification results were achieved by BERT and CatBoost with hyper-parameter tuning, yielding an accuracy of 92% and 91.1% on the test data, respectively. Additionally, response strategies are devised to moderate content and foster constructive engagement among users. Sentiment analysis is employed to explore the emotional landscape of Twitter discourse. Furthermore, the research is expanded by utilizing clustering to classify hate speech, aiming for a detailed characterization of online hate speech to enhance our understanding. The analysis encompasses a dedicated exploration of racism and sexism detection, identifying tweets exhibiting bias. The study culminates in providing a comprehensive understanding of online discourse, with potential applications spanning content moderation, user engagement strategies, and the cultivation of a more positive digital space. Keyphrases: BERT, CatBoost, Classification, Ensemble Techniques, Moderation, Sentiment, cluster, deep learning, hyper-parameter
|