Tags:Large Language Models, Privacy, prompt engineering, rephrased text, Sanitizing, Self-disclosure and Sensitivity detection
Abstract:
The proliferation of platforms such as e-commerce and social networks has led to an increasing amount of personal health information being disclosed in user-generated content. This study investigates the use of Large Language Models (LLMs) to detect and sanitize sensitive health data disclosures in reviews posted on Amazon. Specifically, we present an approach that uses ChatGPT to evaluate both the sensitivity and informativeness of Amazon reviews. The approach uses prompt engineering to identify sensitive content and rephrase reviews to reduce sensitive disclosures while maintaining informativeness. Empirical results indicate that ChatGPT is capable of reliably assigning sensitivity scores and informativeness scores to user-generated reviews and can be used to generate sanitized reviews that remain informative.
“I Was Diagnosed with...": Sensitivity Detection and Rephrasing of Amazon Reviews with ChatGPT