Tags:Audio Authentication, Deep learning, Deepfake, Speech and Watermark
Abstract:
Generative AI, particularly through ``deepfake" technology, stands at the crossroads of innovation and ethical dilemma. On one hand, it brings unprecedented advancements, transforming how we interact with digital content. On the other hand, it significantly compromises privacy and security, casting a shadow over the reliability of speaker recognition systems and fueling misuse in telecommunication fraud and manipulation of public opinion. This stark contrast not only raises legitimate concerns over the safety of sharing personal audio and video but also questions the very authenticity of digital media. To address the challenges of traceability in deepfake content and guarantee the integrity of audio, we propose a new solution specifically designed to counteract voice conversion and synthetic speech attacks. Leveraging cutting-edge deep learning technology, three extension strategies and ensemble learning of synthesis layer, this approach not only overcomes the inherent limitations of existing forensic methods but also resolves the issues associated with high-capacity watermarks. It achieves exceptionally high accuracy and imperceptibility across multiple speech datasets, various synthetic forgery methods, and numerous speech processing algorithms.
Proactive Audio Authentication Using Speaker Identity Watermarking