In the field of remote surveillance, acquiring the high-quality voice of target has always been an exciting goal. In this paper, we propose a convolutional neural network based method to extract the target’s speech signals remotely. The method consists of two parts: the optical setup enables us to obtain speckle images conveniently and covertly, and the convolutional neural model is used to recovers speech signals from continuous speckle images. Correlation coefficient and root mean square error metrics show the effectiveness of our method for high-quality speech extraction. Compare to the traditional spatial image correlation, our convolutional neural model is more accurate and more efficient in speckle image processing. The model gets an average accuracy of 94% on real data and 98% on simulated data, which is far better than the spatial image correlation. Besides, by using GPU hardware, the model can process speckle images up to 237 frames per second, far more than 10 frames per second of the spatial image correlation. Experimental results show that the method is simple, efficient and accurate, which proves our significant progress in the field of remote sound extraction.
Remote Speech Extraction from Speckle Image by Convolutional Neural Network