Tags:Adversarial learning, Common representation, Cross-modal retrieval, Private representation and Representation separation
Abstract:
Cross-modal retrieval aims to search the semantically similar instances from the other modalities by giving a query from one modality. Recently, generative adversarial networks (GANs) has been proposed to model the joint distribution over the data from different modalities and to learn the common representations for cross-modal retrieval. However, most of existing GANs-based methods simply project original representations of different modalities into a common representation space, and ignore the fact that different modalities share the common characteristics and on the other side each modality has the individual characteristics.To address this problem, in this paper, we propose a novel crossmodal retrieval method, called Representation Separation Adversarial Networks (ReSAN), which explicitly separates the original representations into common latent representations and private representations. Specifically, we minimize the correlation between the common representations and private representations to ensure independence of them. Then, we reconstruct the original representations via exchanging the common representations of different modalities to encourage the information swap. Finally, the labels are utilized to increase the discriminant of common representations. Comprehensive experimental results on two widely used datasets show that the proposed method achieved better performance than many existing GANs-based methods, and demonstrate that explicitly modeling the private representation for each modality can improve the model to extract common latent representations.
Representation Separation Adversarial Networks for Cross-Modal Retrieval