Tags:Approximate nearest neighbor search, Billion-scale high-dimensional data, GPU and Quantization
Abstract:
Nowadays approximate nearest neighbor search (ANN) system is widely applied to the large-scale multi-media data retrieval problem. We introduce an ANN system called Vector and Product Quantization Tree(V-PQT) based on GPU, which can be applied to billion-scale high-dimensional data sets. We propose a novel two-level tree indexing structure combined with vector and product quantization that can not only decrease the number of redundant cells significantly for indexing, but also reduce the number of distance computations during the query time. Our system also provides a novel highly parallelizable encoding method for re-ranking step with lower quantization error than previous quantization method. Due to its highly parallelizable non-exhaustive search algorithm and small memory consumption, our system can be perfectly implemented on GPU. Comparing to other state-of-the-art systems based on GPU, V-PQT system can provide a significantly better recall with a same level of runtime. The experimental result shows that our system outperforms the state-of-the-art method by 27% and 20% in the term of search accuracy on two public billion-scale datesets.
Similarity Search for Billion Scale Data Sets with V-PQ Tree on the GPU