Tags:Deep learning, Document Matching and Information Retrieval
Abstract:
Plagiarism detection is a widely used technique to uniquely identify quality of work. We address in this paper, the problem of predicting similarities amongst a collection of documents. This technique has widespread uses in academic institutions. In this paper, we propose a simple yet effective meth-od for detection of plagiarism by using a robust word detection and segmen-tation procedure followed by a convolution neural network (CNN) - Bi-directional Long Short Term Memory (biLSTM) pipeline to extract the text. Our approach also extract and encodes common patterns like scratches in handwriting for improving accuracy on real-world use cases. The extracted information from multiple documents using comparison metrics are used to find the documents which have been plagiarized from a source. Extensive experiments in our research show that this approach may help simplify the examining process and can act as a cheap viable alternative to many modern approaches used to detect plagiarism from handwritten documents.
A Robust Approach to Plagiarism Detection in Handwritten Documents