Download PDFOpen PDF in browser

Feedback Learning: Automating the Process of Correcting and Completing the Extracted Information

EasyChair Preprint no. 992

6 pagesDate: May 12, 2019

Abstract

In recent years, with the increasing usage of digital media and advancements in deep learning architectures, most of the paper-based documents have been revolutionized into digital versions. These advancements have helped the state-of-the-art Optical Character Recognition (OCR) and digital mailroom technologies become progressively efficient. Commercially, there already exists end to end systems which use OCR and digital mailroom technologies for extracting relevant information from financial documents such as invoices. However, there is plenty of room for improvement in terms of automating and correcting post information extracted errors. This paper describes the user-involved, self-correction concept based on the sequence to sequence Neural Machine Translation (NMT) as applied to rectify the incorrectness in the results of the information extraction. Even though many efficient Post-OCR error rectification methods have been introduced in the recent past to improve the quality of digitized documents, they are still imperfect and demand improvement in the area of context-based error correction specifically for the documents involving sensitive information. This paper further illustrates the capability of sequence learning with the help of feedback provided during each cycle of training, yields relatively better results and have outsmarted the state-of-the-art OCR error correction methods.

Keyphrases: document understanding, Post IE Error Correction and Completeness, Sequence to Sequence Neural Machine Translation (NMT)

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@Booklet{EasyChair:992,
  author = {Rakshith Bymana Ponnappa and Khurram Azeem Hashmi and Syed Saqib Bukhari and Andreas Dengel},
  title = {Feedback Learning: Automating the Process of Correcting and Completing the Extracted Information},
  howpublished = {EasyChair Preprint no. 992},
  doi = {10.29007/jl5f},
  year = {EasyChair, 2019}}
Download PDFOpen PDF in browser