Paper documents are routinely found in general litigation and criminal and terrorist investigations. The current state-of-the-art processing of these documents is to simply OCR them and search strictly the text. This ignores all handwriting, signatures, logos, images, watermarks, and any other non-text artifacts in a document. Technology, however, exists to extract key metadata from paper documents such as logos and signatures and match these against a set of known logos and signatures. We describe a prototype that moves beyond simply the OCR processing of paper documents and relies on additional documents artifacts rather than only on text in the search process. We also describe a benchmark developed for the evaluation of paper document search systems.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 firstname.lastname@example.org
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 email@example.com