Tn this paper, an effective error checking and correction method of or Chinese medical records recognized by OCR is proposed. In our research, an optimized N-gram language model based on vocabulary rather than words is adopted to correct errors, and supervised machine learning based on maximum entropy (MaxEnt) is deployed to build a model for tokenization and named entity recognition. A medical knowledge base (MKB) is established, including dictionaries of medicine, symptoms, diseases, etc., and the frequency of each word as it appeared in the study corpus. Furthermore a Knowledge Base for Error correction (KBE) is built to automatically correct high-frequency errors. With the developed approach, the accuracy rate of the electronic medical record increases from 85.20% to 95.72%, indicating an error reduction of 71.08%.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 email@example.com
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 firstname.lastname@example.org