Volume 12, no. 3Pages 74 - 88

A Method to Reduce Errors of String Recognition Based on Combination of Several Recognition Results with Per-Character Alternatives

K.B. Bulatov

We consider the problem on recognition of a string object presented in several video stream frames. In order to maximize the output accuracy, we combine several results of the recognition. To this end, we consider a model of result of a string object recognition. The model takes into account the estimations of alternative results of per-character classification. Also, we propose an algorithm to combine results of a string recognition according to this model. The algorithm was evaluated on a MIDV-500 dataset of document images. The experimental results show that the proposed algorithm allows to achieve the high accuracy of recognition result due to an analysis of several images, and the use of the estimations of alternative results of per-character classification gives the higher results then a combination of strings that contain only the final alternatives of each character.

Full text

Keywords: recognition in video stream; mobile OCR; recognition algorithms.
References: 1. Bulatov K., Arlazarov V.V., Chernov T. et al. Smart IDReader: Document Recognition in Video Stream. Proceeding 14th International Conference on Document Analysis and Recogntiion, 2017, no. 6, pp. 39-44. DOI: 10.1109/ICDAR.2017.347
2. Burie J.-C., Chazalon J., Coustaty M. et al. ICDAR 2015 Competition on Smartphone Document Capture and OCR. Proceeding 13th International Conference on Document Analaysis and Recognition, 2015, pp. 1161-1165. DOI: 10.1109/ICDAR.2015.7333943
3. Puybareau E., Geraud T. Real-Time Document Detection in Smartphone Videos. Proceeding 25th IEEE International Conference on Image Processing, 2018, pp. 1498-1502. DOI: 10.1109/ICIP.2018.8451533
4. Arlazarov V.V., Zhukovsky A., Krivtsov V et al. [Analysis of Using Stationary and Mobile Small-Scale Digital Video Cameras for Document Recognition]. Information Technologies and Computation Systems, 2014, no. 3, pp. 71-78. (in Russian)
5. Chernov T., Kolmakov S., Nikolaev D. An Algorithm for Detection and Phase Estimation of Protective Elements Periodic Lattice on Document Image. Pattern Recognition and Image Analysis, 2017, vol. 27, no. 1, pp. 53-65. DOI: 10.1134/S1054661817010023
6. Arlazarov V.V., Bulatov K., Chernov T., Arlazarov V.L. A Dataset for Identity Documents Analysis and Recognition on Mobile Devices in Video Stream, 2018. Available at: arXiv.1807.05786.
7. Kittler J., Hatef M., Duin R.P.W., Matas J. On Combining Classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, vol. 20, no. 3, pp. 226-239. DOI: 10.1109/34.667881
8. Kuncheva L.I., Bezdek J.C., Duin R.P.W. Decision Templates for Multiple Classifier Fusion: an Experimental Comparison. Pattern Recognition, 2001, vol. 34, no. 2, pp. 299-314. DOI: 10.1016/S0031-3203(99)00223-X
9. Fiscus J.G. A Post-Processing System to Yield Reduced Word Error Rates: Recognizer Output Voting Error Reduction (ROVER). Proceeding IEEE Workshop on Automatic Speech Recognition and Understanding, 1997, pp. 347-354.
10. Wemhoener D., Yalniz I.Z., Manmatha R. Creating an Improved Version Using Noisy OCR from Multiple Editions. Proceeding 12th International Conference on Document Analysis and Recognition (ICDAR), 2013, pp. 160-164. DOI: 10.1109/ICDAR.2013.39
11. Stuner B., Chatelain C., Paquet T. LV-ROVER: Lexicon Verified Recognizer Output Voting Error Reduction, 2017. Available at: arXiv.1707.07432.
12. Llobet R., Cerdan-Navarro J.-R., Perez-Cortes J.-C., Arlandis J. OCR Post-Processing Using Weighted Finite-State Transducers. Proceeding 20th International Conference on Pattern Recognition, 2010, pp. 2021-2024. DOI: 10.1109/ICPR.2010.498
13. Bulatov K.B., Kirsanov V.Yu., Arlazarov V.V. et al. [Methods of Recognition Results Integration for Document Text Fields in a Video Dtream of a Mobile Device]. Bulletin of the Russian Foundation for Basic Research, 2016, vol. 92, no. 4, pp. 109-115. (in Russian) DOI: 10.22204/2410-4639-2016-092-04-109-115
14. Raspoznavanie. Klassifikatsiya. Prognoz. Matematicheskie metody i ikh primenenie [Pattern Recognition. Classification. Forecasting. Mathematical Tecniques and Their Application]. Moscow, Nauka, 1989. (in Russian)
15. Krizhevsky A., Sutskever I., Hinton G.E. ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems 25, 2015, pp. 1097-1105.
16. Sankoff D., Kruskal J. Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, Stanford, Center for the Study of Language and Information, 1999.
17. Yujian L., Bo L. A Normalized Levenshtein Distance Metric. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, vol. 29, no. 6, pp. 1091-1095. DOI: 10.1109/TPAMI.2007.1078
18. Ing-Jr Ding, Chih-Ta Yen, Yen-Ming Hsu. Developments of Machine Learning Schemes for Dynamic Time-Wrapping-Based Speech Recognition. Mathematical Problems in Engineering, 2013, 10 p. DOI: 10.1155/2013/542680
19. Casenave T. Overestimation for Multiple Sequence Alignment. IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology (CIBCB), 2007, pp. 159-164. DOI: 10.1109/CIBCB.2007.4221218
20. Zilbershtein S. Using Anytime Algorithms in Intelligent Systems. AI Magazine, 1996, vol. 17, pp. 73-83.