Volume 14, no. 4Pages 5 - 23

Evolution of the Viola - Jones Object Detection Method: a Survey

V.V. Arlazarov, Ju.S. Voysyat, D.P. Matalov, D.P. Nikolaev, S.A. Usilin
The Viola and Jones algorithm is one of the most well-known methods of object detection in digital images. Over the past 20 years since the first publication, the method has been extensively studied, and many modifications of the original algorithm and its individual parts have been proposed by researchers and engineers. Some ideas popularized by Paul Viola and Michael Jones became the basis for many other algorithms of object localization in images. This paper presents a description of Viola and Jones algorithm, the history of its development and modifications in the context of various problems of object localization in images, as well as a description of the current state of affairs: the method's place in the era of convolutional neural networks extensive application.
Full text
Keywords
Viola - Jones algorithm; pattern recognition; machine learning; object classification; object localization; object detection.
References
1. Henderson C. Driving Crime Down: Denying Criminals the Use of the Road. Available at: https://popcenter.asu.edu/sites/default/files/Henderson.pdf (accessed 21 July 2021)
2. China's Watchful Eye (2021). Available at: https://www.washingtonpost.com/news/world/\wp/2018/01/07/feature/in-china-facial-recognition-is-sharp-end-of-a-drive-for-total-surveillance/ (accessed 21 July 2021)
3. Du S. et al. Automatic License Plate Recognition: A state-of-the-Art Review. IEEE Transactions on Circuits and Systems for Video Technology, 2012, vol. 23, no. 2, pp. 311-325.
4. Law Enforcement's Use of Facial Recognition Technology. Available at: https://www.fbi.gov/news/testimony/law-enforcements-use-of-facial-recognition-technology (accessed 21 July 2021)
5. Sung K.K., Poggio T. Example-Based Learning for View-Based Human Face Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, vol. 20, no. 1, pp. 39-51.
6. Schneiderman H., Kanade T. A Statistical Method for 3D Object Detection Applied to Faces and Cars. Proceedings IEEE Conference on Computer Vision and Pattern Recognition, 2000, no. 1, pp. 746-751.
7. Viola P., Jones M. Robust Real-Time Object Detection. International Journal of Computer Vision, 2001, no. 4, pp. 34-47.
8. Freund Y., Schapire R.E. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. Journal of Computer and System Sciences, 1997, vol. 55, no. 1, pp. 119-139.
9. Papageorgiou C. P., Oren M., Poggio T. A General Framework for Object Detection. Sixth International Conference on Computer Vision, 1998, pp. 555-562.
10. Lienhart R., Maydt J. An Extended Set of Haar-Like Features for Rapid Object Detection. Proceedings of International Conference on Image Processing, 2002, vol. 1, pp. I-I.
11. Huang C.C., Tsai C.Y., Yang H.C. An Extended Set of Haar-Like Features for Bird Detection Based on AdaBoost. International Conference on Signal Processing, Image Processing, and Pattern Recognition, 2011, pp. 160-169.
12. Wen X. et al. A Rapid Learning Algorithm for Vehicle Classification. Information Sciences, 2015, no. 295, pp. 395-406.
13. Gaszczak A., Breckon T. P., Han J. Real-Time People and Vehicle Detection from UAV imagery. Intelligent Robots and Computer Vision XXVIII: Algorithms and Techniques, 2011, no. 7878, pp. 78780B.
14. Li S.Z. et al. Statistical Learning of Multi-View Face Detection. European Conference on Computer Vision, 2002, pp. 67-81.
15. Jones M., Viola P. Fast Multi-View Face Detection. Mitsubishi Electric Research Lab TR-20003-96, 2003, vol. 3, no. 14, pp. 2.
16. Viola P., Jones M. J., Snow D. Detecting Pedestrians Using Patterns of Motion and Appearance. International Journal of Computer Vision, 2005, vol. 63, no. 2, pp. 153-161.
17. Messom C., Barczak A. Fast and Efficient Rotated Haar-Like Features Using Rotated Integral Images. Australian Conference on Robotics and Automation, 2006, pp. 1-6.
18. Ramirez G. A., Fuentes O. Multi-Pose Face Detection with Asymmetric Haar Features. 2008 IEEE Workshop on Applications of Computer Vision, 2008, pp. 1-6.
19. Pavani S. K., Delgado D., Frangi A. F. Haar-Like Features with Optimally Weighted Rectangles for Rapid Object Detection. Pattern Recognition, 2010, vol. 43, no. 1, pp. 160-172.
20. Pham M. T. et al. Fast Polygonal Integration and Its Application in Extending Haar-Like Features to Improve Object Detection. 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010, pp. 942-949.
21. Ojala T., Pietikainen M., Harwood D. A Comparative Study of Texture Measures with Classification Based on Featured Distributions. Pattern Recognition, 1996, vol. 29. no. 1, pp. 51-59.
22. Zhang L. et al. Face Detection Based on Multi-Block Lbp Representation. International Conference on Biometrics, 2007, pp. 11-18.
23. Nikisins O., Greitans M. Local Binary Patterns and Neural Network Based Technique for Robust Face Detection and Localization. 2012 BIOSIG-Proceedings of the International Conference of Biometrics Special Interest Group (BIOSIG), 2012, pp. 1-6.
24. Suri P.K., Verma E.A. Robust Face Detection Using Circular Multi Block Local Binary Pattern and Integral Haar Features. International Journal of Advanced Computer Science and Applications, Special Issue on Artificial Intelligence, 2011, pp. 67-71.
25. Jammoussi A. Y., Masmoudi D. S. Joint Integral Histogram Based Adaboost for Face Detection System. International Journal of Computer Applications, 2011, vol. 23, no. 5.
26. Lowe D. G. Object Recognition from Local Scale-Invariant Features. Proceedings of the Seventh IEEE International Conference on Computer Vision, 1999, no. 2, pp. 1150-1157.
27. Dalal N., Triggs B. Histograms of Oriented Gradients for Human Detection. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005, no. 1, pp. 886-893.
28. Zhang W., Sun J., Tang X. Cat Head Detection-How to Effectively Exploit Shape and Texture Features. European Conference on Computer Vision, 2008, pp. 802-816.
29. Dollar P. et al. Integral Channel Features. Proceedings of the British Machine Vision Conference, 2009, pp. 91.1-91.11.
30. Dollar P., Belongie S., Perona P. The Fastest Pedestrian Detector in the West. Proceedings of the British Machine Vision Conference, 2010, pp. 68.1-68.11.
31. Bourdev L., Brandt J. Robust Object Detection Via Soft Cascade. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 2005, no. 2, pp. 236-243.
32. Xiao R., Zhu L., Zhang H. J. Boosting Chain Learning for Object Detection. Proceedings Ninth IEEE International Conference on Computer Vision, 2003, pp. 709-715.
33. Wu B. et al. Fast Rotation Invariant Multi-View Face Detection Based on Real AdaBoost. Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004, pp. 79-84.
34. Zhang C., Viola P. Multiple-Instance Pruning for Learning Efficient Cascade Detectors. Advances in Neural Information Processing Systems, 2007, no. 20, pp. 1681-1688.
35. Friedman J., Hastie T., Tibshirani R. Additive Logistic Regression: a Statistical View of Boosting (with Discussion and a Rejoinder by the authors). The Annals of Statistics, 2000, vol. 28, no. 2, pp. 337-407.
36. Minkina A. et al. Generalization of the Viola-Jones Method as a Decision Tree of Strong Classifiers for Real-Time Object Recognition in Video Stream. Seventh International Conference on Machine Vision, 2015, no. 9445, pp. 944517. DOI: 10.1117/12.2180941
37. Mason L. et al. Boosting Algorithms as Gradient Descent in Function Space. Advances in Neural Information Processing Systems, 1999, no. 12, pp. 512-518.
38. Friedman J.H. Greedy Function Approximation: a Gradient Boosting Machine. Annals of statistics, 2001, pp. 1189-1232.
39. Schapire R.E., Singer Y. Improved Boosting Algorithms Using Confidence-Rated Predictions. Machine Learning, 1999, vol. 37, no. 3, pp. 297-336.
40. Huang C. et al. Vector Boosting for Rotation Invariant Multi-View Face Detection. Tenth IEEE International Conference on Computer Vision, 2005, no. 1, pp. 446-453.
41. Duan S., Wang X., Wan W. The Logitboost Based on Joint Feature for Face Detection. 2013 Seventh International Conference on Image and Graphics, 2013, pp. 483-488.
42. Gualdi G., Prati A., Cucchiara R. Multi-Stage Sampling with Boosting Cascades for Pedestrian Detection in Images and Videos. European Conference on Computer Vision, 2010, pp. 196-209.
43. Wang L., Zhang Z. Automatic Detection of Wind Turbine Blade Surface Cracks based on UAV-Taken Images. IEEE Transactions on Industrial Electronics, 2017, vol. 64, no. 9, pp. 7293-7303.
44. Poljakov I.V. et al. [Training Optimal Viola-Jones Detectors using Greedy Algorithms for Selecting Control Parameters with Intermediate Validation on Each Level]. Sensory Systems, 2016, vol. 30, no. 3, pp. 241-248. (in Russian)
45. Li H. et al. A Convolutional Neural Network Cascade for Face Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 5325-5334.
46. Owusu E., Abdulai J. D., Zhan Y. Face Detection Based on Multilayer Feed-Forward Neural Network and Haar Features. Software: Practice and Experience, 2019, vol. 49, no. 1, pp. 120-129.
47. Cai Z., Vasconcelos N. Cascade r-cnn: Delving into High Quality Object Detection. tProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 6154-6162.
48. Krizhevsky A., Sutskever I., Hinton G. E. Imagenet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems, 2012, no. 25, pp. 1097-1105.
49. Everingham M. et al. The Pascal Visual Object Classes (VOC) Challenge. tInternational journal of computer vision, 2010, vol. 88, no. 2, pp. 303-338.
50. Girshick R. et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 580-587.
51. Redmon J. et al. You Only Look Once: Unified, Real-Time Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 779-788.
52. Liu W. et al. SSD: Single Shot Multibox Detector. European Conference on Computer Vision, 2016, pp. 21-37.
53. Lin T. Y. et al. Feature Pyramid Networks for Object Detection. Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2117-2125.
54. Jiao L. et al. A Survey of Deep Learning-Based Object Detection. IEEE access, 2019, vol. 7, pp. 128837-128868.
55. Zou Z. et al. Object Detection in 20 years: A Survey. Available at: https://arxiv.org/abs/1905.05055 (accessed 21 July 2021)
56. Granger E. et al. A Comparison of Cnn-Based Face and Head Detectors for Real-Time Video Surveillance Applications. 2017 Seventh International Conference on Image Processing Theory, Tools and Applications, 2017, pp. 1-7.
57. Jain V., Learned-Miller E. Fddb: A Benchmark for Face Detection in Unconstrained Settings. UMass Amherst technical report, 2010, vol. 2, no. 4, pp. 5.
58. Yan J. et al. Real-Time High Performance Deformable Model for Face Detection in the Wild. 2013 International Conference on Biometrics, 2013, pp. 1-6.
59. Felzenszwalb P., McAllester D., Ramanan D. A Discriminatively Trained, Multiscale, Deformable Part Model. 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008, pp. 1-8.
60. Usilin S. A., Slavin O. A., Arlazarov V. V. Memory Consumption and Computation Efficiency Improvements of Viola-Jones Object Detection Method for Remote Sensing Applications. Pattern Recognition and Image Analysis, 2021, vol. 31, no. 3, pp. 571-579. DOI: 10.1007/978-3-030-68821-9_23
61. Xu Y. et al. A hybrid Vehicle Detection Method Based on Viola-Jones and HOG+SVM from UAV Images. tSensors, 2016, vol. 16, no. 8, p. 1325.
62. Irgens P. et al. An Efficient and Cost Effective Fpga Based Implementation of the Viola-Jones Face Detection Algorithm. tHardwareX, 2017, no. 1, pp. 68-75.
63. Skoryukina N., Arlazarov V., Nikolaev D. Fast Method of ID Documents Location and Type Identification for Mobile and Server Application. 2019 International Conference on Document Analysis and Recognition, 2019, pp. 850-857. DOI: 10.1109/ICDAR.2019.00141
64. Usilin S. et al. Visual Appearance Based Document Image Classification. 2010 IEEE International Conference on Image Processing, 2010, pp. 2133-2136.
65. Tropin D. V. et al. Localization of Planar Objects on the Images with Complex Structure of Projective Distortion. Informatsionnye Protsessy, 2019, vol. 19, no. 2, pp. 208-229. (in Russian)
66. Matalov D. P., Usilin S. A., Arlazarov V. V. Modification of the Viola-Jones Approach for the Detection of the Government Seal Stamp of the Russian Federation. Eleventh International Conference on Machine Vision, 2019, vol. 11041, p. 110411Y. DOI: 10.1117/12.2522793
67. Polevoy D. et al. Key Aspects of Document Recognition Using Small Digital Cameras. RFBR Journal, 2016, vol. 4, no. 92, pp. 98-105. doi: 10.22204/2410-4639-2016-092-04-97-108 (in Russian)
68. Limonova E. E. et al. [Recognition System Efficiency Evaluation on VLIW Architecture on the Example of Elbrus Platform]. Programming and Computer Software, 2019, vol. 45, no. 1, pp. 15-21. DOI: 10.1134/S0132347419010047 (in Russian)