Volume 18, no. 2Pages 102 - 111

The Impact of Dataset Size on the Reliability of Model Testing and Ranking

A.V. Chuiko, V.V. Arlazarov, S.A. Usilin

Machine learning is widely applied across diverse domains, with research teams continually developing new recognition models that compete on open datasets. In some tasks, accuracy surpasses 99% These minimal differences, combined with the varying size of the benchmark datasets, raise questions about the reliability of model evaluation and ranking. This paper introduces a method for determining the necessary dataset size to ensure robust hypothesis testing for model performance. It also examines the statistical significance of accuracy rankings in recent studies on MNIST, CIFAR-10, and CIFAR-100 datasets.

Full text

Keywords: dataset size; object recognition; statistical significance; model evaluation; recognition quality assessment.
References: 1. Arlazarov V.L, Slavin O.A. Issues of Recognition and Verification of Text Documents. Intelligent Systems AND Technologies, 2023, no. 3, pp. 55-61. DOI: 10.14357/20718632230306
2. Kunina I.A., Sher A.V, Nikolaev D.P. Screen Recapture Detection Based on Color-Texture Analysis of Document Boundary Regions. Computer Optics, 2023, vol. 47, no. 4, pp. 650-657. DOI: 10.18287/2412-6179-CO-1237
3. Gayer A.V. Context-Independent Fast Text Detection Method for Recognizing Phone Numbers. Proceedings of ISA RAS, 2024, vol. 74, no. 3, pp. 39-47. DOI: 10.14357/20790279240305
4. Maksimova T.R., Bulatov K.B. Reducing Errors and Computational Load in Road Scene Text Recognition. Intelligent Systems AND Technologies, 2024, no. 3, pp. 1-15. DOI: 10.14357/20718632240301
5. Deng Li. The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web]. IEEE Signal Processing Magazine, 2012, vol. 29, no. 6, pp. 141-142. DOI: 10.1109/MSP.2012.2211477
6. Krizhevsky A., Hinton G. Learning Multiple Layers of Features From Tiny Images. Toronto, University of Toronto, 2009.
7. Kowsari K., Heidarysafa M., Brown D.E., Meimandi K.J., Barnes L.E. RMDL: Random Multimodel Deep Learning for Classification. Proceedings of the 2nd International Conference on Information System and Data Mining, New York, 2018, pp. 19-28. DOI: 10.1145/3206098.3206111
8. Ciregan D., Meier U., Schmidhuber J. Multi-Column Deep Neural Networks for Image Classification. IEEE Conference on Computer Vision and Pattern Recognition, Providence, 2012, pp. 3642-3649. DOI: 10.1109/CVPR.2012.6248110
9. Romanuke V. Training Data Expansion and Boosting of Convolutional Neural Networks for Reducing the MNIST Dataset Error Rate. Research Bulletin of the National Technical University of Ukraine ``Kyiv Politechnic Institute'', 2016, no. 6, pp. 29-34. DOI: 10.20535/1810-0546.2016.6.84115
10. Gesmundo A., Dean J. An Evolutionary Approach to Dynamic Introduction of Tasks in Large-Scale Multitask Learning Systems. arXiv: Machine Learning, 2022. Avialable at: https://arxiv.org/abs/2205.12755. DOI: 10.48550/arXiv.2205.12755
11. Byerly A., Kalganova T., Dear I. No Routing Needed Between Capsules. Neurocomputing, 2021, vol. 463, pp. 545-553. DOI: 10.1016/j.neucom.2021.08.064
12. Hirata D., Takahashi N. Ensemble Learning in CNN Augmented with Fully Connected Subnetworks. IEICE Transactions on Information and Systems, 2023, vol. 106, no. 7, pp. 1258-1261. DOI: 10.1587/transinf.2022EDL8098
13. Bruno A., Moroni D., Martinelli M. Efficient Adaptive Ensembling for Image Classification. arXiv: Computer Vision and Pattern Recognition, 2022. Avialable at: https://arxiv.org/abs/2206.07394. DOI: /10.48550/arXiv.2206.07394
14. Foret P., Kleiner A., Mobahi H., Neyshabur B. Sharpness-Aware Minimization for Efficiently Improving Generalization. arXiv: Machine Learning, 2020. Avialable at: https://arxiv.org/abs/2010.01412. DOI: 10.48550/arXiv.2010.01412
15. Gehring J., Auli M., Grangier D., Yarats D., Dauphin Y.N. Convolutional Sequence to Sequence Learning. Proceedings of the 34th International Conference on Machine Learning, 2017, pp. 1243-1252. DOI: 10.48550/arXiv.1705.03122
16. Dosovitskiy A. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv: Computer Vision and Pattern Recognition, 2020. Avialable at: https://arxiv.org/abs/2010.11929. DOI: 10.48550/arXiv.2010.11929
17. Oquab M., Darcet T., Moutakanni T., Huy Vo, et al. Dinov2: Learning Robust Visual Features without Supervision. arXiv: Computer Vision and Pattern Recognition, 2023. Avialable at: https://arxiv.org/abs/2304.07193. DOI: 10.48550/arXiv.2304.07193
18. Kabir H.M. Reduction of Class Activation Uncertainty with Background Information. arXiv: Computer Vision and Pattern Recognition, 2023. Avialable at: https://arxiv.org/abs/2305.03238. DOI: 10.48550/arXiv.2305.03238
19. Zhichao Lu, Sreekumar G., Goodman E., Banzhaf W., Deb K., Boddeti V.N. Neural Architecture Transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, vol. 43, no. 9, pp. 2971-2989. DOI: 10.1109/TPAMI.2021.3052758
20. Ridnik T., Sharir G., Ben-Cohen A., Ben-Baruch E., Noy A. Ml-Decoder: Scalable and Versatile Classification Head. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 32-41. DOI: 10.1109/WACV56688.2023.00012
21. Ridnik T., Ben-Baruch E., Noy A., Zelnik-Manor L. Imagenet-21k Pretraining for the Masses. arXiv: Computer Vision and Pattern Recognition, 2021. Avialable at: https://arxiv.org/abs/2104.10972. DOI: 10.48550/arXiv.2104.10972
22. Haiping Wu, Bin Xiao, Codella N., Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang. CVT: Introducing Convolutions to Vision Transformers. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 22-31. DOI: 10.1109/ICCV48922.2021.00009
23. Ching-Hsun Tseng, Liu-Hsueh Cheng, Shin-Jye Lee, Xiaojun Zeng. Perturbed Gradients Updating Within Unit Space for Deep Learning. IEEE International Joint Conference on Neural Networks, 2022, pp. 1-8. DOI: 10.1109/IJCNN55064.2022.9892245