Volume 19, no. 1Pages 77 - 82

Peculiarity of Dynamic Memory Allocation Using the Openacc Standard

N.M. Kuzmin, A.V. Khoperskov

The peculiarity of mapping the memory of the central processing unit (CPU) to the memory of the graphics processing unit (GPU) is discussed in the case of its dynamic allocation using OpenACC directives. The problem is that the array is not placed contiguously in memory. Therefore, the program associates all bytes of the computer's RAM between the start and end elements of the dynamic array with bytes of the GPU's memory, regardless of whether all of these bytes are actually occupied by the array elements. This leads to an unjustified and unpredictable increase in the size of the GPU memory allocated to store the dynamic array. Our simulations show that the increase in memory size can reach two orders of magnitude. The source code for dynamic allocation of a contiguous block of memory for two-dimensional arrays in C is given. This approach can be easily generalized to the case of an arbitrary number of dimensions. Testing of the described method showed that the sizes of dynamic arrays in the memory of the central and graphic processors coincide.

Full text

Keywords: dynamic arrays; memory allocation; OpenACC standard; parallel computing.
References: 1. Gervich L.R., Steinberg B.Ya. Automation of the Application of Data Distribution with Overlapping in Distributed Memory. Bulletin of the South Ural State University. Series: Mathematical Modelling, Programming and Computer Software, 2023, vol. 16, no. 1, pp. 59-68. DOI: 10.14529/mmp230105
2. Krasnov M.M., Feodoritova O.B. The Use of Functional Programming Library for Parallel Computing on CUDA. Programming and Computer Software, 2024, vol. 50, no. 1, pp. 11-23. DOI: 10.1134/S0361768824010055
3. Yohei Miki, Toshihiro Hanawa. Unified Schemes for Directive-Based GPU Offloading. IEEE Access, 2024, vol. 12, no. 1, pp. 181644-181665. DOI: 10.1109/ACCESS.2024.3509380
4. OpenACC-Standard.org. The OpenACC Application Programming Interface Version 3.4. Available at: https://www.openacc.org/sites/default/files/inline-images/Specification/OpenACC-3.4.pdf (accessed on 15.01.2026).
5. Subramanian S., Balsara D.S., Bhoriya D. et al. Techniques, Tricks, and Algorithms for Efficient GPU-Based Processing of Higher Order Hyperbolic PDEs. Communications on Applied Mathematics and Computation, 2024, vol. 6, no. 4, p. 2336-2384. DOI: 10.1007/s42967-022-00235-9
6. Tomanovic I., Belosevic S., Milicevic A. et al. CFD Code Parallelization on GPU and the Code Portability. Advanced Theory and Simulations, 2025, vol. 8, no. 3, article ID: 2400629. DOI: 10.1002/adts.202400629
7. Joel E. D., Seyong Lee, Valero-Lara P. et al. Clacc: OpenACC for C/C++ in Clang. The International Journal of High Performance Computing Applications, 2024, vol. 38, no. 5, pp. 427-446. DOI: 10.1177/10943420241261976
8. Krasnov M.M., Feodoritova O.B. Functional Programming Libraries for Graphics Accelerators. Supercomputing Frontiers and Innovations, 2022, vol. 9, no. 4, pp. 28-37. DOI: 10.14529/jsfi220403
9. NVIDIA Corporation. NVIDIA HPC SDK Version 25.5 Documentation. Available at: https://docs.nvidia.com/hpc-sdk/index.html (accessed on 15.01.2026).