Volume 10, no. 1Pages 113 - 124

Baltic Sea Water Dynamics Model Acceleration

A.P. Bagliy, А.V. Boukhanovsky, B.Ya. Steinberg, R.B. Steinberg
Industrial Baltic sea water dynamics modelling program optimization and parallelization is described. Program is based on solving the system of partial differential equations of shallow water with numerical methods. Mechanical approach to program modernization is demonstrated involving building module dependency graph and rewriting every module in specific order.
To achieve desired speed-up the program is translated into another language and several key optimization methods are used, including parallelization of most time-consuming loop nests. The theory of optimizing and parallelizing program transformations is used to achieve best performance boost with given amount of work. The list of applied program transformations is presented along with achieved speed-up for most time-consuming subroutines. Entire program speed-up results on shared memory computer system are presented.
Full text
program transformation; program optimization; program parallelization.
1. Metcalf M. Fortran Optimization. Academic Press, 1985. 264 p. DOI: 10.1002/spe.4380170208
2. Boukhanovsky А.V., Zhitnikov А.N., Petrosyan S.G., Sloot P. [High-Performance Technologies of Urgent Computing for Flood Hazard Prevention]. Journal of Instrument Engineering, 2011, vol. 54, no. 10, pp. 14-20. (in Russian)
3. Klevannyy K.A., Smirnova E.V. Using of Modeling System Cardinal for Solving Hydraulic Problems. Vestnik Gosudarstvennogo Universiteta Morskogo i Rechnogo Flota Imeni Admirala S.O. Makarova, 2009, no. 1, pp. 152-161. (in Russian)
4. Kovalchuk S.V., Ivanov S.V., Kolykhmatov I.I., Boukhanovsky A.V. [Special Characteristics of High Performance Software for Complex Systems Simulations]. Information and Control Systems, 2008, no. 3, pp. 10-18. (in Russian)
5. Gervich L.R., Kravchenko E.N., Steinberg B.Ya., Yurushkin M.V. Automatic Program Parallelization with Block Data Distribution. Siberian Journal of Numerical Mathematics, 2015, vol. 18, no. 1, pp. 41-53. (in Russian)
6. GCC Compiler Suite (2016). Available at: http://gcc.gnu.org/
7. Muchnik S. Advanced Compiler Design and Implementation. San Francisco, Morgan Kaufmann, 1997.
8. Kowarschik M., Weiss C. An Overview of Cache Optimization Techniques and Cache-Aware Numerical Algorithms. Algorithms for Memory Hierarchies: Advanced Lectures. Berlin, Heidelberg, Springer, 2003, pp. 213-232.
9. Kasperskiy С. Tekhnika Optimizatsii Programm. Effektivnoe Ispol'zovanie Pamyati [Program Optimization Techniques. Efficient Use of Memory]. St. Peterburg, BHV-Petersburg, 2003.
10. Gervich L.R., Steinberg B.Ya., Yurushkin M.V. [Exaflop Systems Programming]. Open Systems. DBMS, 2013, vol. 8, pp. 26-29. (in Russian)
11. Abu-khalil Zh.M., Morylev R.I., Steinberg B.Ya. Parallel Global Alignment Algorithm with the Optimal Use of Memory. Digital Scientific Magazine 'Modern Problems of Science and Education', 2013, no. 1, 6 p. Available at: http://www.science-education.ru/ru/article/view?id=8139
12. Korzh A.A. NPB UA Benchmark Scaling to Thousands of Blue Gene/P Cores Using PGAS-like OpenMP Extension. Numerical Methods and Programming, 2010, vol. 11, pp. 31-41.
13. Likhoded N.A. Generalized Tiling. Doklady of the National Academy of Sciences of Belarus, 2011, vol. 55, no. 1, pp. 16-21. (in Russian)
14. Denning P.J. The Locality Principle. Communications of the Association for Computing Machinery, 2005, vol. 48, no. 7, pp. 19-24. DOI: 10.1145/1070838.1070856
15. Gustavson F.G., Wyrzykowski R. Cache Blocking for Linear Algebra Algorithms. Parallel Processing and Applied Mathematics 2011, Part I, Lecture Notes in Computer Science, 2012, vol. 7203, pp. 122-132. DOI: 10.1007/978-3-642-31464-3_13
16. Lam M.S., Rothberg E.E., Wolf M.E. The Cache Performance and Optimizations of Blocked Algorithms. Proceeding ASPLOS IV Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. Palo Alto, pp. 63-74.
17. Goto K., van de Geijn R.A. Anatomy of High-Performance Matrix Multiplication. ACM Transaction on Mathematical Software, 2008, vol. 34, no. 3, pp. 1-25. DOI: 10.1145/1356052.1356053
18. Mycroft A. Programming Language Design and Analysis Motivated by Hardwere Evolution (Invited Presentation). The 14th International Static Analysis Symposium, 2007, vol. 3634, pp. 18-33. Available at: http://www.cl.cam.ac.uk/am21/papers/sas07final.pdf
19. Galushkin A.I. [The Development Strategy of Modern Supercomputers on the Path to Ekzaflopsnym Computing]. Prilozhenie k zhurnalu 'Informacionnye tehnologii', 2012, no. 2. 32 p. (in Russian)
20. Arykov S.B., Malyshkin V.E. Asynchronous Parallel Programming System 'Aspect'. Numerical Methods and Programming, 2008, vol. 9, no. 1, pp. 48-52. (in Russian)
21. Optimizing Parallelizing System (2016). Available at: www.ops.rsu.ru
22. Ryder B.G. Constructing the Call Graph of a Program. IEEE Transactions on Software Engineering, 1979, vol. SE-5, no. 3, pp. 216-226. DOI: 10.1109/TSE.1979.234183