Load balancing issues uncovered in a particle tracking application
The Centre for Environment, Fisheries and Aquaculture Science (Cefas) is a world leader in marine science and technology, providing innovative solutions for the aquatic environment, biodiversity and food security. The centre uses behaviour and transport models to address a range of marine management questions.
They use an off-line particle tracking model that requires velocity fields from a hydrodynamic model, which is known as the Individual Behaviour Model (IBM) GITM (General Individuals Transport Model) code. It includes physical particle advection and diffusion, and biological development and behaviour. The code is implemented in Fortran 90. It was originally sequential then parallelized with OpenMP.
The objective of this work was to investigate how to improve the parallel performance of the GITM application using OpenMP and to quantify the effort for it to run on distributed memory HPC systems using MPI.
Audit – The GITM audit took two months. A set of performance metrics were used to assess the quality of performance and identify any limiting issues, these metrics relate to computational scaling and load balance. The audit identified that the application could further benefit from using multiple threads. The underlying cause was found to be related to load balance caused by inefficient cache usage and inefficient array alignments for vectorization. As the code was parallelized using OpenMP, the I/O using NetCDF was done sequentially.
Example of load balance test report
Report – Recommendations were made to improve the vectorization and computational performance of the application. Specifically, a change of compiler would assist in vectorization of masked Fortran 90 array operations. Improved array alignment would also enable more efficient vectorization and a reduction in the use of floating point division would enable more efficient computation. A long term recommendation was made for the use of MPI so that the I/O can be performed via the parallel NetCDF Library, rather than sequential access methods, to make it more scalable.
Code improvements – The code improvements are being carried out by the existing development with advice available from NAG when requested.
On-going – Work is currently progressing to improve I/O and vectorize further parts of the code.
Long term recommendation of the use of MPI is being considered so that the parallel NetCDF Library could be used for parallel I/O.
Note: This work was carried out by NAG staff working under the remit of the Performance Optimisation and Productivity Centre of Excellence in Computing Applications (POP)