ORB5  v4.9.4
MPI, OpenMP, OpenACC

Description of the parallel programming models in Orb5.

Author
T. Hayward-Schneider
Date
04.2021

MPI

MPI Background

MPI is used in ORB5 for the domain cloning and domain decomposition. Modern Fortran should use the F2008 use mpi_f08 fortran bindings, which allow for extensive type checking at compile time (using GCC >= 10 implies doing this or requiring a workaround). Previously (before 2021), ORB5 used the F90 use mpi fortran bindings, in which most MPI options are of type integer, which limits the extent of compile time checking. Unfortunately, support of the mpi_f08 option is not universal, especially from vendors such as Cray. Therefore, in order to support systems where use mpi_f08 is not available, we implement fallback options for the use mpi interface.

MPI implementation

  • All files which use MPI should contain include "precomp.h".
  • The macro USE_MPI will resolve to use mpi_f08/use mpi depending on the system.
  • Macros for MPI types are defined in precomp.h: MPI_comm_TYPE, MPI_datatype_TYPE, MPI_request_TYPE, MPI_status_TYPE (any additional types should be added following this pattern).
  • Systems which don't support use mpi_f08 should add -DNO_MPI_F08 to the FFLAGS.
  • In a small number of places, #ifdef NO_MPI_F08 ... #else ... #endif is used to define a different codepath for use mpi_f08 and use mpi.
  • Jenkins tests, run at CSCS, test both cases.

OpenMP + OpenACC

OpenMP/OpenACC Background

In ORB5, particle loops were ported to both OpenACC and OpenMP. The code was compiled with at most one of these enabled (threads or GPU). We desire the ability to compile parts of the CPU part of the code with OpenMP support when running the particle loops on GPUs. This was not possible with the original nesting of both OpenACC and OpenMP directives around the particle loops.

OpenMP/OpenACC implementation

  • OpenMP directives which should be ignored when compiling with OpenACC support (e.g. all particle loops) are changed from !$omp to !pomp.
  • All files which use these features should contain include "precomp.h".
  • If we explicitly need the number of OpenMP threads, this can be accessed as parallelp_nthreads (particles) and parallelnthreads (non-particles).
  • This allows OpenMP to be used on the CPU when compiling with GPU support.

OpenMP Offloading

  • In 2023, some effort was made to porting ORB5 to GPUs with OpenMP Offloading.
  • This code is not yet production ready, but will be further developed in 2024:
    • The work is in the OMP5 branch.
    • Currently, one can run at least the cyclone.in and TAE_pb.in test files.
  • At that stage, there will be 2 sets of OpenMP directives in the code, one for CPU and one for GPU.