A forwarding MPI implementation that can use any other MPI implementation via an MPI ABI

Overview

MPItrampoline

  • MPI wrapper library: GitHub CI
  • MPI trampoline library: GitHub CI
  • MPI integration tests: GitHub CI

MPI is the de-facto standard for inter-node communication on HPC systems, and has been for the past 25 years. While highly successful, MPI is a standard for source code (it defines an API), and is not a standard defining binary compatibility (it does not define an ABI). This means that applications running on HPC systems need to be compiled anew on every system. This is tedious, since the software that is available on every HPC system is slightly different.

This project attempts to remedy this. It defines an ABI for MPI, and provides an MPI implementation based on this ABI. That is, MPItrampoline does not implement any MPI functions itself, it only forwards them to a "real" implementation via this ABI. The advantage is that one can produce "portable" applications that can use any given MPI implementation. For example, this will make it possible to build external packages for Julia via Yggdrasil that run efficiently on almost any HPC system.

A small and simple MPIwrapper library is used to provide this ABI for any given MPI installation. MPIwrapper needs to be compiled for each MPI installation that is to be used with MPItrampoline, but this is quick and easy.

Successfully Tested

  • Debian 11.0 via Docker (MPICH; arm32v5, arm32v7, arm64v8, mips64le, ppc64le, riscv64; C/C++ only)
  • Debian 11.0 via Docker (MPICH; i386, x86-64)
  • macOS laptop (MPICH, OpenMPI; x86-64)
  • macOS via Github Actions (OpenMPI; x86-64)
  • Ubuntu 20.04 via Docker (MPICH; x86-64)
  • Ubuntu 20.04 via Github Actions (MPICH, OpenMPI; x86-64)
  • Blue Waters, HPC system at the NCSA (Cray MPICH; x86-64)
  • Graham, HPC system at Compute Canada (Intel MPI; x86-64)
  • Marconi A3, HPC system at Cineca (Intel MPI; x86-64)
  • Niagara, HPC system at Compute Canada (OpenMPI; x86-64)
  • Summit, HPC system at ORNL (Spectrum MPI; IBM POWER 9)
  • Symmetry, in-house HPC system at the Perimeter Institute (MPICH, OpenMPI; x86-64)

Workflow

Preparing an HPC system

Install MPIwrapper, wrapping the MPI installation you want to use there. You can install MPIwrapper multiple times if you want to wrap more than one MPI implementation.

This is possibly as simple as

cmake -S . -B build -DMPIEXEC_EXECUTABLE=mpiexec -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_INSTALL_PREFIX=$HOME/mpiwrapper
cmake --build build
cmake --install build

but nothing is ever simple on an HPC system. It might be necessary to load certain modules, or to specify more cmake MPI configuration options.

The MPIwrapper libraries remain on the HPC system, they are installed independently of any application.

Building an application

Build your application as usual, using MPItrampline as MPI library.

Running an application

At startup time, MPItrampoline needs to be told which MPIwrapper library to use. This is done via the environment variable MPITRAMPOLINE_LIB. You also need to point MPItrampoline's mpiexec to a respective wrapper created by MPIwrapper, using the environment variable MPITRAMPOLINE_MPIEXEC.

For example:

env MPITRAMPOLINE_MPIEXEC=$HOME/mpiwrapper/bin/mpiwrapper-mpiexec MPITRAMPOLINE_LIB=$HOME/mpiwrapper/lib/libmpiwrapper.so mpiexec -n 4 ./your-application

The mpiexec you run here needs to be the one provided by MPItrampoline.

Current state

MPItrampoline uses the C preprocessor to create wrapper functions for each MPI function. This is how MPI_Send is wrapped:

FUNCTION(int, Send,
         (const void *buf, int count, MT(Datatype) datatype, int dest, int tag,
          MT(Comm) comm),
         (buf, count, (MP(Datatype))datatype, dest, tag, (MP(Comm))comm))

Unfortunately, MPItrampoline does not yet wrap the Fortran API. Your help is welcome.

Certain MPI types, constants, and functions are difficult to wrap. Theoretically, there could be MPI libraries where it is not possible to implement the current MPI ABI. If you encounter this, please let me know -- maybe there is a work-around.

Comments
  • Add support for MPI profiling interface

    Add support for MPI profiling interface

    Each standard MPI function can be called with an MPI_ or PMPI_ prefix (quoting from https://www.open-mpi.org/faq/?category=perftools#PMPI), I think MPItrampoline doesn't currently support the PMPI_ calls.

    opened by ocaisa 6
  • Supporting `MPIX_Query_cuda_support()`

    Supporting `MPIX_Query_cuda_support()`

    While not part of the standard, MPIX_Query_cuda_support() is available in a number of MPI implementations (see https://github.com/pmodels/mpich/pull/4741). It's also being used by a number of applications (and hopefully that will grow, see my issue at https://github.com/lammps/lammps/issues/3140 which links back to the support in GROMACS). This would be a valuable inclusion in MPItrampoline since it would then be able to handle the runtime detection of CUDA support in the MPI implementation.

    opened by ocaisa 6
  • Allow overriding default compilation options

    Allow overriding default compilation options

    Currently default compilation options are set during the build of MPItrampoline, it would probably be useful to be able to fully override these, e.g.,

    exec ${MPITRAMPOLINE_CC:[email protected]_C_COMPILER@} ${CFLAGS:-"@CMAKE_C_FLAGS@"} [email protected]_INSTALL_PREFIX@/@CMAKE_INSTALL_INCLUDEDIR@ @LINK_FLAGS@ [email protected]_INSTALL_PREFIX@/@CMAKE_INSTALL_LIBDIR@ -Wl,-rpath,@CMAKE_INSTALL_PREFIX@/@CMAKE_INSTALL_LIBDIR@ "$@" -lmpi -ldl
    
    opened by ocaisa 5
  • Allow use of a fallback/default value for `MPITRAMPOLINE_MPIEXEC`

    Allow use of a fallback/default value for `MPITRAMPOLINE_MPIEXEC`

    It's possible to configure a default MPI library with -DMPITRAMPOLINE_DEFAULT_LIB=XXX, it would be good to be able to also configure a default mpiexec (-DMPITRAMPOLINE_DEFAULT_MPIEXEC=XXX) so that one can have a fully functional fallback in place at build time.

    opened by ocaisa 3
  • Incomplete installation

    Incomplete installation

    Hello ! I am looking at installing MPItrampoline, and following the steps outlined in the README.md, I cannot access files later referenced.

    For instance, on Summit at ORNL, using the following script:

    INSTALLDIR=$HOME/mpiwrapper
    
    module load cmake/3.18
    module load gcc
    
    cmake -S . -B build -DMPIEXEC_EXECUTABLE=mpiexec \
                                   -DCMAKE_BUILD_TYPE=RelWithDebInfo \
                                   -DCMAKE_INSTALL_PREFIX=$INSTALLDIR
    cmake --build build
    cmake --install build
    

    The following installation is generated:

    $ tree mpiwrapper/
    mpiwrapper/
    |-- bin
    |   |-- mpicc
    |   |-- mpicxx
    |   |-- mpiexec
    |   |-- mpifc
    |   `-- mpifort
    |-- include
    |   |-- mpi.h
    |   |-- mpi.mod
    |   |-- mpi_declarations.h
    |   |-- mpi_declarations_fortran.h
    |   |-- mpi_declarations_fortran90.h
    |   |-- mpi_defaults.h
    |   |-- mpi_f08.mod
    |   |-- mpi_version.h
    |   |-- mpiabi.h
    |   |-- mpiabif.h
    |   |-- mpif.h
    |   `-- mpio.h
    |-- lib
    |   |-- cmake
    |   |   `-- MPItrampoline
    |   |       |-- MPItrampolineConfig.cmake
    |   |       |-- MPItrampolineConfigVersion.cmake
    |   |       |-- MPItrampolineTargets-relwithdebinfo.cmake
    |   |       `-- MPItrampolineTargets.cmake
    |   `-- pkgconfig
    |       `-- MPItrampoline.pc
    `-- lib64
        |-- libmpi.a
        `-- libmpifort.a
    
    7 directories, 24 files
    

    The README.md dictates to use the wrapped libraries, but no shared libraries are available here. I looked through the options of the CMakeLists.txt but the only one defined in the project is the fortran flag.

    Am I missing something ?

    opened by spoutn1k 2
  •  Issues building Global Arrays

    Issues building Global Arrays

    I'm trying to build the Global Arrays library with MPItrampoline and have run into some issues. This package is a dependency for Molpro and several other computational chemistry applications. Any assistance that you can provide to get it working would be greatly appreciated.

    The compilation fails at https://github.com/GlobalArrays/ga/blob/f4016b869dfd1a2b2856f74a73dd9452dbbc8ae4/comex/src-mpi-pr/comex.c#L4968 with the error message:

    libtool: compile:  mpicc -DHAVE_CONFIG_H -I. -I./src-common -I./src-mpi-pr -g -O2 -MT src-mpi-pr/comex.lo -MD -MP -MF src-mpi-pr/.deps/comex.Tpo -c src-mpi-pr/comex.c -o src-mpi-pr/comex.o
    src-mpi-pr/comex.c: In function ‘str_mpi_retval’:
    src-mpi-pr/comex.c:4932:9: error: case label does not reduce to an integer constant
             case MPI_SUCCESS       : msg = "MPI_SUCCESS"; break;
             ^
    

    Steps to reproduce:

    git clone -b develop https://github.com/GlobalArrays/ga
    cd ga
    ./autogen.sh
    ./configure --with-mpi-pr --with-blas=no --with-lapack=no --with-scalapack=no --disable-f77
    make
    

    I've tried both the develop and master branches.

    There are lots of configurations which can be used, we're most interested in the recommended port which uses MPI-1 with progress ranks (--with-mpi-pr) but I tried several other configurations without success.

    I'm not sure whether it's an MPItrampoline issue or whether it's non-standard use of MPI within Global Arrays.

    I've attached the full output from build build-ga.txt

    I've tried on

    • RHEL 7.9 with gcc 4.8.5
    • Ubuntu 22.04 with gcc 11.3.0
    opened by nick-wilson 9
  • Issues building CP2K

    Issues building CP2K

    I thought I would give this a full test with Fortran, and CP2K is a good benchmark for that. The build (v8.2) is failing with:

    /project/60005/easybuild/build/CP2K/8.2/gmtfbf-2021a/cp2k-8.2/exts/dbcsr/src/mpi/dbcsr_mpiwrap.F:1669:21:
    
     1669 |       CALL mpi_bcast(msg, msglen, MPI_LOGICAL, source, gid, ierr)
          |                     1
    ......
     3160 |       CALL mpi_bcast(msg, msglen, ${mpi_type1}$, source, gid, ierr)
          |                     2
    Error: Type mismatch between actual argument at (1) and actual argument at (2) (LOGICAL(4)/COMPLEX(4)).
    
    opened by ocaisa 24
  • Building shared and static libraries at once

    Building shared and static libraries at once

    Currently the default behaviour is to only build static libraries. It might be good build both static and shared libraries since then if libmpi.so is in the default search path it is not selected over the library from MPItrampoline (MPItrampoline would shadow libmpi.so and libmpi.a).

    opened by ocaisa 4
Releases(v5.2.0)
Owner
Erik Schnetter
Erik Schnetter
Pytorch Implementation of Residual Vision Transformers(ResViT)

ResViT Official Pytorch Implementation of Residual Vision Transformers(ResViT) which is described in the following paper: Onat Dalmaz and Mahmut Yurt

ICON Lab 41 Dec 08, 2022
This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

SLATE This is the official source code for SLATE. We provide the code for the model, the training code and a dataset loader for the 3D Shapes dataset.

Gautam Singh 66 Dec 26, 2022
This repository introduces a short project about Transfer Learning for Classification of MRI Images.

Transfer Learning for MRI Images Classification This repository introduces a short project made during my stay at Neuromatch Summer School 2021. This

Oscar Guarnizo 3 Nov 15, 2022
🚗 INGI Dakar 2K21 - Be the first one on the finish line ! 🚗

🚗 INGI Dakar 2K21 - Be the first one on the finish line ! 🚗 This year's first semester Club Info challenge will put you at the head of a car racing

ClubINFO INGI (UCLouvain) 6 Dec 10, 2021
A repository built on the Flow software package to explore cyber-security attacks on intelligent transportation systems.

A repository built on the Flow software package to explore cyber-security attacks on intelligent transportation systems.

George Gunter 4 Nov 14, 2022
Efficient semidefinite bounds for multi-label discrete graphical models.

Low rank solvers #################################### benchmark/ : folder with the random instances used in the paper. ############################

1 Dec 08, 2022
Python implementation of Bayesian optimization over permutation spaces.

Bayesian Optimization over Permutation Spaces This repository contains the source code and the resources related to the paper "Bayesian Optimization o

Aryan Deshwal 9 Dec 23, 2022
Exploring Relational Context for Multi-Task Dense Prediction [ICCV 2021]

Adaptive Task-Relational Context (ATRC) This repository provides source code for the ICCV 2021 paper Exploring Relational Context for Multi-Task Dense

David Brüggemann 35 Dec 05, 2022
Evaluation suite for large-scale language models.

This repo contains code for running the evaluations and reproducing the results from the Jurassic-1 Technical Paper (see blog post), with current support for running the tasks through both the AI21 S

71 Dec 17, 2022
A simple code to perform canny edge contrast detection on images.

CECED-Canny-Edge-Contrast-Enhanced-Detection A simple code to perform canny edge contrast detection on images. A simple code to process images using c

Happy N. Monday 3 Feb 15, 2022
The implementation our EMNLP 2021 paper "Enhanced Language Representation with Label Knowledge for Span Extraction".

LEAR The implementation our EMNLP 2021 paper "Enhanced Language Representation with Label Knowledge for Span Extraction". **The code is in the "master

杨攀 93 Jan 07, 2023
This is the official implementation of the paper "Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation".

[CVPRW 2021] - Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation

Anirudh S Chakravarthy 6 May 03, 2022
Official implementation of Long-Short Transformer in PyTorch.

Long-Short Transformer (Transformer-LS) This repository hosts the code and models for the paper: Long-Short Transformer: Efficient Transformers for La

NVIDIA Corporation 198 Dec 29, 2022
RobustART: Benchmarking Robustness on Architecture Design and Training Techniques

The first comprehensive Robustness investigation benchmark on large-scale dataset ImageNet regarding ARchitecture design and Training techniques towards diverse noises.

132 Dec 23, 2022
A robust pointcloud registration pipeline based on correlation.

PHASER: A Robust and Correspondence-Free Global Pointcloud Registration Ubuntu 18.04+ROS Melodic: Overview Pointcloud registration using correspondenc

ETHZ ASL 101 Dec 01, 2022
Official PyTorch implementation of CAPTRA: CAtegory-level Pose Tracking for Rigid and Articulated Objects from Point Clouds

CAPTRA: CAtegory-level Pose Tracking for Rigid and Articulated Objects from Point Clouds Introduction This is the official PyTorch implementation of o

Yijia Weng 96 Dec 07, 2022
"Neural Turing Machine" in Tensorflow

Neural Turing Machine in Tensorflow Tensorflow implementation of Neural Turing Machine. This implementation uses an LSTM controller. NTM models with m

Taehoon Kim 1k Dec 06, 2022
Train Yolov4 using NBX-Jobs

yolov4-trainer-nbox Train Yolov4 using NBX-Jobs. Use the powerfull functionality available in nbox-SDK repo to train a tiny-Yolo v4 model on Pascal VO

Yash Bonde 1 Jan 12, 2022
SBINN: Systems-biology informed neural network

SBINN: Systems-biology informed neural network The source code for the paper M. Daneker, Z. Zhang, G. E. Karniadakis, & L. Lu. Systems biology: Identi

Lu Group 15 Nov 19, 2022
Distance correlation and related E-statistics in Python

dcor dcor: distance correlation and related E-statistics in Python. E-statistics are functions of distances between statistical observations in metric

Carlos Ramos Carreño 108 Dec 27, 2022