Python bindings for MPI

Overview

MPI for Python

https://github.com/mpi4py/mpi4py/workflows/ci/badge.svg?branch=master https://dev.azure.com/mpi4py/mpi4py/_apis/build/status/mpi4py.mpi4py?branchName=master https://ci.appveyor.com/api/projects/status/whh5xovp217h0f7n?svg=true https://circleci.com/gh/mpi4py/mpi4py.svg?style=shield https://travis-ci.org/mpi4py/mpi4py.svg?branch=master https://readthedocs.org/projects/mpi4py/badge/?version=latest

Overview

Welcome to MPI for Python. This package provides Python bindings for the Message Passing Interface (MPI) standard. It is implemented on top of the MPI-1/2/3 specification and exposes an API which grounds on the standard MPI-2 C++ bindings.

Dependencies

  • Python 2.7, 3.5 or above, or PyPy 2.0 or above.
  • A functional MPI 1.x/2.x/3.x implementation like MPICH or Open MPI built with shared/dynamic libraries.
  • To work with the in-development version, you need to install Cython.

Testsuite

The testsuite is run periodically on

Comments
  • Importing the C API fails on Windows

    Importing the C API fails on Windows

    I am trying to build a C++ Python extension (with pybind11) that uses MPI and mpi4py on Windows. I am working in a conda environment and I installed mpi4py as follows:

    conda install mpi4py -c conda-forge --yes
    

    The following:

    // import the mpi4py API
    if (import_mpi4py() < 0) {
      throw std::runtime_error("Could not load mpi4py API.");
    }
    

    throws the exception with traceback:

    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "C:\Users\IEUser\VeloxChemMP\build_dbg_14.27\lib\python3.8\site-packages\veloxchem\__init__.py", line 2, in <module>
        from .veloxchemlib import AtomBasis
    ImportError: Could not load mpi4py API.
    

    Running mpiexec -n 4 python -c “from mpi4py import MPI; comm = MPI.COMM_WORLD; print(comm.Get_rank())” works as expected. I am not sure whether this is a bug or some silly mistake I am making.

    This issue was migrated from moved from BitBucket #177

    opened by robertodr 60
  • cuda tests fail when CUDA is available but not configured

    cuda tests fail when CUDA is available but not configured

    I'm testing the build of the new release 3.1.1.

    All tests accessing cuda are failing. This is not entirely surprising in itself. My system has nvidia drivers available and has a switchable nvidia card accessible via bumblebee (primusrun). But I have not specifically configured my system to execute CUDA. So it's not surprising that CUDA_ERROR_NO_DEVICE is found. For me the nvidia card that I have at hand is for experimentation, not for routine operation. The main video card is intel.

    What's the best way to handle this situation? How can a non-CUDA build be enforced when CUDA is otherwise "available".

    An example test log is:

    ERROR: testAllgather (test_cco_buf.TestCCOBufInplaceSelf)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/projects/python/build/mpi4py/test/test_cco_buf.py", line 382, in testAllgather
        buf = array(-1, typecode, (size, count))
      File "/projects/python/build/mpi4py/test/arrayimpl.py", line 459, in __init__
        self.array = numba.cuda.device_array(shape, typecode)
      File "/usr/lib/python3/dist-packages/numba/cuda/cudadrv/devices.py", line 223, in _require_cuda_context
        with _runtime.ensure_context():
      File "/usr/lib/python3.9/contextlib.py", line 117, in __enter__
        return next(self.gen)
      File "/usr/lib/python3/dist-packages/numba/cuda/cudadrv/devices.py", line 121, in ensure_context
        with driver.get_active_context():
      File "/usr/lib/python3/dist-packages/numba/cuda/cudadrv/driver.py", line 393, in __enter__
        driver.cuCtxGetCurrent(byref(hctx))
      File "/usr/lib/python3/dist-packages/numba/cuda/cudadrv/driver.py", line 280, in __getattr__
        self.initialize()
      File "/usr/lib/python3/dist-packages/numba/cuda/cudadrv/driver.py", line 240, in initialize
        raise CudaSupportError("Error at driver init: \n%s:" % e)
    numba.cuda.cudadrv.error.CudaSupportError: Error at driver init:
    [100] Call to cuInit results in CUDA_ERROR_NO_DEVICE:
    -------------------- >> begin captured logging << --------------------
    numba.cuda.cudadrv.driver: INFO: init
    numba.cuda.cudadrv.driver: DEBUG: call driver api: cuInit
    numba.cuda.cudadrv.driver: ERROR: Call to cuInit results in CUDA_ERROR_NO_DEVICE
    --------------------- >> end captured logging << ---------------------
    
    opened by drew-parsons 43
  • test_io.TestIOSelf failures on Fedora Rawhide i686

    test_io.TestIOSelf failures on Fedora Rawhide i686

    Seeing this on Fedora Rawhide i686:

    + mpiexec -np 1 python3 test/runtests.py -v --no-builddir
    /builddir/build/BUILD/mpi4py-3.1.1/test/runtests.py:76: DeprecationWarning: The distutils package is deprecated and slated for removal in Python 3.12. Use setuptools or check PEP 632 for potential alternatives
      from distutils.util import get_platform
    [[email protected]] Python 3.10 (/usr/bin/python3)
    [[email protected]] MPI 3.1 (Open MPI 4.1.1)
    [[email protected]] mpi4py 3.1.1 (/builddir/build/BUILDROOT/mpi4py-3.1.1-1.fc36.i386/usr/lib/python3.10/site-packages/openmpi/mpi4py)
    --------------------------------------------------------------------------
    The OSC pt2pt component does not support MPI_THREAD_MULTIPLE in this release.
    Workarounds are to run on a single node, or to use a system with an RDMA
    capable network such as Infiniband.
    --------------------------------------------------------------------------
    ...
    ======================================================================
    ERROR: testIReadIWrite (test_io.TestIOSelf)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 124, in testIReadIWrite
        fh.Set_view(0, etype)
      File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
    mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
    ======================================================================
    ERROR: testIReadIWriteAll (test_io.TestIOSelf)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 302, in testIReadIWriteAll
        fh.Set_view(0, etype)
      File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
    mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
    ======================================================================
    ERROR: testIReadIWriteAt (test_io.TestIOSelf)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 75, in testIReadIWriteAt
        fh.Set_view(0, etype)
      File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
    mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
    ======================================================================
    ERROR: testIReadIWriteAtAll (test_io.TestIOSelf)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 227, in testIReadIWriteAtAll
        fh.Set_view(0, etype)
      File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
    mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
    ======================================================================
    ERROR: testIReadIWriteShared (test_io.TestIOSelf)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 176, in testIReadIWriteShared
        fh.Set_view(0, etype)
      File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
    mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
    ======================================================================
    ERROR: testReadWrite (test_io.TestIOSelf)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 97, in testReadWrite
        fh.Set_view(0, etype)
      File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
    mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
    ======================================================================
    ERROR: testReadWriteAll (test_io.TestIOSelf)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 276, in testReadWriteAll
        fh.Set_view(0, etype)
      File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
    mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
    ======================================================================
    ERROR: testReadWriteAllBeginEnd (test_io.TestIOSelf)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 329, in testReadWriteAllBeginEnd
        fh.Set_view(0, etype)
      File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
    mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
    ======================================================================
    ERROR: testReadWriteAt (test_io.TestIOSelf)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 53, in testReadWriteAt
        fh.Set_view(0, etype)
      File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
    mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
    ======================================================================
    ERROR: testReadWriteAtAll (test_io.TestIOSelf)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 203, in testReadWriteAtAll
        fh.Set_view(0, etype)
      File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
    mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
    ======================================================================
    ERROR: testReadWriteAtAllBeginEnd (test_io.TestIOSelf)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 252, in testReadWriteAtAllBeginEnd
        fh.Set_view(0, etype)
      File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
    mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
    ======================================================================
    ERROR: testReadWriteOrdered (test_io.TestIOSelf)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 355, in testReadWriteOrdered
        fh.Set_view(0, etype)
      File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
    mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
    ======================================================================
    ERROR: testReadWriteOrderedBeginEnd (test_io.TestIOSelf)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 379, in testReadWriteOrderedBeginEnd
        fh.Set_view(0, etype)
      File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
    mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
    ======================================================================
    ERROR: testReadWriteShared (test_io.TestIOSelf)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 151, in testReadWriteShared
        fh.Set_view(0, etype)
      File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
    mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
    ======================================================================
    ERROR: testIReadIWrite (test_io.TestIOWorld)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 124, in testIReadIWrite
        fh.Set_view(0, etype)
      File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
    mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
    ======================================================================
    ERROR: testIReadIWriteAll (test_io.TestIOWorld)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 302, in testIReadIWriteAll
        fh.Set_view(0, etype)
      File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
    mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
    ======================================================================
    ERROR: testIReadIWriteAt (test_io.TestIOWorld)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 75, in testIReadIWriteAt
        fh.Set_view(0, etype)
      File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
    mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
    ======================================================================
    ERROR: testIReadIWriteAtAll (test_io.TestIOWorld)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 227, in testIReadIWriteAtAll
        fh.Set_view(0, etype)
      File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
    mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
    ======================================================================
    ERROR: testIReadIWriteShared (test_io.TestIOWorld)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 176, in testIReadIWriteShared
        fh.Set_view(0, etype)
      File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
    mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
    ======================================================================
    ERROR: testReadWrite (test_io.TestIOWorld)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 97, in testReadWrite
        fh.Set_view(0, etype)
      File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
    mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
    ======================================================================
    ERROR: testReadWriteAll (test_io.TestIOWorld)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 276, in testReadWriteAll
        fh.Set_view(0, etype)
      File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
    mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
    ======================================================================
    ERROR: testReadWriteAllBeginEnd (test_io.TestIOWorld)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 329, in testReadWriteAllBeginEnd
        fh.Set_view(0, etype)
      File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
    mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
    ======================================================================
    ERROR: testReadWriteAt (test_io.TestIOWorld)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 53, in testReadWriteAt
        fh.Set_view(0, etype)
      File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
    mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
    ======================================================================
    ERROR: testReadWriteAtAll (test_io.TestIOWorld)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 203, in testReadWriteAtAll
        fh.Set_view(0, etype)
      File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
    mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
    ======================================================================
    ERROR: testReadWriteAtAllBeginEnd (test_io.TestIOWorld)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 252, in testReadWriteAtAllBeginEnd
        fh.Set_view(0, etype)
      File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
    mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
    ======================================================================
    ERROR: testReadWriteOrdered (test_io.TestIOWorld)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 355, in testReadWriteOrdered
        fh.Set_view(0, etype)
      File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
    mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
    ======================================================================
    ERROR: testReadWriteOrderedBeginEnd (test_io.TestIOWorld)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 379, in testReadWriteOrderedBeginEnd
        fh.Set_view(0, etype)
      File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
    mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
    ======================================================================
    ERROR: testReadWriteShared (test_io.TestIOWorld)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 151, in testReadWriteShared
        fh.Set_view(0, etype)
      File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
    mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
    ======================================================================
    FAIL: testStruct4 (test_util_dtlib.TestUtilDTLib) (typecode='q')
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_util_dtlib.py", line 165, in testStruct4
        self.assertEqual(mt.extent, n*ex1)
    AssertionError: 20 != 24
    ======================================================================
    FAIL: testStruct4 (test_util_dtlib.TestUtilDTLib) (typecode='Q')
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_util_dtlib.py", line 165, in testStruct4
        self.assertEqual(mt.extent, n*ex1)
    AssertionError: 20 != 24
    ======================================================================
    FAIL: testStruct4 (test_util_dtlib.TestUtilDTLib) (typecode='d')
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_util_dtlib.py", line 165, in testStruct4
        self.assertEqual(mt.extent, n*ex1)
    AssertionError: 20 != 24
    ======================================================================
    FAIL: testStruct4 (test_util_dtlib.TestUtilDTLib) (typecode='g')
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_util_dtlib.py", line 165, in testStruct4
        self.assertEqual(mt.extent, n*ex1)
    AssertionError: 28 != 36
    ======================================================================
    FAIL: testStruct4 (test_util_dtlib.TestUtilDTLib) (typecode='D')
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_util_dtlib.py", line 165, in testStruct4
        self.assertEqual(mt.extent, n*ex1)
    AssertionError: 36 != 40
    ======================================================================
    FAIL: testStruct4 (test_util_dtlib.TestUtilDTLib) (typecode='G')
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_util_dtlib.py", line 165, in testStruct4
        self.assertEqual(mt.extent, n*ex1)
    AssertionError: 52 != 60
    ======================================================================
    FAIL: testStruct4 (test_util_dtlib.TestUtilDTLib) (typecode='i8')
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_util_dtlib.py", line 165, in testStruct4
        self.assertEqual(mt.extent, n*ex1)
    AssertionError: 20 != 24
    ======================================================================
    FAIL: testStruct4 (test_util_dtlib.TestUtilDTLib) (typecode='u8')
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_util_dtlib.py", line 165, in testStruct4
        self.assertEqual(mt.extent, n*ex1)
    AssertionError: 20 != 24
    ======================================================================
    FAIL: testStruct4 (test_util_dtlib.TestUtilDTLib) (typecode='f8')
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_util_dtlib.py", line 165, in testStruct4
        self.assertEqual(mt.extent, n*ex1)
    AssertionError: 20 != 24
    ----------------------------------------------------------------------
    Ran 1385 tests in 41.098s
    FAILED (failures=9, errors=28, skipped=191)
    
    opened by opoplawski 31
  • Warn users if mpi4py running under different MPI implementation

    Warn users if mpi4py running under different MPI implementation

    It does happen to unexperienced users that they run pacakges using mpi4py under an MPI implementation different from the one that mpi4y was built with.

    MPI.jl, the analogue of mpi4py in Julia land, tries to warn users if this happens by detecting at MPI_INIT time if MPI_COMM_WORLD reports only 1 rank but the known environment variables (MPI_LOCALNRANKS, OMPI_COMM_WORLD_SIZE) suggest that there should be more than 1 rank. This happens exactly when mpi4py would be built with mpich but run under open-mpi.

    If this condition is detected it prints a warning.

    What do you think about doing this? Is it something that you would merge into mpi4py?

    opened by PhilipVinc 19
  • [WIP] Add auto-generated

    [WIP] Add auto-generated "API Reference" to the RTD docs for future cross referencing

    This is a pure exploration and I have no idea if I have time to complete it. Just for fun...

    Notes to self:

    • I temporarily changed the theme to pydata's so that I can make a direct comparison with NumPy & CuPy's websites. I will revert it back if I ever have the chance to finish.
    • _templates/autosummary/class.rst is copied from CuPy
    • Need to compare with the old API ref https://mpi4py.github.io/apiref/index.html
    • I am still confused by how the new website http://mpi4py.readthedocs.org/ is generated. My local test via running make html in docs/source/usrman/ would lead to the old site https://mpi4py.github.io/ instead...
    opened by leofang 15
  • mpi4py error during getting results (in pare with SLURM)

    mpi4py error during getting results (in pare with SLURM)

    ERROR: Traceback (most recent call last): File "/opt/software/anaconda/3/lib/python3.6/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/opt/software/anaconda/3/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/vasko/.local/lib/python3.6/site-packages/mpi4py/futures/main.py", line 72, in main() File "/home/vasko/.local/lib/python3.6/site-packages/mpi4py/futures/main.py", line 60, in main run_command_line() File "/home/vasko/.local/lib/python3.6/site-packages/mpi4py/run.py", line 47, in run_command_line run_path(sys.argv[0], run_name='main') File "/opt/software/anaconda/3/lib/python3.6/runpy.py", line 263, in run_path pkg_name=pkg_name, script_name=fname) File "/opt/software/anaconda/3/lib/python3.6/runpy.py", line 96, in _run_module_code mod_name, mod_spec, pkg_name, script_name) File "/opt/software/anaconda/3/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "cali_send_2.py", line 137, in globals()[sys.argv[1]](sys.argv[2], sys.argv[3]) File "cali_send_2.py", line 94, in solve_on_cali sols = list(executor.map(solve_matrix, repeat(inputs), range(len(wls)), wls)) File "/home/vasko/.local/lib/python3.6/site-packages/mpi4py/futures/pool.py", line 207, in result_iterator yield futures.pop().result() File "/opt/software/anaconda/3/lib/python3.6/concurrent/futures/_base.py", line 432, in result return self.__get_result() File "/opt/software/anaconda/3/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result raise self._exception UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc1 in position 5: invalid start byte

    ENV CentOS release 6.5 (Final) Python 3.6 anaconda mpiexec (OpenRTE) 1.8.2 mpi4py 3.0.3

    Piece of Code:

    inputs = [der_mats, ref_ind_yee_grid, n_xy_sq, param_sweep_on, i_m, inv_eps, sol_params]
    with MPIPoolExecutor(max_workers=int(nodes)) as executor:
       sols = list(executor.map(solve_matrix, repeat(inputs), range(len(wls)), wls))
       executor.shutdown(wait=True)  # wait for all complete
       zipobj = ZipFile(zp_fl_nm, 'w')
    
       for sol in sols:
          w, v, solnum, vq = sol
          print(w[0], solnum) # this line will shows if data have duplicates.
          w.tofile(f"w_sol_{solnum}.npy")
          v.tofile(f"v_sol_{solnum}.npy")
          vq.tofile(f"vq_sol_{solnum}.npy")
          zipobj.write(f"w_sol_{solnum}.npy")
          zipobj.write(f"v_sol_{solnum}.npy")
          zipobj.write(f"vq_sol_{solnum}.npy")
          os.remove(f"w_sol_{solnum}.npy")
          os.remove(f"v_sol_{solnum}.npy")
          os.remove(f"vq_sol_{solnum}.npy")
    

    Call of method I do with sending command like this: f'srun --mpi=pmi2 -n ${{SLURM_NTASKS}} python -m mpi4py.futures cali_send_2.py solve_on_cali \"\"{name}\"\" {num_nodes}'

    Sometimes this error not appear if I use another range for wls with (wls = np.arange(0.4e-6, 1.8e-6, 0.01e-6)) it crush with this error or return duplicates of some solutions if step 0.1e-6. If I use this range (wls = np.arange(0.55e-6, 1.55e-6, 0.01e-6)) with any step 0.1e-6 or 0.001e-6 it's NOT crush with mentioned error and returns good results without duplicates.

    Could someone please explain me what is the origin of this error? My suspicion is pointing on float numbers like 1.699999999999999999999e-6

    opened by byquip 15
  • Add GPU tests for DLPack support

    Add GPU tests for DLPack support

    DO NOT MERGE. To be continued tomorrow...

    With Open MPI 4.1.1, I am seeing MPI_ERR_TRUNCATE in testAllgatherv3, so I skip them for now:

    $ mpirun --mca opal_cuda_support 1 -n 1 python test/runtests.py --no-numba --cupy -e "test_cco_nb_vec" -e "test_cco_vec"
    

    Tomorrow I will continue investigating the only errors in test_msgspec.py.

    opened by leofang 14
  • mpi4py.futures.MPIPoolExecutor hangs at comm.Disconnect() inside client_close(comm)

    mpi4py.futures.MPIPoolExecutor hangs at comm.Disconnect() inside client_close(comm)

    Background information

    mpi4py 3.0.3 PyPy 7.3.3 (Python 2.7.18) Intel MPI Version 2019 Update 8 Build 20200624 GNU/Linux 3.10.0-1160.11.1.el7.x86_64 slurm 20.02.5

    Details of the problem

    Sample code

    from mpi4py import MPI
    import mpi4py.futures as mp
    import sys
    
    def write(x):
        sys.stdout.write(x)
        sys.stdout.flush()
    
    def fun(args):
        rank = MPI.COMM_WORLD.Get_rank()
        return rank
        
    if __name__ == '__main__':
        size = MPI.COMM_WORLD.Get_size()
        rank = MPI.COMM_WORLD.Get_rank()
        for i in range(size):
            MPI.COMM_WORLD.Barrier()
            if rank == i:
                write('Parent %d\n' % rank)
                with mp.MPIPoolExecutor(2) as pool:
                    write(', '.join(map(str, pool.map(fun, range(2)))) + '\n')
    
    shell$ srun -N 16 --ntasks-per-node 2 --pty /bin/bash -l
    shell$ mpirun -ppn 1 ./test_mpi.py
    Parent 0
    0, 1
    

    It hangs here. When I traced the function calls, I found the problem at futures/_lib.py -> def client_close(comm): -> comm.Disconnect() Similarly, when the with statement is replaced with an assignment statement and pool.shutdown() is called after the for loop, all the processes print their messages correctly but the program still hangs at comm.Disconnect(). A similar C code version of this code does not produce this problem.

    opened by KyuzoR 14
  • 3.1.3: pytest is failing

    3.1.3: pytest is failing

    I'm trying to package your module as an rpm package. So I'm using the typical PEP517 based build, install and test cycle used on building packages from non-root account.

    • python3 -sBm build -w --no-isolation
    • because I'm calling build with --no-isolation I'm using during all processes only locally installed modules
    • install .whl file in </install/prefix>
    • run pytest with PYTHONPATH pointing to sitearch and sitelib inside </install/prefix>

    Here is pytest output:

    + PYTHONPATH=/home/tkloczko/rpmbuild/BUILDROOT/python-mpi4py-3.1.3-2.fc35.x86_64/usr/lib64/python3.8/site-packages:/home/tkloczko/rpmbuild/BUILDROOT/python-mpi4py-3.1.3-2.fc35.x86_64/usr/lib/python3.8/site-packages
    + /usr/bin/pytest -ra
    =========================================================================== test session starts ============================================================================
    platform linux -- Python 3.8.13, pytest-7.1.2, pluggy-1.0.0
    rootdir: /home/tkloczko/rpmbuild/BUILD/mpi4py-3.1.3, configfile: setup.cfg, testpaths: test
    collected 1402 items
    
    test/test_address.py .....                                                                                                                                           [  0%]
    test/test_attributes.py ........................................ssssssss                                                                                             [  3%]
    test/test_cco_buf.py ........................................................................                                                                        [  8%]
    test/test_cco_nb_buf.py ......................................................................                                                                       [ 13%]
    test/test_cco_nb_vec.py ..........................................................                                                                                   [ 18%]
    test/test_cco_ngh_buf.py ................                                                                                                                            [ 19%]
    test/test_cco_ngh_obj.py ........                                                                                                                                    [ 19%]
    test/test_cco_obj.py ........................................                                                                                                        [ 22%]
    test/test_cco_obj_inter.py ssssssssssssssssssssssss                                                                                                                  [ 24%]
    test/test_cco_vec.py ..............................................................                                                                                  [ 28%]
    test/test_cffi.py ss                                                                                                                                                 [ 28%]
    test/test_comm.py ...................................................................                                                                                [ 33%]
    test/test_comm_inter.py ssssssssssss                                                                                                                                 [ 34%]
    test/test_comm_topo.py ............................                                                                                                                  [ 36%]
    test/test_ctypes.py ..                                                                                                                                               [ 36%]
    test/test_datatype.py .........................                                                                                                                      [ 38%]
    test/test_dl.py ....                                                                                                                                                 [ 38%]
    test/test_doc.py .                                                                                                                                                   [ 38%]
    test/test_dynproc.py sss.                                                                                                                                            [ 39%]
    test/test_environ.py ..............                                                                                                                                  [ 40%]
    test/test_errhandler.py .....s                                                                                                                                       [ 40%]
    test/test_errorcode.py .....                                                                                                                                         [ 40%]
    test/test_exceptions.py ....................................sssss...                                                                                                 [ 44%]
    test/test_file.py ..............                                                                                                                                     [ 45%]
    test/test_fortran.py ...........                                                                                                                                     [ 45%]
    test/test_grequest.py ...                                                                                                                                            [ 46%]
    test/test_group.py ................................................                                                                                                  [ 49%]
    test/test_info.py ..........                                                                                                                                         [ 50%]
    test/test_io.py ............................                                                                                                                         [ 52%]
    test/test_memory.py .............                                                                                                                                    [ 53%]
    test/test_mpimem.py ..                                                                                                                                               [ 53%]
    test/test_msgspec.py ..s....ss............FF.........ssssss......ssssssssssssssssss......................................................ssssssssssss......s.s...... [ 63%]
    s..ss                                                                                                                                                                [ 63%]
    test/test_msgzero.py ...s...s                                                                                                                                        [ 64%]
    test/test_objmodel.py .........                                                                                                                                      [ 64%]
    test/test_op.py .........                                                                                                                                            [ 65%]
    test/test_p2p_buf.py ....................................                                                                                                            [ 68%]
    test/test_p2p_buf_matched.py ..........                                                                                                                              [ 68%]
    test/test_p2p_obj.py ................................................................................                                                                [ 74%]
    test/test_p2p_obj_matched.py ..........                                                                                                                              [ 75%]
    test/test_pack.py ......                                                                                                                                             [ 75%]
    test/test_pickle.py ..s...s                                                                                                                                          [ 76%]
    test/test_rc.py ...                                                                                                                                                  [ 76%]
    test/test_request.py .........                                                                                                                                       [ 77%]
    test/test_rma.py ssssssssssssssssssssssssssssssssssss                                                                                                                [ 79%]
    test/test_rma_nb.py ssssssssssssss                                                                                                                                   [ 80%]
    test/test_spawn.py ssssssssssssssssssssssssssssssssssssssss                                                                                                          [ 83%]
    test/test_status.py .........                                                                                                                                        [ 84%]
    test/test_subclass.py ..............................ss..                                                                                                             [ 86%]
    test/test_threads.py ...                                                                                                                                             [ 86%]
    test/test_util_dtlib.py ...................                                                                                                                          [ 88%]
    test/test_util_pkl5.py ..........................................................................................                                                    [ 94%]
    test/test_win.py ssssssssssssssssss......................................ssssssssssssssssssss                                                                        [100%]
    
    ================================================================================= FAILURES =================================================================================
    _________________________________________________________________ TestMessageSimpleNumPy.testNotContiguous _________________________________________________________________
    
    >   ???
    E   ValueError: ndarray is not contiguous
    
    mpi4py/MPI/asbuffer.pxi:140: ValueError
    
    During handling of the above exception, another exception occurred:
    
    self = <test_msgspec.TestMessageSimpleNumPy testMethod=testNotContiguous>
    
        def testNotContiguous(self):
            sbuf = numpy.ones([3,2])[:,0]
            rbuf = numpy.zeros([3])
            sbuf.flags.writeable = False
    >       self.assertRaises((BufferError, ValueError),
                              Sendrecv, sbuf, rbuf)
    
    test/test_msgspec.py:457:
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    test/test_msgspec.py:158: in Sendrecv
        MPI.COMM_SELF.Sendrecv(sendbuf=smsg, dest=0,   sendtag=0,
    mpi4py/MPI/Comm.pyx:327: in mpi4py.MPI.Comm.Sendrecv
        ???
    mpi4py/MPI/msgbuffer.pxi:455: in mpi4py.MPI.message_p2p_send
        ???
    mpi4py/MPI/msgbuffer.pxi:438: in mpi4py.MPI._p_msg_p2p.for_send
        ???
    mpi4py/MPI/msgbuffer.pxi:203: in mpi4py.MPI.message_simple
        ???
    mpi4py/MPI/msgbuffer.pxi:138: in mpi4py.MPI.message_basic
        ???
    mpi4py/MPI/asbuffer.pxi:365: in mpi4py.MPI.getbuffer
        ???
    mpi4py/MPI/asbuffer.pxi:144: in mpi4py.MPI.PyMPI_GetBuffer
        ???
    mpi4py/MPI/commimpl.pxi:142: in mpi4py.MPI.PyMPI_Lock
        ???
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    
    >   ???
    E   TypeError: NumPy currently only supports dlpack for writeable arrays
    
    mpi4py/MPI/asdlpack.pxi:193: TypeError
    _________________________________________________________________ TestMessageSimpleNumPy.testNotWriteable __________________________________________________________________
    
    >   ???
    E   ValueError: buffer source array is read-only
    
    mpi4py/MPI/asbuffer.pxi:140: ValueError
    
    During handling of the above exception, another exception occurred:
    
    self = <test_msgspec.TestMessageSimpleNumPy testMethod=testNotWriteable>
    
        def testNotWriteable(self):
            sbuf = numpy.ones([3])
            rbuf = numpy.zeros([3])
            rbuf.flags.writeable = False
    >       self.assertRaises((BufferError, ValueError),
                              Sendrecv, sbuf, rbuf)
    
    test/test_msgspec.py:450:
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    test/test_msgspec.py:158: in Sendrecv
        MPI.COMM_SELF.Sendrecv(sendbuf=smsg, dest=0,   sendtag=0,
    mpi4py/MPI/Comm.pyx:328: in mpi4py.MPI.Comm.Sendrecv
        ???
    mpi4py/MPI/msgbuffer.pxi:460: in mpi4py.MPI.message_p2p_recv
        ???
    mpi4py/MPI/msgbuffer.pxi:446: in mpi4py.MPI._p_msg_p2p.for_recv
        ???
    mpi4py/MPI/msgbuffer.pxi:203: in mpi4py.MPI.message_simple
        ???
    mpi4py/MPI/msgbuffer.pxi:138: in mpi4py.MPI.message_basic
        ???
    mpi4py/MPI/asbuffer.pxi:365: in mpi4py.MPI.getbuffer
        ???
    mpi4py/MPI/asbuffer.pxi:144: in mpi4py.MPI.PyMPI_GetBuffer
        ???
    mpi4py/MPI/commimpl.pxi:142: in mpi4py.MPI.PyMPI_Lock
        ???
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    
    >   ???
    E   TypeError: NumPy currently only supports dlpack for writeable arrays
    
    mpi4py/MPI/asdlpack.pxi:193: TypeError
    ========================================================================= short test summary info ==========================================================================
    SKIPPED [1] test/test_attributes.py:20: mpi-win-attr
    SKIPPED [1] test/test_attributes.py:190: mpi-win-attr
    SKIPPED [1] test/test_attributes.py:42: mpi-win-attr
    SKIPPED [1] test/test_attributes.py:45: mpi-win-attr
    SKIPPED [1] test/test_attributes.py:48: mpi-win-attr
    SKIPPED [1] test/test_attributes.py:71: mpi-win-attr
    SKIPPED [1] test/test_attributes.py:100: mpi-win-attr
    SKIPPED [1] test/test_attributes.py:96: mpi-win-attr
    SKIPPED [3] test/test_cco_obj_inter.py:115: mpi-world-size<2
    SKIPPED [3] test/test_cco_obj_inter.py:158: mpi-world-size<2
    SKIPPED [3] test/test_cco_obj_inter.py:124: mpi-world-size<2
    SKIPPED [3] test/test_cco_obj_inter.py:55: mpi-world-size<2
    SKIPPED [3] test/test_cco_obj_inter.py:59: mpi-world-size<2
    SKIPPED [3] test/test_cco_obj_inter.py:77: mpi-world-size<2
    SKIPPED [3] test/test_cco_obj_inter.py:132: mpi-world-size<2
    SKIPPED [3] test/test_cco_obj_inter.py:96: mpi-world-size<2
    SKIPPED [1] test/test_cffi.py:49: cffi
    SKIPPED [1] test/test_cffi.py:70: cffi
    SKIPPED [3] test/test_comm_inter.py:34: mpi-world-size<2
    SKIPPED [3] test/test_comm_inter.py:41: mpi-world-size<2
    SKIPPED [3] test/test_comm_inter.py:57: mpi-world-size<2
    SKIPPED [3] test/test_comm_inter.py:50: mpi-world-size<2
    SKIPPED [1] test/test_dynproc.py:63: mpi-world-size<2
    SKIPPED [1] test/test_dynproc.py:112: mpi-world-size<2
    SKIPPED [1] test/test_dynproc.py:161: mpi-world-size<2
    SKIPPED [1] test/test_errhandler.py:45: mpi-win
    SKIPPED [1] test/test_exceptions.py:314: mpi-win
    SKIPPED [1] test/test_exceptions.py:301: mpi-win
    SKIPPED [1] test/test_exceptions.py:305: mpi-win
    SKIPPED [1] test/test_exceptions.py:309: mpi-win
    SKIPPED [1] test/test_exceptions.py:331: mpi-win
    SKIPPED [1] test/test_msgspec.py:239: python3
    SKIPPED [1] test/test_msgspec.py:231: python3
    SKIPPED [1] test/test_msgspec.py:263: mpi-world-size<2
    SKIPPED [2] test/test_msgspec.py:393: cupy
    SKIPPED [2] test/test_msgspec.py:396: cupy
    SKIPPED [2] test/test_msgspec.py:399: cupy
    SKIPPED [2] test/test_msgspec.py:402: cupy
    SKIPPED [2] test/test_msgspec.py:405: cupy
    SKIPPED [2] test/test_msgspec.py:408: cupy
    SKIPPED [1] test/test_msgspec.py:506: cupy
    SKIPPED [1] test/test_msgspec.py:493: cupy
    SKIPPED [1] test/test_msgspec.py:499: cupy
    SKIPPED [1] test/test_msgspec.py:393: numba
    SKIPPED [1] test/test_msgspec.py:396: numba
    SKIPPED [1] test/test_msgspec.py:399: numba
    SKIPPED [1] test/test_msgspec.py:402: numba
    SKIPPED [1] test/test_msgspec.py:405: numba
    SKIPPED [1] test/test_msgspec.py:408: numba
    SKIPPED [1] test/test_msgspec.py:550: numba
    SKIPPED [1] test/test_msgspec.py:524: numba
    SKIPPED [1] test/test_msgspec.py:537: numba
    SKIPPED [1] test/test_msgspec.py:1040: cupy
    SKIPPED [1] test/test_msgspec.py:1043: cupy
    SKIPPED [1] test/test_msgspec.py:1046: cupy
    SKIPPED [1] test/test_msgspec.py:1049: cupy
    SKIPPED [1] test/test_msgspec.py:1052: cupy
    SKIPPED [1] test/test_msgspec.py:1055: cupy
    SKIPPED [1] test/test_msgspec.py:1040: numba
    SKIPPED [1] test/test_msgspec.py:1043: numba
    SKIPPED [1] test/test_msgspec.py:1046: numba
    SKIPPED [1] test/test_msgspec.py:1049: numba
    SKIPPED [1] test/test_msgspec.py:1052: numba
    SKIPPED [1] test/test_msgspec.py:1055: numba
    SKIPPED [1] test/test_msgspec.py:1199: cupy
    SKIPPED [1] test/test_msgspec.py:1208: numba
    SKIPPED [1] test/test_msgspec.py:1332: cupy
    SKIPPED [1] test/test_msgspec.py:1341: numba
    SKIPPED [1] test/test_msgspec.py:1300: python3
    SKIPPED [2] test/test_msgzero.py:33: openmpi
    SKIPPED [1] test/test_pickle.py:126: dill
    SKIPPED [1] test/test_pickle.py:168: yaml
    SKIPPED [2] test/test_rma.py:91: mpi-rma
    SKIPPED [2] test/test_rma.py:261: mpi-rma
    SKIPPED [2] test/test_rma.py:270: mpi-rma
    SKIPPED [2] test/test_rma.py:207: mpi-rma
    SKIPPED [2] test/test_rma.py:307: mpi-rma
    SKIPPED [2] test/test_rma.py:319: mpi-rma
    SKIPPED [2] test/test_rma.py:163: mpi-rma
    SKIPPED [2] test/test_rma.py:407: mpi-rma
    SKIPPED [2] test/test_rma.py:114: mpi-rma
    SKIPPED [2] test/test_rma.py:279: mpi-rma
    SKIPPED [2] test/test_rma.py:256: mpi-rma
    SKIPPED [2] test/test_rma.py:340: mpi-rma
    SKIPPED [2] test/test_rma.py:42: mpi-rma
    SKIPPED [2] test/test_rma.py:251: mpi-rma
    SKIPPED [2] test/test_rma.py:335: mpi-rma
    SKIPPED [2] test/test_rma.py:369: mpi-rma
    SKIPPED [2] test/test_rma.py:345: mpi-rma
    SKIPPED [2] test/test_rma.py:397: mpi-rma
    SKIPPED [2] test/test_rma_nb.py:67: mpi-rma-nb
    SKIPPED [2] test/test_rma_nb.py:151: mpi-rma-nb
    SKIPPED [2] test/test_rma_nb.py:161: mpi-rma-nb
    SKIPPED [2] test/test_rma_nb.py:100: mpi-rma-nb
    SKIPPED [2] test/test_rma_nb.py:144: mpi-rma-nb
    SKIPPED [2] test/test_rma_nb.py:44: mpi-rma-nb
    SKIPPED [2] test/test_rma_nb.py:137: mpi-rma-nb
    SKIPPED [4] test/test_spawn.py:120: using CUDA
    SKIPPED [4] test/test_spawn.py:219: using CUDA
    SKIPPED [4] test/test_spawn.py:94: using CUDA
    SKIPPED [4] test/test_spawn.py:151: using CUDA
    SKIPPED [4] test/test_spawn.py:169: using CUDA
    SKIPPED [4] test/test_spawn.py:183: using CUDA
    SKIPPED [4] test/test_spawn.py:106: using CUDA
    SKIPPED [4] test/test_spawn.py:197: using CUDA
    SKIPPED [4] test/test_spawn.py:134: using CUDA
    SKIPPED [4] test/test_spawn.py:239: using CUDA
    SKIPPED [1] test/test_subclass.py:234: mpi-win
    SKIPPED [1] test/test_subclass.py:229: mpi-win
    SKIPPED [2] test/test_win.py:47: mpi-win-create
    SKIPPED [2] test/test_win.py:95: mpi-win-create
    SKIPPED [2] test/test_win.py:30: mpi-win-create
    SKIPPED [2] test/test_win.py:53: mpi-win-create
    SKIPPED [2] test/test_win.py:71: mpi-win-create
    SKIPPED [2] test/test_win.py:61: mpi-win-create
    SKIPPED [2] test/test_win.py:85: mpi-win-create
    SKIPPED [2] test/test_win.py:38: mpi-win-create
    SKIPPED [2] test/test_win.py:106: mpi-win-create
    SKIPPED [2] test/test_win.py:194: mpi-win-dynamic
    SKIPPED [2] test/test_win.py:189: mpi-win-dynamic
    SKIPPED [2] test/test_win.py:95: mpi-win-dynamic
    SKIPPED [2] test/test_win.py:176: mpi-win-dynamic
    SKIPPED [2] test/test_win.py:53: mpi-win-dynamic
    SKIPPED [2] test/test_win.py:71: mpi-win-dynamic
    SKIPPED [2] test/test_win.py:61: mpi-win-dynamic
    SKIPPED [2] test/test_win.py:85: mpi-win-dynamic
    SKIPPED [2] test/test_win.py:182: mpi-win-dynamic
    SKIPPED [2] test/test_win.py:106: mpi-win-dynamic
    FAILED test/test_msgspec.py::TestMessageSimpleNumPy::testNotContiguous - TypeError: NumPy currently only supports dlpack for writeable arrays
    FAILED test/test_msgspec.py::TestMessageSimpleNumPy::testNotWriteable - TypeError: NumPy currently only supports dlpack for writeable arrays
    =============================================================== 2 failed, 1167 passed, 233 skipped in 24.75s ===============================================================
    

    In my build procedure I've added temporary those failing units to to --deselect list.

    build 
    opened by kloczek 12
  • Problem with corrupted data using allgather

    Problem with corrupted data using allgather

    I'm having a problem with data corruption when using allgather, but only one of the HPC systems we use. I think the problem is very likely somewhere in the infiniband stack, but I'm at a complete loss to track it down.

    We can trigger the error with the following simple test script:

    from mpi4py import MPI
    
    NCOORD = 3600
    
    
    class State:
        def __init__(self, coords):
            self.coords = coords
    
    
    def main():
        world = MPI.COMM_WORLD
        rank = world.Get_rank()
        N = world.Get_size()
    
        with open(f"output_{rank}.txt", "w") as outfile:
            if rank == 0:
                # generate N random states
                states = []
                for i in range(N):
                    coords = np.random.randn(NCOORD, 3)
                    state = State(coords)
                    states.append(state)
    
                # broadcast all states to each node
                world.bcast(states, root=0)
    
                # scatter a single state to each node
                state = world.scatter(states, root=0)
    
            else:
                states = world.bcast(None, root=0)
                state = world.scatter(None, root=0)
    
            while True:
                results = world.allgather(state)
                for i, (s1, s2) in enumerate(zip(states, results)):
                    if (s1.coords != s2.coords).any():
                        print(f"Coordinates do not match on rank {rank}", file=outfile)
                        print(f"position {i}", file=outfile)
                        print("expected:", file=outfile)
                        print(s1.coords, file=outfile)
                        print("got:", file=outfile)
                        print(s2.coords, file=outfile)
    
                        mask = (s1.coords != s2.coords)
                        x1_diff = s1.coords[mask]
                        x2_diff = s2.coords[mask]
                        print("diff expected", file=outfile)
                        print(x1_diff, file=outfile)
                        print("diff got", file=outfile)
                        print(x2_diff, file=outfile)
                        raise ValueError()
                print("success")
    
    
    if __name__ == "__main__":
        main()
    

    This runs when ranks are confined to a single node, but fails whenever ranks span multiple nodes. The errors are either the ValueError triggered when the data does not match what is expected, an unknown pickle protocol error, or a pickle data truncated error. In each case, the data is apparently being corrupted.

    Typically, one of the ranks on a node will complete successfully, whereas the remaining ranks will recieve garbage data. This makes me think this is some kind of data race.

    The same script works fine on the other HPC system that we use, which makes me think it is a some kind of problem with the network stack, rather than with mpi4py. However, I have been getting pushback from our system adminstrators because the following equivalent c program runs without issue:

    #include <stdio.h>
    #include <stdlib.h>
    #include <mpi.h>
    #include <sys/time.h>
    
    double **alloc_darray2d(int, int);
    void free_darray2d(double **, int, int);
    double ***alloc_darray3d(int, int, int);
    void free_darray3d(double ***, int, int, int);
    double ***alloc_parray2d(int, int);
    void free_parray2d(double ***, int, int);
    void gauss_rng(double *, int, double);
    
    int main(int argc, char *argv[]) {
    int i, j, p,  myid, numprocs, n, ntot, iter, found;
    double *sbuf, *rbuf;
    int ncoord = 3600;
    int seed = -1, maxiter = 1000000;
    unsigned int lseed;
    double ***states, **state, ***results;
    double sigma = 1.0;
    const double randmax = (double)RAND_MAX + 1.0;
    struct timeval curtime;
    char fname[25];
    FILE *fd;
    
       MPI_Init(NULL, NULL);
       MPI_Comm_rank(MPI_COMM_WORLD, &myid);
       MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
    
       /* allocate 2d array of doubles state[ncoord][3] contiguously in memory */
       state = alloc_darray2d(ncoord, 3);
       /* allocate 3d array of doubles states[numprocs][ncoord][3] contiguously in memory */
       states = alloc_darray3d(numprocs, ncoord, 3);
       n = ncoord*3;
       ntot = n*numprocs;
       if (states == NULL) { fprintf(stderr, "error: failed to allocate states array\n"); }
    
       if (myid == 0) {
          /* configure the random number generator */
          if (argc >= 2) {
             seed = atoi(argv[1]);
             if (argc == 3) {
                maxiter = atoi(argv[2]);
             }
          }
          if (seed >= 0) {
             lseed = seed;
          } else {
             /* randomly seed rng using the current microseconds within the current second */
             gettimeofday(&curtime, NULL);
             lseed = curtime.tv_usec;
          }
          srandom(lseed);
    
          /* fill the states array with gaussian distributed random numbers */
          for (p = 0; p < numprocs; p++) {
             for (i = 0; i < ncoord; i++) {
                for (j = 0; j < 3; j++) {
                   /* random returns integer between 0 and RAND_MAX, divide
                      by randmax gives number within [0,1) */
                   states[p][i][j] = random()/randmax;
                }
             }
          }
          gauss_rng(&states[0][0][0], ntot, sigma);
       }
       /* scatter states array into state array at each process */
       MPI_Bcast(&states[0][0][0], ntot, MPI_DOUBLE, 0, MPI_COMM_WORLD);
       MPI_Scatter(&states[0][0][0], n, MPI_DOUBLE, &state[0][0], n, MPI_DOUBLE, 0, MPI_COMM_WORLD);
       /* gather state array from each process into results array */
       results = alloc_darray3d(numprocs, ncoord, 3);
       found = 0;
       for (iter = 1; iter <= maxiter; iter++) {
          MPI_Allgather(&state[0][0], n, MPI_DOUBLE, &results[0][0][0], n, MPI_DOUBLE, MPI_COMM_WORLD);
          snprintf(fname, 24, "scattergather.%d", myid);
          fd = fopen(fname, "w");
          /* compare states and results arrays */
          for (p = 0; p < numprocs; p++) {
             for (i = 0; i < ncoord; i++) {
                if (states[p][i][0] != results[p][i][0]
                    || states[p][i][1] != results[p][i][1]
                    || states[p][i][2] != results[p][i][2]) {
                   fprintf(fd, "states and results differ on process %d at position %d:\n", myid, p);
                   fprintf(fd, "states[%d][%d][0] = %g\nresults[%d][%d][0] = %g\n", p, i, p, i, states[p][i][0], results[p][i][0]);
                   fprintf(fd, "states[%d][%d][1] = %g\nresults[%d][%d][1] = %g\n", p, i, p, i, states[p][i][1], results[p][i][1]);
                   fprintf(fd, "states[%d][%d][2] = %g\nresults[%d][%d][2] = %g\n", p, i, p, i, states[p][i][2], results[p][i][2]);
                   found = 1;
                }
             }
          }
          if (found == 1) { break; }
       }
       fclose(fd);
       MPI_Finalize();
    }
    
    #include <math.h>
    void gauss_rng(double *z, int n, double sigma) {
    /*
       gauss_rng converts an array z of length n filled with uniformly
       distributed random numbers in [0,1) into gaussian distributed
       random numbers with width sigma.
    */
    double sq;
    const double pi2 = 8.0*atan(1.0);
    int i;
    
       if ((n/2)*2 != n) {
          fprintf(stderr, "error: n in gauss_rng must be even\n");
          return;
       }
       for (i = 0; i < n; i += 2) {
          sq = sigma*sqrt(-2.0*log(1.0 - z[i]));
          z[i] = sq*sin(pi2*z[i + 1]);
          z[i + 1] = sq*cos(pi2*z[i + 1]);
       }
    }
    
    /* allocate a double 2d array with subscript range
       dm[0,...,nx-1][0,...,ny-1]                      */
    double **alloc_darray2d(int nx, int ny) {
      int i;
      double **dm;
    
      dm = (double **)malloc((size_t) nx*sizeof(double*));
      if (dm == NULL) { return NULL; }
      dm[0] = (double *)malloc((size_t) nx*ny*sizeof(double));
      if (dm[0] == NULL) {
         free((void *) dm);
         return NULL;
      }
      for (i = 1; i < nx; i++) dm[i] = dm[i - 1] + ny;
    /* return pointer to array of pointers to rows */
      return dm;
    }
    
    void free_darray2d(double **dm, int nx, int ny) {
      free((void *) dm[0]);
      free((void *) dm);
    }
    
    /* allocate a 2d array of pointers to doubles with subscript range
       pm[0,...,nx-1][0,...,ny-1]           */
    double ***alloc_parray2d(int nx, int ny) {
      int i;
      double ***pm;
    
      pm = (double ***)malloc((size_t) nx*sizeof(double**));
      if (pm == NULL) { return NULL; }
      pm[0]=(double **)malloc((size_t) nx*ny*sizeof(double*));
      if (pm[0] == NULL) {
         free((void *) pm);
         return NULL;
      }
      for (i = 1; i < nx; i++) pm[i] = pm[i - 1] + ny;
    /* return pointer to array of pointers to rows */
      return pm;
    }
    
    void free_parray2d(double ***pm,int nx, int ny) {
      free((void *) pm[0]);
      free((void *) pm);
    }
    
    /* allocate 3d array of doubles with subscript range
       dm[0,...,nx-1][0,...,ny-1][0,...,nz-1]           */
    double ***alloc_darray3d(int nx, int ny, int nz) {
      int i, j;
      double ***dm;
    
    /* first, allocate 2d array of pointers to doubles */
      dm = alloc_parray2d(nx, ny);
      if (dm == NULL) return NULL;
    
    /* allocate memory for the whole thing */
      dm[0][0] = (double *)malloc((size_t) nx*ny*nz*sizeof(double));
      if (dm[0][0] == NULL) {
        free_parray2d(dm, nx, ny);
        return NULL;
      }
    
    /* set the pointers inside the matrix */
      for (i = 0; i < nx; i++) {
        for (j = 1; j < ny; j++) {
          dm[i][j] = dm[i][j - 1] + nz;
        }
        if (i < nx - 1) dm[i + 1][0] = dm[i][ny - 1] + nz;
      }
    
    /* return pointer to array of matrix of pointers */
      return dm;
    }
    
    void free_darray3d(double ***dm, int nx, int ny, int nz) {
      free((void *) dm[0][0]);
      free_parray2d(dm, nx, ny);
    
    opened by jlmaccal 12
  • CUDA-aware Ireduce and Iallreduce operations for PyTorch GPU tensors segfault

    CUDA-aware Ireduce and Iallreduce operations for PyTorch GPU tensors segfault

    When calling either Ireduce or Iallreduce on PyTorch GPU tensors, a segfault occurs. I haven't exhaustively tested all of the ops, but I don't have problems with Reduce, Allreduce, Isend / Irecv, and Ibcast when tested the same way. I haven't tested CuPy tensors, but it might be worthwhile.

    It might just be something I'm doing wrong when using these functions, so here is a minimal script that can be used to demonstrate this behavior. The errors are only present when running on GPU:

    # mpirun -np 2 python repro.py gpu Ireduce
    from mpi4py import MPI
    import torch
    import sys
    
    if len(sys.argv) < 3:
        print('Usage: python repro.py [cpu|gpu] [MPI function to test]')
        sys.exit(1)
    
    use_gpu = sys.argv[1] == 'gpu'
    func_name = sys.argv[2]
    
    comm = MPI.COMM_WORLD
    rank = comm.Get_rank()
    size = comm.Get_size()
    if use_gpu:
        device = torch.device('cuda:' + str(rank % torch.cuda.device_count()))
    else:
        device = torch.device('cpu')
    
    def test_Iallreduce():
        sendbuf = torch.ones(1, device=device)
        recvbuf = torch.empty_like(sendbuf)
        torch.cuda.synchronize()
        req = comm.Iallreduce(sendbuf, recvbuf, op=MPI.SUM)  # also fails with MPI.MAX
        req.wait()
        assert recvbuf[0] == size
    
    def test_Ireduce():
        buf = torch.ones(1, device=device)
        if rank == 0:
            sendbuf = MPI.IN_PLACE
            recvbuf = buf
        else:
            sendbuf = buf
            recvbuf = None
        torch.cuda.synchronize()
        req = comm.Ireduce(sendbuf, recvbuf, root=0, op=MPI.SUM)  # also fails with MPI.MAX
        req.wait()
        if rank == 0:
            assert buf[0] == size
    
    eval('test_' + func_name + '()')
    

    Software/Hardware Versions:

    • OpenMPI 4.1.2, 4.1.1, 4.1.0, and 4.0.7 (built w/ --with-cuda flag)
    • mpi4py 3.1.3 (built against above MPI version)
    • CUDA 11.0
    • Python 3.6 (also tested under 3.8)
    • Nvidia K80 GPU (also tested with V100)
    • OS Ubuntu 18.04 (also tested in containerized environment)
    • torch 1.10.1 (w/ GPU support)

    You can reproduce my environment setup with the following commands:

    wget https://www.open-mpi.org//software/ompi/v3.0/downloads/openmpi-4.1.2.tar.gz
    tar xvf openmpi-4.1.2.tar.gz
    cd openmpi-4.1.2
    ./configure --with-cuda --prefix=/opt/openmpi-4.1.2
    sudo make -j4 all install
    export PATH=/opt/openmpi-4.1.2/bin:$PATH
    export LD_LIBRARY_PATH=/opt/openmpi-4.1.2/lib:$LD_LIBRARY_PATH
    env MPICC=/opt/openmpi-4.1.2/bin/mpicc pip install mpi4py
    pip install torch numpy
    

    The error message for Ireduce is the following:

    [<host>:25864] *** Process received signal ***
    [<host>:25864] Signal: Segmentation fault (11)
    [<host>:25864] Signal code: Invalid permissions (2)
    [<host>:25864] Failing at address: 0x1201220000
    [<host>:25864] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x3f040)[0x7f00efcf3040]
    [<host>:25864] [ 1] /opt/openmpi-4.1.2/lib/openmpi/mca_op_avx.so(+0xc079)[0x7f00e41c0079]
    [<host>:25864] [ 2] /opt/openmpi-4.1.2/lib/openmpi/mca_coll_libnbc.so(+0x7385)[0x7f00d3330385]
    [<host>:25864] [ 3] /opt/openmpi-4.1.2/lib/openmpi/mca_coll_libnbc.so(NBC_Progress+0x1f3)[0x7f00d3330033]
    [<host>:25864] [ 4] /opt/openmpi-4.1.2/lib/openmpi/mca_coll_libnbc.so(ompi_coll_libnbc_progress+0x8e)[0x7f00d332e84e]
    [<host>:25864] [ 5] /opt/openmpi-4.1.2/lib/libopen-pal.so.40(opal_progress+0x2c)[0x7f00edefba3c]
    [<host>:25864] [ 6] /opt/openmpi-4.1.2/lib/libopen-pal.so.40(ompi_sync_wait_mt+0xc5)[0x7f00edf025a5]
    [<host>:25864] [ 7] /opt/openmpi-4.1.2/lib/libmpi.so.40(ompi_request_default_wait+0x1f9)[0x7f00ee4eafa9]
    [<host>:25864] [ 8] /opt/openmpi-4.1.2/lib/libmpi.so.40(PMPI_Wait+0x52)[0x7f00ee532e02]
    [<host>:25864] [ 9] /home/ubuntu/venv/lib/python3.6/site-packages/mpi4py/MPI.cpython-36m-x86_64-linux-gnu.so(+0xa81e2)[0x7f00ee8911e2]
    [<host>:25864] [10] python[0x50a865]
    [<host>:25864] [11] python(_PyEval_EvalFrameDefault+0x444)[0x50c274]
    [<host>:25864] [12] python[0x509989]
    [<host>:25864] [13] python[0x50a6bd]
    [<host>:25864] [14] python(_PyEval_EvalFrameDefault+0x444)[0x50c274]
    [<host>:25864] [15] python[0x507f94]
    [<host>:25864] [16] python(PyRun_StringFlags+0xaf)[0x63500f]
    [<host>:25864] [17] python[0x600911]
    [<host>:25864] [18] python[0x50a4ef]
    [<host>:25864] [19] python(_PyEval_EvalFrameDefault+0x444)[0x50c274]
    [<host>:25864] [20] python[0x507f94]
    [<host>:25864] [21] python(PyEval_EvalCode+0x23)[0x50b0d3]
    [<host>:25864] [22] python[0x634dc2]
    [<host>:25864] [23] python(PyRun_FileExFlags+0x97)[0x634e77]
    [<host>:25864] [24] python(PyRun_SimpleFileExFlags+0x17f)[0x63862f]
    [<host>:25864] [25] python(Py_Main+0x591)[0x6391d1]
    [<host>:25864] [26] python(main+0xe0)[0x4b0d30]
    [<host>:25864] [27] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f00efcd5bf7]
    [<host>:25864] [28] python(_start+0x2a)[0x5b2a5a]
    [<host>:25864] *** End of error message ***
    --------------------------------------------------------------------------
    Primary job  terminated normally, but 1 process returned
    a non-zero exit code. Per user-direction, the job has been aborted.
    --------------------------------------------------------------------------
    --------------------------------------------------------------------------
    mpirun noticed that process rank 0 with PID 0 on node <host> exited on signal 11 (Segmentation fault).
    --------------------------------------------------------------------------
    

    I appreciate any guidance!

    opened by jmerizia 12
  • The kernel appears to have died. It will restart automatically.

    The kernel appears to have died. It will restart automatically.

    Hi there,

    I would really like to know how to fix or work around the problem I encountered when running a simple code snipper from the docs in a jupyter notebook.

    from mpi4py import MPI
    from mpi4py.futures import MPICommExecutor
    
    with MPICommExecutor(MPI.COMM_WORLD, root=0) as executor:
        if executor is not None:
           future = executor.submit(abs, -42)
           assert future.result() == 42
           answer = set(executor.map(abs, [-42, 42]))
           assert answer == {42}
    

    When I run this code with

    MPI4PY_FUTURES_MAX_WORKERS=8 mpiexec -n 1 jupyter notebook
    

    in a jupyter notebook, I get the error The kernel appears to have died. It will restart automatically.. The problem happens on Linux, but on Windows it works fine.

    Thanks in advance.

    opened by YarShev 6
  • 3.1.3: sphinx fails

    3.1.3: sphinx fails

    I have problems with generate documentation. Any hint what it could be?

    + /usr/bin/sphinx-build -j48 -n -T -b man docs/source/usrman build/sphinx/man
    Running Sphinx v5.3.0
    /usr/lib/python3.8/site-packages/requests/__init__.py:109: RequestsDependencyWarning: urllib3 (1.26.12) or chardet (None)/charset_normalizer (3.0.0) doesn't match a supported version!
      warnings.warn(
    making output directory... done
    [pers-jacek:3312327] mca_base_component_repository_open: unable to open mca_pmix_ext3x: /usr/lib64/openmpi/mca_pmix_ext3x.so: undefined symbol: pmix_value_load (ignored)
    [pers-jacek:3312327] [[17983,0],0] ORTE_ERROR_LOG: Not found in file ess_hnp_module.c at line 320
    --------------------------------------------------------------------------
    It looks like orte_init failed for some reason; your parallel process is
    likely to abort.  There are many reasons that a parallel process can
    fail during orte_init; some of which are due to configuration or
    environment problems.  This failure appears to be an internal failure;
    here's some additional information (which may only be relevant to an
    Open MPI developer):
    
      opal_pmix_base_select failed
      --> Returned value Not found (-13) instead of ORTE_SUCCESS
    --------------------------------------------------------------------------
    [pers-jacek:3312325] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ess_singleton_module.c at line 716
    [pers-jacek:3312325] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ess_singleton_module.c at line 172
    --------------------------------------------------------------------------
    It looks like orte_init failed for some reason; your parallel process is
    likely to abort.  There are many reasons that a parallel process can
    fail during orte_init; some of which are due to configuration or
    environment problems.  This failure appears to be an internal failure;
    here's some additional information (which may only be relevant to an
    Open MPI developer):
    
      orte_ess_init failed
      --> Returned value Unable to start a daemon on the local node (-127) instead of ORTE_SUCCESS
    --------------------------------------------------------------------------
    --------------------------------------------------------------------------
    It looks like MPI_INIT failed for some reason; your parallel process is
    likely to abort.  There are many reasons that a parallel process can
    fail during MPI_INIT; some of which are due to configuration or environment
    problems.  This failure appears to be an internal failure; here's some
    additional information (which may only be relevant to an Open MPI
    developer):
    
      ompi_mpi_init: ompi_rte_init failed
      --> Returned "Unable to start a daemon on the local node" (-127) instead of "Success" (0)
    --------------------------------------------------------------------------
    *** An error occurred in MPI_Init_thread
    *** on a NULL communicator
    *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
    ***    and potentially your MPI job)
    [pers-jacek:3312325] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
    
    opened by kloczek 12
  • MPICommExecutor & MPIPoolExecutor Freeze Indefinitely

    MPICommExecutor & MPIPoolExecutor Freeze Indefinitely

    Architecture: Power9 (Summit Super Computer)

    MPI Version: Package: IBM Spectrum MPI Spectrum MPI: 10.4.0.03rtm4 Spectrum MPI repo revision: IBM_SPECTRUM_MPI_10.04.00.03_2021.01.12_RTM4 Spectrum MPI release date: Unreleased developer copy

    MPI4py Version: 3.1.1

    Reproduce Script:

    from mpi4py.futures import MPICommExecutor
    from mpi4py import MPI
    import time
    import os
    
    comm = MPI.COMM_WORLD
    size = comm.Get_size()
    rank = comm.Get_rank()
    
    # square the numbers
    def apply_fun(i):
        print('running apply!', flush=True)
        return i**2*rank
    
    print('pid:',os.getpid(), flush=True)
    print('rank:', rank, flush=True)
    
    # this *does* implement map-reduce and supposedly works on legacy systems without dynamic process management
    # (I've gotten it working with `jsrun -n 1` but so far no luck with multiple processes)
    # see the docs: https://mpi4py.readthedocs.io/en/stable/mpi4py.futures.html?highlight=MPICommExecutor#mpicommexecutor
    with MPICommExecutor(MPI.COMM_WORLD, root=0) as executor:
        if executor is not None:
            print('Executor started from root!', flush=True)
            answer = list(executor.map(apply_fun, range(41)))
            print('pid: ',os.getpid(),'rank:',rank, answer, flush=True)
    

    jsrun python mpi_test.py output:

    Warning: OMP_NUM_THREADS=16 is greater than available PU's
    Warning: OMP_NUM_THREADS=16 is greater than available PU's
    pid: 1448946
    rank: 1
    pid: 1448945
    rank: 0
    Executor started from root!
    running apply!
    

    Then indefinite freeze. Btw jsrun is summit's 'custom version' of mpirun/mpiexec and it works really well in general (in contrast to mpirun & mpiexec). Also with this exact same setup I had no problem using MPI.gather() & MPI.scatter() it is just the Executors which don't work which troublesome because I really like the map-based API.

    opened by profPlum 16
  • Wrap MPIX_Query_cuda_support

    Wrap MPIX_Query_cuda_support

    Our application, PyFR, makes very successful use of mpi4py and has support for CUDA-aware MPI implementations. Here, however, our biggest issue is knowing if the MPI distribution we are running under is CUDA aware or not. Although there does not appear to be a perfect solution, for OpenMPI derivatives, the following is reasonable:

    https://www.open-mpi.org/faq/?category=runcuda#mpi-cuda-aware-support

    with the API of interest being MPIX_Query_cuda_support. It would therefore be nice if mpi4py could expose this attribute (None/False/True).

    opened by FreddieWitherden 6
  • testPackUnpackExternal alignment error on sparc64

    testPackUnpackExternal alignment error on sparc64

    sparc64 is not the most common architecture around, but for what it's worth 3.1.2 has started giving a Bus Error (Invalid address alignment) in testPackUnpackExternal (test_pack.TestPackExternal),

    testProbeRecv (test_p2p_obj_matched.TestP2PMatchedWorldDup) ... ok
    testPackSize (test_pack.TestPackExternal) ... ok
    testPackUnpackExternal (test_pack.TestPackExternal) ... [sompek:142729] *** Process received signal ***
    [sompek:142729] Signal: Bus error (10)
    [sompek:142729] Signal code: Invalid address alignment (1)
    [sompek:142729] Failing at address: 0xffff800100ea2821
    [sompek:142729] *** End of error message ***
    Bus error
    make[1]: *** [debian/rules:91: override_dh_auto_test] Error 1
    

    Full log at https://buildd.debian.org/status/fetch.php?pkg=mpi4py&arch=sparc64&ver=3.1.2-1&stamp=1636215944&raw=0

    It previously passed with 3.1.1.

    Ongoing sparc64 build logs at https://buildd.debian.org/status/logs.php?pkg=mpi4py&arch=sparc64

    opened by drew-parsons 11
Releases(3.1.4)
  • 3.1.4(Nov 2, 2022)

  • 3.1.3(Nov 25, 2021)

  • 3.1.2(Nov 4, 2021)

    WARNING: This is the last release supporting Python 2.

    • mpi4py.futures: Add _max_workers property to MPIPoolExecutor.

    • mpi4py.util.dtlib: Fix computation of alignment for predefined datatypes.

    • mpi4py.util.pkl5: Fix deadlock when using ssend() + mprobe().

    • mpi4py.util.pkl5: Add environment variable MPI4PY_PICKLE_THRESHOLD.

    • mpi4py.rc: Interpret "y" and "n" strings as boolean values.

    • Fix/add typemap/typestr for MPI.WCHAR/MPI.COUNT datatypes.

    • Minor fixes and additions to documentation.

    • Minor fixes to typing support.

    • Support for local version identifier (PEP-440).

    Source code(tar.gz)
    Source code(zip)
    mpi4py-3.1.2.tar.gz(2.34 MB)
  • 3.1.1(Aug 14, 2021)

  • 3.1.0(Aug 12, 2021)

    WARNING: This is the last release supporting Python 2.

    • New features:

      • mpi4py.util: New package collecting miscellaneous utilities.
    • Enhancements:

      • Add pickle-based Request.waitsome() and Request.testsome().

      • Add lowercase methods Request.get_status() and Request.cancel().

      • Support for passing Python GPU arrays compliant with the DLPack_ data interchange mechanism (link) and the __cuda_array_interface__ (CAI) standard (link) to uppercase methods. This support requires that mpi4py is built against CUDA-aware MPI implementations. This feature is currently experimental and subject to future changes.

      • mpi4py.futures: Add support for initializers and canceling futures at shutdown. Environment variables names now follow the pattern MPI4PY_FUTURES_*, the previous MPI4PY_* names are deprecated.

      • Add type annotations to Cython code. The first line of the docstring of functions and methods displays a signature including type annotations.

      • Add companion stub files to support type checkers.

      • Support for weak references.

    • Miscellaneous:

      • Add a new mpi4py publication (link) to the citation listing.
    Source code(tar.gz)
    Source code(zip)
    mpi4py-3.1.0.tar.gz(2.33 MB)
  • 3.0.3(Jul 27, 2020)

  • 3.0.2(Jul 27, 2020)

    • Bug fixes:

      • Fix handling of readonly buffers in support for Python 2 legacy buffer interface. The issue triggers only when using a buffer-like object that is readonly and does not export the new Python 3 buffer interface.
      • Fix build issues with Open MPI 4.0.x series related to removal of many MPI-1 symbols deprecated in MPI-2 and removed in MPI-3.
      • Minor documentation fixes.
    Source code(tar.gz)
    Source code(zip)
    mpi4py-3.0.2.tar.gz(1.36 MB)
  • 3.0.1(Jul 27, 2020)

    • Bug fixes:

      • Fix Comm.scatter() and other collectives corrupting input send list. Add safety measures to prevent related issues in global reduction operations.
      • Fix error-checking code for counts in Op.Reduce_local().
    • Enhancements:

      • Map size-specific Python/NumPy typecodes to MPI datatypes.
      • Allow partial specification of target list/tuple arguments in the various Win RMA methods.
      • Workaround for removal of MPI_{LB|UB} in Open MPI 4.0.
      • Support for Microsoft MPI v10.0.
    Source code(tar.gz)
    Source code(zip)
    mpi4py-3.0.1.tar.gz(1.36 MB)
  • 3.0.0(Jul 27, 2020)

    • New features:

      • mpi4py.futures: Execute computations asynchronously using a pool of MPI processes. This package is based on concurrent.futures from the Python standard library.
      • mpi4py.run: Run Python code and abort execution in case of unhandled exceptions to prevent deadlocks.
      • mpi4py.bench: Run basic MPI benchmarks and tests.
    • Enhancements:

      • Lowercase, pickle-based collective communication calls are now thread-safe through the use of fine-grained locking.
      • The MPI module now exposes a memory type which is a lightweight variant of the builtin memoryview type, but exposes both the legacy Python 2 and the modern Python 3 buffer interface under a Python 2 runtime.
      • The MPI.Comm.Alltoallw() method now uses count=1 and displ=0 as defaults, assuming that messages are specified through user-defined datatypes.
      • The Request.Wait[all]() methods now return True to match the interface of Request.Test[all]().
      • The Win class now implements the Python buffer interface.
    • Backward-incompatible changes:

      • The buf argument of the MPI.Comm.recv() method is deprecated, passing anything but None emits a warning.
      • The MPI.Win.memory property was removed, use the MPI.Win.tomemory() method instead.
      • Executing python -m mpi4py in the command line is now equivalent to python -m mpi4py.run. For the former behavior, use python -m mpi4py.bench.
      • Python 2.6 and 3.2 are no longer supported. The mpi4py.MPI module may still build and partially work, but other pure-Python modules under the mpi4py namespace will not.
      • Windows: Remove support for legacy MPICH2, Open MPI, and DeinoMPI.
    Source code(tar.gz)
    Source code(zip)
    mpi4py-3.0.0.tar.gz(1.36 MB)
  • 2.0.0(Jul 27, 2020)

    • Support for MPI-3 features.

      • Matched probes and receives.
      • Nonblocking collectives.
      • Neighborhood collectives.
      • New communicator constructors.
      • Request-based RMA operations.
      • New RMA communication and synchronisation calls.
      • New window constructors.
      • New datatype constructor.
      • New C++ boolean and floating complex datatypes.
    • Support for MPI-2 features not included in previous releases.

      • Generalized All-to-All collective (Comm.Alltoallw())
      • User-defined data representations (Register_datarep())
    • New scalable implementation of reduction operations for Python objects. This code is based on binomial tree algorithms using point-to-point communication and duplicated communicator contexts. To disable this feature, use mpi4py.rc.fast_reduce = False.

    • Backward-incompatible changes:

      • Python 2.4, 2.5, 3.0 and 3.1 are no longer supported.
      • Default MPI error handling policies are overriden. After import, mpi4py sets the ERRORS_RETURN error handler in COMM_SELF and COMM_WORLD, as well as any new Comm, Win, or File instance created through mpi4py, thus effectively ignoring the MPI rules about error handler inheritance. This way, MPI errors translate to Python exceptions. To disable this behavior and use the standard MPI error handling rules, use mpi4py.rc.errors = 'default'.
      • Change signature of all send methods, dest is a required argument.
      • Change signature of all receive and probe methods, source defaults to ANY_SOURCE, tag defaults to ANY_TAG.
      • Change signature of send lowercase-spelling methods, obj arguments are not mandatory.
      • Change signature of recv lowercase-spelling methods, renamed 'obj' arguments to 'buf'.
      • Change Request.Waitsome() and Request.Testsome() to return None or list.
      • Change signature of all lowercase-spelling collectives, sendobj arguments are now mandatory, recvobj arguments were removed.
      • Reduction operations MAXLOC and MINLOC are no longer special-cased in lowercase-spelling methods Comm.[all]reduce() and Comm.[ex]scan(), the input object must be specified as a tuple (obj, location).
      • Change signature of name publishing functions. The new signatures are Publish_name(service_name, port_name, info=INFO_NULL) and ``Unpublish_name(service_name, port_name, info=INFO_NULL)```.
      • Win instances now cache Python objects exposing memory by keeping references instead of using MPI attribute caching.
      • Change signature of Win.Lock(). The new signature is Win.Lock(rank, lock_type=LOCK_EXCLUSIVE, assertion=0).
      • Move Cartcomm.Map() to Intracomm.Cart_map().
      • Move Graphcomm.Map() to Intracomm.Graph_map().
      • Remove the mpi4py.MPE module.
      • Rename the Cython definition file for use with cimport statement from mpi_c.pxd to libmpi.pxd.
    Source code(tar.gz)
    Source code(zip)
    mpi4py-2.0.0.tar.gz(1.22 MB)
  • 1.3.1(Jul 27, 2020)

    • Regenerate C wrappers with Cython 0.19.1 to support Python 3.3.

    • Install *.pxd files in <site-packages>/mpi4py to ease the support for Cython's cimport statement in code requiring to access mpi4py internals.

    • As a side-effect of using Cython 0.19.1, ancient Python 2.3 is no longer supported. If you really need it, you can install an older Cython and run python setup.py build_src --force.

    Source code(tar.gz)
    Source code(zip)
    mpi4py-1.3.1.tar.gz(1022.05 KB)
Owner
MPI for Python
Python bindings for MPI
MPI for Python
An implementation of Relaxed Linear Adversarial Concept Erasure (RLACE)

Background This repository contains an implementation of Relaxed Linear Adversarial Concept Erasure (RLACE). Given a dataset X of dense representation

Shauli Ravfogel 4 Apr 13, 2022
Course files for "Ocean/Atmosphere Time Series Analysis"

time-series This package contains all necessary files for the course Ocean/Atmosphere Time Series Analysis, an introduction to data and time series an

Jonathan Lilly 107 Nov 29, 2022
A data preprocessing package for time series data. Design for machine learning and deep learning.

A data preprocessing package for time series data. Design for machine learning and deep learning.

Allen Chiang 152 Jan 07, 2023
A project based example of Data pipelines, ML workflow management, API endpoints and Monitoring.

MLOps template with examples for Data pipelines, ML workflow management, API development and Monitoring.

Utsav 33 Dec 03, 2022
ArviZ is a Python package for exploratory analysis of Bayesian models

ArviZ (pronounced "AR-vees") is a Python package for exploratory analysis of Bayesian models. Includes functions for posterior analysis, data storage, model checking, comparison and diagnostics

ArviZ 1.3k Jan 05, 2023
This is an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch

This is an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch. It uses a simple TestEnvironment to test the algorithm

Martin Huber 59 Dec 09, 2022
ThunderGBM: Fast GBDTs and Random Forests on GPUs

Documentations | Installation | Parameters | Python (scikit-learn) interface What's new? ThunderGBM won 2019 Best Paper Award from IEEE Transactions o

Xtra Computing Group 648 Dec 16, 2022
This repo includes some graph-based CTR prediction models and other representative baselines.

Graph-based CTR prediction This is a repository designed for graph-based CTR prediction methods, it includes our graph-based CTR prediction methods: F

Big Data and Multi-modal Computing Group, CRIPAC 47 Dec 30, 2022
Python library for multilinear algebra and tensor factorizations

scikit-tensor is a Python module for multilinear algebra and tensor factorizations

Maximilian Nickel 394 Dec 09, 2022
Python Machine Learning Jupyter Notebooks (ML website)

Python Machine Learning Jupyter Notebooks (ML website) Dr. Tirthajyoti Sarkar, Fremont, California (Please feel free to connect on LinkedIn here) Also

Tirthajyoti Sarkar 2.6k Jan 03, 2023
List of Data Science Cheatsheets to rule the world

Data Science Cheatsheets List of Data Science Cheatsheets to rule the world. Table of Contents Business Science Business Science Problem Framework Dat

Favio André Vázquez 11.7k Dec 30, 2022
This is an auto-ML tool specialized in detecting of outliers

Auto-ML tool specialized in detecting of outliers Description This tool will allows you, with a Dash visualization, to compare 10 models of machine le

1 Nov 03, 2021
Responsible Machine Learning with Python

Examples of techniques for training interpretable ML models, explaining ML models, and debugging ML models for accuracy, discrimination, and security.

ph_ 624 Jan 06, 2023
Primitives for machine learning and data science.

An Open Source Project from the Data to AI Lab, at MIT MLPrimitives Pipelines and primitives for machine learning and data science. Documentation: htt

MLBazaar 65 Dec 29, 2022
Python ML pipeline that showcases mltrace functionality.

mltrace tutorial Date: October 2021 This tutorial builds a training and testing pipeline for a toy ML prediction problem: to predict whether a passeng

Log Labs 28 Nov 09, 2022
stability-selection - A scikit-learn compatible implementation of stability selection

stability-selection - A scikit-learn compatible implementation of stability selection stability-selection is a Python implementation of the stability

185 Dec 03, 2022
Scikit-Garden or skgarden is a garden for Scikit-Learn compatible decision trees and forests.

Scikit-Garden or skgarden (pronounced as skarden) is a garden for Scikit-Learn compatible decision trees and forests.

260 Dec 21, 2022
Empyrial is a Python-based open-source quantitative investment library dedicated to financial institutions and retail investors

By Investors, For Investors. Want to read this in Chinese? Click here Empyrial is a Python-based open-source quantitative investment library dedicated

Santosh 640 Dec 31, 2022
Automatically create Faiss knn indices with the most optimal similarity search parameters.

It selects the best indexing parameters to achieve the highest recalls given memory and query speed constraints.

Criteo 419 Jan 01, 2023
Merlion: A Machine Learning Framework for Time Series Intelligence

Merlion is a Python library for time series intelligence. It provides an end-to-end machine learning framework that includes loading and transforming data, building and training models, post-processi

Salesforce 2.8k Jan 05, 2023