CUDA Python Low-level Bindings

Overview

CUDA-Python

Building

Requirements

Dependencies of the CUDA-Python bindings and some versions that are known to work are as follows:

  • Driver: Linux (450.80.02 or later) Windows(456.38 or later)
  • CUDA Toolkit 11.0 to 11.4 - e.g. 11.4.48
  • Cython - e.g. 0.29.21
  • Versioneer - e.g. 0.20

Compilation

To compile the extension in-place, run:

python setup.py build_ext --inplace

To compile for debugging the extension modules with gdb, pass the --debug argument to setup.py.

Develop installation

You can use

python setup.py develop

to use the module in-place in your current Python environment (e.g. for testing of porting other libraries to use the binding).

Build the Docs

conda env create -f docs/environment-docs.yml
conda activate cuda-python-docs

Then compile and install cuda-python following the steps above.

cd docs
make html
open build/html/index.html

Build the Docs

conda env create -f docs_src/environment-docs.yml
conda activate cuda-python-docs

Then compile and install cuda-python following the steps above.

cd docs_src
make html
open build/html/index.html

Publish the Docs

git checkout gh-pages
cd docs_src
make html
cp -a build/html/. ../docs/

Testing

Requirements

Dependencies of the test execution and some versions that are known to work are as follows:

  • numpy-1.19.5
  • numba-0.53.1
  • matplotlib-3.3.4
  • scipy-1.6.3
  • pytest-benchmark-3.4.1

Unit-tests

You can run the included tests with:

pytest

Samples

You can run the included tests with:

pytest examples

Benchmark

You can run benchmark only tests with:

pytest --benchmark-only

Examples

The included examples are:

  • examples/extra/jit_program.py: Demonstrates the use of the API to compile and launch a kernel on the device. Includes device memory allocation / deallocation, transfers between host and device, creation and usage of streams, and context management.
  • examples/extra/numba_emm_plugin.py: Implements a Numba External Memory Management plugin, showing that this CUDA Python Driver API can coexist with other wrappers of the driver API.
Comments
  • Fails to build on AmazonLinux

    Fails to build on AmazonLinux

    Hi,

    I checked out the package and tried to build it on AmazonLinux but it fails to compile. Please see the build output below. I also tried all other commands there were mentioned in installation guide, but all failed with the same issue.

    Cuda : 11.2 GCC: 9.3

    $ python setup.py build
    Compiling cuda/_cuda/ccuda.pyx because it changed.
    Compiling cuda/_cuda/cnvrtc.pyx because it changed.
    [1/2] Cythonizing cuda/_cuda/ccuda.pyx
    [2/2] Cythonizing cuda/_cuda/cnvrtc.pyx
    Compiling cuda/_lib/utils.pyx because it changed.
    [1/1] Cythonizing cuda/_lib/utils.pyx
    Compiling cuda/_lib/ccudart/ccudart.pyx because it changed.
    Compiling cuda/_lib/ccudart/utils.pyx because it changed.
    [1/2] Cythonizing cuda/_lib/ccudart/ccudart.pyx
    [2/2] Cythonizing cuda/_lib/ccudart/utils.pyx
    Compiling cuda/ccuda.pyx because it changed.
    Compiling cuda/ccudart.pyx because it changed.
    Compiling cuda/cnvrtc.pyx because it changed.
    Compiling cuda/cuda.pyx because it changed.
    Compiling cuda/cudart.pyx because it changed.
    Compiling cuda/nvrtc.pyx because it changed.
    [1/6] Cythonizing cuda/ccuda.pyx
    [2/6] Cythonizing cuda/ccudart.pyx
    [3/6] Cythonizing cuda/cnvrtc.pyx
    [4/6] Cythonizing cuda/cuda.pyx
    [5/6] Cythonizing cuda/cudart.pyx
    [6/6] Cythonizing cuda/nvrtc.pyx
    Compiling cuda/tests/test_ccuda.pyx because it changed.
    Compiling cuda/tests/test_ccudart.pyx because it changed.
    Compiling cuda/tests/test_interoperability_cython.pyx because it changed.
    [1/3] Cythonizing cuda/tests/test_ccuda.pyx
    [2/3] Cythonizing cuda/tests/test_ccudart.pyx
    [3/3] Cythonizing cuda/tests/test_interoperability_cython.pyx
    running build
    running build_py
    creating build
    creating build/lib.linux-x86_64-3.8
    creating build/lib.linux-x86_64-3.8/cuda
    copying cuda/__init__.py -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/_version.py -> build/lib.linux-x86_64-3.8/cuda
    creating build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_cuda/__init__.py -> build/lib.linux-x86_64-3.8/cuda/_cuda
    creating build/lib.linux-x86_64-3.8/cuda/_lib
    copying cuda/_lib/__init__.py -> build/lib.linux-x86_64-3.8/cuda/_lib
    creating build/lib.linux-x86_64-3.8/cuda/benchmarks
    copying cuda/benchmarks/__init__.py -> build/lib.linux-x86_64-3.8/cuda/benchmarks
    copying cuda/benchmarks/kernels.py -> build/lib.linux-x86_64-3.8/cuda/benchmarks
    copying cuda/benchmarks/perf_test_utils.py -> build/lib.linux-x86_64-3.8/cuda/benchmarks
    copying cuda/benchmarks/test_cupy.py -> build/lib.linux-x86_64-3.8/cuda/benchmarks
    copying cuda/benchmarks/test_launch_latency.py -> build/lib.linux-x86_64-3.8/cuda/benchmarks
    copying cuda/benchmarks/test_numba.py -> build/lib.linux-x86_64-3.8/cuda/benchmarks
    copying cuda/benchmarks/test_pointer_attributes.py -> build/lib.linux-x86_64-3.8/cuda/benchmarks
    creating build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/__init__.py -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_cuda.py -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_cudart.py -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_cython.py -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_interoperability.py -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_kernelParams.py -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_nvrtc.py -> build/lib.linux-x86_64-3.8/cuda/tests
    creating build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
    copying cuda/_lib/ccudart/__init__.py -> build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
    copying cuda/__init__.pxd -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/ccuda.pxd -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/ccudart.pxd -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/cnvrtc.pxd -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/cuda.pxd -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/cudart.pxd -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/nvrtc.pxd -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/ccuda.pyx -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/ccudart.pyx -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/cnvrtc.pyx -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/cuda.pyx -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/cudart.pyx -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/nvrtc.pyx -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/ccuda.cpp -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/ccudart.cpp -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/cnvrtc.cpp -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/cuda.cpp -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/cudart.cpp -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/nvrtc.cpp -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/_cuda/ccuda.pxd -> build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_cuda/cnvrtc.pxd -> build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_cuda/loader.pxd -> build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_cuda/ccuda.pyx -> build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_cuda/cnvrtc.pyx -> build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_cuda/loader.h -> build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_cuda/loader.cpp -> build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_cuda/ccuda.cpp -> build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_cuda/cnvrtc.cpp -> build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_lib/dlfcn.pxd -> build/lib.linux-x86_64-3.8/cuda/_lib
    copying cuda/_lib/param_packer.pxd -> build/lib.linux-x86_64-3.8/cuda/_lib
    copying cuda/_lib/utils.pxd -> build/lib.linux-x86_64-3.8/cuda/_lib
    copying cuda/_lib/utils.pyx -> build/lib.linux-x86_64-3.8/cuda/_lib
    copying cuda/_lib/param_packer.h -> build/lib.linux-x86_64-3.8/cuda/_lib
    copying cuda/_lib/param_packer.cpp -> build/lib.linux-x86_64-3.8/cuda/_lib
    copying cuda/_lib/utils.cpp -> build/lib.linux-x86_64-3.8/cuda/_lib
    copying cuda/tests/test_ccuda.pyx -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_ccudart.pyx -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_interoperability_cython.pyx -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_ccuda.cpp -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_ccudart.cpp -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_interoperability_cython.cpp -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/_lib/ccudart/ccudart.pxd -> build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
    copying cuda/_lib/ccudart/utils.pxd -> build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
    copying cuda/_lib/ccudart/ccudart.pyx -> build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
    copying cuda/_lib/ccudart/utils.pyx -> build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
    copying cuda/_lib/ccudart/ccudart.cpp -> build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
    copying cuda/_lib/ccudart/utils.cpp -> build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
    UPDATING build/lib.linux-x86_64-3.8/cuda/_version.py
    set build/lib.linux-x86_64-3.8/cuda/_version.py to '11.7.1'
    running build_ext
    building 'cuda._cuda.ccuda' extension
    creating build/temp.linux-x86_64-3.8
    creating build/temp.linux-x86_64-3.8/cuda
    creating build/temp.linux-x86_64-3.8/cuda/_cuda
    /home/ec2-user/anaconda3/envs/tensorflow2_p38/bin/x86_64-conda-linux-gnu-cc -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -Wstrict-prototypes -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -pipe -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -pipe -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/ec2-user/anaconda3/envs/tensorflow2_p38/include -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /home/ec2-user/anaconda3/envs/tensorflow2_p38/include -fPIC -I./cuda -I./cuda/_cuda -I/home/ec2-user/anaconda3/envs/tensorflow2_p38/include -I/usr/local/cuda-11.2/include -I/home/ec2-user/anaconda3/envs/tensorflow2_p38/include/python3.8 -c cuda/_cuda/ccuda.cpp -o build/temp.linux-x86_64-3.8/cuda/_cuda/ccuda.o -std=c++14 -fpermissive -Wno-deprecated-declarations -D _GLIBCXX_ASSERTIONS -fno-var-tracking-assignments -O3
    cc1plus: warning: command line option '-Wstrict-prototypes' is valid for C/ObjC but not for C++
    cuda/_cuda/ccuda.cpp: In function 'int __pyx_f_4cuda_5_cuda_5ccuda_cuPythonInit()':
    cuda/_cuda/ccuda.cpp:4202:138: error: 'CU_GET_PROC_ADDRESS_PER_THREAD_DEFAULT_STREAM' was not declared in this scope
     4202 |         __pyx_t_8 = __pyx_f_4cuda_5ccuda_cuGetProcAddress(((char const *)"cuMemcpy"), (&__pyx_v_4cuda_5_cuda_5ccuda___cuMemcpy), 0x1B58, CU_GET_PROC_ADDRESS_PER_THREAD_DEFAULT_STREAM); if (unlikely(__pyx_t_8 == ((CUresult)CUDA_ERROR_NOT_FOUND) && __Pyx_ErrOccurredWithGIL())) __PYX_ERR(0, 836, __pyx_L4_error)
          |                                                                                                                                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:4924:137: error: 'CU_GET_PROC_ADDRESS_DEFAULT' was not declared in this scope
     4924 |         __pyx_t_8 = __pyx_f_4cuda_5ccuda_cuGetProcAddress(((char const *)"cuMemcpy"), (&__pyx_v_4cuda_5_cuda_5ccuda___cuMemcpy), 0xFA0, CU_GET_PROC_ADDRESS_DEFAULT); if (unlikely(__pyx_t_8 == ((CUresult)CUDA_ERROR_NOT_FOUND) && __Pyx_ErrOccurredWithGIL())) __PYX_ERR(0, 917, __pyx_L4_error)
          |                                                                                                                                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:5637:152: error: 'CU_GET_PROC_ADDRESS_DEFAULT' was not declared in this scope
     5637 |       __pyx_t_8 = __pyx_f_4cuda_5ccuda_cuGetProcAddress(((char const *)"cuGetErrorString"), (&__pyx_v_4cuda_5_cuda_5ccuda___cuGetErrorString), 0x1770, CU_GET_PROC_ADDRESS_DEFAULT); if (unlikely(__pyx_t_8 == ((CUresult)CUDA_ERROR_NOT_FOUND) && __Pyx_ErrOccurredWithGIL())) __PYX_ERR(0, 997, __pyx_L4_error)
          |                                                                                                                                                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp: At global scope:
    cuda/_cuda/ccuda.cpp:15609:73: error: 'CUflushGPUDirectRDMAWritesTarget' was not declared in this scope
    15609 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuFlushGPUDirectRDMAWrites(CUflushGPUDirectRDMAWritesTarget __pyx_v_target, CUflushGPUDirectRDMAWritesScope __pyx_v_scope) {
          |                                                                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:15609:122: error: 'CUflushGPUDirectRDMAWritesScope' was not declared in this scope
    15609 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuFlushGPUDirectRDMAWrites(CUflushGPUDirectRDMAWritesTarget __pyx_v_target, CUflushGPUDirectRDMAWritesScope __pyx_v_scope) {
          |                                                                                                                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:15609:167: warning: expression list treated as compound expression in initializer [-fpermissive]
    15609 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuFlushGPUDirectRDMAWrites(CUflushGPUDirectRDMAWritesTarget __pyx_v_target, CUflushGPUDirectRDMAWritesScope __pyx_v_scope) {
          |                                                                                                                                                                       ^
    cuda/_cuda/ccuda.cpp:16977:94: error: 'CUexecAffinityType' has not been declared
    16977 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuDeviceGetExecAffinitySupport(int *__pyx_v_pi, CUexecAffinityType __pyx_v_typename, CUdevice __pyx_v_dev) {
          |                                                                                              ^~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp: In function 'CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuDeviceGetExecAffinitySupport(int*, int, CUdevice)':
    cuda/_cuda/ccuda.cpp:17082:30: error: expected primary-expression before '(' token
    17082 |     __pyx_v_err = ((CUresult (*)(int *, CUexecAffinityType, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceGetExecAffinitySupport)(__pyx_v_pi, __pyx_v_typename, __pyx_v_dev);
          |                              ^
    cuda/_cuda/ccuda.cpp:17082:32: error: expected primary-expression before ')' token
    17082 |     __pyx_v_err = ((CUresult (*)(int *, CUexecAffinityType, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceGetExecAffinitySupport)(__pyx_v_pi, __pyx_v_typename, __pyx_v_dev);
          |                                ^
    cuda/_cuda/ccuda.cpp:17082:34: error: expected primary-expression before 'int'
    17082 |     __pyx_v_err = ((CUresult (*)(int *, CUexecAffinityType, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceGetExecAffinitySupport)(__pyx_v_pi, __pyx_v_typename, __pyx_v_dev);
          |                                  ^~~
    cuda/_cuda/ccuda.cpp:17082:41: error: 'CUexecAffinityType' was not declared in this scope
    17082 |     __pyx_v_err = ((CUresult (*)(int *, CUexecAffinityType, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceGetExecAffinitySupport)(__pyx_v_pi, __pyx_v_typename, __pyx_v_dev);
          |                                         ^~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:17082:69: error: expected primary-expression before ')' token
    17082 |     __pyx_v_err = ((CUresult (*)(int *, CUexecAffinityType, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceGetExecAffinitySupport)(__pyx_v_pi, __pyx_v_typename, __pyx_v_dev);
          |                                                                     ^
    cuda/_cuda/ccuda.cpp:17082:71: error: expected ')' before '__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceGetExecAffinitySupport'
    17082 |     __pyx_v_err = ((CUresult (*)(int *, CUexecAffinityType, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceGetExecAffinitySupport)(__pyx_v_pi, __pyx_v_typename, __pyx_v_dev);
          |                   ~                                                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          |                                                                       )
    cuda/_cuda/ccuda.cpp: At global scope:
    cuda/_cuda/ccuda.cpp:17319:86: error: 'CUexecAffinityParam' has not been declared
    17319 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuCtxCreate_v3(CUcontext *__pyx_v_pctx, CUexecAffinityParam *__pyx_v_paramsArray, int __pyx_v_numParams, unsigned int __pyx_v_flags, CUdevice __pyx_v_dev) {
          |                                                                                      ^~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp: In function 'CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuCtxCreate_v3(CUctx_st**, int*, int, unsigned int, CUdevice)':
    cuda/_cuda/ccuda.cpp:17424:30: error: expected primary-expression before '(' token
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                              ^
    cuda/_cuda/ccuda.cpp:17424:32: error: expected primary-expression before ')' token
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                                ^
    cuda/_cuda/ccuda.cpp:17424:44: error: expected primary-expression before '*' token
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                                            ^
    cuda/_cuda/ccuda.cpp:17424:45: error: expected primary-expression before ',' token
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                                             ^
    cuda/_cuda/ccuda.cpp:17424:47: error: 'CUexecAffinityParam' was not declared in this scope
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                                               ^~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:17424:68: error: expected primary-expression before ',' token
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                                                                    ^
    cuda/_cuda/ccuda.cpp:17424:70: error: expected primary-expression before 'int'
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                                                                      ^~~
    cuda/_cuda/ccuda.cpp:17424:75: error: expected primary-expression before 'unsigned'
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                                                                           ^~~~~~~~
    cuda/_cuda/ccuda.cpp:17424:97: error: expected primary-expression before ')' token
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                                                                                                 ^
    cuda/_cuda/ccuda.cpp:17424:99: error: expected ')' before '__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3'
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                   ~                                                                               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          |                                                                                                   )
    cuda/_cuda/ccuda.cpp: At global scope:
    cuda/_cuda/ccuda.cpp:20397:67: error: 'CUexecAffinityParam' was not declared in this scope
    20397 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuCtxGetExecAffinity(CUexecAffinityParam *__pyx_v_pExecAffinity, CUexecAffinityType __pyx_v_typename) {
          |                                                                   ^~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:20397:88: error: '__pyx_v_pExecAffinity' was not declared in this scope
    20397 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuCtxGetExecAffinity(CUexecAffinityParam *__pyx_v_pExecAffinity, CUexecAffinityType __pyx_v_typename) {
          |                                                                                        ^~~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:20397:111: error: 'CUexecAffinityType' was not declared in this scope
    20397 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuCtxGetExecAffinity(CUexecAffinityParam *__pyx_v_pExecAffinity, CUexecAffinityType __pyx_v_typename) {
          |                                                                                                               ^~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:20397:146: warning: expression list treated as compound expression in initializer [-fpermissive]
    20397 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuCtxGetExecAffinity(CUexecAffinityParam *__pyx_v_pExecAffinity, CUexecAffinityType __pyx_v_typename) {
          |                                                                                                                                                  ^
    cuda/_cuda/ccuda.cpp:33564:75: error: 'CUDA_ARRAY_MEMORY_REQUIREMENTS' was not declared in this scope
    33564 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuArrayGetMemoryRequirements(CUDA_ARRAY_MEMORY_REQUIREMENTS *__pyx_v_memoryRequirements, CUarray __pyx_v_array, CUdevice __pyx_v_device) {
          |                                                                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:33564:107: error: '__pyx_v_memoryRequirements' was not declared in this scope
    33564 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuArrayGetMemoryRequirements(CUDA_ARRAY_MEMORY_REQUIREMENTS *__pyx_v_memoryRequirements, CUarray __pyx_v_array, CUdevice __pyx_v_device) {
          |                                                                                                           ^~~~~~~~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:33564:143: error: expected primary-expression before '__pyx_v_array'
    33564 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuArrayGetMemoryRequirements(CUDA_ARRAY_MEMORY_REQUIREMENTS *__pyx_v_memoryRequirements, CUarray __pyx_v_array, CUdevice __pyx_v_device) {
          |                                                                                                                                               ^~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:33564:167: error: expected primary-expression before '__pyx_v_device'
    33564 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuArrayGetMemoryRequirements(CUDA_ARRAY_MEMORY_REQUIREMENTS *__pyx_v_memoryRequirements, CUarray __pyx_v_array, CUdevice __pyx_v_device) {
    ---
    truncated due to git issue limit
    ---
    cuda/_cuda/ccuda.cpp:58806:44: error: 'CUgraphMem_attribute' was not declared in this scope
    58806 |     __pyx_v_err = ((CUresult (*)(CUdevice, CUgraphMem_attribute, void *))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceSetGraphMemAttribute)(__pyx_v_device, __pyx_v_attr, __pyx_v_value);
          |                                            ^~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:58806:66: error: expected primary-expression before 'void'
    58806 |     __pyx_v_err = ((CUresult (*)(CUdevice, CUgraphMem_attribute, void *))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceSetGraphMemAttribute)(__pyx_v_device, __pyx_v_attr, __pyx_v_value);
          |                                                                  ^~~~
    cuda/_cuda/ccuda.cpp:58806:74: error: expected ')' before '__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceSetGraphMemAttribute'
    58806 |     __pyx_v_err = ((CUresult (*)(CUdevice, CUgraphMem_attribute, void *))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceSetGraphMemAttribute)(__pyx_v_device, __pyx_v_attr, __pyx_v_value);
          |                   ~                                                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          |                                                                          )
    cuda/_cuda/ccuda.cpp: At global scope:
    cuda/_cuda/ccuda.cpp:64515:65: error: 'CUuserObject' was not declared in this scope; did you mean 'CUsurfObject'?
    64515 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectCreate(CUuserObject *__pyx_v_object_out, void *__pyx_v_ptr, CUhostFn __pyx_v_destroy, unsigned int __pyx_v_initialRefcount, unsigned int __pyx_v_flags) {
          |                                                                 ^~~~~~~~~~~~
          |                                                                 CUsurfObject
    cuda/_cuda/ccuda.cpp:64515:79: error: '__pyx_v_object_out' was not declared in this scope
    64515 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectCreate(CUuserObject *__pyx_v_object_out, void *__pyx_v_ptr, CUhostFn __pyx_v_destroy, unsigned int __pyx_v_initialRefcount, unsigned int __pyx_v_flags) {
          |                                                                               ^~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:64515:99: error: expected primary-expression before 'void'
    64515 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectCreate(CUuserObject *__pyx_v_object_out, void *__pyx_v_ptr, CUhostFn __pyx_v_destroy, unsigned int __pyx_v_initialRefcount, unsigned int __pyx_v_flags) {
          |                                                                                                   ^~~~
    cuda/_cuda/ccuda.cpp:64515:127: error: expected primary-expression before '__pyx_v_destroy'
    64515 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectCreate(CUuserObject *__pyx_v_object_out, void *__pyx_v_ptr, CUhostFn __pyx_v_destroy, unsigned int __pyx_v_initialRefcount, unsigned int __pyx_v_flags) {
          |                                                                                                                               ^~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:64515:144: error: expected primary-expression before 'unsigned'
    64515 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectCreate(CUuserObject *__pyx_v_object_out, void *__pyx_v_ptr, CUhostFn __pyx_v_destroy, unsigned int __pyx_v_initialRefcount, unsigned int __pyx_v_flags) {
          |                                                                                                                                                ^~~~~~~~
    cuda/_cuda/ccuda.cpp:64515:182: error: expected primary-expression before 'unsigned'
    64515 | atic CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectCreate(CUuserObject *__pyx_v_object_out, void *__pyx_v_ptr, CUhostFn __pyx_v_destroy, unsigned int __pyx_v_initialRefcount, unsigned int __pyx_v_flags) {
          |                                                                                                                                                                                    ^~~~~~~~
    
    cuda/_cuda/ccuda.cpp:64515:208: warning: expression list treated as compound expression in initializer [-fpermissive]
    64515 | a_5_cuda_5ccuda__cuUserObjectCreate(CUuserObject *__pyx_v_object_out, void *__pyx_v_ptr, CUhostFn __pyx_v_destroy, unsigned int __pyx_v_initialRefcount, unsigned int __pyx_v_flags) {
          |                                                                                                                                                                                    ^
    
    cuda/_cuda/ccuda.cpp:64686:65: error: 'CUuserObject' was not declared in this scope; did you mean 'CUsurfObject'?
    64686 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectRetain(CUuserObject __pyx_v_object, unsigned int __pyx_v_count) {
          |                                                                 ^~~~~~~~~~~~
          |                                                                 CUsurfObject
    cuda/_cuda/ccuda.cpp:64686:94: error: expected primary-expression before 'unsigned'
    64686 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectRetain(CUuserObject __pyx_v_object, unsigned int __pyx_v_count) {
          |                                                                                              ^~~~~~~~
    cuda/_cuda/ccuda.cpp:64686:120: warning: expression list treated as compound expression in initializer [-fpermissive]
    64686 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectRetain(CUuserObject __pyx_v_object, unsigned int __pyx_v_count) {
          |                                                                                                                        ^
    cuda/_cuda/ccuda.cpp:64857:66: error: 'CUuserObject' was not declared in this scope; did you mean 'CUsurfObject'?
    64857 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectRelease(CUuserObject __pyx_v_object, unsigned int __pyx_v_count) {
          |                                                                  ^~~~~~~~~~~~
          |                                                                  CUsurfObject
    cuda/_cuda/ccuda.cpp:64857:95: error: expected primary-expression before 'unsigned'
    64857 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectRelease(CUuserObject __pyx_v_object, unsigned int __pyx_v_count) {
          |                                                                                               ^~~~~~~~
    cuda/_cuda/ccuda.cpp:64857:121: warning: expression list treated as compound expression in initializer [-fpermissive]
    64857 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectRelease(CUuserObject __pyx_v_object, unsigned int __pyx_v_count) {
          |                                                                                                                         ^
    cuda/_cuda/ccuda.cpp:65028:93: error: 'CUuserObject' has not been declared
    65028 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuGraphRetainUserObject(CUgraph __pyx_v_graph, CUuserObject __pyx_v_object, unsigned int __pyx_v_count, unsigned int __pyx_v_flags) {
          |                                                                                             ^~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp: In function 'CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuGraphRetainUserObject(CUgraph, int, unsigned int, unsigned int)':
    cuda/_cuda/ccuda.cpp:65133:30: error: expected primary-expression before '(' token
    65133 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count, __pyx_v_flags);
          |                              ^
    cuda/_cuda/ccuda.cpp:65133:32: error: expected primary-expression before ')' token
    65133 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count, __pyx_v_flags);
          |                                ^
    cuda/_cuda/ccuda.cpp:65133:41: error: expected primary-expression before ',' token
    65133 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count, __pyx_v_flags);
          |                                         ^
    cuda/_cuda/ccuda.cpp:65133:43: error: 'CUuserObject' was not declared in this scope; did you mean 'CUsurfObject'?
    65133 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count, __pyx_v_flags);
          |                                           ^~~~~~~~~~~~
          |                                           CUsurfObject
    cuda/_cuda/ccuda.cpp:65133:57: error: expected primary-expression before 'unsigned'
    65133 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count, __pyx_v_flags);
          |                                                         ^~~~~~~~
    cuda/_cuda/ccuda.cpp:65133:71: error: expected primary-expression before 'unsigned'
    65133 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count, __pyx_v_flags);
          |                                                                       ^~~~~~~~
    cuda/_cuda/ccuda.cpp:65133:85: error: expected ')' before '__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject'
    65133 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count, __pyx_v_flags);
          |                   ~                                                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          |                                                                                     )
    cuda/_cuda/ccuda.cpp: At global scope:
    cuda/_cuda/ccuda.cpp:65199:94: error: 'CUuserObject' has not been declared
    65199 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuGraphReleaseUserObject(CUgraph __pyx_v_graph, CUuserObject __pyx_v_object, unsigned int __pyx_v_count) {
          |                                                                                              ^~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp: In function 'CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuGraphReleaseUserObject(CUgraph, int, unsigned int)':
    cuda/_cuda/ccuda.cpp:65304:30: error: expected primary-expression before '(' token
    65304 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphReleaseUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count);
          |                              ^
    cuda/_cuda/ccuda.cpp:65304:32: error: expected primary-expression before ')' token
    65304 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphReleaseUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count);
          |                                ^
    cuda/_cuda/ccuda.cpp:65304:41: error: expected primary-expression before ',' token
    65304 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphReleaseUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count);
          |                                         ^
    cuda/_cuda/ccuda.cpp:65304:43: error: 'CUuserObject' was not declared in this scope; did you mean 'CUsurfObject'?
    65304 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphReleaseUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count);
          |                                           ^~~~~~~~~~~~
          |                                           CUsurfObject
    cuda/_cuda/ccuda.cpp:65304:57: error: expected primary-expression before 'unsigned'
    65304 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphReleaseUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count);
          |                                                         ^~~~~~~~
    cuda/_cuda/ccuda.cpp:65304:71: error: expected ')' before '__pyx_v_4cuda_5_cuda_5ccuda___cuGraphReleaseUserObject'
    65304 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphReleaseUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count);
          |                   ~                                                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          |                                                                       )
    cuda/_cuda/ccuda.cpp: At global scope:
    cuda/_cuda/ccuda.cpp:74604:69: error: 'CUmoduleLoadingMode' was not declared in this scope
    74604 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuModuleGetLoadingMode(CUmoduleLoadingMode *__pyx_v_mode) {
          |                                                                     ^~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:74604:90: error: '__pyx_v_mode' was not declared in this scope; did you mean '__pyx_k_name'?
    74604 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuModuleGetLoadingMode(CUmoduleLoadingMode *__pyx_v_mode) {
          |                                                                                          ^~~~~~~~~~~~
          |                                                                                          __pyx_k_name
    cuda/_cuda/ccuda.cpp:74775:145: error: 'CUmemRangeHandleType' has not been declared
    74775 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuMemGetHandleForAddressRange(void *__pyx_v_handle, CUdeviceptr __pyx_v_dptr, size_t __pyx_v_size, CUmemRangeHandleType __pyx_v_handleType, unsigned PY_LONG_LONG __pyx_v_flags) {
          |                                                                                                                                                 ^~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp: In function 'CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuMemGetHandleForAddressRange(void*, CUdeviceptr, size_t, int, long long unsigned int)':
    cuda/_cuda/ccuda.cpp:74880:30: error: expected primary-expression before '(' token
    74880 |     __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
          |                              ^
    cuda/_cuda/ccuda.cpp:74880:32: error: expected primary-expression before ')' token
    74880 |     __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
          |                                ^
    cuda/_cuda/ccuda.cpp:74880:34: error: expected primary-expression before 'void'
    74880 |     __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
          |                                  ^~~~
    cuda/_cuda/ccuda.cpp:74880:53: error: expected primary-expression before ',' token
    74880 |     __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
          |                                                     ^
    cuda/_cuda/ccuda.cpp:74880:61: error: expected primary-expression before ',' token
    74880 |     __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
          |                                                             ^
    cuda/_cuda/ccuda.cpp:74880:63: error: 'CUmemRangeHandleType' was not declared in this scope; did you mean 'CUmemHandleType'?
    74880 |     __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
          |                                                               ^~~~~~~~~~~~~~~~~~~~
          |                                                               CUmemHandleType
    cuda/_cuda/ccuda.cpp:74880:85: error: expected primary-expression before 'unsigned'
    74880 |     __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
          |                                                                                     ^~~~~~~~
    cuda/_cuda/ccuda.cpp:74880:108: error: expected ')' before '__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange'
    74880 |     __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
          |                   ~                                                                                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          |                                                                                                            )
    error: command '/home/ec2-user/anaconda3/envs/tensorflow2_p38/bin/x86_64-conda-linux-gnu-cc' failed with exit status 1
    
    opened by pranavladkat 9
  • nvrtc.nvrtcCompileProgram is changing the preferred encoding from UTF-8 to ANSI_X3.4-1968

    nvrtc.nvrtcCompileProgram is changing the preferred encoding from UTF-8 to ANSI_X3.4-1968

    Dear developers,

    I found out that calling the NVRTC for compilation is changing the preferred encoding for the current Python instance.

    For more details and to reproduce the issue, please refer to this StackOverflow question.

    Do you have an idea on why this happens, and how it is possible to revert the preferred encoding to its original setting?

    Thank you in advance

    opened by redsnic 5
  • Failed to dlopen libcuda.so in WSL environment

    Failed to dlopen libcuda.so in WSL environment

    from cuda import cuda
    cuda.cuInit(0)
    
    ---------------------------------------------------------------------------
    RuntimeError                              Traceback (most recent call last)
    Input In [2], in <module>
    ----> 1 cuda.cuInit(0)
    
    File ~/miniconda3/envs/dev/lib/python3.8/site-packages/cuda/cuda.pyx:8876, in cuda.cuda.cuInit()
    
    File ~/miniconda3/envs/dev/lib/python3.8/site-packages/cuda/ccuda.pyx:17, in cuda.ccuda.cuInit()
    
    File ~/miniconda3/envs/dev/lib/python3.8/site-packages/cuda/_cuda/ccuda.pyx:3553, in cuda._cuda.ccuda._cuInit()
    
    File ~/miniconda3/envs/dev/lib/python3.8/site-packages/cuda/_cuda/ccuda.pyx:424, in cuda._cuda.ccuda.cuPythonInit()
    
    RuntimeError: Failed to dlopen libcuda.so
    

    This is because in a WSL environment libcuda.so lives in /usr/lib/wsl/lib which is not in the default search path of dlopen. For libraries that link against libcuda this isn't a problem because there's a file at /etc/ld.so.conf.d/ld.wsl.conf which instructs the linker as to where it can find the libraries, but unfortunately dlopen doesn't use this.

    As a workaround, adding /usr/lib/wsl/lib to the LD_LIBRARY_PATH environment variable resolves the problem.

    opened by kkraus14 4
  • No module named 'examples'

    No module named 'examples'

    I change directories to try to run some examples.

    (cython) [email protected]:~/Documents/cuda-start-dec2022/cuda-python/examples/0_Introduction$ python clock_nvrtc_test.py
    Traceback (most recent call last):
      File "/home/nyck33/Documents/cuda-start-dec2022/cuda-python/examples/0_Introduction/clock_nvrtc_test.py", line 10, in <module>
        from examples.common import common
    ModuleNotFoundError: No module named 'examples'
    

    What am I doing wrong? I am looking at pypi package called absolufy-imports to try to get this going.

    opened by nyck33 3
  • Adopting a set of

    Adopting a set of "supported" python versions

    Right now the project doesn't have any set of explicitly supported python versions. NEP 29 provides an example of how this can be done:

    All minor versions of Python released 42 months prior to the project, and at minimum the two latest minor versions.

    Minimum Python ... version support should be adjusted upward on [a] major and minor release, but never on a patch release.

    This language also allows forecasting of python versions and forecasting (of some degree) of the resources required to maintain the project due to PEP 602 which normalizes the release schedule of python versions.

    There are at least two areas this practically impacts:

    • Support for version specific issues. Having a specified set of support versions allows some version specific issues to be termed in or out of scope, and be prioritized appropriately.
    • Binary distributions are currently made available on pypi and the nvidia channel of conda-forge, this bounds for which versions of python the binaries are targeted.
    opened by m3vaz 3
  • cudart.cudaSetDevice allocates memory on GPU other than target

    cudart.cudaSetDevice allocates memory on GPU other than target

    cuda-python 11.6.1 cuda toolkit 11.2 Ubuntu Linux

    If you run something like the following on a multi-GPU machine

    device_num = 5
    err, = cuda.cuInit(0)
    err, device = cuda.cuDeviceGet(device_num)
    err, cuda_context = cuda.cuCtxCreate(0, device)
    err, = cudart.cudaSetDevice(device)
    

    The call to cudart.cudaSetDevice will properly set your device to '5', but it will also allocate ~305 MB of memory on device 0 (or whichever is the 0th device in the device list provided by CUDA_VISIBLE_DEVICES). I think this issue (possibly in the C-CUDA runtime underneath?) may possibly be the root of many downstream issues in libraries like Tensorflow and Pytorch who have similar issues where a user selects a device but still gets tons of allocations on other devices. This 305 MB may not sound like a lot, but I'm running a program on an Nvidia-DGX with 16 GPUs and I have 64 worker processes, causing 64*305 = 19GB of unusable space to be allocated on GPU 0, which crashes the program. I cannot simply set CUDA_VISIBLE_DEVICES to correct this problem because the workers are communicating via shared GPU memory (via cuIPCMemHandle) with their parent process, and the parent process needs access to all GPUs. Additionally, the worker processes are performing data augmentation on one GPU, while writing output to another GPU with a different device ID.

    I am trying to investigate a workaround to not call 'cudart.cudaSetDevice' at all, but when it is not called I cannot properly use the pointer given by cuda.cuMemAlloc to create a PyTorch tensor. When I call cudart.cudaSetDevice, I am able to use the pointer properly.

    opened by QuiteAFoxtrot 3
  • Use python exceptions instead of `err, ... =`

    Use python exceptions instead of `err, ... =`

    Congratulations on the GA release! 🄳

    I've been looking forward to the cuda bindings for a while, and was just looking through the docs.

    The overview notes an implementation of ASSERT_DRV, which already contains the caveat:

    In a future release, this may automatically raise exceptions using a Python object model.

    I'm not sure if that means that the errors are going to be subclasses of something like a CUDAError, or if that is to be interpreted some other way, but in any case, I was quite surprised about this choice of exception API

    Why not make the functions raise err by default? Right now, IIUC, every invocation would need to accept an extra err-return (and handle it with something like ASSERT_DRV). This seems like a really onerous task to achieve the default behaviour of "fail in case of something unexpected" (and actively choosing where to introduce try... except: handling to continue even if things fail).

    It seems like a bad trade-off for me (high verbosity, and easy to forget adding an ASSERT_DRV), but maybe I'm overlooking something?

    The reasons I'm raising this right now, is that this would be a pretty fundamental API change, and if there's any chance at all (assuming it's not already "zero" after GA), it would be ASAP.

    opened by h-vetinari 3
  • ERROR  Could not find a version that satisfies the requirement torch==1.12.1+cu113 (from versions: none)

    ERROR Could not find a version that satisfies the requirement torch==1.12.1+cu113 (from versions: none)

    venv "C:\stable-diffusion-webui-master\stable-diffusion-webui-master\venv\Scripts\Python.exe" Python 3.11.0 (main, Oct 24 2022, 18:26:48) [MSC v.1933 64 bit (AMD64)] Commit hash: Installing torch and torchvision Traceback (most recent call last): File "C:\stable-diffusion-webui-master\stable-diffusion-webui-master\launch.py", line 227, in prepare_enviroment() File "C:\stable-diffusion-webui-master\stable-diffusion-webui-master\launch.py", line 150, in prepare_enviroment run(f'"{python}" -m {torch_command}', "Installing torch and torchvision", "Couldn't install torch") File "C:\stable-diffusion-webui-master\stable-diffusion-webui-master\launch.py", line 33, in run raise RuntimeError(message) RuntimeError: Couldn't install torch. Command: "C:\stable-diffusion-webui-master\stable-diffusion-webui-master\venv\Scripts\python.exe" -m pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113 Error code: 1 stdout: Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu113

    stderr: ERROR: Could not find a version that satisfies the requirement torch==1.12.1+cu113 (from versions: none) ERROR: No matching distribution found for torch==1.12.1+cu113

    Press any key to continue . . .

    opened by GreatHK 2
  • First base of 'CUkernelNodeAttrValue_v1' is not an extension type

    First base of 'CUkernelNodeAttrValue_v1' is not an extension type

    I'm trying to compile cuda-python in a fairly minimal conda environment (nothing installed but the requirements), with cuda-11.6 installed, and seeing several instances of the following sort of error:

          Error compiling Cython file:
          ------------------------------------------------------------
          ...
                  Get memory address of class instance
    
              """
              pass
    
          cdef class CUkernelNodeAttrValue_v1(CUlaunchAttributeValue_union):
                                             ^
          ------------------------------------------------------------
    
          cuda/cuda.pxd:2637:36: First base of 'CUkernelNodeAttrValue_v1' is not an extension type
    

    I do have other cuda versions installed alongside 11.6 but judging from the output of Parsing headers in "/usr/local/cuda-11.6/include" it seems like it's probably finding the right version? Any advice on how to get past this, or debug it? Thanks!

    opened by bertmaher 2
  • Remove duplicate code in vectorAddMMAP example

    Remove duplicate code in vectorAddMMAP example

    The code to determine granularity is duplicated, immediately after perforing that check, the same code exists. The second entry is being eliminated by this change.

    opened by pentschev 2
  • _ZSt28__throw_bad_array_new_lengthv

    _ZSt28__throw_bad_array_new_lengthv

    ~/cuda-python$ pip install -e .
    Obtaining file:///home/vinuj/cuda-python
    Requirement already satisfied: cython in /home/vinuj/anaconda3/lib/python3.9/site-packages (from cuda-python==11.7.1) (0.29.28)
    Installing collected packages: cuda-python
      Attempting uninstall: cuda-python
        Found existing installation: cuda-python 11.7.1
        Uninstalling cuda-python-11.7.1:
          Successfully uninstalled cuda-python-11.7.1
      Running setup.py develop for cuda-python
    
        from cuda import cuda, cudart
    ImportError: /home/vinuj/cuda-python/cuda/cuda.cpython-39-x86_64-linux-gnu.so: undefined symbol: _ZSt28__throw_bad_array_new_lengthv
    
    
    opened by vinutah 2
  • Dropping Python 3.7

    Dropping Python 3.7

    We're considering dropping support for Python 3.7 for the next release. Per NEP 29, Python 3.7 drop schedule was almost a year ago and many associated libraries have already dropped it.

    Let us know if there's concerns in having Python 3.7 dropped next release. Thanks!

    opened by vzhurba01 0
  • No module named 'cuda._lib'; 'cuda' is not a package

    No module named 'cuda._lib'; 'cuda' is not a package

    After following the steps on cuda-python to install cuda-python with conda instruction, I try to

    from cuda import cuda, nvrtc
    

    as in the example in the pycharm python console, but it raises an error:

    Traceback (most recent call last):
      File "D:\Anaconda\envs\hierot\lib\code.py", line 90, in runcode
        exec(code, self.locals)
      File "<input>", line 1, in <module>
      File "D:\PyCharm Community Edition 2022.1.3\plugins\python-ce\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
        module = self._system_import(name, *args, **kwargs)
      File "cuda\cuda.pyx", line 1, in init cuda.cuda
        # Copyright 2021-2022 NVIDIA Corporation.  All rights reserved.
      File "D:\PyCharm Community Edition 2022.1.3\plugins\python-ce\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
        module = self._system_import(name, *args, **kwargs)
    ModuleNotFoundError: No module named 'cuda._lib'; 'cuda' is not a package
    

    But the code above can be successfully run in the terminal

    (hierot) D:\Projects\SimPlatform>python
    Python 3.9.13 (main, Aug 25 2022, 23:51:50) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from cuda import cuda, nvrtc
    >>>
    

    Please help me with the problem, thanks in advance. Further information provided on request.

    I searched with

    ModuleNotFoundError: No module named 'xxx'
    

    Solutions suggest configure correct python interpreter, but I believe my interpreter is already properly configured.

    And search with

    No module named 'xxx'; 'yyy' is not a package
    

    Some says the cause is the name cuda is shadowed by the package name cuda, I think it might be the problem. Please check this.

    opened by HIEROT 0
  • Windows: ModuleNotFoundError: No module named 'win32api'

    Windows: ModuleNotFoundError: No module named 'win32api'

    Installing on Windows:

    python -m pip install cuda-python

    Then from python:

    from cuda import cuda

    Fails with

        File "cuda\cuda.pyx", line 1, in init cuda.cuda
    
        File "cuda\ccuda.pyx", line 1, in init cuda.ccuda
    
        File "cuda\_cuda\ccuda.pyx", line 8, in init cuda._cuda.ccuda
    
      ModuleNotFoundError: No module named 'win32api'
    

    I can fix this by installing pypiwin32 manually. But I think it should be listed in requirements.txt if platform_system is Windows.

    Thanks

    opened by ilyasher 0
  • more inference time in cuda env compared to cpu (occured only for a layer)

    more inference time in cuda env compared to cpu (occured only for a layer)

    Dear sir/madam: When I inference on a deep learning model (slowfast model), I'm facing a problem that my python program seems to take more inference time in cuda env compared to cpu. It's not the whole model but one specific layer takes more time on cuda env than cpu. I'm so confused that hope someone can help me with it. Here is the details. the specific layer is "slowway-conv1" layer as showned in the pic below representing the model structure of slowfast. image And my confusing result is as follows. the first for cuda and the second for cpu. image image In cuda env, I found the processing time of "conv1" (0.97s) accounts for a great proportion of the processing time of the whole model (1.04s), while in cpu env, the processing time of "conv1" (0.07s) only accounts for a very small proportion of the processing time of the whole model (4.43s). And I reckon that the proportion in cpu env is reasonable considering the calculation budget. Is my method of time measurement mistaken? I used the following code to measure time cost. image image If it's my fault that causing the confusing result, please kindly point out, or please give me some ideas to help me solve this problem. Thank you very much! Yours, Koala

    opened by koalaaaaaaaaa 0
  • cuda.cudart.cudaRuntimeGetVersion() hard-codes the runtime version, rather than querying the runtime

    cuda.cudart.cudaRuntimeGetVersion() hard-codes the runtime version, rather than querying the runtime

    The current implementation of cuda.cudart.cudaRuntimeGetVersion() hard-codes the runtime version, rather than querying the runtime for its version. This results in incorrect runtime versions if the runtime version is different from the version of cuda-python.

    https://github.com/NVIDIA/cuda-python/blob/746b773c91e1ede708fe9a584b8cdb1c0f32b51d/cuda/_lib/ccudart/ccudart.pyx#L79-L82

    https://github.com/NVIDIA/cuda-python/blob/746b773c91e1ede708fe9a584b8cdb1c0f32b51d/cuda/_lib/ccudart/utils.pyx#L37

    Additional context

    A workaround used in https://github.com/rapidsai/rmm/pull/946 is to use numba's API for this instead:

    import numba.cuda
    
    def cudaRuntimeGetVersion():
        major, minor = numba.cuda.runtime.get_version()
        return major * 1000 + minor * 10
    
    opened by bdice 6
Releases(v12.0.0)
Code to accompany the paper "Finding Bipartite Components in Hypergraphs", which is published in NeurIPS'21.

Finding Bipartite Components in Hypergraphs This repository contains code to accompany the paper "Finding Bipartite Components in Hypergraphs", publis

Peter Macgregor 5 May 06, 2022
Pytorch implementation of Learning Rate Dropout.

Learning-Rate-Dropout Pytorch implementation of Learning Rate Dropout. Paper Link: https://arxiv.org/pdf/1912.00144.pdf Train ResNet-34 for Cifar10: r

42 Nov 25, 2022
Source code for the plant extraction workflow introduced in the paper ā€œAgricultural Plant Cataloging and Establishment of a Data Framework from UAV-based Crop Images by Computer Visionā€

Plant extraction workflow Source code for the plant extraction workflow introduced in the paper "Agricultural Plant Cataloging and Establishment of a

Maurice Günder 0 Apr 22, 2022
Official code for paper Exemplar Based 3D Portrait Stylization.

3D-Portrait-Stylization This is the official code for the paper "Exemplar Based 3D Portrait Stylization". You can check the paper on our project websi

60 Dec 07, 2022
Deep High-Resolution Representation Learning for Human Pose Estimation

Deep High-Resolution Representation Learning for Human Pose Estimation (accepted to CVPR2019) News If you are interested in internship or research pos

HRNet 167 Dec 27, 2022
ViDT: An Efficient and Effective Fully Transformer-based Object Detector

ViDT: An Efficient and Effective Fully Transformer-based Object Detector by Hwanjun Song1, Deqing Sun2, Sanghyuk Chun1, Varun Jampani2, Dongyoon Han1,

NAVER AI 262 Dec 27, 2022
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

ALBERT ***************New March 28, 2020 *************** Add a colab tutorial to run fine-tuning for GLUE datasets. ***************New January 7, 2020

Google Research 3k Jan 01, 2023
Fully Convolutional Networks for Semantic Segmentation by Jonathan Long*, Evan Shelhamer*, and Trevor Darrell. CVPR 2015 and PAMI 2016.

Fully Convolutional Networks for Semantic Segmentation This is the reference implementation of the models and code for the fully convolutional network

Evan Shelhamer 3.2k Jan 08, 2023
wlad 2 Dec 19, 2022
Code for Robust Contrastive Learning against Noisy Views

Robust Contrastive Learning against Noisy Views This repository provides a PyTorch implementation of the Robust InfoNCE loss proposed in paper Robust

Ching-Yao Chuang 53 Jan 08, 2023
Research on Event Accumulator Settings for Event-Based SLAM

Research on Event Accumulator Settings for Event-Based SLAM This is the source code for paper "Research on Event Accumulator Settings for Event-Based

Robin Shaun 26 Dec 21, 2022
TensorFlow Implementation of Unsupervised Cross-Domain Image Generation

Domain Transfer Network (DTN) TensorFlow implementation of Unsupervised Cross-Domain Image Generation. Requirements Python 2.7 TensorFlow 0.12 Pickle

Yunjey Choi 865 Nov 17, 2022
Detect roadway lanes using Python OpenCV for project during the 5th semester at DHBW Stuttgart for lecture in digital image processing.

Find Line Detection (Image Processing) Identifying lanes of the road is very common task that human driver performs. It's important to keep the vehicl

LMF 4 Jun 21, 2022
HomeAssitant custom integration for dyson

HomeAssistant Custom Integration for Dyson This custom integration is still under development. This is a HA custom integration for dyson. There are se

Xiaonan Shen 232 Dec 31, 2022
Official code for "Stereo Waterdrop Removal with Row-wise Dilated Attention (IROS2021)"

Stereo-Waterdrop-Removal-with-Row-wise-Dilated-Attention This repository includes official codes for "Stereo Waterdrop Removal with Row-wise Dilated A

29 Oct 01, 2022
PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR)

Ilya Kostrikov 3k Dec 31, 2022
My solution for the 7th place / 245 in the Umoja Hack 2022 challenge

Umoja Hack 2022 : Insurance Claim Challenge My solution for the 7th place / 245 in the Umoja Hack 2022 challenge Umoja Hack Africa is a yearly hackath

Souames Annis 17 Jun 03, 2022
Unofficial implement with paper SpeakerGAN: Speaker identification with conditional generative adversarial network

Introduction This repository is about paper SpeakerGAN , and is unofficially implemented by Mingming Huang ( 7 Jan 03, 2023

Code accompanying "Dynamic Neural Relational Inference" from CVPR 2020

Code accompanying "Dynamic Neural Relational Inference" This codebase accompanies the paper "Dynamic Neural Relational Inference" from CVPR 2020. This

Colin Graber 48 Dec 23, 2022
House3D: A Rich and Realistic 3D Environment

House3D: A Rich and Realistic 3D Environment Yi Wu, Yuxin Wu, Georgia Gkioxari and Yuandong Tian House3D is a virtual 3D environment which consists of

Meta Research 1.1k Dec 14, 2022