ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

Overview

ONNX Runtime is a cross-platform inference and training machine-learning accelerator.

ONNX Runtime inference can enable faster customer experiences and lower costs, supporting models from deep learning frameworks such as PyTorch and TensorFlow/Keras as well as classical machine learning libraries such as scikit-learn, LightGBM, XGBoost, etc. ONNX Runtime is compatible with different hardware, drivers, and operating systems, and provides optimal performance by leveraging hardware accelerators where applicable alongside graph optimizations and transforms. Learn more →

ONNX Runtime training can accelerate the model training time on multi-node NVIDIA GPUs for transformer models with a one-line addition for existing PyTorch training scripts. Learn more →

Get Started

General Information: onnxruntime.ai

Usage documention and tutorials: onnxruntime.ai/docs

Companion sample repositories:

Build Pipeline Status

System CPU GPU EPs
Windows Build Status Build Status Build Status
Linux Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Mac Build Status
Build Status
Android Build Status
iOS Build Status
WebAssembly Build Status

Data/Telemetry

Windows distributions of this project may collect usage data and send it to Microsoft to help improve our products and services. See the privacy statement for more details.

Contributions and Feedback

We welcome contributions! Please see the contribution guidelines.

For feature requests or bug reports, please file a GitHub Issue.

For general discussion or questions, please use GitHub Discussions.

Code of Conduct

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

License

This project is licensed under the MIT License.

Comments
  • Openvino ep 2021.4 v3.3

    Openvino ep 2021.4 v3.3

    Changes enabled in OpenVINO EP for IO Buffer Optimization Enable Auto Plugin Feature

    Motivation and Context

    • Change was required to enable IO Buffer Optimization
    • Change was required to enable AutoPlugin, fix Multi, Hetero Flow
    • Change is ONNXRuntime API to get the Device Location for For ORT Value Tensor
    • If it fixes an open issue, please link to the issue here.
    opened by sfatimar 79
  • Java API for onnxruntime

    Java API for onnxruntime

    Description: This pull request provides a Java 8 API using JNI. It has unit tests ported from the v0.5.0 release of the C# API, I'll work on porting the new tests from the master branch over the next few weeks. I assume there will be some design & naming discussion on this PR so we can have that while I work on the unit tests.

    Currently it builds using a separate gradle project which I've tested on Mac & Linux. The build process involves running gradle clean build -x test; gradle build as the combination of a JNI and Java project in Gradle 5 isn't properly supported. I could do with some help integrating it into the CMake build system, but I've not used CMake much before. Integrating it into CMake will make it simpler to put in the appropriate provider compilation flags and fix the oddities in the build (as CMake has all the information necessary).

    opened by Craigacp 75
  • Support CUDA Graph

    Support CUDA Graph

    Description

    This PR wants to support the feature of CUDA Graph. This feature can significantly reduce the CPU overhead of calling CUDA APIs by submitting the entire graph to the GPU with a single call to cudaGraphLaunch.

    Motivation and Context

    • Why is this change required? What problem does it solve? This feature is pretty helpful to reduce the model latency, especially for the online inference, when the above CPU overhead is a bottleneck. For example, it can reduce the 95% latency of the transformer-based online inference model (with 148 millions of parameters) from 4.3ms to 2.1ms.
    opened by feihugis 72
  • Resolve Optim Params Issues

    Resolve Optim Params Issues

    • Includes a test of Optimizer Parameter Groups for the ONNX BERT Model (3 variations)
    • Resolves the issue of not passing default hyperparameters for parameters not in a group
    • Resolves the issue of sending 'lambda_coef' instead of 'lambda' to the backend
    • Resolves the issue of sending lr to the backend as a hyperparameter
    opened by rayankrish 68
  • Upgrade GIST memory compression nodes, kernels, optimizer rule, and cli

    Upgrade GIST memory compression nodes, kernels, optimizer rule, and cli

    Description: Extend Gist memory compression to support additional compression formats, support of new priority execution order, and other upgrades:

    • New Feature: GistPack1 compression. It compresses from float32/bool to 1 bit. It is used for lossless compression for dropout and relu nodes.
    • New Feature: GistPack8 compression. It compresses from 32 bits/16 bits to 8 bits. It is used for lossy compression for any operator.
    • New Feature: GistPackMsfp15 compression. It compresses 8 (or tile size) values each 32 bits wide to 8 (or tile size) values each 7 bits wide (sign and mantissa) and a single 8 bits shared exponent. It is used for lossy compression for any operator.
    • New Feature: GistPack16 compression. It compresses from 32 bits to 16 bits. It is used for lossy compression for any operator.
    • We also upgraded Gist rule to support different operators. We created a generic Gist rule as long as we provide a Pattern map. The pattern map has key as the target operator and value as the destination operator (e.g. PATTER_MAP[Sofmax] = {“SoftmaxGrad”}. Our rule is operator-agnostic, and makes Gist robust to support new operators in the future.
    • New test for Priority execution order for nested compression.
    • Gist upgrade to support priority execution order to trigger encoder (compression) and decoder (decompression) accordingly.
    • Gist CLI: --use_gist, --op <which operator is being targeted, e.g. Softmax is op 1> --gist_compr <GistPack1|GistPack8|GistPack16|GistPackMsfp15>

    Motivation and Context

    • Why is this change required? What problem does it solve? It fixes and improves Gist optimizer rule by changing Gist operators to handle 1 input and 1 output without the need of early encoder input or late decoder output. It also adds new compression format (Pack1, Pack8).
    training 
    opened by fninaparavecino 61
  • Multi-stream executor

    Multi-stream executor

    Description: This PR including following works:

    1. provide stream and related synchronization abstractions in onnxruntime.
    2. enhance onnxruntime's execution planner / executor / memory arena to support execute multiple streams in parallel.
    3. deprecate the parallel executor for cpu.
    4. deprecate the Fence mechanism.
    5. update the cuda / tensorrt EP to support the stream mechanism, support running different request in different cuda stream.

    Motivation and Context

    • Why is this change required? currently, the execution plan is just a linear list of those primitives, ort will execute them step by step. For any given graph, ORT will serialize it to a fixed execution order. This sequential execution design simplifies most scenarios, but it has the following limitations:
    1. it is difficult to enable inter-node parallelization, we have a half-baked parallel executor but it is very difficult to make it work with GPU.
    2. The fence mechanism can work with single gpu stream + cpu thread case, but when extend to multiple stream, it is difficult to manage the cross GPU stream synchronizations.
    3. our cuda EP rely on the BFCArena to make the memory management work with the GPU async kernels, but current BFCArena is not aware of the streams, so it doesn't behavior correctly when run with multiple streams.

    This PR enhance our existing execution plan and executor to support multiple stream execution. we use an unified algorithm to mange both single stream and multiple stream scenarios. This PR mainly focus on the infrastructure support for multiple stream execution, that is said, given a valid stream assignment, onnxruntime can execute it correctly. How to generate a good stream assignment for a given model will be in the future PR.

    opened by souptc 60
  • Amdmigraphx fix build error

    Amdmigraphx fix build error

    Description: Describe your changes. For build error related to EP API changes

    Motivation and Context

    1. ORT EP is changed to use shared lib, and APIs for EP is changed, AMD migraphx needs corresponding changes to work as an EP.
    2. Added a few operators that AMDMIGraphX implemented recently.
    • Why is this change required? What problem does it solve? See above explanation

    • If it fixes an open issue, please link to the issue here. No

    opened by scxiao 60
  • Python MacOS arm64 release binaries

    Python MacOS arm64 release binaries

    Describe the bug

    ONNX Runtime does not install using pip on M1.

    System information

    • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): macOS 11.2.1
    • ONNX Runtime installed from (source or binary): pip
    • Python version: 3.9.1

    To Reproduce

    ~: uname -v
    Darwin Kernel Version 20.3.0: Thu Jan 21 00:06:51 PST 2021; root:xnu-7195.81.3~1/RELEASE_ARM64_T8101
    ~: which python3
    /opt/homebrew/bin/python3
    ~: which pip
    /opt/homebrew/bin/pip
    ~: python3 --version
    Python 3.9.1
    ~: pip install onnxruntime
    ERROR: Could not find a version that satisfies the requirement onnxruntime
    ERROR: No matching distribution found for onnxruntime
    
    feature request 
    opened by lutzroeder 59
  • Bump numpy from 1.21.0 to 1.22.0 in /tools/ci_build/github/linux/docker/scripts/training/ortmodule/stage1/requirements_torch1.11.0_rocm4.3.1

    Bump numpy from 1.21.0 to 1.22.0 in /tools/ci_build/github/linux/docker/scripts/training/ortmodule/stage1/requirements_torch1.11.0_rocm4.3.1

    Bumps numpy from 1.21.0 to 1.22.0.

    Release notes

    Sourced from numpy's releases.

    v1.22.0

    NumPy 1.22.0 Release Notes

    NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

    • Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.
    • A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.
    • NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.
    • New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.
    • A new configurable allocator for use by downstream projects.

    These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

    The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

    Expired deprecations

    Deprecated numeric style dtype strings have been removed

    Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

    (gh-19539)

    Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

    numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

    (gh-19615)

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    api 
    opened by dependabot[bot] 55
  • [Java] Adds support for DNNL, OpenVINO, TensorRT shared providers and refactors the CUDA shared provider loader

    [Java] Adds support for DNNL, OpenVINO, TensorRT shared providers and refactors the CUDA shared provider loader

    Description:

    Refactors the native library loading in Java to allow CUDA to be loaded on demand, fixing #7044. Then expands the shared provider library loading to DNNL, OpenVINO, TensorRT, fixing #6553.

    Added a flag to the native library loading to allow users to supply a directory which contains all the native libraries, fixing #8003. This is also the only way to make the shared library providers load from a different place than the jar, as the individual library path specification conflicts with the way that the ONNX Runtime native code loads the shared library providers.

    I also slightly refactored the Java cmake bits, and added the --console=plain flag to the gradle executions to stop gradle writing over cmake's output.

    Motivation and Context

    • Why is this change required? What problem does it solve? Re-enables DNNL, OpenVINO and TensorRT in Java by allowing them to be packaged in the jar and dynamically loaded in the same way CUDA is.
    • If it fixes an open issue, please link to the issue here. Fixes #6553. Fixes #7044. Fixes #8003.
    opened by Craigacp 54
  • Jetson Xavier - building from source

    Jetson Xavier - building from source

    1. I tried the solution proposed here: `../build.sh --config Release --update --build --build_wheel --use_tensorrt --cuda_home /usr/local/cuda --cudnn_home /usr/lib/aarch64-linux-gnu --tensorrt_home /usr/lib/aarch64-linux-gnu 2020-02-14 14:34:50,960 Build [INFO] - Build started 2020-02-14 14:34:50,960 Build [DEBUG] - Running subprocess in '/code/onnxruntime' ['git', 'submodule', 'sync', '--recursive'] Synchronizing submodule url for 'cmake/external/DNNLibrary' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/flatbuffers' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/glog' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/onnx' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/onnx/third_party/benchmark' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/onnx/third_party/pybind11' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/onnx/third_party/pybind11/tools/clang' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/protobuf' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/protobuf/third_party/benchmark' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/protobuf/third_party/googletest' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/pybind11' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/pybind11/tools/clang' Synchronizing submodule url for 'cmake/external/cub' Synchronizing submodule url for 'cmake/external/date' Synchronizing submodule url for 'cmake/external/eigen' Synchronizing submodule url for 'cmake/external/gemmlowp' Synchronizing submodule url for 'cmake/external/googletest' Synchronizing submodule url for 'cmake/external/grpc' Synchronizing submodule url for 'cmake/external/grpc/third_party/abseil-cpp' Synchronizing submodule url for 'cmake/external/grpc/third_party/benchmark' Synchronizing submodule url for 'cmake/external/grpc/third_party/bloaty' Synchronizing submodule url for 'cmake/external/grpc/third_party/bloaty/third_party/googletest' Synchronizing submodule url for 'cmake/external/grpc/third_party/bloaty/third_party/libFuzzer' Synchronizing submodule url for 'cmake/external/grpc/third_party/bloaty/third_party/re2' Synchronizing submodule url for 'cmake/external/grpc/third_party/boringssl' Synchronizing submodule url for 'cmake/external/grpc/third_party/boringssl-with-bazel' Synchronizing submodule url for 'cmake/external/grpc/third_party/cares/cares' Synchronizing submodule url for 'cmake/external/grpc/third_party/data-plane-api' Synchronizing submodule url for 'cmake/external/grpc/third_party/gflags' Synchronizing submodule url for 'cmake/external/grpc/third_party/gflags/doc' Synchronizing submodule url for 'cmake/external/grpc/third_party/googleapis' Synchronizing submodule url for 'cmake/external/grpc/third_party/googletest' Synchronizing submodule url for 'cmake/external/grpc/third_party/libcxx' Synchronizing submodule url for 'cmake/external/grpc/third_party/libcxxabi' Synchronizing submodule url for 'cmake/external/grpc/third_party/protobuf' Synchronizing submodule url for 'cmake/external/grpc/third_party/protobuf/third_party/benchmark' Synchronizing submodule url for 'cmake/external/grpc/third_party/protobuf/third_party/googletest' Synchronizing submodule url for 'cmake/external/grpc/third_party/protoc-gen-validate' Synchronizing submodule url for 'cmake/external/grpc/third_party/upb' Synchronizing submodule url for 'cmake/external/grpc/third_party/upb/third_party/protobuf' Synchronizing submodule url for 'cmake/external/grpc/third_party/upb/third_party/protobuf/third_party/benchmark' Synchronizing submodule url for 'cmake/external/grpc/third_party/upb/third_party/protobuf/third_party/googletest' Synchronizing submodule url for 'cmake/external/grpc/third_party/zlib' Synchronizing submodule url for 'cmake/external/mimalloc' Synchronizing submodule url for 'cmake/external/nsync' Synchronizing submodule url for 'cmake/external/onnx' Synchronizing submodule url for 'cmake/external/onnx/third_party/benchmark' Synchronizing submodule url for 'cmake/external/onnx/third_party/pybind11' Synchronizing submodule url for 'cmake/external/onnx/third_party/pybind11/tools/clang' Synchronizing submodule url for 'cmake/external/onnx-tensorrt' Synchronizing submodule url for 'cmake/external/onnx-tensorrt/third_party/onnx' Synchronizing submodule url for 'cmake/external/onnx-tensorrt/third_party/onnx/third_party/benchmark' Synchronizing submodule url for 'cmake/external/onnx-tensorrt/third_party/onnx/third_party/pybind11' Synchronizing submodule url for 'cmake/external/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' Synchronizing submodule url for 'cmake/external/protobuf' Synchronizing submodule url for 'cmake/external/protobuf/third_party/benchmark' Synchronizing submodule url for 'cmake/external/protobuf/third_party/googletest' Synchronizing submodule url for 'cmake/external/re2' Synchronizing submodule url for 'cmake/external/spdlog' Synchronizing submodule url for 'cmake/external/tvm' Synchronizing submodule url for 'cmake/external/tvm/3rdparty/HalideIR' Synchronizing submodule url for 'cmake/external/tvm/3rdparty/dlpack' Synchronizing submodule url for 'cmake/external/tvm/3rdparty/dmlc-core' Synchronizing submodule url for 'cmake/external/tvm/3rdparty/rang' Synchronizing submodule url for 'cmake/external/wil' 2020-02-14 14:34:52,305 Build [DEBUG] - Running subprocess in '/code/onnxruntime' ['git', 'submodule', 'update', '--init', '--recursive'] 2020-02-14 14:34:54,502 Build [INFO] - Generating CMake build tree 2020-02-14 14:34:54,504 Build [DEBUG] - Running subprocess in '/code/onnxruntime/build/Linux/Release' ['/usr/local/bin/cmake', '/code/onnxruntime/cmake', '-Donnxruntime_RUN_ONNX_TESTS=OFF', '-Donnxruntime_GENERATE_TEST_REPORTS=ON', '-Donnxruntime_DEV_MODE=OFF', '-DPYTHON_EXECUTABLE=/usr/bin/python3', '-Donnxruntime_USE_CUDA=ON', '-Donnxruntime_USE_NSYNC=OFF', '-Donnxruntime_CUDNN_HOME=/usr/lib/aarch64-linux-gnu', '-Donnxruntime_USE_AUTOML=OFF', '-Donnxruntime_CUDA_HOME=/usr/local/cuda', '-Donnxruntime_USE_JEMALLOC=OFF', '-Donnxruntime_USE_MIMALLOC=OFF', '-Donnxruntime_ENABLE_PYTHON=ON', '-Donnxruntime_BUILD_CSHARP=OFF', '-Donnxruntime_BUILD_SHARED_LIB=OFF', '-Donnxruntime_USE_EIGEN_FOR_BLAS=ON', '-Donnxruntime_USE_OPENBLAS=OFF', '-Donnxruntime_USE_MKLDNN=OFF', '-Donnxruntime_USE_MKLML=OFF', '-Donnxruntime_USE_GEMMLOWP=OFF', '-Donnxruntime_USE_NGRAPH=OFF', '-Donnxruntime_USE_OPENVINO=OFF', '-Donnxruntime_USE_OPENVINO_BINARY=OFF', '-Donnxruntime_USE_OPENVINO_SOURCE=OFF', '-Donnxruntime_USE_OPENVINO_MYRIAD=OFF', '-Donnxruntime_USE_OPENVINO_GPU_FP32=OFF', '-Donnxruntime_USE_OPENVINO_GPU_FP16=OFF', '-Donnxruntime_USE_OPENVINO_CPU_FP32=OFF', '-Donnxruntime_USE_OPENVINO_VAD_M=OFF', '-Donnxruntime_USE_OPENVINO_VAD_F=OFF', '-Donnxruntime_USE_NNAPI=OFF', '-Donnxruntime_USE_OPENMP=ON', '-Donnxruntime_USE_TVM=OFF', '-Donnxruntime_USE_LLVM=OFF', '-Donnxruntime_ENABLE_MICROSOFT_INTERNAL=OFF', '-Donnxruntime_USE_BRAINSLICE=OFF', '-Donnxruntime_USE_NUPHAR=OFF', '-Donnxruntime_USE_EIGEN_THREADPOOL=OFF', '-Donnxruntime_USE_TENSORRT=ON', '-Donnxruntime_TENSORRT_HOME=/usr/lib/aarch64-linux-gnu', '-Donnxruntime_CROSS_COMPILING=OFF', '-Donnxruntime_BUILD_SERVER=OFF', '-Donnxruntime_BUILD_x86=OFF', '-Donnxruntime_USE_FULL_PROTOBUF=ON', '-Donnxruntime_DISABLE_CONTRIB_OPS=OFF', '-Donnxruntime_MSVC_STATIC_RUNTIME=OFF', '-Donnxruntime_ENABLE_LANGUAGE_INTEROP_OPS=OFF', '-Donnxruntime_USE_DML=OFF', '-DCUDA_CUDA_LIBRARY=/usr/local/cuda/lib64/stubs', '-Donnxruntime_PYBIND_EXPORT_OPSCHEMA=OFF', '-DCMAKE_BUILD_TYPE=Release'] Use gtest from submodule -- Found PythonInterp: /usr/bin/python3 (found version "3.6.9") -- Found PythonInterp: /usr/bin/python3 (found suitable version "3.6.9", minimum required is "3.5") Use protobuf from submodule -- The CUDA compiler identification is NVIDIA 10.0.326 -- Check for working CUDA compiler: /usr/local/cuda-10.0/bin/nvcc -- Check for working CUDA compiler: /usr/local/cuda-10.0/bin/nvcc - broken CMake Error at /usr/local/share/cmake-3.17/Modules/CMakeTestCUDACompiler.cmake:46 (message): The CUDA compiler

      "/usr/local/cuda-10.0/bin/nvcc"

    is not able to compile a simple test program.

    It fails with the following output:

    Change Dir: /code/onnxruntime/build/Linux/Release/CMakeFiles/CMakeTmp
    
    Run Build Command(s):/usr/bin/make cmTC_bb43d/fast && /usr/bin/make -f CMakeFiles/cmTC_bb43d.dir/build.make CMakeFiles/cmTC_bb43d.dir/build
    make[1]: Entering directory '/code/onnxruntime/build/Linux/Release/CMakeFiles/CMakeTmp'
    Building CUDA object CMakeFiles/cmTC_bb43d.dir/main.cu.o
    /usr/local/cuda-10.0/bin/nvcc    -cudart shared  -Xcompiler=-fPIE   -x cu -c /code/onnxruntime/build/Linux/Release/CMakeFiles/CMakeTmp/main.cu -o CMakeFiles/cmTC_bb43d.dir/main.cu.o
    Linking CUDA executable cmTC_bb43d
    /usr/local/bin/cmake -E cmake_link_script CMakeFiles/cmTC_bb43d.dir/link.txt --verbose=1
    /usr/bin/g++   CMakeFiles/cmTC_bb43d.dir/main.cu.o -o cmTC_bb43d  -lcudadevrt -lcudart_static  -L"/usr/local/cuda-10.0/targets/aarch64-linux/lib/stubs" -L"/usr/local/cuda-10.0/targets/aarch64-linux/lib" -lcudadevrt -lcudart
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::globalState::initializeDriverEntrypoints()':
    :(.text+0x23488): undefined reference to `dlsym'
    :(.text+0x234b0): undefined reference to `dlsym'
    :(.text+0x234d4): undefined reference to `dlsym'
    :(.text+0x234f8): undefined reference to `dlsym'
    :(.text+0x2351c): undefined reference to `dlsym'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o)::(.text+0x23540): more undefined references to `dlsym' follow
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::globalState::loadDriverInternal()':
    :(.text+0x288cc): undefined reference to `dlopen'
    :(.text+0x28904): undefined reference to `dlclose'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::__loadDriverInternalUtil()':
    :(.text+0x289e0): undefined reference to `dlopen'
    :(.text+0x28a14): undefined reference to `dlclose'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::globalState::initializeDriverInternal()':
    :(.text+0x2b664): undefined reference to `dlclose'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosInit()':
    :(.text+0x5c7bc): undefined reference to `dlerror'
    :(.text+0x5c7c8): undefined reference to `dlopen'
    :(.text+0x5c7dc): undefined reference to `dlsym'
    :(.text+0x5c7e4): undefined reference to `dlerror'
    :(.text+0x5c7f4): undefined reference to `dlclose'
    :(.text+0x5c838): undefined reference to `dlerror'
    :(.text+0x5c844): undefined reference to `dlopen'
    :(.text+0x5c858): undefined reference to `dlsym'
    :(.text+0x5c860): undefined reference to `dlerror'
    :(.text+0x5c870): undefined reference to `dlclose'
    :(.text+0x5c8b4): undefined reference to `dlerror'
    :(.text+0x5c8c0): undefined reference to `dlopen'
    :(.text+0x5c8d4): undefined reference to `dlsym'
    :(.text+0x5c8dc): undefined reference to `dlerror'
    :(.text+0x5c8ec): undefined reference to `dlclose'
    :(.text+0x5c930): undefined reference to `dlerror'
    :(.text+0x5c93c): undefined reference to `dlopen'
    :(.text+0x5c950): undefined reference to `dlsym'
    :(.text+0x5c958): undefined reference to `dlerror'
    :(.text+0x5c968): undefined reference to `dlclose'
    :(.text+0x5c9a0): undefined reference to `dlerror'
    :(.text+0x5c9ac): undefined reference to `dlopen'
    :(.text+0x5c9c0): undefined reference to `dlsym'
    :(.text+0x5c9c8): undefined reference to `dlerror'
    :(.text+0x5c9d8): undefined reference to `dlclose'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosSemaphoreCreate(sem_t*, int)':
    :(.text+0x5d910): undefined reference to `sem_init'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosSemaphoreDestroy(sem_t*)':
    :(.text+0x5d92c): undefined reference to `sem_destroy'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosSemaphoreWait(sem_t*, unsigned int)':
    :(.text+0x5da10): undefined reference to `sem_timedwait'
    :(.text+0x5da48): undefined reference to `sem_wait'
    :(.text+0x5da60): undefined reference to `sem_trywait'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosSemaphoreSignal(sem_t*)':
    :(.text+0x5dab0): undefined reference to `sem_post'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosVirtualReserveInRangeBug1778973WARInit()':
    :(.text+0x5f448): undefined reference to `pthread_mutexattr_init'
    :(.text+0x5f464): undefined reference to `pthread_mutexattr_settype'
    :(.text+0x5f474): undefined reference to `pthread_mutexattr_setpshared'
    :(.text+0x5f484): undefined reference to `pthread_mutexattr_setprotocol'
    :(.text+0x5f4a4): undefined reference to `pthread_mutexattr_destroy'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosPosixInit()':
    :(.text+0x5f4f0): undefined reference to `dlerror'
    :(.text+0x5f4fc): undefined reference to `dlopen'
    :(.text+0x5f510): undefined reference to `dlsym'
    :(.text+0x5f518): undefined reference to `dlerror'
    :(.text+0x5f528): undefined reference to `dlclose'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosVirtualReserveInRange(unsigned long, void*, void*, unsigned long)':
    :(.text+0x5f768): undefined reference to `pthread_once'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosLoadLibrary(char const*)':
    :(.text+0x5fc8c): undefined reference to `dlerror'
    :(.text+0x5fca0): undefined reference to `dlopen'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosLoadLibraryUnsafe(char const*)':
    :(.text+0x5fcb4): undefined reference to `dlerror'
    :(.text+0x5fcc8): undefined reference to `dlopen'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosFreeLibrary(void*)':
    :(.text+0x5fcd4): undefined reference to `dlclose'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosGetProcAddress(void*, char const*)':
    :(.text+0x5fce8): undefined reference to `dlsym'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosTlsAlloc(void (*)(void*))':
    :(.text+0x5fdec): undefined reference to `pthread_key_create'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosTlsFree(unsigned int)':
    :(.text+0x5fe10): undefined reference to `pthread_key_delete'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosTlsGetValue(unsigned int)':
    :(.text+0x5fe18): undefined reference to `pthread_getspecific'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosTlsSetValue(unsigned int, void*)':
    :(.text+0x5fe28): undefined reference to `pthread_setspecific'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosInitializeCriticalSectionWithSharedFlag(pthread_mutex_t*, int)':
    :(.text+0x5fef4): undefined reference to `pthread_mutexattr_init'
    :(.text+0x5ff14): undefined reference to `pthread_mutexattr_settype'
    :(.text+0x5ff24): undefined reference to `pthread_mutexattr_setpshared'
    :(.text+0x5ff34): undefined reference to `pthread_mutexattr_setprotocol'
    :(.text+0x5ff50): undefined reference to `pthread_mutexattr_destroy'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosInitializeCriticalSection(pthread_mutex_t*)':
    :(.text+0x5ff70): undefined reference to `pthread_mutexattr_init'
    :(.text+0x5ff8c): undefined reference to `pthread_mutexattr_settype'
    :(.text+0x5ff9c): undefined reference to `pthread_mutexattr_setpshared'
    :(.text+0x5ffac): undefined reference to `pthread_mutexattr_setprotocol'
    :(.text+0x5ffc8): undefined reference to `pthread_mutexattr_destroy'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosInitializeCriticalSectionShared(pthread_mutex_t*)':
    :(.text+0x5ffe8): undefined reference to `pthread_mutexattr_init'
    :(.text+0x60004): undefined reference to `pthread_mutexattr_settype'
    :(.text+0x60014): undefined reference to `pthread_mutexattr_setpshared'
    :(.text+0x60024): undefined reference to `pthread_mutexattr_setprotocol'
    :(.text+0x60040): undefined reference to `pthread_mutexattr_destroy'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosTryEnterCriticalSection(pthread_mutex_t*)':
    :(.text+0x60058): undefined reference to `pthread_mutex_trylock'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosInitRWLockEx(void**, void*, unsigned long)':
    :(.text+0x600b4): undefined reference to `pthread_rwlockattr_init'
    :(.text+0x600c4): undefined reference to `pthread_rwlockattr_setpshared'
    :(.text+0x600d4): undefined reference to `pthread_rwlock_init'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosInitRWLock(void**)':
    :(.text+0x60114): undefined reference to `pthread_rwlockattr_init'
    :(.text+0x60144): undefined reference to `pthread_rwlockattr_setpshared'
    :(.text+0x60154): undefined reference to `pthread_rwlock_init'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosAcquireReaderLock(void**)':
    :(.text+0x60164): undefined reference to `pthread_rwlock_rdlock'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosAcquireWriterLock(void**)':
    :(.text+0x6016c): undefined reference to `pthread_rwlock_wrlock'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosTryAcquireReaderLock(void**)':
    :(.text+0x6017c): undefined reference to `pthread_rwlock_tryrdlock'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosTryAcquireWriterLock(void**)':
    :(.text+0x601a4): undefined reference to `pthread_rwlock_trywrlock'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosReleaseReaderLock(void**)':
    :(.text+0x601c4): undefined reference to `pthread_rwlock_unlock'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosReleaseWriterLock(void**)':
    :(.text+0x601cc): undefined reference to `pthread_rwlock_unlock'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosDestroyRWLockEx(void**)':
    :(.text+0x601d4): undefined reference to `pthread_rwlock_destroy'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosDestroyRWLock(void**)':
    :(.text+0x601ec): undefined reference to `pthread_rwlock_destroy'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosOnce(int*, void (*)())':
    :(.text+0x60210): undefined reference to `pthread_once'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosCondCreateWithSharedFlag(pthread_cond_t*, int)':
    :(.text+0x60250): undefined reference to `pthread_condattr_setpshared'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosCondCreate(pthread_cond_t*)':
    :(.text+0x602b0): undefined reference to `pthread_condattr_setpshared'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosCondCreateShared(pthread_cond_t*)':
    :(.text+0x60310): undefined reference to `pthread_condattr_setpshared'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosThreadCreateWithName(cudart::CUOSthread_st**, int (*)(void*), void*, char const*)':
    :(.text+0x60564): undefined reference to `pthread_create'
    :(.text+0x60578): undefined reference to `pthread_setname_np'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosThreadCreate(cudart::CUOSthread_st**, int (*)(void*), void*)':
    :(.text+0x60640): undefined reference to `pthread_create'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosThreadJoin(cudart::CUOSthread_st*, int*)':
    :(.text+0x606a8): undefined reference to `pthread_join'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosThreadDetach(cudart::CUOSthread_st*)':
    :(.text+0x60708): undefined reference to `pthread_detach'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosHasThreadExited(cudart::CUOSthread_st*)':
    :(.text+0x60758): undefined reference to `pthread_kill'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosShmCreateNamedEx(void*, char const*, unsigned long, cudart::cuosShmInfoEx_st**)':
    :(.text+0x60ee0): undefined reference to `shm_unlink'
    :(.text+0x60ef8): undefined reference to `shm_open'
    :(.text+0x60f98): undefined reference to `shm_unlink'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosShmOpenNamedEx(void*, char const*, unsigned long, cudart::cuosShmInfoEx_st**)':
    :(.text+0x61124): undefined reference to `shm_open'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosShmCloseEx(cudart::cuosShmInfoEx_st*, unsigned int, unsigned int)':
    :(.text+0x61370): undefined reference to `shm_unlink'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosSetThreadName(cudart::CUOSthread_st*, char const*)':
    :(.text+0x62294): undefined reference to `pthread_setname_np'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `CUOSdlsymLoader<int (*)(int, sockaddr*, unsigned int*, int)>::~CUOSdlsymLoader()':
    :(.text._ZN15CUOSdlsymLoaderIPFiiP8sockaddrPjiEED2Ev[_ZN15CUOSdlsymLoaderIPFiiP8sockaddrPjiEED5Ev]+0x18): undefined reference to `dlclose'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `CUOSdlsymLoader<int (*)(int*, int)>::~CUOSdlsymLoader()':
    :(.text._ZN15CUOSdlsymLoaderIPFiPiiEED2Ev[_ZN15CUOSdlsymLoaderIPFiPiiEED5Ev]+0x18): undefined reference to `dlclose'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `CUOSdlsymLoader<int (*)(unsigned long, unsigned long, unsigned long const*)>::~CUOSdlsymLoader()':
    :(.text._ZN15CUOSdlsymLoaderIPFimmPKmEED2Ev[_ZN15CUOSdlsymLoaderIPFimmPKmEED5Ev]+0x18): undefined reference to `dlclose'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `CUOSdlsymLoader<int (*)(unsigned long, unsigned long, unsigned long*)>::~CUOSdlsymLoader()':
    :(.text._ZN15CUOSdlsymLoaderIPFimmPmEED2Ev[_ZN15CUOSdlsymLoaderIPFimmPmEED5Ev]+0x18): undefined reference to `dlclose'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `CUOSdlsymLoader<int (*)()>::~CUOSdlsymLoader()':
    :(.text._ZN15CUOSdlsymLoaderIPFivEED2Ev[_ZN15CUOSdlsymLoaderIPFivEED5Ev]+0x18): undefined reference to `dlclose'
    collect2: error: ld returned 1 exit status
    CMakeFiles/cmTC_bb43d.dir/build.make:103: recipe for target 'cmTC_bb43d' failed
    make[1]: *** [cmTC_bb43d] Error 1
    make[1]: Leaving directory '/code/onnxruntime/build/Linux/Release/CMakeFiles/CMakeTmp'
    Makefile:138: recipe for target 'cmTC_bb43d/fast' failed
    make: *** [cmTC_bb43d/fast] Error 2
    

    CMake will not be able to correctly generate this project. Call Stack (most recent call first): CMakeLists.txt:715 (enable_language)

    -- Configuring incomplete, errors occurred! See also "/code/onnxruntime/build/Linux/Release/CMakeFiles/CMakeOutput.log". See also "/code/onnxruntime/build/Linux/Release/CMakeFiles/CMakeError.log". Traceback (most recent call last): File "/code/onnxruntime/tools/ci_build/build.py", line 1043, in sys.exit(main()) File "/code/onnxruntime/tools/ci_build/build.py", line 972, in main args, cmake_extra_args) File "/code/onnxruntime/tools/ci_build/build.py", line 422, in generate_build_tree run_subprocess(cmake_args + ["-DCMAKE_BUILD_TYPE={}".format(config)], cwd=config_build_dir) File "/code/onnxruntime/tools/ci_build/build.py", line 196, in run_subprocess return subprocess.run(args, cwd=cwd, check=True, stdout=stdout, stderr=stderr, env=my_env, shell=shell) File "/usr/lib/python3.6/subprocess.py", line 438, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '['/usr/local/bin/cmake', '/code/onnxruntime/cmake', '-Donnxruntime_RUN_ONNX_TESTS=OFF', '-Donnxruntime_GENERATE_TEST_REPORTS=ON', '-Donnxruntime_DEV_MODE=OFF', '-DPYTHON_EXECUTABLE=/usr/bin/python3', '-Donnxruntime_USE_CUDA=ON', '-Donnxruntime_USE_NSYNC=OFF', '-Donnxruntime_CUDNN_HOME=/usr/lib/aarch64-linux-gnu', '-Donnxruntime_USE_AUTOML=OFF', '-Donnxruntime_CUDA_HOME=/usr/local/cuda', '-Donnxruntime_USE_JEMALLOC=OFF', '-Donnxruntime_USE_MIMALLOC=OFF', '-Donnxruntime_ENABLE_PYTHON=ON', '-Donnxruntime_BUILD_CSHARP=OFF', '-Donnxruntime_BUILD_SHARED_LIB=OFF', '-Donnxruntime_USE_EIGEN_FOR_BLAS=ON', '-Donnxruntime_USE_OPENBLAS=OFF', '-Donnxruntime_USE_MKLDNN=OFF', '-Donnxruntime_USE_MKLML=OFF', '-Donnxruntime_USE_GEMMLOWP=OFF', '-Donnxruntime_USE_NGRAPH=OFF', '-Donnxruntime_USE_OPENVINO=OFF', '-Donnxruntime_USE_OPENVINO_BINARY=OFF', '-Donnxruntime_USE_OPENVINO_SOURCE=OFF', '-Donnxruntime_USE_OPENVINO_MYRIAD=OFF', '-Donnxruntime_USE_OPENVINO_GPU_FP32=OFF', '-Donnxruntime_USE_OPENVINO_GPU_FP16=OFF', '-Donnxruntime_USE_OPENVINO_CPU_FP32=OFF', '-Donnxruntime_USE_OPENVINO_VAD_M=OFF', '-Donnxruntime_USE_OPENVINO_VAD_F=OFF', '-Donnxruntime_USE_NNAPI=OFF', '-Donnxruntime_USE_OPENMP=ON', '-Donnxruntime_USE_TVM=OFF', '-Donnxruntime_USE_LLVM=OFF', '-Donnxruntime_ENABLE_MICROSOFT_INTERNAL=OFF', '-Donnxruntime_USE_BRAINSLICE=OFF', '-Donnxruntime_USE_NUPHAR=OFF', '-Donnxruntime_USE_EIGEN_THREADPOOL=OFF', '-Donnxruntime_USE_TENSORRT=ON', '-Donnxruntime_TENSORRT_HOME=/usr/lib/aarch64-linux-gnu', '-Donnxruntime_CROSS_COMPILING=OFF', '-Donnxruntime_BUILD_SERVER=OFF', '-Donnxruntime_BUILD_x86=OFF', '-Donnxruntime_USE_FULL_PROTOBUF=ON', '-Donnxruntime_DISABLE_CONTRIB_OPS=OFF', '-Donnxruntime_MSVC_STATIC_RUNTIME=OFF', '-Donnxruntime_ENABLE_LANGUAGE_INTEROP_OPS=OFF', '-Donnxruntime_USE_DML=OFF', '-DCUDA_CUDA_LIBRARY=/usr/local/cuda/lib64/stubs', '-Donnxruntime_PYBIND_EXPORT_OPSCHEMA=OFF', '-DCMAKE_BUILD_TYPE=Release']' returned non-zero exit status 1. `

    opened by AndreV84 52
  • make WITHCACHE as an option in MacOS workflow

    make WITHCACHE as an option in MacOS workflow

    Description

    1. Set the WithCache default value as false in Mac OS CI workflow too.
    2. Add date of today in cache key to avoid cache size keep increasing too.

    WithCache, the pipeline duration reduced from 70 more minutes to 10 more minutes

    opened by mszhanyi 0
  • please reopen the issue

    please reopen the issue

    Describe the issue

    Could you please reopen this issue? We get the same problem in opset_version=16. issue: https://github.com/microsoft/onnxruntime/issues/2756#issue-543199292.

    Urgency

    No response

    Target platform

    Windows

    Build script

    .

    Error / output

    onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running BatchNormalization node. Name:'BatchNormalization_123' Status Message: D:\a_work\1\s\onnxruntime\core\framework\op_kernel.cc:81 onnxruntime::OpKernelContext::OutputMLValue status.IsOK() was false. Shape mismatch attempting to re-use buffer. {1,3,256,192} != {1,6,256,192}. Validate usage of dim_value (values should be > 0) and dim_param (all values with the same string should equate to the same size) in shapes in the model.

    Visual Studio Version

    No response

    GCC / Compiler Version

    No response

    build platform:windows 
    opened by shu0o0yX 0
  • CUDNN error executing cudnnConvolutionForward

    CUDNN error executing cudnnConvolutionForward

    Describe the issue

    Hi, I'm running the same ONNX model on many different machines in azure (all of the same type, same configuration, docker, etc...) and on some of them I get the following error on the first batch which is being executed:

    <class 'onnxruntime.capi.onnxruntime_pybind11_state.Fail'>
    
    [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Conv node. Name:'efficientnetb4/stem_conv/Conv2D' Status Message: CUDNN error executing cudnnConvolutionForward(s_.handle, &alpha, s_.x_tensor, s_.x_data, s_.w_desc, s_.w_data, s_.conv_desc, s_.algo, workspace.get(), s_.workspace_bytes, &beta, s_.y_tensor, s_.y_data)
    

    It happens only on some of the machines, and only on the first message.

    To reproduce

    onnxruntime-gpu==1.10.0

     ONNX_PROVIDERS = [
         ('CUDAExecutionProvider', {
             'device_id': 0,
             'cudnn_conv_algo_search': 'DEFAULT', 
         }),
     ]
    ONNX_SESSION_OPTIONS = onnxruntime.SessionOptions()
    ONNX_SESSION_OPTIONS.graph_optimization_level = onnxruntime.GraphOptimizationLevel.ORT_ENABLE_ALL
    efficientnet = onnxruntime.InferenceSession(str(fe_net_weights),
                                                            sess_options=ONNX_SESSION_OPTIONS,
                                                            providers=ONNX_PROVIDERS)
    
    feature_extractor.run([output_layer], {"input": input})
    

    Urgency

    No response

    Platform

    Linux

    OS Version

    Ubuntu 20.04

    ONNX Runtime Installation

    Released Package

    ONNX Runtime Version or Commit ID

    onnxruntime-gpu==1.10.0

    ONNX Runtime API

    Python

    Architecture

    X64

    Execution Provider

    CUDA

    Execution Provider Library Version

    cuda 11.3.0, cudnn8

    ep:CUDA 
    opened by kfirgoldwsc 0
  • How to save inference onnx model?

    How to save inference onnx model?

    Describe the issue

    Now I can build my own training session from torch net, but when I save onnx model after training, BatchNormalization is in training mode and can not fuse to conv. What should I do to save inference model ? current format: 1

    expect format: 0

    To reproduce

    2

    Urgency

    No response

    ONNX Runtime Installation

    Built from Source

    ONNX Runtime Version or Commit ID

    1.8.1

    PyTorch Version

    3.7

    Execution Provider

    CUDA

    Execution Provider Library Version

    No response

    training ep:CUDA 
    opened by ArtyZe 0
  • [MIGraphX] update the MIGraphX version used in ORT to rocm-5.4.0

    [MIGraphX] update the MIGraphX version used in ORT to rocm-5.4.0

    Description

    Update the MIGraphX version used in ORT to rocm-5.4.0

    Motivation and Context

    The previous branch migraphx_for_ort has stopped updating, it is too far away from the MIgraphX latest release branch. More discussion here: https://github.com/microsoft/onnxruntime/issues/14126#issuecomment-1373201049

    opened by PeixuanZuo 0
  • Update HistogramCalibrater.collect_data method to reduce memory consumption

    Update HistogramCalibrater.collect_data method to reduce memory consumption

    Description

    Updated HistogramCalibrater.collect_data method.

    Inference results are no longer appended to self.intermediate_outputs list. Instead, self.collector.collect method is called inside a while loop.

    Motivation and Context

    When CalibrationMethod.Entropy or CalibrationMethod.Percentile is specified, HistogramCalibrater class is used.

    In the HistogramCalibrater.collect_data method, all the intermediate outputs are taken in prior to collect histograms using HistogramCollector class. But this two-pass scheme consumes a lot of memory when a network has many intermediate output nodes and there're a lot of data that CalibrationDataReader provides.

    Please be noted that quantized models aren't identical after the changes. I suppose it won't cause harmful results though.

    opened by beru 0
Releases(v1.13.1)
Owner
Microsoft
Open source projects and samples from Microsoft
Microsoft
QuanTaichi evaluation suite

QuanTaichi: A Compiler for Quantized Simulations (SIGGRAPH 2021) Yuanming Hu, Jiafeng Liu, Xuanda Yang, Mingkuan Xu, Ye Kuang, Weiwei Xu, Qiang Dai, W

Taichi Developers 120 Jan 04, 2023
Library for 8-bit optimizers and quantization routines.

bitsandbytes Bitsandbytes is a lightweight wrapper around CUDA custom functions, in particular 8-bit optimizers and quantization functions. Paper -- V

Facebook Research 687 Jan 04, 2023
Barlow Twins and HSIC

Barlow Twins and HSIC Unofficial Pytorch implementation for Barlow Twins and HSIC_SSL on small datasets (CIFAR10, STL10, and Tiny ImageNet). Correspon

Yao-Hung Hubert Tsai 49 Nov 24, 2022
This is my codes that can visualize the psnr image in testing videos.

CVPR2018-Baseline-PSNRplot This is my codes that can visualize the psnr image in testing videos. Future Frame Prediction for Anomaly Detection – A New

Wenhao Yang 12 May 29, 2021
Training neural models with structured signals.

Neural Structured Learning in TensorFlow Neural Structured Learning (NSL) is a new learning paradigm to train neural networks by leveraging structured

955 Jan 02, 2023
This is an official implementation for the WTW Dataset in "Parsing Table Structures in the Wild " on table detection and table structure recognition.

WTW-Dataset This is an official implementation for the WTW Dataset in "Parsing Table Structures in the Wild " on ICCV 2021. Here, you can download the

109 Dec 29, 2022
A texturizer that I just made. Nothing special here.

texturizer This is a little project that I did with an hour's time. It texturizes an image given a image and a texture to texturize it with. There is

1 Nov 11, 2021
[ICML 2021] “ Self-Damaging Contrastive Learning”, Ziyu Jiang, Tianlong Chen, Bobak Mortazavi, Zhangyang Wang

Self-Damaging Contrastive Learning Introduction The recent breakthrough achieved by contrastive learning accelerates the pace for deploying unsupervis

VITA 51 Dec 29, 2022
Unified API to facilitate usage of pre-trained "perceptor" models, a la CLIP

mmc installation git clone https://github.com/dmarx/Multi-Modal-Comparators cd 'Multi-Modal-Comparators' pip install poetry poetry build pip install d

David Marx 37 Nov 25, 2022
An Unsupervised Graph-based Toolbox for Fraud Detection

An Unsupervised Graph-based Toolbox for Fraud Detection Introduction: UGFraud is an unsupervised graph-based fraud detection toolbox that integrates s

SafeGraph 99 Dec 11, 2022
A unet implementation for Image semantic segmentation

Unet-pytorch a unet implementation for Image semantic segmentation 参考网上的Unet做分割的代码,做了一个针对kaggle地盐识别的,请去以下地址获取数据集: https://www.kaggle.com/c/tgs-salt-id

Rabbit 3 Jun 29, 2022
A dead simple python wrapper for darknet that works with OpenCV 4.1, CUDA 10.1

What Dead simple python wrapper for Yolo V3 using AlexyAB's darknet fork. Works with CUDA 10.1 and OpenCV 4.1 or later (I use OpenCV master as of Jun

Pliable Pixels 6 Jan 12, 2022
Disentangled Lifespan Face Synthesis

Disentangled Lifespan Face Synthesis Project Page | Paper Demo on Colab Preparation Please follow this github to prepare the environments and dataset.

何森 50 Sep 20, 2022
Official Pytorch Implementation of Relational Self-Attention: What's Missing in Attention for Video Understanding

Relational Self-Attention: What's Missing in Attention for Video Understanding This repository is the official implementation of "Relational Self-Atte

mandos 43 Dec 07, 2022
PyTorch implementation of the WarpedGANSpace: Finding non-linear RBF paths in GAN latent space (ICCV 2021)

Authors official PyTorch implementation of the "WarpedGANSpace: Finding non-linear RBF paths in GAN latent space" [ICCV 2021].

Christos Tzelepis 100 Dec 06, 2022
Unofficial implementation of MLP-Mixer: An all-MLP Architecture for Vision

MLP-Mixer: An all-MLP Architecture for Vision This repo contains PyTorch implementation of MLP-Mixer: An all-MLP Architecture for Vision. Usage : impo

Rishikesh (ऋषिकेश) 175 Dec 23, 2022
Crowd-Kit is a powerful Python library that implements commonly-used aggregation methods for crowdsourced annotation and offers the relevant metrics and datasets

Crowd-Kit: Computational Quality Control for Crowdsourcing Documentation Crowd-Kit is a powerful Python library that implements commonly-used aggregat

Toloka 125 Dec 30, 2022
Demo code for ICCV 2021 paper "Sensor-Guided Optical Flow"

Sensor-Guided Optical Flow Demo code for "Sensor-Guided Optical Flow", ICCV 2021 This code is provided to replicate results with flow hints obtained f

10 Mar 16, 2022
[CIKM 2021] Enhancing Aspect-Based Sentiment Analysis with Supervised Contrastive Learning

Enhancing Aspect-Based Sentiment Analysis with Supervised Contrastive Learning. This repo contains the PyTorch code and implementation for the paper E

Akuchi 18 Dec 22, 2022