DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning.

Overview

DirectML

DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning. DirectML provides GPU acceleration for common machine learning tasks across a broad range of supported hardware and drivers, including all DirectX 12-capable GPUs from vendors such as AMD, Intel, NVIDIA, and Qualcomm.

When used standalone, the DirectML API is a low-level DirectX 12 library and is suitable for high-performance, low-latency applications such as frameworks, games, and other real-time applications. The seamless interoperability of DirectML with Direct3D 12 as well as its low overhead and conformance across hardware makes DirectML ideal for accelerating machine learning when both high performance is desired, and the reliability and predictability of results across hardware is critical.

More information about DirectML can be found in Introduction to DirectML.

Visit the DirectX Landing Page for more resources for DirectX developers.

Getting Started with DirectML

DirectML is distributed as a system component of Windows 10, and is available as part of the Windows 10 operating system (OS) in Windows 10, version 1903 (10.0; Build 18362), and newer.

Starting with DirectML version 1.4.0, DirectML is also available as a standalone redistributable package (see Microsoft.AI.DirectML), which is useful for applications that wish to use a fixed version of DirectML, or when running on older versions of Windows 10.

Hardware requirements

DirectML requires a DirectX 12 capable device. Almost all commercially-available graphics cards released in the last several years support DirectX 12. Examples of compatible hardware include:

  • AMD GCN 1st Gen (Radeon HD 7000 series) and above
  • Intel Haswell (4th-gen core) HD Integrated Graphics and above
  • NVIDIA Kepler (GTX 600 series) and above
  • Qualcomm Adreno 600 and above

For application developers

DirectML exposes a native C++ DirectX 12 API. The header and library (DirectML.h/DirectML.lib) are available as part of the redistributable NuGet package, and are also included in the Windows 10 SDK version 10.0.18362 or newer.

For users, data scientists, and researchers

DirectML is built-in as a backend to several frameworks such as Windows ML, ONNX Runtime, and TensorFlow.

See the following sections for more information:

DirectML Samples

DirectML C++ sample code is available under Samples.

  • HelloDirectML: A minimal "hello world" application that executes a single DirectML operator.
  • DirectMLSuperResolution: A sample that uses DirectML to execute a basic super-resolution model to upscale video from 540p to 1080p in real time.
  • yolov4: YOLOv4 is an object detection model capable of recognizing up to 80 different classes of objects in an image. This sample contains a complete end-to-end implementation of the model using DirectML, and is able to run in real time on a user-provided video stream.

DirectML Python sample code is available under Python/samples. The samples require PyDirectML, an open source Python projection library for DirectML, which can be built and installed to a Python executing environment from Python/src. Refer to the Python/README.md file for more details.

Windows ML on DirectML

Windows ML (WinML) is a high-performance, reliable API for deploying hardware-accelerated ML inferences on Windows devices. DirectML provides the GPU backend for Windows ML.

DirectML acceleration can be enabled in Windows ML using the LearningModelDevice with any one of the DirectX DeviceKinds.

For more information, see Get Started with Windows ML.

ONNX Runtime on DirectML

ONNX Runtime is a cross-platform inferencing and training accelerator compatible with many popular ML/DNN frameworks, including PyTorch, TensorFlow/Keras, scikit-learn, and more.

DirectML is available as an optional execution provider for ONNX Runtime that provides hardware acceleration when running on Windows 10.

For more information about getting started, see Using the DirectML execution provider.

TensorFlow with DirectML

TensorFlow is a popular open source platform for machine learning and is a leading framework for training of machine learning models.

DirectML acceleration for TensorFlow 1.15 is currently available for Public Preview. TensorFlow on DirectML enables training and inference of complex machine learning models on a wide range of DirectX 12-compatible hardware.

TensorFlow on DirectML is supported on both the latest versions of Windows 10 and the Windows Subsystem for Linux, and is available for download as a PyPI package. For more information about getting started, see GPU accelerated ML training (docs.microsoft.com)

PyTorch with DirectML

DirectML acceleration for PyTorch 1.8.0 is currently available for Public Preview. PyTorch with DirectML enables training and inference of complex machine learning models on a wide range of DirectX 12-compatible hardware.

PyTorch on DirectML is supported on both the latest versions of Windows 10 and the Windows Subsystem for Linux, and is available for download as a PyPI package. For more information about getting started, see GPU accelerated ML training (docs.microsoft.com)

Feedback

We look forward to hearing from you!

External Links

Documentation

DirectML programming guide
DirectML API reference

More information

Introducing DirectML (Game Developers Conference '19)
Accelerating GPU Inferencing with DirectML and DirectX 12 (SIGGRAPH '18)
Windows AI: hardware-accelerated ML on Windows devices (Microsoft Build '20)
Gaming with Windows ML (DirectX Developer Blog)
DirectML at GDC 2019 (DirectX Developer Blog)
DirectX Linux (DirectX Developer Blog)

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Comments
  • DirectML is x2.8 slower than CUDA

    DirectML is x2.8 slower than CUDA

    I tested training the same deepfake model on the same hardware using tensorflow-cuda and tensorflow-directml. (my project https://github.com/iperov/DeepFaceLab)

    DirectML: avg iter time 626ms DMLvsCUDA1

    CUDA: avg iter time 222ms DMLvsCUDA2

    DirectML is x2.8 slower :-(

    I think that's what I was talking about here https://github.com/microsoft/DirectML/issues/104

    So what is the point of using DirectML if every millisecond of training acceleration is important in today's world?

    x2.8 slower is serious performance degradation. I reached the same speed in my weekend OpenCL NN library in pure python (https://github.com/iperov/litenn)

    But you are guys from microsoft company. Don't you think there is no point in further development of DirectML until you reach the level of CUDA performance?

    opened by iperov 36
  • Could not load dynamic library 'libcuda.so.1'

    Could not load dynamic library 'libcuda.so.1'

    Followed the instructions here

    ~ » cat /proc/version                                                                                                                                                             1 ↵ [email protected]
    Linux version 4.4.0-20150-Microsoft ([email protected]) (gcc version 5.4.0 (GCC) ) #1000-Microsoft Thu Jun 12 17:34:00 PST 2020
    

    I'm running build 20150, but am getting this error:

    Python 3.6.10 |Anaconda, Inc.| (default, May  8 2020, 02:54:21)
    [GCC 7.3.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import tensorflow.compat.v1 as tf
    >>>
    >>> tf.enable_eager_execution(tf.ConfigProto(log_device_placement=True))
    >>>
    >>> print(tf.add([1.0, 2.0], [3.0, 4.0]))
    2020-06-17 16:36:05.469811: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
    2020-06-17 16:36:05.469926: E tensorflow/stream_executor/cuda/cuda_driver.cc:313] failed call to cuInit: UNKNOWN ERROR (303)
    2020-06-17 16:36:05.470029: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (MAKERPC): /proc/driver/nvidia/version does not exist
    2020-06-17 16:36:05.470532: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
    2020-06-17 16:36:05.483133: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 3400000000 Hz
    2020-06-17 16:36:05.487879: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fffe52ac420 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
    2020-06-17 16:36:05.488038: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
    tf.Tensor([4. 6.], shape=(2,), dtype=float32)
    
    opened by jflam 23
  • [installation] Could not find a version that satisfies the requirement tensorflow-directml (from versions: none)

    [installation] Could not find a version that satisfies the requirement tensorflow-directml (from versions: none)

    Hi,

    After following the steps described in https://docs.microsoft.com/en-us/windows/win32/direct3d12/gpu-tensorflow-wsl till pip install tensorflow-directml,

    the error appeared as

    ERROR: Could not find a version that satisfies the requirement tensorflow-directml (from versions: none) ERROR: No matching distribution found for tensorflow-directml

    BTW, I am using python 3.8

    and I did python list tensorflow*, which outputed

    Package Version


    certifi 2020.6.20 pip 20.1.1 setuptools 49.2.0.post20200714 wheel 0.34.2

    opened by shuwang1 19
  • How to get available devices and set a specific device in Pytorch-DML?

    How to get available devices and set a specific device in Pytorch-DML?

    Hi, For accessing available devices in Pytorch we'd normally do :

        print(f'available devices: {torch.cuda.device_count()}')
        print(f'current device: { torch.cuda.current_device()}')
    

    However, I noticed this fails (AssertionError: Torch not compiled with CUDA enabled).
    I thought the transition would be minimal, and stuff like this would work out of the box! especially so, after noting we cant write:

        print(f'available devices: {torch.dml.device_count()}')
        print(f'current device: { torch.dml.current_device()}')
    

    as it fails with the error :

    AttributeError: module 'torch.dml' has no attribute 'device_count'
    

    Apart from this, trying to specify a device using the form "dml:number" fails if number>1! that is this fails for "dml:1":

    import torch 
    import time
    def bench(device ='cpu'):
        print(f'running on {device}:')
        a = torch.randn(size=(2000,2000)).to(device=device)
        b = torch.randn(size=(2000,2000)).to(device=device)
       
        start = time.time()
        c = a+b
        end = time.time()
        
        # print(f'available devices: {torch.dml.device_count()}')
        # print(f'current device: { torch.dml.current_device()}')
        print(f'--took {end-start:.2f} seconds')
    
    bench('cpu')
    bench('dml')
    bench('dml:0')
    bench('dml:1')    
    

    it outputs :

    running on cpu:
    --took 0.00 seconds
    running on dml:
    --took 0.01 seconds
    running on dml:0:
    --took 0.00 seconds
    running on dml:1:
    

    and thats it, it doesnt execute when it comes to "dml:1".

    also trying to do :

    import torch 
    import time
    def bench(device ='cpu'):
        print(f'running on {device}:')
        a = torch.randn(size=(2000,2000)).to(device=device)
        b = torch.randn_like(a).to(device=device)
        
        start = time.time()
        c = a+b
        end = time.time()
        
        # print(f'available devices: {torch.dml.device_count()}')
        # print(f'current device: { torch.dml.current_device()}')
        print(f'--took {end-start:.2f} seconds')
    
    bench('cpu')
    bench('dml')
    bench('dml:0')
    bench('dml:1')    
    

    Fails with the following error :

    running on cpu:
    --took 0.00 seconds
    running on dml:
    Traceback (most recent call last):
      File "g:\tests.py", line 1246, in <module>
        bench('dml')
      File "g:\tests.py", line 1235, in bench
        b = torch.randn_like(a).to(device=device)
    RuntimeError: Could not run 'aten::normal_' with arguments from the 'UNKNOWN_TENSOR_TYPE_ID' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom 
    build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::normal_' is only available for these backends: [CPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradNestedTensor, UNKNOWN_TENSOR_TYPE_ID, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].
    
    CPU: registered at D:\a\_work\1\s\build\aten\src\ATen\RegisterCPU.cpp:5926 [kernel]
    BackendSelect: fallthrough registered at D:\a\_work\1\s\aten\src\ATen\core\BackendSelectFallbackKernel.cpp:3 [backend fallback]
    Named: fallthrough registered at D:\a\_work\1\s\aten\src\ATen\core\NamedRegistrations.cpp:11 [kernel]
    AutogradOther: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    AutogradCPU: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    AutogradCUDA: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    AutogradXLA: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    AutogradNestedTensor: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    UNKNOWN_TENSOR_TYPE_ID: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    AutogradPrivateUse1: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    AutogradPrivateUse2: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    AutogradPrivateUse3: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    Tracer: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\TraceType_4.cpp:10612 [kernel]
    Autocast: fallthrough registered at D:\a\_work\1\s\aten\src\ATen\autocast_mode.cpp:250 [backend fallback]
    Batched: registered at D:\a\_work\1\s\aten\src\ATen\BatchingRegistrations.cpp:1016 [backend fallback]
    VmapMode: registered at D:\a\_work\1\s\aten\src\ATen\VmapModeRegistrations.cpp:37 [kernel]
    
    
    pytorch-directml 
    opened by Coderx7 11
  • Conv2D-Fail: internal compiler error, abnormal program termination

    Conv2D-Fail: internal compiler error, abnormal program termination

    I ran across directML a few hours ago and am currently playing around with it on a Surface Pro 6 with an Intel HD Graphics 620. To set it all up, I followed this article to the letter: https://docs.microsoft.com/en-us/windows/win32/direct3d12/gpu-tensorflow-windows

    For testing purposes, I used a slightly modified version of my small go-to script:

    import tensorflow.compat.v1 as tf 
    
    tf.enable_eager_execution(tf.ConfigProto(log_device_placement=False)) 
    
    fashion_mnist = tf.keras.datasets.fashion_mnist
    (train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
    
    
    class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
                   'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
    
    train_images = train_images.reshape(60000, 28, 28, 1)
    train_images = train_images / 255.0
    
    test_images = test_images.reshape(10000, 28, 28, 1)
    test_images = test_images / 255.0
    
    #model = tf.keras.Sequential([
    #    tf.keras.layers.Flatten(input_shape=(28, 28, 1)),
    #    tf.keras.layers.Dense(128, activation=tf.nn.relu),
    #    tf.keras.layers.Dense(10, activation=tf.nn.softmax)
    #])
    
    model = tf.keras.Sequential([
        tf.keras.layers.Conv2D(64, (3,3), activation=tf.nn.relu, input_shape=(28, 28, 1)),
        tf.keras.layers.MaxPooling2D(2,2),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(128, activation=tf.nn.relu),
        tf.keras.layers.Dense(10, activation=tf.nn.softmax)
    ])
    
    model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])
    
    model.fit(train_images, train_labels, epochs=5)
    
    test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=2)
    
    print('Test accuracy:', test_acc)
    

    The version of the model without convolutions runs absolutely fine. But as soon as I add the Conv2D layer, nothing works anymore.

    The entire output I get is:

    2021-04-23 21:23:05.241248: I tensorflow/stream_executor/platform/default/dso_loader.cc:99] Successfully opened dynamic library C:\Users\cyphus309\.conda\envs\directml\lib\site-packages\tensorflow_core\python/directml.b6e3bc69b89cfca5486e178bb9d51724d0c4a94a.dll
    2021-04-23 21:23:05.298554: I tensorflow/core/common_runtime/dml/dml_device_cache.cc:249] DirectML device enumeration: found 1 compatible adapters.
    2021-04-23 21:23:05.299189: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
    2021-04-23 21:23:05.331743: I tensorflow/core/common_runtime/dml/dml_device_cache.cc:185] DirectML: creating device on adapter 0 (Intel(R) HD Graphics 620)
    2021-04-23 21:23:05.363568: I tensorflow/stream_executor/platform/default/dso_loader.cc:99] Successfully opened dynamic library Kernel32.dll
    Train on 60000 samples
    Epoch 1/5
    
    internal compiler error, abnormal program termination
    
    

    Any ideas?

    bug 
    opened by kampfhamster309 11
  • Tensorflow directml crashes my python session

    Tensorflow directml crashes my python session

    Hi,

    I've recently purchased a 6900 xt GPU which I would like to use with tensorflow. I followed the installation guide on https://docs.microsoft.com/en-us/windows/win32/direct3d12/gpu-tensorflow-windows which worked but the issue I have now is that whenever I try to use tensorflow it closes my python environment.

    I've attached an image to show what I mean. I can import tensorflow fine and it shows me that I have version 1.15.5 available. The problem is when I want to check if my GPU is available I get two messages and then it crashes me out of my python environment.

    Does anybody know how to solve this issue and what is going on?

    Thank you in advance!

    amd_tf_problem

    bug 
    opened by bwintertkb 9
  • C++ DirectML.dll causes crash in debug x64 mode when using NuGet package Microsoft.AI.MachineLearning 1.5.2

    C++ DirectML.dll causes crash in debug x64 mode when using NuGet package Microsoft.AI.MachineLearning 1.5.2

    Hello,

    I'm experiencing a runtime crash with the C++ DirectML API in Debug x64 mode after upgrading my NuGet package Microsoft.AI.MachineLearning from version 1.4.0 to 1.5.2. There is no error in Release x64 mode.

    The reason why I'm using this package is because the included DirectML.dll improves DirectML performance greatly. There seems to be an issue when creating a DirectMLOperator. The operator type is DML_OPERATOR_JOIN.

    Can you please help me identify the issue? Also how can I find the latest DirectML.dll file without downloading the package?

    DirectML dll error

    opened by momower1 9
  • Performance will be improved by setting input strides=output strides for Clip in DirectMLX

    Performance will be improved by setting input strides=output strides for Clip in DirectMLX

    I am investigating for the performance of MobileNet V2 from TFLite models with "nhwc" layout and MobileNet V2 from ONNX models with "nchw" layout on the implementation with DirectML and DirectMLX API.

    I find that nhwc MobileNetV2 model has lots of Clip after Conv2d, the Clip will cost much time on inference. I guess that the Clip will do memory copy and hasn't be optimized in compilation stage.

    I have a workaround to resolve this problem: set Clip's input strides same as its' output strides by changing this lineto TensorDesc outputTensor = inputTensor in DirectMLX.h, the Clip will be optimized just like fused into Conv2d, and then the inference time will be significantly reduced to be as same as nchw MobileNetV2.

    When building nhwc MobileNetV2 model, we need append Identity after each Conv2d to transpose output tensor from default nchw to nhwc, then transpose this output tensor from nhwc to nchw as the next Conv2d's input tensor. In my opinion, I suppose that the Identity and Reinterpret can be optimized by DML in this model like: Conv0->Identity(nchw->nhwc)->Reinterpret strides(nhwc->nchw)->Conv1 just like transpose sinking in OpenVINO backend.

    I guess that the Identity and Reinterpret sinking may be blocked when there is Clip like: Conv0->Identity(nchw->nhwc)->Clip->Reinterpret strides(nhwc->nchw)->Conv1 . I verified that if I remove Identity to run Conv0->Reinterpret strides(nchw->nhwc)->Clip(input strides = output strides)->Reinterpret strides(nhwc->nchw)->Conv1, the inference time will be much lower than before.

    So in conclusion, I suggest setting Clip's input strides same as its' output strides by changing this line to TensorDesc outputTensor = inputTensor in DirectMLX.h.

    opened by mingmingtasd 8
  • TensorFlow & DirectML & ROCm  performance and roadmap

    TensorFlow & DirectML & ROCm performance and roadmap

    The current DirectML library for GPU is more 2x slower than the TensorFlow CPU library. When DirectML team will improve the performance of the library? Could you share a roadmap of DirectML? Will DirectML team cooperate with ROCm team (https://github.com/RadeonOpenCompute/ROCm), Intel and Nvidia for improving performance?

    opened by YuriyTigiev 8
  • pytorch-directml simple command error

    pytorch-directml simple command error

    just trying simple command with pytorch-directml 1.8.0a0.dev220224 and getting error

    >>> torch.tensor([1], dtype=torch.float32, device='dml')
    
    Traceback (most recent call last):
      File "<console>", line 1, in <module>
      File "D:\DevelopPPP\projects\DeepFakeBox\_internal\python\lib\site-packages\torch\tensor.py", line 193, in __repr__
        return torch._tensor_str._str(self)
      File "D:\DevelopPPP\projects\DeepFakeBox\_internal\python\lib\site-packages\torch\_tensor_str.py", line 383, in _str
        return _str_intern(self)
      File "D:\DevelopPPP\projects\DeepFakeBox\_internal\python\lib\site-packages\torch\_tensor_str.py", line 358, in _str_intern
        tensor_str = _tensor_str(self, indent)
      File "D:\DevelopPPP\projects\DeepFakeBox\_internal\python\lib\site-packages\torch\_tensor_str.py", line 242, in _tensor_str
        formatter = _Formatter(get_summarized_data(self) if summarize else self)
      File "D:\DevelopPPP\projects\DeepFakeBox\_internal\python\lib\site-packages\torch\_tensor_str.py", line 90, in __init__
        nonzero_finite_vals = torch.masked_select(tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0))
    RuntimeError: Could not run 'aten::masked_select' with arguments from the 'UNKNOWN_TENSOR_TYPE_ID' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::masked_select' is only available for these backends: [CPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradNestedTensor, UNKNOWN_TENSOR_TYPE_ID, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].
    
    CPU: registered at D:\a\_work\1\s\pytorch-directml\build\aten\src\ATen\RegisterCPU.cpp:5926 [kernel]
    BackendSelect: fallthrough registered at D:\a\_work\1\s\pytorch-directml\aten\src\ATen\core\BackendSelectFallbackKernel.cpp:3 [backend fallback]
    Named: fallthrough registered at D:\a\_work\1\s\pytorch-directml\aten\src\ATen\core\NamedRegistrations.cpp:11 [kernel]
    AutogradOther: registered at D:\a\_work\1\s\pytorch-directml\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    AutogradCPU: registered at D:\a\_work\1\s\pytorch-directml\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    AutogradCUDA: registered at D:\a\_work\1\s\pytorch-directml\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    AutogradXLA: registered at D:\a\_work\1\s\pytorch-directml\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    AutogradNestedTensor: registered at D:\a\_work\1\s\pytorch-directml\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    UNKNOWN_TENSOR_TYPE_ID: registered at D:\a\_work\1\s\pytorch-directml\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    AutogradPrivateUse1: registered at D:\a\_work\1\s\pytorch-directml\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    AutogradPrivateUse2: registered at D:\a\_work\1\s\pytorch-directml\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    AutogradPrivateUse3: registered at D:\a\_work\1\s\pytorch-directml\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    Tracer: registered at D:\a\_work\1\s\pytorch-directml\torch\csrc\autograd\generated\TraceType_4.cpp:10612 [kernel]
    Autocast: fallthrough registered at D:\a\_work\1\s\pytorch-directml\aten\src\ATen\autocast_mode.cpp:250 [backend fallback]
    Batched: registered at D:\a\_work\1\s\pytorch-directml\aten\src\ATen\BatchingRegistrations.cpp:1016 [backend fallback]
    VmapMode: fallthrough registered at D:\a\_work\1\s\pytorch-directml\aten\src\ATen\VmapModeRegistrations.cpp:33 [backend fallback]
    

    cpu is fine

    >>> torch.tensor([1], dtype=torch.float32, device='cpu')
    tensor([1.])
    
    pytorch-directml 
    opened by iperov 7
  • Is there any low power mode for DirectML

    Is there any low power mode for DirectML

    hi, now I have a quick enough model (120fps) and will run at 20fps, what i need is use as low as possible gpu power. but i find the gpu frequency jump to 1150mhz too many times. as compare to "https://voovmeeting.com/download-center.html?from=1001" tencent meeting , I found when I enable human segmentation , in a 8xxx laptop, the gpu frequency hold below 400mhz , but GPU load over 75%, that is strange for frequency policy.
    so I guess , maybe directx12 or dx11 has some low power mode ? or some other ways, for ex. add some wait in each OP (for ex. convolution op)

    opened by liyuming1978 7
  • pytorch-directml produce

    pytorch-directml produce "[W dml_heap_allocator.cc:97] DML allocator out of memory!"

    I was trying to run the simple code below:

    import torch import torch_directml dml = torch_directml.device()

    print(f"dml={dml}")

    tensor1 = torch.tensor([1]) print(tensor1) tensor1=tensor1.to(dml)

    when runing tensor1.to(dml), i got the following error: [W dml_heap_allocator.cc:97] DML allocator out of memory! Traceback (most recent call last): File "/home/fnz/workspace/direct-ml/main.py", line 9, in tensor1=tensor1.to(dml) RuntimeError: Unknown error -2147024882

    It seems that my pytorch-directml doesn't work at all.

    below is my package in conda: (direct_ml) [email protected]:~/workspace/direct-ml$ conda list | grep torch torch 1.13.1 pypi_0 pypi torch-directml 0.1.13.dev221216 pypi_0 pypi

    BTW, my environment is wsl2 on top of windows 11 pro .

    The tensorflow directml seems working well.

    any idea ?

    thanks

    Feng

    opened by virtual-feng 1
  • torch-directml : torch.div with trunc rounding on int64 fails with RuntimeError

    torch-directml : torch.div with trunc rounding on int64 fails with RuntimeError

    Hi, Because 'aten::fmod.Tensor_out' is not implemented, I tried to implement it myself. I encountered a new error when using the rounding mode trunc with a int64 tensor.

    Code:

    import torch
    import torch_directml
    dml = torch_directml.device()
    
    a = torch.tensor([1,2,3]).to(dml) #
    b = 2
    a = a - torch.div(a, b, rounding_mode="trunc") * b
    
    opened by Theucalyptus 0
  • Very low validation and testing accuracy on CNN

    Very low validation and testing accuracy on CNN

    Hello everyone. I am facing an issue. I am explaining what I am trying to do. I have a Traffic and Road sign dataset that contains 43 classes. I am trying to classify the images. I am using the resnet34 pre-trained model. I have AMD RX6600 GPU that I use for running the model. For running the model on my AMD GPU I am using Pytorch Directml. Until now everything has worked fine. Training speed is fast enough, and GPU utilization is near 100%. Training loss decreases per epoch. But when I check the model using validation data after one training phase, validation loss increases and validation accuracy is too low. But training is ok. When I run the same code on my friend’s PC who has NVIDIA GPU, all is ok. Validation loss decreases and it converges. And I got an accuracy of 98% when running the same code on NVIDIA GPU. I can not figure out what the problem is. I also tune the hyperparameter but had no luck. And one strange thing is that this problem arises when I use CNN based model. I had run NLP pre-trained model BERT on my AMD GPU and there is no Issue. Validation loss decreases and it converges. Can anyone help me with this issue? I am giving the code below. Thanks in advance. Screenshot 2023-01-03 221733

    opened by AtiqurRahmanAni 0
  • Spacy seems outdated + problems running attention...

    Spacy seems outdated + problems running attention...

    Disclaimer: NOT a coder. Generally curious individual with just enough copy-paste and google skills. I may not know what I'm talking about.

    Just playing around with the repo. The install failed because of spacy version in requirements.txt for me. Using python 3.10 on Ubuntu 22.10. Changing Spacy to 3.4.4 (which I had cached, so I just did pip install spacy - to see whichever worked)

    It installed, but gave further warnings like ⚠ As of spaCy v3.0, shortcuts like 'en' are deprecated. Please use the full pipeline package name 'en_core_web_sm' instead. Collecting en-core-web-sm==3.4.1... and

    ⚠ As of spaCy v3.0, shortcuts like 'de' are deprecated. Please use the full pipeline package name 'de_core_news_sm' instead. Collecting de-core-news-sm==3.4.0

    opened by Vidyut 0
  • Operator 'aten::amax.out' is not currently supported on the DML backend.

    Operator 'aten::amax.out' is not currently supported on the DML backend.

    C:\ProgramData\Anaconda3\envs\torchdml\lib\site-packages\torch\optim\adamax.py:231: UserWarning: The operator 'aten::amax.out' is not currently supported on the DML backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at D:\a_work\1\s\pytorch-directml-plugin\torch_directml\csrc\dml\dml_cpu_fallback.cpp:16.) torch.amax(norm_buf, 0, keepdim=False, out=exp_inf)

    opened by rmskmr05 0
Releases(tensorflow-directml-1.15.3.dev200626)
Owner
Microsoft
Open source projects and samples from Microsoft
Microsoft
Neural Machine Translation (NMT) tutorial with OpenNMT-py

Neural Machine Translation (NMT) tutorial with OpenNMT-py. Data preprocessing, model training, evaluation, and deployment.

Yasmin Moslem 29 Jan 09, 2023
Decentralized deep learning in PyTorch. Built to train models on thousands of volunteers across the world.

Hivemind: decentralized deep learning in PyTorch Hivemind is a PyTorch library to train large neural networks across the Internet. Its intended usage

1.3k Jan 08, 2023
Machine Learning Model to predict the payment date of an invoice when it gets created in the system.

Payment-Date-Prediction Machine Learning Model to predict the payment date of an invoice when it gets created in the system.

15 Sep 09, 2022
Apple-voice-recognition - Machine Learning

Apple-voice-recognition Machine Learning How does Siri work? Siri is based on large-scale Machine Learning systems that employ many aspects of data sc

Harshith VH 1 Oct 22, 2021
A Python Module That Uses ANN To Predict A Stocks Price And Also Provides Accurate Technical Analysis With Many High Potential Implementations!

Stox A Module to predict the "close price" for the next day and give "technical analysis". It uses a Neural Network and the LSTM algorithm to predict

Stox 31 Dec 16, 2022
The Ultimate FREE Machine Learning Study Plan

The Ultimate FREE Machine Learning Study Plan

Patrick Loeber (Python Engineer) 2.5k Jan 05, 2023
This is an auto-ML tool specialized in detecting of outliers

Auto-ML tool specialized in detecting of outliers Description This tool will allows you, with a Dash visualization, to compare 10 models of machine le

1 Nov 03, 2021
Time series forecasting with PyTorch

Our article on Towards Data Science introduces the package and provides background information. Pytorch Forecasting aims to ease state-of-the-art time

Jan Beitner 2.5k Jan 02, 2023
GAM timeseries modeling with auto-changepoint detection. Inspired by Facebook Prophet and implemented in PyMC3

pm-prophet Pymc3-based universal time series prediction and decomposition library (inspired by Facebook Prophet). However, while Faceook prophet is a

Luca Giacomel 314 Dec 25, 2022
Python package for concise, transparent, and accurate predictive modeling

Python package for concise, transparent, and accurate predictive modeling. All sklearn-compatible and easy to use. 📚 docs • 📖 demo notebooks Modern

Chandan Singh 983 Jan 01, 2023
distfit - Probability density fitting

Python package for probability density function fitting of univariate distributions of non-censored data

Erdogan Taskesen 187 Dec 30, 2022
PyHarmonize: Adding harmony lines to recorded melodies in Python

PyHarmonize: Adding harmony lines to recorded melodies in Python About To use this module, the user provides a wav file containing a melody, the key i

Julian Kappler 2 May 20, 2022
MiniTorch - a diy teaching library for machine learning engineers

This repo is the full student code for minitorch. It is designed as a single repo that can be completed part by part following the guide book. It uses

1.1k Jan 07, 2023
Python ML pipeline that showcases mltrace functionality.

mltrace tutorial Date: October 2021 This tutorial builds a training and testing pipeline for a toy ML prediction problem: to predict whether a passeng

Log Labs 28 Nov 09, 2022
Tangram makes it easy for programmers to train, deploy, and monitor machine learning models.

Tangram Website | Discord Tangram makes it easy for programmers to train, deploy, and monitor machine learning models. Run tangram train to train a mo

Tangram 1.4k Jan 05, 2023
Timeseries analysis for neuroscience data

=================================================== Nitime: timeseries analysis for neuroscience data ===============================================

NIPY developers 212 Dec 09, 2022
Microsoft Machine Learning for Apache Spark

Microsoft Machine Learning for Apache Spark MMLSpark is an ecosystem of tools aimed towards expanding the distributed computing framework Apache Spark

Microsoft Azure 3.9k Dec 30, 2022
MLR - Machine Learning Research

Machine Learning Research 1. Project Topic 1.1. Exsiting research Benmark: https://paperswithcode.com/sota ACL anthology for NLP papers: http://www.ac

Charles 69 Oct 20, 2022
ThunderGBM: Fast GBDTs and Random Forests on GPUs

Documentations | Installation | Parameters | Python (scikit-learn) interface What's new? ThunderGBM won 2019 Best Paper Award from IEEE Transactions o

Xtra Computing Group 648 Dec 16, 2022
2D fluid simulation implementation of Jos Stam paper on real-time fuild dynamics, including some suggested extensions.

Fluid Simulation Usage Download this repo and store it in your computer. Open a terminal and go to the root directory of this folder. Make sure you ha

Mariana Ávalos Arce 5 Dec 02, 2022