NVIDIA Deep Learning Examples for Tensor Cores

Overview

NVIDIA Deep Learning Examples for Tensor Cores

Introduction

This repository provides State-of-the-Art Deep Learning examples that are easy to train and deploy, achieving the best reproducible accuracy and performance with NVIDIA CUDA-X software stack running on NVIDIA Volta, Turing and Ampere GPUs.

NVIDIA GPU Cloud (NGC) Container Registry

These examples, along with our NVIDIA deep learning software stack, are provided in a monthly updated Docker container on the NGC container registry (https://ngc.nvidia.com). These containers include:

  • The latest NVIDIA examples from this repository
  • The latest NVIDIA contributions shared upstream to the respective framework
  • The latest NVIDIA Deep Learning software libraries, such as cuDNN, NCCL, cuBLAS, etc. which have all been through a rigorous monthly quality assurance process to ensure that they provide the best possible performance
  • Monthly release notes for each of the NVIDIA optimized containers

Computer Vision

Models Framework A100 AMP Multi-GPU Multi-Node TRT ONNX Triton DLC NB
ResNet-50 PyTorch Yes Yes Yes - Yes - Yes Yes -
ResNeXt-101 PyTorch Yes Yes Yes - Yes - Yes Yes -
SE-ResNeXt-101 PyTorch Yes Yes Yes - Yes - Yes Yes -
EfficientNet-B0 PyTorch Yes Yes Yes - - - - Yes -
EfficientNet-B4 PyTorch Yes Yes Yes - - - - Yes -
EfficientNet-WideSE-B0 PyTorch Yes Yes Yes - - - - Yes -
EfficientNet-WideSE-B4 PyTorch Yes Yes Yes - - - - Yes -
Mask R-CNN PyTorch Yes Yes Yes - - - - - Yes
nnUNet PyTorch Yes Yes Yes - - - - Yes -
SSD PyTorch Yes Yes Yes - - - - - Yes
ResNet-50 TensorFlow Yes Yes Yes - - - - Yes -
ResNeXt101 TensorFlow Yes Yes Yes - - - - Yes -
SE-ResNeXt-101 TensorFlow Yes Yes Yes - - - - Yes -
Mask R-CNN TensorFlow Yes Yes Yes - - - - Yes -
SSD TensorFlow Yes Yes Yes - - - - Yes Yes
U-Net Ind TensorFlow Yes Yes Yes - - - - Yes Yes
U-Net Med TensorFlow Yes Yes Yes - - - - Yes -
U-Net 3D TensorFlow Yes Yes Yes - - - - Yes -
V-Net Med TensorFlow Yes Yes Yes - - - - Yes -
U-Net Med TensorFlow2 Yes Yes Yes - - - - Yes -
Mask R-CNN TensorFlow2 Yes Yes Yes - - - - Yes -
EfficientNet TensorFlow2 Yes Yes Yes Yes - - - Yes -
ResNet-50 MXNet - Yes Yes - - - - - -

Natural Language Processing

Models Framework A100 AMP Multi-GPU Multi-Node TRT ONNX Triton DLC NB
BERT PyTorch Yes Yes Yes Yes - - Yes Yes -
TransformerXL PyTorch Yes Yes Yes Yes - - - Yes -
GNMT PyTorch Yes Yes Yes - - - - - -
Transformer PyTorch Yes Yes Yes - - - - - -
ELECTRA TensorFlow2 Yes Yes Yes Yes - - - Yes -
BERT TensorFlow Yes Yes Yes Yes Yes - Yes Yes Yes
BERT TensorFlow2 Yes Yes Yes Yes - - - Yes -
BioBert TensorFlow Yes Yes Yes - - - - Yes Yes
TransformerXL TensorFlow Yes Yes Yes - - - - - -
GNMT TensorFlow Yes Yes Yes - - - - - -
Faster Transformer Tensorflow - - - - Yes - - - -

Recommender Systems

Models Framework A100 AMP Multi-GPU Multi-Node TRT ONNX Triton DLC NB
DLRM PyTorch Yes Yes Yes - - Yes Yes Yes Yes
DLRM TensorFlow2 Yes Yes Yes Yes - - - Yes -
NCF PyTorch Yes Yes Yes - - - - - -
Wide&Deep TensorFlow Yes Yes Yes - - - - Yes -
Wide&Deep TensorFlow2 Yes Yes Yes - - - - Yes -
NCF TensorFlow Yes Yes Yes - - - - Yes -
VAE-CF TensorFlow Yes Yes Yes - - - - - -

Speech to Text

Models Framework A100 AMP Multi-GPU Multi-Node TRT ONNX Triton DLC NB
Jasper PyTorch Yes Yes Yes - Yes Yes Yes Yes Yes
Hidden Markov Model Kaldi - - Yes - - - Yes - -

Text to Speech

Models Framework A100 AMP Multi-GPU Multi-Node TRT ONNX Triton DLC NB
FastPitch PyTorch Yes Yes Yes - - - - Yes -
FastSpeech PyTorch - Yes Yes - Yes - - - -
Tacotron 2 and WaveGlow PyTorch Yes Yes Yes - Yes Yes Yes Yes -

Graph Neural Networks

Models Framework A100 AMP Multi-GPU Multi-Node TRT ONNX Triton DLC NB
SE(3)-Transformer PyTorch Yes Yes Yes - - - - - -

NVIDIA support

In each of the network READMEs, we indicate the level of support that will be provided. The range is from ongoing updates and improvements to a point-in-time release for thought leadership.

Glossary

Multinode Training
Supported on a pyxis/enroot Slurm cluster.

Deep Learning Compiler (DLC)
TensorFlow XLA and PyTorch JIT and/or TorchScript

Accelerated Linear Algebra (XLA)
XLA is a domain-specific compiler for linear algebra that can accelerate TensorFlow models with potentially no source code changes. The results are improvements in speed and memory usage.

PyTorch JIT and/or TorchScript
TorchScript is a way to create serializable and optimizable models from PyTorch code. TorchScript, an intermediate representation of a PyTorch model (subclass of nn.Module) that can then be run in a high-performance environment such as C++.

Automatic Mixed Precision (AMP)
Automatic Mixed Precision (AMP) enables mixed precision training on Volta, Turing, and NVIDIA Ampere GPU architectures automatically.

TensorFloat-32 (TF32)
TensorFloat-32 (TF32) is the new math mode in NVIDIA A100 GPUs for handling the matrix math also called tensor operations. TF32 running on Tensor Cores in A100 GPUs can provide up to 10x speedups compared to single-precision floating-point math (FP32) on Volta GPUs. TF32 is supported in the NVIDIA Ampere GPU architecture and is enabled by default.

Jupyter Notebooks (NB)
The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text.

Feedback / Contributions

We're posting these examples on GitHub to better support the community, facilitate feedback, as well as collect and implement contributions using GitHub Issues and pull requests. We welcome all contributions!

Known issues

In each of the network READMEs, we indicate any known issues and encourage the community to provide feedback.

Owner
NVIDIA Corporation
NVIDIA Corporation
EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

MADE (Multi-Adapter Dataset Experts) This repository contains the implementation of MADE (Multi-adapter dataset experts), which is described in the pa

Princeton Natural Language Processing 68 Jul 18, 2022
Official PyTorch implementation of Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations

Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations Zhenyu Jiang, Yifeng Zhu, Maxwell Svetlik, Kuan Fang, Yu

UT-Austin Robot Perception and Learning Lab 63 Jan 03, 2023
Source code for the ACL-IJCNLP 2021 paper entitled "T-DNA: Taming Pre-trained Language Models with N-gram Representations for Low-Resource Domain Adaptation" by Shizhe Diao et al.

T-DNA Source code for the ACL-IJCNLP 2021 paper entitled Taming Pre-trained Language Models with N-gram Representations for Low-Resource Domain Adapta

shizhediao 17 Dec 22, 2022
The GitHub repository for the paper: “Time Series is a Special Sequence: Forecasting with Sample Convolution and Interaction“.

SCINet This is the original PyTorch implementation of the following work: Time Series is a Special Sequence: Forecasting with Sample Convolution and I

386 Jan 01, 2023
MTCNN face detection implementation for TensorFlow, as a PIP package.

MTCNN Implementation of the MTCNN face detector for Keras in Python3.4+. It is written from scratch, using as a reference the implementation of MTCNN

Iván de Paz Centeno 1.9k Dec 30, 2022
DeepHawkeye is a library to detect unusual patterns in images using features from pretrained neural networks

English | 简体中文 Introduction DeepHawkeye is a library to detect unusual patterns in images using features from pretrained neural networks Reference Pat

CV Newbie 28 Dec 13, 2022
Forest R-CNN: Large-Vocabulary Long-Tailed Object Detection and Instance Segmentation (ACM MM 2020)

Forest R-CNN: Large-Vocabulary Long-Tailed Object Detection and Instance Segmentation (ACM MM 2020) Official implementation of: Forest R-CNN: Large-Vo

Jialian Wu 54 Jan 06, 2023
use tensorflow 2.0 to tell a dog and cat from a specified picture

dog_or_cat use tensorflow 2.0 to tell a dog and cat from a specified picture This is one of the classic experiments for the introduction of deep learn

你这个代码我看不懂 1 Oct 22, 2021
Random-Afg - Afghanistan Random Old Idz Cloner Tools

AFGHANISTAN RANDOM OLD IDZ CLONER TOOLS Install $ apt update $ apt upgrade $ apt

MAHADI HASAN AFRIDI 5 Jan 26, 2022
xitorch: differentiable scientific computing library

xitorch is a PyTorch-based library of differentiable functions and functionals that can be widely used in scientific computing applications as well as deep learning.

24 Apr 15, 2021
PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models

PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models This repository is the official implementation of the fol

DistributedML 41 Dec 06, 2022
Neurolab is a simple and powerful Neural Network Library for Python

Neurolab Neurolab is a simple and powerful Neural Network Library for Python. Contains based neural networks, train algorithms and flexible framework

152 Dec 06, 2022
The 3rd place solution for competition

The 3rd place solution for competition "Lyft Motion Prediction for Autonomous Vehicles" at Kaggle Team behind this solution: Artsiom Sanakoyeu [Homepa

Artsiom 104 Nov 22, 2022
Implementation of Perceiver, General Perception with Iterative Attention in TensorFlow

Perceiver This Python package implements Perceiver: General Perception with Iterative Attention by Andrew Jaegle in TensorFlow. This model builds on t

Rishit Dagli 84 Oct 15, 2022
This is the official repository for our paper: ''Pruning Self-attentions into Convolutional Layers in Single Path''.

Pruning Self-attentions into Convolutional Layers in Single Path This is the official repository for our paper: Pruning Self-attentions into Convoluti

Zhuang AI Group 77 Dec 26, 2022
Myia prototyping

Myia Myia is a new differentiable programming language. It aims to support large scale high performance computations (e.g. linear algebra) and their g

Mila 456 Nov 07, 2022
code for "Feature Importance-aware Transferable Adversarial Attacks"

Feature Importance-aware Attack(FIA) This repository contains the code for the paper: Feature Importance-aware Transferable Adversarial Attacks (ICCV

Hengchang Guo 44 Nov 24, 2022
PolyGlot, a fuzzing framework for language processors

PolyGlot, a fuzzing framework for language processors Build We tested PolyGlot on Ubuntu 18.04. Get the source code: git clone https://github.com/s3te

Software Systems Security Team at Penn State University 79 Dec 27, 2022
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

Microsoft 8.4k Jan 01, 2023
Creative Applications of Deep Learning w/ Tensorflow

Creative Applications of Deep Learning w/ Tensorflow This repository contains lecture transcripts and homework assignments as Jupyter Notebooks for th

Parag K Mital 1.5k Dec 30, 2022