Deep Learning GPU Training System

Last update: Jan 03, 2023

Overview

DIGITS

DIGITS (the Deep Learning GPU Training System) is a webapp for training deep learning models. The currently supported frameworks are: Caffe, Torch, and Tensorflow.

Feedback

In addition to submitting pull requests, feel free to submit and vote on feature requests via our ideas portal.

Documentation

Current and most updated document is availabel at NVIDIA Accelerated Computing, Deep Learning Documentation, NVIDIA DIGITS.

Installation

Installation method	Supported platform[s]	Available versions	Instructions
Source	Ubuntu 14.04, 16.04	GitHub tags	docs/BuildDigits.md

Official DIGITS container is available at nvcr.io via docker pull command.

Usage

Once you have installed DIGITS, visit docs/GettingStarted.md for an introductory walkthrough.

Then, take a look at some of the other documentation at docs/ and examples/:

Get help

Installation issues

First, check out the instructions above
Then, ask questions on our user group

Usage questions

First, check out the Getting Started page
Then, ask questions on our user group

Bugs and feature requests

Please let us know by filing a new issue
Bonus points if you want to contribute by opening a pull request!
- You will need to send a signed copy of the Contributor License Agreement to [email protected] before your change can be accepted.

Notice on security

Users shall understand that DIGITS is not designed to be run as an exposed external web service.

Comments

Torch Data Augmentation
Data augmentation needs little introduction I recon. It counters overfitting and makes your model generalize better, yielding better validation accuracies; or alternatively, allows you to use smaller datasets with similar performance.

In the Zoo that's the internet, I see many implementations of different augmentations, of which few are proper and nicely portable. A part from Digits yielding a great UI; ease of use; and deep learning turn-key solution, I strongly feel we can expand to the functional side as well to make this a deep learning killer-app.

For torch, I have made an implementation during lua preprocessing from frontend to backend to enable Digits to do so. In #330 there was already an attempt for augmentation, which happened on the dataset-creation side; something I am strongly against. Resizing and cropping I would consider a transformation, while I consider augmenting the data in its container an augmentation. I think therefore it's fine to resize during dataset loading (and squashing/filling/etc), but I would probably leave it at that.

Anyway, I set up a more dynamic structure to pass around these options on the torch side; instead of adding a dozen of arguments to each function, I am just adding a table.

Implements the following (screenshot):

I have iterated through many augmentation types but these were the most useful. Almost done, now running elaborate tests.

Progress

The code is already functional, though see progress below. See code, shoot!

Features

[x] Make UI data transforms only visible for the Torch framework (invisible for Caffe)

[x] ~~Implement UI option for normalization (scales the [0 255] to [0 1])~~

[x] Data Augmentation UI

[x] Flips (mirrors)

[x] Quadrilateral rotations

[x] Arbitrary rotations

[x] Arbitrary scales

[x] Augmenting in HSV space

[x] Augmenting with noise (Thoughts?)

[x] [Travis] Tests

[x] Use Data Augmentation Template: data_augmentation.html

Testing

[x] No augmentation

[x] Flips (mirrors)

[x] Quadrilateral rotations

[x] Arbitrary rotations

[x] Arbitrary scales

[x] Arbitrary rotations & arbitrary scales

[x] Augmenting in HSV space

[x] Augmenting with noise

[x] All Augmentations & benchmark speed; identify bottlenecks

[x] Verify models reporting a slower learning/less overfitting trade-off : more generalization.

enhancement torch
opened by TimZaman 46
running on multiple GPU is very slow

I am trying to run 50-layer residual network with 4 K40m GPUs and it's very slow (same batch_size 16 as running on single GPU), take 6 hours for 1 epoch. However, If I run it on 1 GPU the speed is normal.

System: CentOS, digits v3, nvcaffe-0.14

BTW, I tried use Googlenet and it was ok on 4 GPUs.

Any suggestion or potential issue?
duplicate

opened by 201power 37

ERROR: Expected caffe suffix "-nv". libcaffe.so does not match. Are you building from the NVIDIA/caffe fork?

Hi,

I'm running on Ubuntu 14.4 LTS.

ERROR: Expected caffe suffix "-nv". libcaffe.so does not match. Are you building from the NVIDIA/caffe fork?

[email protected]:~/digits$ pip install -r requirements.txt
You are using pip version 7.0.3, however version 7.1.0 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Requirement already satisfied (use --upgrade to upgrade): Pillow>=2.3.0 in /home/ubuntu/anaconda/lib/python2.7/site-packages (from -r requirements.txt (line 1))
Requirement already satisfied (use --upgrade to upgrade): numpy>=1.7 in /home/ubuntu/anaconda/lib/python2.7/site-packages (from -r requirements.txt (line 2))
Requirement already satisfied (use --upgrade to upgrade): scipy>=0.13.3 in /home/ubuntu/anaconda/lib/python2.7/site-packages (from -r requirements.txt (line 3))
Collecting protobuf>=2.5.0 (from -r requirements.txt (line 4))
  Downloading protobuf-2.6.1.tar.gz (188kB)
    100% |████████████████████████████████| 188kB 2.3MB/s 
Collecting pydot>=1.0.2 (from -r requirements.txt (line 5))
  Downloading pydot-1.0.2.tar.gz
Requirement already satisfied (use --upgrade to upgrade): six>=1.5.2 in /home/ubuntu/anaconda/lib/python2.7/site-packages (from -r requirements.txt (line 6))
Requirement already satisfied (use --upgrade to upgrade): requests>=2.2.1 in /home/ubuntu/anaconda/lib/python2.7/site-packages (from -r requirements.txt (line 7))
Requirement already satisfied (use --upgrade to upgrade): gevent>=1.0 in /home/ubuntu/anaconda/lib/python2.7/site-packages (from -r requirements.txt (line 8))
Requirement already satisfied (use --upgrade to upgrade): Flask>=0.10.1 in /home/ubuntu/anaconda/lib/python2.7/site-packages (from -r requirements.txt (line 9))
Collecting Flask-WTF>=0.11 (from -r requirements.txt (line 10))
  Downloading Flask_WTF-0.12-py2-none-any.whl
Collecting Flask-SocketIO (from -r requirements.txt (line 11))
  Downloading Flask-SocketIO-0.6.0.tar.gz
Collecting lmdb (from -r requirements.txt (line 12))
  Downloading lmdb-0.86.tar.gz (144kB)
    100% |████████████████████████████████| 147kB 2.9MB/s 
Requirement already satisfied (use --upgrade to upgrade): nose>=1.3.1 in /home/ubuntu/anaconda/lib/python2.7/site-packages (from -r requirements.txt (line 13))
Requirement already satisfied (use --upgrade to upgrade): mock>=1.0.1 in /home/ubuntu/anaconda/lib/python2.7/site-packages (from -r requirements.txt (line 14))
Requirement already satisfied (use --upgrade to upgrade): beautifulsoup4>=4.2.1 in /home/ubuntu/anaconda/lib/python2.7/site-packages (from -r requirements.txt (line 15))
Requirement already satisfied (use --upgrade to upgrade): selenium>=2.25.0 in /home/ubuntu/anaconda/lib/python2.7/site-packages (from -r requirements.txt (line 16))
Collecting gunicorn (from -r requirements.txt (line 17))
  Downloading gunicorn-19.3.0-py2.py3-none-any.whl (110kB)
    100% |████████████████████████████████| 110kB 3.8MB/s 
Requirement already satisfied (use --upgrade to upgrade): setuptools in /home/ubuntu/anaconda/lib/python2.7/site-packages/setuptools-17.1.1-py2.7.egg (from protobuf>=2.5.0->-r requirements.txt (line 4))
Requirement already satisfied (use --upgrade to upgrade): pyparsing in /home/ubuntu/anaconda/lib/python2.7/site-packages (from pydot>=1.0.2->-r requirements.txt (line 5))
Requirement already satisfied (use --upgrade to upgrade): Werkzeug in /home/ubuntu/anaconda/lib/python2.7/site-packages (from Flask-WTF>=0.11->-r requirements.txt (line 10))
Collecting WTForms (from Flask-WTF>=0.11->-r requirements.txt (line 10))
  Downloading WTForms-2.0.2-py27-none-any.whl (128kB)
    100% |████████████████████████████████| 131kB 3.3MB/s 
Collecting gevent-socketio>=0.3.6 (from Flask-SocketIO->-r requirements.txt (line 11))
  Downloading gevent_socketio-0.3.6-py27-none-any.whl
Requirement already satisfied (use --upgrade to upgrade): gevent-websocket in /home/ubuntu/anaconda/lib/python2.7/site-packages (from gevent-socketio>=0.3.6->Flask-SocketIO->-r requirements.txt (line 11))
Installing collected packages: protobuf, pydot, WTForms, Flask-WTF, gevent-socketio, Flask-SocketIO, lmdb, gunicorn
  Running setup.py install for protobuf
  Running setup.py install for pydot
  Running setup.py install for Flask-SocketIO
  Running setup.py install for lmdb
Successfully installed Flask-SocketIO-0.6.0 Flask-WTF-0.12 WTForms-2.0.2 gevent-socketio-0.3.6 gunicorn-19.3.0 lmdb-0.86 protobuf-2.6.1 pydot-1.0.2
[email protected]:~/digits$ sudo apt-get install graphviz
Reading package lists... Done
Building dependency tree       
Reading state information... Done
graphviz is already the newest version.
The following packages were automatically installed and are no longer required:
  linux-headers-3.13.0-49 linux-headers-3.13.0-49-generic
  linux-image-3.13.0-49-generic linux-image-extra-3.13.0-49-generic
Use 'apt-get autoremove' to remove them.
0 upgraded, 0 newly installed, 0 to remove and 267 not upgraded.
[email protected]:~/digits$ ./digits-devserver
  ___ ___ ___ ___ _____ ___
 |   \_ _/ __|_ _|_   _/ __|
 | |) | | (_ || |  | | \__ \
 |___/___\___|___| |_| |___/

Welcome to the DIGITS config module.

Where is caffe installed?
    (enter "SYS" if installed system-wide)
    [default is SYS]
(q to quit) >>> SYS
ERROR: Expected caffe suffix "-nv". libcaffe.so does not match. Are you building from the NVIDIA/caffe fork?

(q to quit) >>>

caffe

opened by dbl001 35

Accuracy & confusion matrix
See #17

Adds a new kind of job for performance evaluation of trained classifiers. It is now possible to visualize :

accuracy / recall curve

confusion matrix

Accuracy and the confusion matrix are computed against a chosen snapshot of a training task, and against both the validation set and testing set (if it exists). An "evaluate performance" button has been added on the training view. This is currently the only way to run an evaluation job. The results are stored in the job directory in the form of two pickle files.

Accuracy / recall curve

Confusion matrix

I chose a very simple representation of the confusion matrix (not in the form of a matrix !), because it is more adapted to datasets with lots of classes. For each class, the top 10 most represented classes are displayed, with their respective %.

Related jobs

I added a "Related jobs" section on each job show view. It displays the jobs which depends on the current job. For example, models trained on a specific dataset, evaluations ran on a specific model.

Let me know what you think, critiques and comments are more than welcome.
opened by groar 29
Windows Compatibility

On my machine the image serving, e.g. of the mean.jpg does not work. The browser (tested IE and Chrome) cannot interpret the image probably due to the missing content type. The send_file function takes care of that all.
windows

opened by crohkohl 27
Add support for HDF5 datasets
Closes #224

TODO before merge

[x] Create models from HDF5 datasets using HDF5Data layers

[x] Expose backend and compression information in REST API

[x] Shard HDF5 files into acceptable dataset sizes - https://github.com/BVLC/caffe/issues/2953#issuecomment-137274066

TODO after merge

Allow non-image data (see #197)

Analyze prebuilt HDF5 datasets in "generic" path

enhancement
opened by lukeyeager 26
Set map_size for LMDB

@crohkohl, @danst18, I'm breaking the discussion in #203 out into a new issue.

Here's the situation as I understand it. Please correct me if any of this is wrong.

| map_size | Linux | OSX & Windows | | --- | --- | --- | | lower than size of dataset | LMDB runs out of memory | ? | | higher than system memory | No problem | LMDB can't allocate enough memory |

On Linux, you can just set it as high as you like and never see a problem. But that strategy blows up on other platforms.

Should [map_size] be made configurable? https://github.com/NVIDIA/DIGITS/pull/203#issuecomment-128859465

This is a sufficient but lazy solution. I would like to understand whether this can be avoided programmatically somehow before making a decision. My googling skills are failing me.
question

opened by lukeyeager 26

can't find hdf5.h when build caffe

I want to install digits on my debian jessie.
When I build caffe(NVIDIA's fork), I got errors complaining that hdf5.h could not be found.

I'm sure I had installed libhdf5-serial-dev and libhdf5-dev, and I found the header file in /usr/include/hdf5/serial and its libs in /usr/lib/x86_64-linux-gnu.

So, what's wrong? Some one help me?

The build error message show below:

(venv)➜  caffe  make all --jobs=4
CXX src/caffe/layer_factory.cpp
CXX src/caffe/util/insert_splits.cpp
CXX src/caffe/util/db.cpp
CXX src/caffe/util/upgrade_proto.cpp
In file included from src/caffe/util/upgrade_proto.cpp:10:0:
./include/caffe/util/io.hpp:8:18: fatal error: hdf5.h: no such file or directory
 #include "hdf5.h"
                  ^
compilation terminated.
Makefile:512: recipe for target '.build_release/src/caffe/util/upgrade_proto.o' failed
make: *** [.build_release/src/caffe/util/upgrade_proto.o] Error 1
make: *** 正在等待未完成的任务....
In file included from ./include/caffe/common_layers.hpp:10:0,
                 from ./include/caffe/vision_layers.hpp:10,
                 from src/caffe/layer_factory.cpp:6:
./include/caffe/data_layers.hpp:9:18: fatal error: hdf5.h: no such file or directory
 #include "hdf5.h"
                  ^
compilation terminated.
Makefile:512: recipe for target '.build_release/src/caffe/layer_factory.o' failed
make: *** [.build_release/src/caffe/layer_factory.o] Error 1

question caffe platform

opened by tangshi 26

mAP always zero

I can't figure out why my model training mAP (val) doesn't get above zero. I'm trying to use the same approach and the SpaceNet_DetectNet_Train_Val.prototxt from this article.

My label files 000n.txt look like this: p 0.0 0 0.0 0 0 24 118 0 0 0 0 0 0 0 0

My images are 1280x1280, and I'm using these custom classes: dontcare,p

Where am I going wrong?
object-detection

opened by DarylWM 25

CUDNN_STATUS_BAD_PARAM

Ubuntu 14.04LTS Clean install nvidia dpkg install

$ sudo apt-get install cuda
$ sudo apt-get install digits

$ gedit .bashrc
add to endline next.

export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

$ sudo reboot

$ nvidia-smi
Tue May 31 13:32:37 2016       
+------------------------------------------------------+                       
| NVIDIA-SMI 352.93     Driver Version: 352.93         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 960     Off  | 0000:01:00.0      On |                  N/A |
| 20%   37C    P8    10W / 160W |    289MiB /  4095MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 960     Off  | 0000:02:00.0     Off |                  N/A |
| 20%   43C    P8     9W / 160W |     13MiB /  4095MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Tue_Aug_11_14:27:32_CDT_2015
Cuda compilation tools, release 7.5, V7.5.17

----digits run and create Dataset----

MNIST Image Size28x28 Image Type GRAYSCALE

run Image Classification Model

select Caffe and LeNet

run, and rize next error

ERROR: Check failed: status == CUDNN_STATUS_SUCCESS (3 vs. 0) CUDNN_STATUS_BAD_PARAM

bug

opened by shinfo001 25

Error: status == CUDNN_STATUS_SUCCESS (8 vs. 0) CUDNN_STATUS_EXECUTION_FAILED

I am getting this error when trying to run training with my custom network.

status == CUDNN_STATUS_SUCCESS (8 vs. 0) CUDNN_STATUS_EXECUTION_FAILED

I found this post that refers to this error: https://github.com/BVLC/caffe/issues/1700#issuecomment-133476490

But it doesn't specify where or how to fix it. Also I am not sure if the issues are related or something completely different. Let me mention that this custom framework works perfectly fine when I run it in my local caffe install, and I can also see all the nodes if I hit the visualize button. It starts training and fails after the first epoch.

bug

opened by alfredox10 24
Fix TypeError

File "/opt/digits/digits/extensions/data/imageSegmentation/data.py", line 225, in split_image_list random.shuffle(self.random_indices) File "/usr/lib/python3.8/random.py", line 307, in shuffle x[i], x[j] = x[j], x[i] TypeError: 'range' object does not support item assignment

opened by vertexodessa 0
DIGITS DOCKET CONTAINER INSTALLING SUNNY PLUGIN

I'm Sorry, I'm trying to install Sunnybrook for the segmentation example on the docker container, as I want to run it over the TensorFlow backend (not Coffe). I tried to repeat the install procedure from inside the container doing docker exec -it XXXXX bash, being XXX the container ID, and later downloading the plugin from https://github.com/NVIDIA/DIGITS/tree/master/plugins/data and later doing the install proccedure, but it not works. Is there any official way to do this? I did pip install --ignore-installed setuptools (no error appears)

Installing collected packages: setuptools Successfully installed setuptools-44.1.1

git clone https://github.com/NVIDIA/DIGITS.git I went to /DIGITS/plugins/data/sunnybrook via "cd" finally I run pip install . No error appear, but after restarting docker, when trying to create a Sunny dataset it fails (See in the following post the error, I've posted appart, for clarity)

Can you help please? Kind regards

opened by crmuinos 1
I'm confused between which version of DIGITS to install

Apologies in advance since I'm new to all this but I'm confused regarding which version of DIGITS to install. I'm beginning a fresh install of the latest Ubuntu version and as of now, after hours of scouring the internet, I have found DIGITS versions that work standalone, versions that work in Docker, then there's the official DIGITS github page which has DIGITS upto version 6 and on the NGC, there's DIGITS 20.03???

What is going on I'm so confused. I was excited to get DIGITS up and running on my local machine just as soon as I had completed the Nvidia DLI's course and now I'm just stumped as to where to start. Would also like to know how different is DIGITS running for Tensorflow from the Caffe DIGITS.

Please help.

opened by RazaZaidi2802 0
cannot see detectnet bounding boxes using Caffe model on Nano

We have trained and deployed a custom model on the nano using a caffe detectnet model. We trained in digits, and it works well when conducting inference in DIGITS, but it will not show bounding boxes when running on the nano. Is there a patch for this issue?

opened by eanmikale 0
Module Creation erros

So I am about to train with digits as specify in Hello AI Wold an then

this is the run code

inception_5b/relu_pool_proj ← inception_5b/pool_proj inception_5b/relu_pool_proj → inception_5b/pool_proj (in-place) Setting up inception_5b/relu_pool_proj TRAIN Top shape for layer 158 ‘inception_5b/relu_pool_proj’ 5 128 40 40 (1024000) Creating layer ‘inception_5b/output’ of type ‘Concat’ Layer’s types are Ftype:FLOAT Btype:FLOAT Fmath:FLOAT Bmath:FLOAT Created Layer inception_5b/output (159) inception_5b/output ← inception_5b/1x1 inception_5b/output ← inception_5b/3x3 inception_5b/output ← inception_5b/5x5 inception_5b/output ← inception_5b/pool_proj inception_5b/output → inception_5b/output Setting up inception_5b/output TRAIN Top shape for layer 159 ‘inception_5b/output’ 5 1024 40 40 (8192000) Creating layer ‘pool5/drop_s1’ of type ‘Dropout’ Layer’s types are Ftype:FLOAT Btype:FLOAT Fmath:FLOAT Bmath:FLOAT Created Layer pool5/drop_s1 (160) pool5/drop_s1 ← inception_5b/output pool5/drop_s1 → pool5/drop_s1 Check failed: status == CUDNN_STATUS_SUCCESS (8 vs. 0) CUDNN_STATUS_EXECUTION_FAILED, device 0

I am using a 2070 super

Server: 9dca63a42e15 DIGITS version: 6.1.1 Caffe version: 0.17.0 Caffe flavor: NVIDIA My brain is soup at this point please help me out. caffe_output.log

I have not be able to create one model yet

I am also unable to install the source digits without crashing Ubuntu. Today is May 11 and I started trying to have it work since the 7th please could you help me out. I am really exited about this tool.

opened by cespedesk 0

Releases(v6.1.1)

v6.1.1(Apr 10, 2018)
Since 6.1.0

Bugfixes

Update for new TF API (#2014)

Update CI scripts to add some new deps to Caffe build (#1993)

Update import and API for pydicom 1.0

Fix label distribution and its view page (#1916)

Source code(tar.gz)
Source code(zip)
v6.1.0(Dec 12, 2017)
Since 6.0

New Features

Added functionality to integrate DIGITS with S3 Endpoints (#1868)

Added publish to inference server on classification workflow (#1906)

Bugfixes

Fix frozen graph issue (#1907)

Fix 404 error for /datasets/inference-form/... from #1888 (#1889)

Remove timeout assertion (#1859)

Changes

Various updates on document

Known Issues

Out of memory error in the semantic-segmentation example when training the FCN AlexNet model on Tesla P100.

Source code(tar.gz)
Source code(zip)
v6.0.0(Aug 30, 2017)
See release notes for the 6.0 release candidate.

Since 6.0 RC1

New Features

Added support for URL prefix (#1803)

Bugfixes

Fixed loading/saving tensorflow models (#1794)

Changes

Various updates on document

Known Issues

Visualization for Caffe models does not currently work. (#1738)

Source code(tar.gz)
Source code(zip)
v6.0.0-rc.1(Jul 25, 2017)
New Features

Added TensorFlow backend for DIGITS as an alternate to Caffe and Torch (#1714)

Added examples and support for GANs (#1714)

Added support for text classification (#1025)

Added more viewing options for image segmentation (#1188)

Changes

HTML embedding now defaults to PNG (#1270)

Images that causes exceptions will now show the file name (#1636)

Bugfixes

Fixed softmax visualization issue with scaled images (#1647)

Documentation was changed for model store with official pictures (#1650)

Fixed Caffe search path in Windows (#1244)

Fixed image file entry in Sunnybrook inference form (#1237)

Fixed bugs when visiting nested image folder (#1477)

Known Issues

Visualization for Caffe models does not currently work. (#1738)

Source code(tar.gz)
Source code(zip)
v5.0.0(Feb 2, 2017)
See release notes for the 5.0 release candidate.

New since 5.0 RC

Enable the DIGITS Model Store (https://github.com/NVIDIA/DIGITS/pull/1308)

Fix calculations related to batch accumulation for Caffe (https://github.com/NVIDIA/DIGITS/pull/1307)

Various documentation updates

Source code(tar.gz)
Source code(zip)
v5.0.0-rc.1(Oct 15, 2016)
279 commits since v4.0.0

New Features

Import pretrained models from a model "store" (#896, #1077, #1161)

Support for image segmentation workflows (#830, #961, #1131)

Online data augmentation with Torch (#777)

Show CPU and system memory utilization during training (#800)

Improved bounding-box visualizations for object detection models (#869)

Create groups of jobs for easier display on the home page (#734)

Reuse data extensions for inference (#1024)

Support for plugin extensions (#1093, #927, #947)

Add documentation for the REST API (#964)

Changes

Use environment variables for configuration instead of a file (#1091)

Remove digits-server and dependency on gunicorn (#1127)

digits-devserver is now just a small shell script instead of a Python script (#1121)

New design for Torch multi-GPU training (#828)

Add Ubuntu 16.04 support by updating dependency versions (#965)

Allow testing of only Caffe or only Torch with the testsuite (#1143)

Return more info when downloading a model tarball or json (#891)

Bugfixes

Fix bug with Torch and CUDA_VISIBLE_DEVICES (#1130)

Fix issues with browsers returning incorrectly cached css and js files (#904)

Known Issues

Training goes on longer than required when using batch accumulation (#1240)

Source code(tar.gz)
Source code(zip)
v4.0.0(Jul 19, 2016)
529 commits since v3.0.0

New Features

Add support for object-detection networks like DetectNet (#735) with documentation (#803)

Parameter sweep over batch size and learning rate (#708)

Show accuracy confusion matrix for "Classify Many" (#608)

Test a model with an LMDB (#638)

Add basic login functionality (#463)

Changes

Major revamp of home page (#728, #790)

Allow use of BVLC/caffe (#769)

Run inference jobs in separate processes (#573)

Bugfixes

Made device_query compatible with CUDA 8.0 (#890)

For more information, see the release notes for v3.1, v3.2, v3.3, and the 4.0 RC.
Source code(tar.gz)
Source code(zip)
v4.0.0-rc.2(Jul 19, 2016)
211 commits since v3.3.0

New Features

Add support for object-detection networks like DetectNet (#735) with documentation (#803)

Parameter sweep over batch size and learning rate (#708)

Add plugin systems for data formats (#731) and inference visualizations (#756)

Expose Caffe's iter_size solver option (#744)

Add syntax highlighting when editing custom networks (#751)

View list of related jobs (#767)

Explore generic datasets (#822)

Add example for doing text classification with Torch (#684)

Changes

Major revamp of home page (#728, #790)

Allow use of BVLC/caffe (#769)

New Torch multi-GPU programming model (#732)

Make small improvements to standard networks (#733, #749)

Set weight_decay to lr / 100 (#792)

Make major improvements to TravisCI build system (#766, #788)

Source code(tar.gz)
Source code(zip)
v3.3.0(Apr 25, 2016)
New Features

Show accuracy confusion matrix for "Classify Many" (#608)

Test a model with an LMDB (#638)

Use layer stages in network descriptions for full control over train/val/deploy networks (#628)

Option to limit number of images to use for "Classify/Test Many" (#592)

Better in-app documentation for Python layers (#651)

Changes

Run inference jobs in separate processes (#573)

Path autocompletion returns sorted list (#621)

Bugfixes

Fixed UI bugs when using Safari (#702)

Fixed file serving for files with absolute paths (#586)

Fixed some UI bugs related to permissions (#594, #596)

Various torch-related bugfixes (#661, #663, #681, #686, #699)

Windows compatibility fixes (#698)

Source code(tar.gz)
Source code(zip)
v3.2.0(Feb 18, 2016)
New Features

Add support for new solvers - RMSprop, AdaDelta and Adam (#564)

AlexNet for Torch now works for multiple GPUs (#539)

New documentation for installing CUDA toolkit, drivers, etc. (#558)

Changes

Only look in one location for config files (#541)

Re-use weights when retraining a model on the same dataset (#538)

Functional improvements and documentation changes for examples/classification (#559, #557, #579, #582)

Better error-checking for caffe networks referencing invalid layer "bottoms" (#576)

Bugfixes

Fixes for multistep learning rate (#549, #550)

Source code(tar.gz)
Source code(zip)
v3.1.0(Jan 22, 2016)
New Features

Enable multi-GPU for Torch (#480)

Add basic login functionality (#463)

Allow Torch to fine-tune pretrained models (#499)

Allow Caffe to fine-tune from multiple pretrained models (#498)

New tutorials

Fine-tuning (#500)

Siamese networks (#453)

Weight initialization (#522)

Allow optional specification of image folder during multiple inference (#526)

Changes

Torch performance improvements (#368, #390, #441, #339)

Disable colormap for "Top N" feature (#481)

Better real-time updates for dataset creation (#473)

Better display for device_query tool (#497)

Display the job directory for all job types (#469)

Use Flask "Blueprints" to cleanup routing code (#507)

Cleanup and alphabetize imports throughout the project (#501)

Removed docs/API.md and docs/FlaskRoutes.md (a05356ebfe0fe462f20143625ec8c942847348de)

Bugfixes

Enable importing of LMDBs created with Caffe's convert_imageset tool (#517)

Source code(tar.gz)
Source code(zip)
v3.0.0(Jan 22, 2016)
See release notes for v3.0 RC.

New since 3.0 RC

Fix handling of unencoded LMDBs in Torch (#475)

Significant performance enhancement for creating datasets (#491)

Various documentation fixes / updates

Source code(tar.gz)
Source code(zip)
v3.0.0-rc.3(Dec 10, 2015)
New Features

Add Torch7 as an alternative backend to Caffe (#324, #345)

Make using python layers easier by [optionally] attaching a python file to each model (#329)

Add the ability to clone previous jobs with a click (#334)

Update the homepage to show job updates in real-time (#240)

Enable mean subtraction by subtracting the mean file as well as subtracting the mean pixel (#321)

Support NVcaffe v0.14 (#341, #336)

Display the job directory size for each DatasetJob and ModelJob (#309)

Add a backend badge (LMDB/HDF5) to DatasetJobs on the homepage (#323)

Explore images in LMDB datasets (#331)

Changes

Use port 34448 for the digits-server instead of port 8080 (#392)

Remove digits-walkthrough (#352)

Enforce standard UI for file input fields across different browsers (#325)

Bugfixes

Fix PicklingErrors issues on all platforms (#307)

Fix issue when running inference on many images at once (#361)

Known Issues

Large inference requests (i.e. "Classify many") may cause timeouts or even crashes (#479)

Incorrect handling of unencoded LMDB in Torch wrapper (#477)

Source code(tar.gz)
Source code(zip)
v2.2.1(Sep 17, 2015)
New since 2.2.0

Fixed snapshot list for previous networks (#285)

Fixed parameter counting (#317)

Source code(tar.gz)
Source code(zip)
v2.2.0(Sep 16, 2015)
New Features

Add [initial] support for HDF5 datasets (#226)

Zoom in on weight/activation visualizations (#267)

Add a new page for comparing training results (#195)

Add notes to jobs (#283)

Changes

Open inference results in a new browser tab (#244)

Various improvements for using prebuilt LMDBs (#268)

Sort subfolders when parsing a folder of images (#296)

Use input_shape instead of input_dim for deploy network prototxt (#231)

Known Issues

Using a snapshot from a previous network doesn't work unless the network is on the first page (#285)

Parameter counting fails for some layer types (like PReLU) (#317)

Source code(tar.gz)
Source code(zip)
v2.1.0(Sep 14, 2015)
New Features

Add support for "Generic Inference" (i.e. non-classification) networks (#189)

Display number of learned parameters in a model (#221)

Show ground truth in "Classify Many" if provided (#110)

Zoom in on a selection of the loss/accuracy graph (#113)

Add autocomplete for server-side path input fields (#183)

Select max/min images per class when parsing a folder of images (#161)

Allow user to download log from CreateDb tasks (#221)

Show number of available GPUs on home page (#207)

Allow local file upload for image lists (#106)

Display DIGITS version in top right of page header (#153) and in the console output (c181797cdf3ce27bf65a22fd39fbc61b95ecaab6)

Changes

Double the LMDB map_size when running out of memory instead of setting to 1TB (#209)

requires py-lmdb 0.87

Rename default GoogLeNet layers and tops (9ff246eed47ec04461956b133495260855168e2e)

Add pagination to Previous Networks list (c181797cdf3ce27bf65a22fd39fbc61b95ecaab6)

Various changes that help with Windows compatibility (#199)

Major refactoring of tests (#192)

Known issues

Parameter counting fails for some layer types (like PReLU) (#317)

Source code(tar.gz)
Source code(zip)
v2.0.0(Sep 3, 2015)
New Features

Enabled support for multi-GPU Caffe (#92)

Select multiple and/or specific GPUs for training (#92, #104)

Created new routes for JSON REST API (#134, #136)

Started using GPU for inference (#66)

Added NVML info about GPU memory/utilization (#93)

Enabled ADAGRAD and NESTEROV as alternative solver types (@drozdvadym in #102)

Added scripts to download standard datasets MNIST and CIFAR

Added option to set server name (#111)

Added support for PPM images (#123)

Enabled path autocompletion while setting values in the configuration (#96)

Changes

Added a python classification example (#147)

Subtract mean pixel during training (#169)

Added TravisCI integration to run tests (#28)

Added Coveralls integration for test coverage

Added Landscape integration to inspect code

Added auto-generated documentation of the webapp’s HTTP routes

Switched to loading config files from new, more logical locations (#96)

Started suppressing most of Caffe’s raw output (b382e99b8a143c9bbbf659ba74e67bf2ef12718e, 019bc6ca750601396a502ad0fd2b0d47b239f0d7)

Added a CLA

Bugfixes

Fixed various OSX platform-specific issues (#32, @trivedigaurav in #94)

Known Issues

Some motherboards cause P2P bandwidth issues (https://github.com/NVIDIA/caffe/issues/10)

Source code(tar.gz)
Source code(zip)
v2.0.0-rc3(Jul 31, 2015)
See release notes for v2.0.0-preview.

New since 2.0 Preview

Recommend NVIDIA/Caffe v0.13(https://github.com/NVIDIA/DIGITS/commit/5dc0f8e646d28587c07ff6fe9bcd1990820b41c2)

Requires cuDNN v3

Subtract mean pixel during training (#169)

Fixes regarding deployment of digits-server (c9a9dce2fcf7bb12363e6cccc44a6dd0a26a8271, e7bbc63213a10bbea516ee51adc5ffcf160494e8)

Source code(tar.gz)
Source code(zip)
v2.0.0-preview(Jul 7, 2015)
New Features

Enabled support for multi-GPU Caffe (#92)

Select multiple and/or specific GPUs for training (#92, #104)

Created new routes for JSON REST API (#134, #136)

Started using GPU for inference (#66)

Added NVML info about GPU memory/utilization (#93)

Enabled ADAGRAD and NESTEROV as alternative solver types (@drozdvadym in #102)

Added scripts to download standard datasets MNIST and CIFAR

Added option to set server name (#111)

Added support for PPM images (#123)

Enabled path autocompletion while setting values in the configuration (#96)

Changes

Added a python classification example (#147)

Added TravisCI integration to run tests (#28)

Added Coveralls integration for test coverage

Added Landscape integration to inspect code

Added auto-generated documentation of the webapp’s HTTP routes

Switched to loading config files from new, more logical locations (#96)

Started suppressing most of Caffe’s raw output (b382e99b8a143c9bbbf659ba74e67bf2ef12718e, 019bc6ca750601396a502ad0fd2b0d47b239f0d7)

Added a CLA

Bugfixes

Fixed various OSX platform-specific issues (#32, @trivedigaurav in #94)

Known Issues

Some motherboards cause P2P bandwidth issues (https://github.com/NVIDIA/caffe/issues/10)

Source code(tar.gz)
Source code(zip)
v1.1.2(Jun 26, 2015)
See release notes for v1.1.0.

New since 1.1.0

Fixed a few things in the documentation (6ab2d6f8e0541fb92cf157b8d95072f057fa2459)

Fixed upgrade path for datasets and jobs created with older versions of DIGITS (6a838e0b44a4352480b889efac848845bccad5fc)

Source code(tar.gz)
Source code(zip)
v1.1.0(Apr 24, 2015)
New Features

Add GoogLeNet as a default network (#11)

"Classify Many Images" shows classification results of many images at once (#61)

Show statistics (mean, standard deviation, histogram of values) for each layer of the network at inference time (#67)

Allow saving images in database with PNG encoding (#73)

Optionally turn off shuffling when creating a dataset (#72)

Optionally provide a random seed to caffe (73fe257)

Changes

Upgrade to NVIDIA/caffe version 0.11.0 (e2bcb27)

Update pip requirements list to match packages available on Ubuntu 14.04 where possible (4162db4, 133213d)

Use C3.js instead of Google Charts to enable DIGITS to run without an internet connection (#34)

Change default image resize mode from HALF_CROP to SQUASH (b4f3261)

Bugfixes

Save images in BGR order instead of RGB because caffe uses OpenCV to read encoded images (#59)

Scale the LeNet standard network by the standard deviation of MNIST (~80) during train, val and test phases (5a38aa5, 23c1a78)

Use a white background when removing transparency from images (#85)

Known Issues

The GoogLeNet standard network is not behaving correctly when trained on the full ImageNet dataset (#82)

"Classify Many Images" may timeout if too many images are uploaded and the server takes too long to respond (#70)

Source code(tar.gz)
Source code(zip)

Owner

NVIDIA Corporation

GitHub Repository https://developer.nvidia.com/digits

Official git for "CTAB-GAN: Effective Table Data Synthesizing"

CTAB-GAN This is the official git paper CTAB-GAN: Effective Table Data Synthesizing. The paper is published on Asian Conference on Machine Learning (A

30 Dec 26, 2022

Reviatalizing Optimization for 3D Human Pose and Shape Estimation: A Sparse Constrained Formulation

Reviatalizing Optimization for 3D Human Pose and Shape Estimation: A Sparse Constrained Formulation This is the implementation of the approach describ

47 Nov 15, 2022

FEDn is an open-source, modular and ML-framework agnostic framework for Federated Machine Learning

FEDn is an open-source, modular and ML-framework agnostic framework for Federated Machine Learning (FedML) developed and maintained by Scaleout Systems. FEDn enables highly scalable cross-silo and cr

75 Nov 09, 2022

A collection of awesome resources image-to-image translation.

awesome image-to-image translation A collection of resources on image-to-image translation. Contributing If you think I have missed out on something (

876 Dec 28, 2022

Translation-equivariant Image Quantizer for Bi-directional Image-Text Generation

Translation-equivariant Image Quantizer for Bi-directional Image-Text Generation Woncheol Shin1, Gyubok Lee1, Jiyoung Lee1, Joonseok Lee2,3, Edward Ch

7 Sep 26, 2022

PyTorch implementation of "MLP-Mixer: An all-MLP Architecture for Vision" Tolstikhin et al. (2021)

mlp-mixer-pytorch PyTorch implementation of "MLP-Mixer: An all-MLP Architecture for Vision" Tolstikhin et al. (2021) Usage import torch from mlp_mixer

27 Jul 09, 2022

Keras implementation of the GNM model in paper ’Graph-Based Semi-Supervised Learning with Nonignorable Nonresponses‘

Graph-based joint model with Nonignorable Missingness (GNM) This is a Keras implementation of the GNM model in paper ’Graph-Based Semi-Supervised Lear

2 Apr 17, 2022

An OpenAI Gym environment for Super Mario Bros

gym-super-mario-bros An OpenAI Gym environment for Super Mario Bros. & Super Mario Bros. 2 (Lost Levels) on The Nintendo Entertainment System (NES) us

1 Jan 05, 2022

Unofficial implementation of PatchCore anomaly detection

PatchCore anomaly detection Unofficial implementation of PatchCore(new SOTA) anomaly detection model Original Paper : Towards Total Recall in Industri

268 Dec 22, 2022

Source code, datasets and trained models for the paper Learning Advanced Mathematical Computations from Examples (ICLR 2021), by François Charton, Amaury Hayat (ENPC-Rutgers) and Guillaume Lample

Maths from examples - Learning advanced mathematical computations from examples This is the source code and data sets relevant to the paper Learning a

171 Nov 23, 2022

Code for ACL2021 long paper: Knowledgeable or Educated Guess? Revisiting Language Models as Knowledge Bases

LANKA This is the source code for paper: Knowledgeable or Educated Guess? Revisiting Language Models as Knowledge Bases (ACL 2021, long paper) Referen

30 Oct 24, 2022

Pytorch Implementation of "Contrastive Representation Learning for Exemplar-Guided Paraphrase Generation"

CRL_EGPG Pytorch Implementation of Contrastive Representation Learning for Exemplar-Guided Paraphrase Generation We use contrastive loss implemented b

25 Nov 14, 2022

code and models for "Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation"

Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation This repository contains code and models for the method described in: Golnaz

55 Jun 18, 2022

Select, weight and analyze complex sample data

Sample Analytics In large-scale surveys, often complex random mechanisms are used to select samples. Estimates derived from such samples must reflect

37 Dec 15, 2022

The code is the training example of AAAI2022 Security AI Challenger Program Phase 8: Data Centric Robot Learning on ML models.

Example code of [Tianchi AAAI2022 Security AI Challenger Program Phase 8]

22 Oct 14, 2022

Source code for our paper "Improving Empathetic Response Generation by Recognizing Emotion Cause in Conversations"

Source code for our paper "Improving Empathetic Response Generation by Recognizing Emotion Cause in Conversations" this repository is maintained by bo

24 Nov 29, 2022

PyTorch code for 'Efficient Single Image Super-Resolution Using Dual Path Connections with Multiple Scale Learning'

Efficient Single Image Super-Resolution Using Dual Path Connections with Multiple Scale Learning This repository is for EMSRDPN introduced in the foll

7 Feb 10, 2022

PyTorch reimplementation of REALM and ORQA

17 Aug 20, 2022

[WACV21] Code for our paper: Samuel, Atzmon and Chechik, "From Generalized zero-shot learning to long-tail with class descriptors"

DRAGON: From Generalized zero-shot learning to long-tail with class descriptors Paper Project Website Video Overview DRAGON learns to correct the bias

25 Dec 06, 2022

constructing maps of intellectual influence from publication data

Influencemap Project @ ANU Influence in the academic communities has been an area of interest for researchers. This can be seen in the popularity of a

13 Jun 18, 2022

Deep Learning GPU Training System

Related tags

Overview

DIGITS

Feedback

Documentation

Installation

Usage

Get help

Installation issues

Usage questions

Bugs and feature requests

Notice on security

Comments

Progress

Features

Testing

Accuracy / recall curve

Confusion matrix

Related jobs

TODO before merge

TODO after merge

Releases(v6.1.1)

v6.1.1(Apr 10, 2018)

Since 6.1.0

Bugfixes

v6.1.0(Dec 12, 2017)

Since 6.0

New Features

Bugfixes

Changes

Known Issues

v6.0.0(Aug 30, 2017)

Since 6.0 RC1

New Features

Bugfixes

Changes

Known Issues

v6.0.0-rc.1(Jul 25, 2017)

New Features

Changes

Bugfixes

Known Issues

v5.0.0(Feb 2, 2017)

New since 5.0 RC

v5.0.0-rc.1(Oct 15, 2016)

New Features

Changes

Bugfixes

Known Issues

v4.0.0(Jul 19, 2016)

New Features

Changes

Bugfixes

v4.0.0-rc.2(Jul 19, 2016)

New Features

Changes

v3.3.0(Apr 25, 2016)

New Features

Changes

Bugfixes

v3.2.0(Feb 18, 2016)

New Features

Changes

Bugfixes

v3.1.0(Jan 22, 2016)

New Features

Changes

Bugfixes

v3.0.0(Jan 22, 2016)

New since 3.0 RC

v3.0.0-rc.3(Dec 10, 2015)

New Features

Changes

Bugfixes

Known Issues

v2.2.1(Sep 17, 2015)

New since 2.2.0

v2.2.0(Sep 16, 2015)

New Features