TorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision

Last update: Jan 06, 2023

Related tags

Overview

TorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision

@misc{you2019torchcv,
    author = {Ansheng You and Xiangtai Li and Zhen Zhu and Yunhai Tong},
    title = {TorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision},
    howpublished = {\url{https://github.com/donnyyou/torchcv}},
    year = {2019}
}

This repository provides source code for most deep learning based cv problems. We'll do our best to keep this repository up-to-date. If you do find a problem about this repository, please raise an issue or submit a pull request.

- Semantic Flow for Fast and Accurate Scene Parsing
- Code and models: https://github.com/lxtGH/SFSegNets

Implemented Papers

Image Classification
- VGG: Very Deep Convolutional Networks for Large-Scale Image Recognition
- ResNet: Deep Residual Learning for Image Recognition
- DenseNet: Densely Connected Convolutional Networks
- ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
- ShuffleNet V2: Practical Guidelines for Ecient CNN Architecture Design
- Partial Order Pruning: for Best Speed/Accuracy Trade-off in Neural Architecture Search
Semantic Segmentation
- DeepLabV3: Rethinking Atrous Convolution for Semantic Image Segmentation
- PSPNet: Pyramid Scene Parsing Network
- DenseASPP: DenseASPP for Semantic Segmentation in Street Scenes
- Asymmetric Non-local Neural Networks for Semantic Segmentation
- Semantic Flow for Fast and Accurate Scene Parsing
Object Detection
- SSD: Single Shot MultiBox Detector
- Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
- YOLOv3: An Incremental Improvement
- FPN: Feature Pyramid Networks for Object Detection
Pose Estimation
- CPM: Convolutional Pose Machines
- OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
Instance Segmentation
- Mask R-CNN
Generative Adversarial Networks
- Pix2pix: Image-to-Image Translation with Conditional Adversarial Nets
- CycleGAN: Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks.

QuickStart with TorchCV

Now only support Python3.x, pytorch 1.3.

pip3 install -r requirements.txt
cd lib/exts
sh make.sh

Performances with TorchCV

All the performances showed below fully reimplemented the papers' results.

Image Classification

ImageNet (Center Crop Test): 224x224

Model	Train	Test	Top-1	Top-5	BS	Iters	Scripts
ResNet50	train	val	77.54	93.59	512	30W	ResNet50
ResNet101	train	val	78.94	94.56	512	30W	ResNet101
ShuffleNetV2x0.5	train	val	60.90	82.54	1024	40W	ShuffleNetV2x0.5
ShuffleNetV2x1.0	train	val	69.71	88.91	1024	40W	ShuffleNetV2x1.0
DFNetV1	train	val	70.99	89.68	1024	40W	DFNetV1
DFNetV2	train	val	74.22	91.61	1024	40W	DFNetV2

Semantic Segmentation

Cityscapes (Single Scale Whole Image Test): Base LR 0.01, Crop Size 769

Model	Backbone	Train	Test	mIOU	BS	Iters	Scripts
PSPNet	3x3-Res101	train	val	78.20	8	4W	PSPNet
DeepLabV3	3x3-Res101	train	val	79.13	8	4W	DeepLabV3

ADE20K (Single Scale Whole Image Test): Base LR 0.02, Crop Size 520

Model	Backbone	Train	Test	mIOU	PixelACC	BS	Iters	Scripts
PSPNet	3x3-Res50	train	val	41.52	80.09	16	15W	PSPNet
DeepLabv3	3x3-Res50	train	val	42.16	80.36	16	15W	DeepLabV3
PSPNet	3x3-Res101	train	val	43.60	81.30	16	15W	PSPNet
DeepLabv3	3x3-Res101	train	val	44.13	81.42	16	15W	DeepLabV3

Object Detection

Pascal VOC2007/2012 (Single Scale Test): 20 Classes

Model	Backbone	Train	Test	mAP	BS	Epochs	Scripts
SSD300	VGG16	07+12_trainval	07_test	0.786	32	235	SSD300
SSD512	VGG16	07+12_trainval	07_test	0.808	32	235	SSD512
Faster R-CNN	VGG16	07_trainval	07_test	0.706	1	15	Faster R-CNN

Pose Estimation

OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields

Instance Segmentation

Mask R-CNN

Generative Adversarial Networks

Pix2pix
CycleGAN

DataSets with TorchCV

TorchCV has defined the dataset format of all the tasks which you could check in the subdirs of data. Following is an example dataset directory trees for training semantic segmentation. You could preprocess the open datasets with the scripts in folder data/seg/preprocess

Dataset
    train
        image
            00001.jpg/png
            00002.jpg/png
            ...
        label
            00001.png
            00002.png
            ...
    val
        image
            00001.jpg/png
            00002.jpg/png
            ...
        label
            00001.png
            00002.png
            ...

Commands with TorchCV

Take PSPNet as an example. ("tag" could be any string, include an empty one.)

Training

cd scripts/seg/cityscapes/
bash run_fs_pspnet_cityscapes_seg.sh train tag

Resume Training

cd scripts/seg/cityscapes/
bash run_fs_pspnet_cityscapes_seg.sh train tag

Validate

cd scripts/seg/cityscapes/
bash run_fs_pspnet_cityscapes_seg.sh val tag

Testing:

cd scripts/seg/cityscapes/
bash run_fs_pspnet_cityscapes_seg.sh test tag

Demos with TorchCV

Example output of VGG19-OpenPose

TorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision

Related tags

Overview

TorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision

Implemented Papers

QuickStart with TorchCV

Performances with TorchCV

Image Classification

Semantic Segmentation

Object Detection

Pose Estimation

Instance Segmentation

Generative Adversarial Networks

DataSets with TorchCV

Commands with TorchCV

Demos with TorchCV

Owner

Donny You

Monify: an Expense tracker Program implemented in a Graphical User Interface that allows users to keep track of their expenses

Paddle implementation for "Cross-Lingual Word Embedding Refinement by ℓ1 Norm Optimisation" (NAACL 2021)

When in Doubt: Improving Classification Performance with Alternating Normalization

RTS3D: Real-time Stereo 3D Detection from 4D Feature-Consistency Embedding Space for Autonomous Driving

Data and code for the paper "Importance of Kernel Bandwidth in Quantum Machine Learning"

Differentiable Neural Computers, Sparse Access Memory and Sparse Differentiable Neural Computers, for Pytorch

"Domain Adaptive Semantic Segmentation without Source Data" (ACM MM 2021)

Improving Contrastive Learning by Visualizing Feature Transformation, ICCV 2021 Oral

Visual Tracking by TridenAlign and Context Embedding

A Deep Reinforcement Learning Framework for Stock Market Trading

Official implementation of UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation

Code for the bachelors-thesis flaky fault localization

Code repository of the paper Neural circuit policies enabling auditable autonomy published in Nature Machine Intelligence

The materials used in the SaxonJS tutorial presented at Declarative Amsterdam, 2021

Code and training data for our ECCV 2016 paper on Unsupervised Learning

Advancing Self-supervised Monocular Depth Learning with Sparse LiDAR

Weakly supervised medical named entity classification

Official Pytorch implementation of 'GOCor: Bringing Globally Optimized Correspondence Volumes into Your Neural Network' (NeurIPS 2020)

Automatic Data-Regularized Actor-Critic (Auto-DrAC)

The Instructed Glacier Model (IGM)