TorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision

Last update: Jan 06, 2023

Related tags

Overview

TorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision

@misc{you2019torchcv,
    author = {Ansheng You and Xiangtai Li and Zhen Zhu and Yunhai Tong},
    title = {TorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision},
    howpublished = {\url{https://github.com/donnyyou/torchcv}},
    year = {2019}
}

This repository provides source code for most deep learning based cv problems. We'll do our best to keep this repository up-to-date. If you do find a problem about this repository, please raise an issue or submit a pull request.

- Semantic Flow for Fast and Accurate Scene Parsing
- Code and models: https://github.com/lxtGH/SFSegNets

Implemented Papers

Image Classification
- VGG: Very Deep Convolutional Networks for Large-Scale Image Recognition
- ResNet: Deep Residual Learning for Image Recognition
- DenseNet: Densely Connected Convolutional Networks
- ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
- ShuffleNet V2: Practical Guidelines for Ecient CNN Architecture Design
- Partial Order Pruning: for Best Speed/Accuracy Trade-off in Neural Architecture Search
Semantic Segmentation
- DeepLabV3: Rethinking Atrous Convolution for Semantic Image Segmentation
- PSPNet: Pyramid Scene Parsing Network
- DenseASPP: DenseASPP for Semantic Segmentation in Street Scenes
- Asymmetric Non-local Neural Networks for Semantic Segmentation
- Semantic Flow for Fast and Accurate Scene Parsing
Object Detection
- SSD: Single Shot MultiBox Detector
- Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
- YOLOv3: An Incremental Improvement
- FPN: Feature Pyramid Networks for Object Detection
Pose Estimation
- CPM: Convolutional Pose Machines
- OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
Instance Segmentation
- Mask R-CNN
Generative Adversarial Networks
- Pix2pix: Image-to-Image Translation with Conditional Adversarial Nets
- CycleGAN: Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks.

QuickStart with TorchCV

Now only support Python3.x, pytorch 1.3.

pip3 install -r requirements.txt
cd lib/exts
sh make.sh

Performances with TorchCV

All the performances showed below fully reimplemented the papers' results.

Image Classification

ImageNet (Center Crop Test): 224x224

Model	Train	Test	Top-1	Top-5	BS	Iters	Scripts
ResNet50	train	val	77.54	93.59	512	30W	ResNet50
ResNet101	train	val	78.94	94.56	512	30W	ResNet101
ShuffleNetV2x0.5	train	val	60.90	82.54	1024	40W	ShuffleNetV2x0.5
ShuffleNetV2x1.0	train	val	69.71	88.91	1024	40W	ShuffleNetV2x1.0
DFNetV1	train	val	70.99	89.68	1024	40W	DFNetV1
DFNetV2	train	val	74.22	91.61	1024	40W	DFNetV2

Semantic Segmentation

Cityscapes (Single Scale Whole Image Test): Base LR 0.01, Crop Size 769

Model	Backbone	Train	Test	mIOU	BS	Iters	Scripts
PSPNet	3x3-Res101	train	val	78.20	8	4W	PSPNet
DeepLabV3	3x3-Res101	train	val	79.13	8	4W	DeepLabV3

ADE20K (Single Scale Whole Image Test): Base LR 0.02, Crop Size 520

Model	Backbone	Train	Test	mIOU	PixelACC	BS	Iters	Scripts
PSPNet	3x3-Res50	train	val	41.52	80.09	16	15W	PSPNet
DeepLabv3	3x3-Res50	train	val	42.16	80.36	16	15W	DeepLabV3
PSPNet	3x3-Res101	train	val	43.60	81.30	16	15W	PSPNet
DeepLabv3	3x3-Res101	train	val	44.13	81.42	16	15W	DeepLabV3

Object Detection

Pascal VOC2007/2012 (Single Scale Test): 20 Classes

Model	Backbone	Train	Test	mAP	BS	Epochs	Scripts
SSD300	VGG16	07+12_trainval	07_test	0.786	32	235	SSD300
SSD512	VGG16	07+12_trainval	07_test	0.808	32	235	SSD512
Faster R-CNN	VGG16	07_trainval	07_test	0.706	1	15	Faster R-CNN

Pose Estimation

OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields

Instance Segmentation

Mask R-CNN

Generative Adversarial Networks

Pix2pix
CycleGAN

DataSets with TorchCV

TorchCV has defined the dataset format of all the tasks which you could check in the subdirs of data. Following is an example dataset directory trees for training semantic segmentation. You could preprocess the open datasets with the scripts in folder data/seg/preprocess

Dataset
    train
        image
            00001.jpg/png
            00002.jpg/png
            ...
        label
            00001.png
            00002.png
            ...
    val
        image
            00001.jpg/png
            00002.jpg/png
            ...
        label
            00001.png
            00002.png
            ...

Commands with TorchCV

Take PSPNet as an example. ("tag" could be any string, include an empty one.)

Training

cd scripts/seg/cityscapes/
bash run_fs_pspnet_cityscapes_seg.sh train tag

Resume Training

cd scripts/seg/cityscapes/
bash run_fs_pspnet_cityscapes_seg.sh train tag

Validate

cd scripts/seg/cityscapes/
bash run_fs_pspnet_cityscapes_seg.sh val tag

Testing:

cd scripts/seg/cityscapes/
bash run_fs_pspnet_cityscapes_seg.sh test tag

Demos with TorchCV

Example output of VGG19-OpenPose

TorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision

Related tags

Overview

TorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision

Implemented Papers

QuickStart with TorchCV

Performances with TorchCV

Image Classification

Semantic Segmentation

Object Detection

Pose Estimation

Instance Segmentation

Generative Adversarial Networks

DataSets with TorchCV

Commands with TorchCV

Demos with TorchCV

Owner

Donny You

The project page of paper: Architecture disentanglement for deep neural networks [ICCV 2021, oral]

State-of-the-art language models can match human performance on many tasks

Generative Adversarial Networks(GANs)

[CVPR'22] Weakly Supervised Semantic Segmentation by Pixel-to-Prototype Contrast

The implementation of the paper "HIST: A Graph-based Framework for Stock Trend Forecasting via Mining Concept-Oriented Shared Information".

An implementation of the methods presented in Causal-BALD: Deep Bayesian Active Learning of Outcomes to Infer Treatment-Effects from Observational Data.

Supervised forecasting of sequential data in Python.

TensorFlow Implementation of Unsupervised Cross-Domain Image Generation

This is implementation of AlexNet(2012) with 3D Convolution on TensorFlow (AlexNet 3D).

CLEAR algorithm for multi-view data association

Code for "Learning Graph Cellular Automata"

TransMorph: Transformer for Medical Image Registration

Article Reranking by Memory-enhanced Key Sentence Matching for Detecting Previously Fact-checked Claims.

PyTorch implementation of Deformable Convolution

AbelNN: Deep Learning Python module from scratch

[CVPR'21] Multi-Modal Fusion Transformer for End-to-End Autonomous Driving

This's an implementation of deepmind Visual Interaction Networks paper using pytorch

SmallInitEmb - LayerNorm(SmallInit(Embedding)) in a Transformer to improve convergence

Dynamic Head: Unifying Object Detection Heads with Attentions

Full-featured Decision Trees and Random Forests learner.