This repository contains the source code of our work on designing efficient CNNs for computer vision

Last update: Nov 26, 2022

Overview

Efficient networks for Computer Vision

This repo contains source code of our work on designing efficient networks for different computer vision tasks: (1) Image classification, (2) Object detection, and (3) Semantic segmentation.

Real-time semantic segmentation using ESPNetv2 on iPhone7. See here for iOS application source code using COREML.

Real-time object detection using ESPNetv2

Table of contents

Key highlihgts
Supported networks
Relevant papers
Blogs
Performance comparison
Training receipe
Instructions for segmentation and detection demos
Citation
License
Acknowledgements
Contributions
Notes

Key highlights

Object classification on the ImageNet and MS-COCO (multi-label)
Semantic Segmentation on the PASCAL VOC and the CityScapes
Object Detection on the PASCAL VOC and the MS-COCO
Supports PyTorch 1.0
Integrated with Tensorboard for easy visualization of training logs.
Scripts for downloading different datasets.
Semantic segmentation application using ESPNetv2 on iPhone can be found here.

Supported networks

This repo supports following networks:

ESPNetv2 (Classification, Segmentation, Detection)
DiCENet (Classification, Segmentation, Detection)
ShuffleNetv2 (Classification)

Relevant papers

Blogs

Performance comparison

ImageNet

Below figure compares the performance of DiCENet with other efficient networks on the ImageNet dataset. DiCENet outperforms all existing efficient networks, including MobileNetv2 and ShuffleNetv2. More details here

Object detection

Below table compares the performance of our architecture with other detection networks on the MS-COCO dataset. Our network is fast and accurate. More details here

	MSCOCO
	Image Size	FLOPs	mIOU	FPS
SSD-VGG	512x512	100 B	26.8	19
YOLOv2	544x544	17.5 B	21.6	40
ESPNetv2-SSD (Ours)	512x512	3.2 B	24.54	35

Semantic Segmentation

Below figure compares the performance of ESPNet and ESPNetv2 on two different datasets. Note that ESPNets are one of the first efficient networks that delivers competitive performance to existing networks on the PASCAL VOC dataset, even with low resolution images say 256x256. See here for more details.

	Cityscapes			PASCAL VOC 2012
	Image Size	FLOPs	mIOU	Image Size	FLOPs	mIOU
ESPNet	1024x512	4.5 B	60.3	512x512	2.2 B	63
ESPNetv2	1024x512	2.7 B	66.2	384x384	0.76 B	68

Training Receipe

Image Classification

Details about training and testing are provided here.

Details about performance of different models are provided here.

Semantic segmentation

Details about training and testing are provided here.

Details about performance of different models are provided here.

Object Detection

Details about training and testing are provided here.

Details about performance of different models are provided here.

Instructions for segmentation and detection demos

To run the segmentation demo, just type:

python segmentation_demo.py

To run the detection demo, run the following command:

python detection_demo.py

OR 

python detection_demo.py --live

For other supported arguments, please see the corresponding files.

Citation

If you find this repository helpful, please feel free to cite our work:

@article{mehta2019dicenet,
Author = {Sachin Mehta and Hannaneh Hajishirzi and Mohammad Rastegari},
Title = {DiCENet: Dimension-wise Convolutions for Efficient Networks},
Year = {2020},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
}

@inproceedings{mehta2018espnetv2,
  title={ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network},
  author={Mehta, Sachin and Rastegari, Mohammad and Shapiro, Linda and Hajishirzi, Hannaneh},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  year={2019}
}

@inproceedings{mehta2018espnet,
  title={Espnet: Efficient spatial pyramid of dilated convolutions for semantic segmentation},
  author={Mehta, Sachin and Rastegari, Mohammad and Caspi, Anat and Shapiro, Linda and Hajishirzi, Hannaneh},
  booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
  pages={552--568},
  year={2018}
}

License

By downloading this software, you acknowledge that you agree to the terms and conditions given here.

Acknowledgements

Most of our object detection code is adapted from SSD in pytorch. We thank authors for such an amazing work.

Want to help out?

Thanks for your interest in our work :).

Open tasks that are interesting:

Tensorflow implementation. I kind of wanna do this but not getting enough time. If you are interested, drop a message and we can talk about it.
Optimizing the EESP and the DiceNet block at CUDA-level.
Optimize and port pretrained models across multiple mobile platforms, including Android.
Other thoughts are also welcome :).

Notes

Notes about DiCENet paper

This repository contains DiCENet's source code in PyTorch only and you should be able to reproduce the results of v1/v2 of our arxiv paper. To reproduce the results of our T-PAMI paper, you need to incorporate MobileNet tricks in Section 5.3, which are currently not a part of this repository.

This repository contains the source code of our work on designing efficient CNNs for computer vision

Related tags

Overview

Efficient networks for Computer Vision

Key highlights

Supported networks

Relevant papers

Blogs

Performance comparison

ImageNet

Object detection

Semantic Segmentation

Training Receipe

Image Classification

Semantic segmentation

Object Detection

Instructions for segmentation and detection demos

Citation

License

Acknowledgements

Want to help out?

Notes

Notes about DiCENet paper

Owner

Sachin Mehta

SpineAI Bilsky Grading With Python

OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)

Solution of Kaggle competition: Sartorius - Cell Instance Segmentation

Full-featured Decision Trees and Random Forests learner.

基于Pytorch实现优秀的自然图像分割框架！(包括FCN、U-Net和Deeplab)

Some tentative models that incorporate label propagation to graph neural networks for graph representation learning in nodes, links or graphs.

Artificial Neural network regression model to predict the energy output in a combined cycle power plant.

A benchmark dataset for mesh multi-label-classification based on cube engravings introduced in MeshCNN

Apache Spark - A unified analytics engine for large-scale data processing

Convert openmmlab (not only mmdetection) series model to tensorrt

Deep generative models of 3D grids for structure-based drug discovery

Official implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis https://arxiv.org/abs/2011.13775

This is the official PyTorch implementation of our paper: "Artistic Style Transfer with Internal-external Learning and Contrastive Learning".

Pytorch implementation of winner from VQA Chllange Workshop in CVPR'17

基于PaddleClas实现垃圾分类，并转换为inference格式用PaddleHub服务端部署

links and status of cool gradio demos

Code for the paper Language as a Cognitive Tool to Imagine Goals in Curiosity Driven Exploration

Polynomial-time Meta-Interpretive Learning

Improving Non-autoregressive Generation with Mixup Training

Quickly and easily create / train a custom DeepDream model