This repository contains the source code of our work on designing efficient CNNs for computer vision

Overview

Efficient networks for Computer Vision

This repo contains source code of our work on designing efficient networks for different computer vision tasks: (1) Image classification, (2) Object detection, and (3) Semantic segmentation.

Real-time semantic segmentation using ESPNetv2 on iPhone7. See here for iOS application source code using COREML.
Seg demo on iPhone7 Seg demo on iPhone7
Real-time object detection using ESPNetv2
Demo 1
Demo 2 Demo 3

Table of contents

  1. Key highlihgts
  2. Supported networks
  3. Relevant papers
  4. Blogs
  5. Performance comparison
  6. Training receipe
  7. Instructions for segmentation and detection demos
  8. Citation
  9. License
  10. Acknowledgements
  11. Contributions
  12. Notes

Key highlights

  • Object classification on the ImageNet and MS-COCO (multi-label)
  • Semantic Segmentation on the PASCAL VOC and the CityScapes
  • Object Detection on the PASCAL VOC and the MS-COCO
  • Supports PyTorch 1.0
  • Integrated with Tensorboard for easy visualization of training logs.
  • Scripts for downloading different datasets.
  • Semantic segmentation application using ESPNetv2 on iPhone can be found here.

Supported networks

This repo supports following networks:

  • ESPNetv2 (Classification, Segmentation, Detection)
  • DiCENet (Classification, Segmentation, Detection)
  • ShuffleNetv2 (Classification)

Relevant papers

Blogs

Performance comparison

ImageNet

Below figure compares the performance of DiCENet with other efficient networks on the ImageNet dataset. DiCENet outperforms all existing efficient networks, including MobileNetv2 and ShuffleNetv2. More details here

DiCENet performance on the ImageNet

Object detection

Below table compares the performance of our architecture with other detection networks on the MS-COCO dataset. Our network is fast and accurate. More details here

MSCOCO
Image Size FLOPs mIOU FPS
SSD-VGG 512x512 100 B 26.8 19
YOLOv2 544x544 17.5 B 21.6 40
ESPNetv2-SSD (Ours) 512x512 3.2 B 24.54 35

Semantic Segmentation

Below figure compares the performance of ESPNet and ESPNetv2 on two different datasets. Note that ESPNets are one of the first efficient networks that delivers competitive performance to existing networks on the PASCAL VOC dataset, even with low resolution images say 256x256. See here for more details.

Cityscapes PASCAL VOC 2012
Image Size FLOPs mIOU Image Size FLOPs mIOU
ESPNet 1024x512 4.5 B 60.3 512x512 2.2 B 63
ESPNetv2 1024x512 2.7 B 66.2 384x384 0.76 B 68

Training Receipe

Image Classification

Details about training and testing are provided here.

Details about performance of different models are provided here.

Semantic segmentation

Details about training and testing are provided here.

Details about performance of different models are provided here.

Object Detection

Details about training and testing are provided here.

Details about performance of different models are provided here.

Instructions for segmentation and detection demos

To run the segmentation demo, just type:

python segmentation_demo.py

To run the detection demo, run the following command:

python detection_demo.py

OR 

python detection_demo.py --live

For other supported arguments, please see the corresponding files.

Citation

If you find this repository helpful, please feel free to cite our work:

@article{mehta2019dicenet,
Author = {Sachin Mehta and Hannaneh Hajishirzi and Mohammad Rastegari},
Title = {DiCENet: Dimension-wise Convolutions for Efficient Networks},
Year = {2020},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
}

@inproceedings{mehta2018espnetv2,
  title={ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network},
  author={Mehta, Sachin and Rastegari, Mohammad and Shapiro, Linda and Hajishirzi, Hannaneh},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  year={2019}
}

@inproceedings{mehta2018espnet,
  title={Espnet: Efficient spatial pyramid of dilated convolutions for semantic segmentation},
  author={Mehta, Sachin and Rastegari, Mohammad and Caspi, Anat and Shapiro, Linda and Hajishirzi, Hannaneh},
  booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
  pages={552--568},
  year={2018}
}

License

By downloading this software, you acknowledge that you agree to the terms and conditions given here.

Acknowledgements

Most of our object detection code is adapted from SSD in pytorch. We thank authors for such an amazing work.

Want to help out?

Thanks for your interest in our work :).

Open tasks that are interesting:

  • Tensorflow implementation. I kind of wanna do this but not getting enough time. If you are interested, drop a message and we can talk about it.
  • Optimizing the EESP and the DiceNet block at CUDA-level.
  • Optimize and port pretrained models across multiple mobile platforms, including Android.
  • Other thoughts are also welcome :).

Notes

Notes about DiCENet paper

This repository contains DiCENet's source code in PyTorch only and you should be able to reproduce the results of v1/v2 of our arxiv paper. To reproduce the results of our T-PAMI paper, you need to incorporate MobileNet tricks in Section 5.3, which are currently not a part of this repository.

Owner
Sachin Mehta
Research Scientist at Apple and Affiliate Assistant Professor at UW
Sachin Mehta
Intrinsic Image Harmonization

Intrinsic Image Harmonization [Paper] Zonghui Guo, Haiyong Zheng, Yufeng Jiang, Zhaorui Gu, Bing Zheng Here we provide PyTorch implementation and the

VISION @ OUC 44 Dec 21, 2022
MAVE: : A Product Dataset for Multi-source Attribute Value Extraction

The dataset contains 3 million attribute-value annotations across 1257 unique categories on 2.2 million cleaned Amazon product profiles. It is a large, multi-sourced, diverse dataset for product attr

Google Research Datasets 89 Jan 08, 2023
We are More than Our JOints: Predicting How 3D Bodies Move

We are More than Our JOints: Predicting How 3D Bodies Move Citation This repo contains the official implementation of our paper MOJO: @inproceedings{Z

72 Oct 20, 2022
Code for "LoFTR: Detector-Free Local Feature Matching with Transformers", CVPR 2021

LoFTR: Detector-Free Local Feature Matching with Transformers Project Page | Paper LoFTR: Detector-Free Local Feature Matching with Transformers Jiami

ZJU3DV 1.4k Jan 04, 2023
[CVPR 2021 Oral] ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis

ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis [arxiv|pdf|v

Yinan He 78 Dec 22, 2022
Unbiased Learning To Rank Algorithms (ULTRA)

This is an Unbiased Learning To Rank Algorithms (ULTRA) toolbox, which provides a codebase for experiments and research on learning to rank with human annotated or noisy labels.

71 Dec 01, 2022
Neighbor2Seq: Deep Learning on Massive Graphs by Transforming Neighbors to Sequences

Neighbor2Seq: Deep Learning on Massive Graphs by Transforming Neighbors to Sequences This repository is an official PyTorch implementation of Neighbor

DIVE Lab, Texas A&M University 8 Jun 12, 2022
Official implementation of "Accelerating Reinforcement Learning with Learned Skill Priors", Pertsch et al., CoRL 2020

Accelerating Reinforcement Learning with Learned Skill Priors [Project Website] [Paper] Karl Pertsch1, Youngwoon Lee1, Joseph Lim1 1CLVR Lab, Universi

Cognitive Learning for Vision and Robotics (CLVR) lab @ USC 134 Dec 06, 2022
95.47% on CIFAR10 with PyTorch

Train CIFAR10 with PyTorch I'm playing with PyTorch on the CIFAR10 dataset. Prerequisites Python 3.6+ PyTorch 1.0+ Training # Start training with: py

5k Dec 30, 2022
Training Cifar-10 Classifier Using VGG16

opevcvdl-hw3 This project uses pytorch and Qt to achieve the requirements. Version Python 3.6 opencv-contrib-python 3.4.2.17 Matplotlib 3.1.1 pyqt5 5.

Kenny Cheng 3 Aug 17, 2022
PyTorch implementation of popular datasets and models in remote sensing

PyTorch Remote Sensing (torchrs) (WIP) PyTorch implementation of popular datasets and models in remote sensing tasks (Change Detection, Image Super Re

isaac 222 Dec 28, 2022
Learning to Map Large-scale Sparse Graphs on Memristive Crossbar

Release of AutoGMap:Learning to Map Large-scale Sparse Graphs on Memristive Crossbar For reproduction of our searched model, the Ubuntu OS is recommen

2 Aug 23, 2022
[ICRA2021] Reconstructing Interactive 3D Scene by Panoptic Mapping and CAD Model Alignment

Interactive Scene Reconstruction Project Page | Paper This repository contains the implementation of our ICRA2021 paper Reconstructing Interactive 3D

97 Dec 28, 2022
ToFFi - Toolbox for Frequency-based Fingerprinting of Brain Signals

ToFFi Toolbox This repository contains "before peer review" version of the software related to the preprint of the publication ToFFi - Toolbox for Fre

4 Aug 31, 2022
Accelerated NLP pipelines for fast inference on CPU and GPU. Built with Transformers, Optimum and ONNX Runtime.

Optimum Transformers Accelerated NLP pipelines for fast inference 🚀 on CPU and GPU. Built with 🤗 Transformers, Optimum and ONNX runtime. Installatio

Aleksey Korshuk 115 Dec 16, 2022
Anomaly detection related books, papers, videos, and toolboxes

Anomaly Detection Learning Resources Outlier Detection (also known as Anomaly Detection) is an exciting yet challenging field, which aims to identify

Yue Zhao 6.7k Dec 31, 2022
WarpRNNT loss ported in Numba CPU/CUDA for Pytorch

RNNT loss in Pytorch - Numba JIT compiled (warprnnt_numba) Warp RNN Transducer Loss for ASR in Pytorch, ported from HawkAaron/warp-transducer and a re

Somshubra Majumdar 15 Oct 22, 2022
An open source Jetson Nano baseboard and tools to design your own.

My Jetson Nano Baseboard This basic baseboard gives the user the foundation and the flexibility to design their own baseboard for the Jetson Nano. It

NVIDIA AI IOT 57 Dec 29, 2022
A self-supervised 3D representation learning framework named viewpoint bottleneck.

Pointly-supervised 3D Scene Parsing with Viewpoint Bottleneck Paper Created by Liyi Luo, Beiwen Tian, Hao Zhao and Guyue Zhou from Institute for AI In

63 Aug 11, 2022
Human Detection - Pedestrian Detection using OpenCV Python

Pedestrian Detection using OpenCV Python Follow us on Instagram for Machine Lear

Hrishikesh Dutta 1 Jan 23, 2022