General Multi-label Image Classification with Transformers

Last update: Dec 21, 2022

Overview

General Multi-label Image Classification with Transformers
Jack Lanchantin, Tianlu Wang, Vicente Ordóñez Román, Yanjun Qi
Conference on Computer Vision and Pattern Recognition (CVPR) 2021
[paper] [poster] [slides]

Training and Running C-Tran

Python version 3.7 is required and all major packages used and their versions are listed in requirements.txt.

C-Tran on COCO80 Dataset

Download COCO data (19G)

wget http://cs.virginia.edu/~jjl5sw/data/vision/coco.tar.gz
mkdir -p data/
tar -xvf coco.tar.gz -C data/

Train New Model

python main.py  --batch_size 16  --lr 0.00001 --optim 'adam' --layers 3  --dataset 'coco' --use_lmt --dataroot data/

C-Tran on VOC20 Dataset

Download VOC2007 data (1.7G)

wget http://cs.virginia.edu/~jjl5sw/data/vision/voc.tar.gz
mkdir -p data/
tar -xvf voc.tar.gz -C data/

Train New Model

python main.py  --batch_size 16  --lr 0.00001 --optim 'adam' --layers 3  --dataset 'voc' --use_lmt --grad_ac_step 2 --dataroot data/

Citing

@article{lanchantin2020general,
  title={General Multi-label Image Classification with Transformers},
  author={Lanchantin, Jack and Wang, Tianlu and Ordonez, Vicente and Qi, Yanjun},
  journal={arXiv preprint arXiv:2011.14027},
  year={2020}
}

General Multi-label Image Classification with Transformers

Related tags

Overview

Training and Running C-Tran

C-Tran on COCO80 Dataset

C-Tran on VOC20 Dataset

Citing

Owner

QData

Official repository for Few-shot Image Generation via Cross-domain Correspondence (CVPR '21)

Sample Prior Guided Robust Model Learning to Suppress Noisy Labels

PyTorch Code for the paper "VSE++: Improving Visual-Semantic Embeddings with Hard Negatives"

Machine learning library for fast and efficient Gaussian mixture models

This is a virtual picture dragging application. Users may virtually slide photos across the screen. The distance between the index and middle fingers determines the movement. Smaller distances indicate click and motion, whereas bigger distances indicate only hand movement.

Commonsense Ability Tests

Code for ICCV 2021 paper Graph-to-3D: End-to-End Generation and Manipulation of 3D Scenes using Scene Graphs

A large-scale video dataset for the training and evaluation of 3D human pose estimation models

ByteTrack(Multi-Object Tracking by Associating Every Detection Box)のPythonでのONNX推論サンプル

Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP

Trying to understand alias-free-gan.

GrabGpu_py: a scripts for grab gpu when gpu is free

Human4D Dataset tools for processing and visualization

PyTorch implementation of DeepUME: Learning the Universal Manifold Embedding for Robust Point Cloud Registration (BMVC 2021)

Convolutional neural network web app trained to track our infant’s sleep schedule using our Google Nest camera.

MAUS: A Dataset for Mental Workload Assessment Using Wearable Sensor - Baseline system

AI Flow is an open source framework that bridges big data and artificial intelligence.

NIMA: Neural IMage Assessment

PyTorch implementation of neural style transfer algorithm

Rapid experimentation and scaling of deep learning models on molecular and crystal graphs.