Official implementation for the paper: "Multi-label Classification with Partial Annotations using Class-aware Selective Loss"

Overview

PWC

Multi-label Classification with Partial Annotations using Class-aware Selective Loss


Paper | Pretrained models

Official PyTorch Implementation

Emanuel Ben-Baruch, Tal Ridnik, Itamar Friedman, Avi Ben-Cohen, Nadav Zamir, Asaf Noy, Lihi Zelnik-Manor
DAMO Academy, Alibaba Group

Abstract

Large-scale multi-label classification datasets are commonly, and perhaps inevitably, partially annotated. That is, only a small subset of labels are annotated per sample. Different methods for handling the missing labels induce different properties on the model and impact its accuracy. In this work, we analyze the partial labeling problem, then propose a solution based on two key ideas. First, un-annotated labels should be treated selectively according to two probability quantities: the class distribution in the overall dataset and the specific label likelihood for a given data sample. We propose to estimate the class distribution using a dedicated temporary model, and we show its improved efficiency over a naive estimation computed using the dataset's partial annotations. Second, during the training of the target model, we emphasize the contribution of annotated labels over originally un-annotated labels by using a dedicated asymmetric loss. Experiments conducted on three partially labeled datasets, OpenImages, LVIS, and simulated-COCO, demonstrate the effectiveness of our approach. Specifically, with our novel selective approach, we achieve state-of-the-art results on OpenImages dataset. Code will be made available.

Class-aware Selective Approach

An overview of our approach is summarized in the following figure:

Loss Implementation

Our loss consists of a selective approach for adjusting the training mode for each class individualy and a partial asymmetric loss.

An implementation of the Class-aware Selective Loss (CSL) can be found here.

  • class PartialSelectiveLoss(nn.Module)

Pretrained Models

We provide models pretrained on the OpenImages datasset with different modes and architectures:

Model Architecture Link mAP
Ignore TResNet-M link 85.38
Negative TResNet-M link 85.85
Selective (CSL) TResNet-M link 86.72
Selective (CSL) TResNet-L link 87.34

Inference Code (Demo)

We provide inference code, that demonstrate how to load the model, pre-process an image and do inference. Example run of OpenImages model (after downloading the relevant model):

python infer.py  \
--dataset_type=OpenImages \
--model_name=tresnet_m \
--model_path=./models_local/mtresnet_opim_86.72.pth \
--pic_path=./pics/10162266293_c7634cbda9_o.jpg \
--input_size=448

Result Examples

Training Code

Training code is provided in (train.py). Also, code for simulating partial annotation for the MS-COCO dataset is available (here). In particular, two "partial" simulation schemes are implemented: fix-per-class(FPC) and random-per-sample (RPS).

  • FPC: For each class, we randomly sample a fixed number of positive annotations and the same number of negative annotations. The rest of the annotations are dropped.
  • RPA: We omit each annotation with probability p.

Pretrained weights using the ImageNet-21k dataset can be found here: link
Pretrained weights using the ImageNet-1k dataset can be found here: link

Example of training with RPS simulation:

--data=/mnt/datasets/COCO/COCO_2014
--model-path=models/pretrain/mtresnet_21k
--gamma_pos=0
--gamma_neg=4
--gamma_unann=4
--simulate_partial_type=rps
--simulate_partial_param=0.5
--partial_loss_mode=selective
--likelihood_topk=5
--prior_threshold=0.5
--prior_path=./outputs/priors/prior_fpc_1000.csv

Example of training with FPC simulation:

--data=/mnt/datasets/COCO/COCO_2014
--model-path=models/pretrain/mtresnet_21k
--gamma_pos=0
--gamma_neg=4
--gamma_unann=4
--simulate_partial_type=fpc
--simulate_partial_param=1000
--partial_loss_mode=selective
--likelihood_topk=5
--prior_threshold=0.5
--prior_path=./outputs/priors/prior_fpc_1000.csv

Typical Training Results

FPC (1,000) simulation scheme:

Model mAP
Ignore, CE 76.46
Negative, CE 81.24
Negative, ASL (4,1) 81.64
CSL - Selective, P-ASL(4,3,1) 83.44

RPS (0.5) simulation scheme:

Model mAP
Ignore, CE 84.90
Negative, CE 81.21
Negative, ASL (4,1) 81.91
CSL- Selective, P-ASL(4,1,1) 85.21

Estimating the Class Distribution

The training code contains also the procedure for estimting the class distribution from the data. Our approach enables to rank the classes based on training a temporary model usinig the Ignore mode. link

Top 10 classes:

Method Top 10 ranked classes
Original 'person', 'chair', 'car', 'dining table', 'cup', 'bottle', 'bowl', 'handbag', 'truck', 'backpack'
Estiimate (Ignore mode) 'person', 'chair', 'handbag', 'cup', 'bench', 'bottle', 'backpack', 'car', 'cell phone', 'potted plant'
Estimate (Negative mode) 'kite' 'truck' 'carrot' 'baseball glove' 'tennis racket' 'remote' 'cat' 'tie' 'horse' 'boat'

Citation

@misc{benbaruch2021multilabel,
      title={Multi-label Classification with Partial Annotations using Class-aware Selective Loss}, 
      author={Emanuel Ben-Baruch and Tal Ridnik and Itamar Friedman and Avi Ben-Cohen and Nadav Zamir and Asaf Noy and Lihi Zelnik-Manor},
      year={2021},
      eprint={2110.10955},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgements

Several images from OpenImages dataset are used in this project. ֿ
Some components of this code implementation are adapted from the repository https://github.com/Alibaba-MIIL/ASL.

Final Project for the CS238: Decision Making Under Uncertainty course at Stanford University in Autumn '21.

Final Project for the CS238: Decision Making Under Uncertainty course at Stanford University in Autumn '21. We optimized wind turbine placement in a wind farm, subject to wake effects, using Q-learni

Manasi Sharma 2 Sep 27, 2022
Multi-modal Content Creation Model Training Infrastructure including the FACT model (AI Choreographer) implementation.

AI Choreographer: Music Conditioned 3D Dance Generation with AIST++ [ICCV-2021]. Overview This package contains the model implementation and training

Google Research 365 Dec 30, 2022
Depth-Aware Video Frame Interpolation (CVPR 2019)

DAIN (Depth-Aware Video Frame Interpolation) Project | Paper Wenbo Bao, Wei-Sheng Lai, Chao Ma, Xiaoyun Zhang, Zhiyong Gao, and Ming-Hsuan Yang IEEE C

Wenbo Bao 7.7k Dec 31, 2022
The dataset of tweets pulling from Twitters with keyword: Hydroxychloroquine, location: US, Time: 2020

HCQ_Tweet_Dataset: FREE to Download. Keywords: HCQ, hydroxychloroquine, tweet, twitter, COVID-19 This dataset is associated with the paper "Understand

2 Mar 16, 2022
ROMP: Monocular, One-stage, Regression of Multiple 3D People, ICCV21

Monocular, One-stage, Regression of Multiple 3D People ROMP, accepted by ICCV 2021, is a concise one-stage network for multi-person 3D mesh recovery f

Yu Sun 937 Jan 04, 2023
Code to train models from "Paraphrastic Representations at Scale".

Paraphrastic Representations at Scale Code to train models from "Paraphrastic Representations at Scale". The code is written in Python 3.7 and require

John Wieting 71 Dec 19, 2022
🔥🔥High-Performance Face Recognition Library on PaddlePaddle & PyTorch🔥🔥

face.evoLVe: High-Performance Face Recognition Library based on PaddlePaddle & PyTorch Evolve to be more comprehensive, effective and efficient for fa

Zhao Jian 3.1k Jan 02, 2023
Code for our CVPR2021 paper coordinate attention

Coordinate Attention for Efficient Mobile Network Design (preprint) This repository is a PyTorch implementation of our coordinate attention (will appe

Qibin (Andrew) Hou 726 Jan 05, 2023
This is the pytorch implementation for the paper: Generalizable Mixed-Precision Quantization via Attribution Rank Preservation, which is accepted to ICCV2021.

GMPQ: Generalizable Mixed-Precision Quantization via Attribution Rank Preservation This is the pytorch implementation for the paper: Generalizable Mix

18 Sep 02, 2022
This repository contains the implementation of the paper: Federated Distillation of Natural Language Understanding with Confident Sinkhorns

Federated Distillation of Natural Language Understanding with Confident Sinkhorns This repository provides an alternative method for ensembled distill

Deep Cognition and Language Research (DeCLaRe) Lab 11 Nov 16, 2022
Code for "Multi-Time Attention Networks for Irregularly Sampled Time Series", ICLR 2021.

Multi-Time Attention Networks (mTANs) This repository contains the PyTorch implementation for the paper Multi-Time Attention Networks for Irregularly

The Laboratory for Robust and Efficient Machine Learning 68 Dec 17, 2022
Rethinking Transformer-based Set Prediction for Object Detection

Rethinking Transformer-based Set Prediction for Object Detection Here are the code for the ICCV paper. The code is adapted from Detectron2 and AdelaiD

Zhiqing Sun 62 Dec 03, 2022
Human Action Controller - A human action controller running on different platforms.

Human Action Controller (HAC) Goal A human action controller running on different platforms. Fun Easy-to-use Accurate Anywhere Fun Examples Mouse Cont

27 Jul 20, 2022
Official implementations of PSENet, PAN and PAN++.

News (2021/11/03) Paddle implementation of PAN, see Paddle-PANet. Thanks @simplify23. (2021/04/08) PSENet and PAN are included in MMOCR. Introduction

395 Dec 14, 2022
Semi-Supervised Semantic Segmentation via Adaptive Equalization Learning, NeurIPS 2021 (Spotlight)

Semi-Supervised Semantic Segmentation via Adaptive Equalization Learning, NeurIPS 2021 (Spotlight) Abstract Due to the limited and even imbalanced dat

Hanzhe Hu 99 Dec 12, 2022
Official implementation for paper Render In-between: Motion Guided Video Synthesis for Action Interpolation

Render In-between: Motion Guided Video Synthesis for Action Interpolation [Paper] [Supp] [arXiv] [4min Video] This is the official Pytorch implementat

8 Oct 27, 2022
Fast Style Transfer in TensorFlow

Fast Style Transfer in TensorFlow Add styles from famous paintings to any photo in a fraction of a second! You can even style videos! It takes 100ms o

Jefferson 5 Oct 24, 2021
Synthetic Scene Text from 3D Engines

Introduction UnrealText is a project that synthesizes scene text images using 3D graphics engine. This repository accompanies our paper: UnrealText: S

Shangbang Long 215 Dec 29, 2022
Code for ICCV2021 paper SPEC: Seeing People in the Wild with an Estimated Camera

SPEC: Seeing People in the Wild with an Estimated Camera [ICCV 2021] SPEC: Seeing People in the Wild with an Estimated Camera, Muhammed Kocabas, Chun-

Muhammed Kocabas 187 Dec 26, 2022
Official pytorch implementation of DeformSyncNet: Deformation Transfer via Synchronized Shape Deformation Spaces

DeformSyncNet: Deformation Transfer via Synchronized Shape Deformation Spaces Minhyuk Sung*, Zhenyu Jiang*, Panos Achlioptas, Niloy J. Mitra, Leonidas

Zhenyu Jiang 21 Aug 30, 2022