Implementation of the paper All Labels Are Not Created Equal: Enhancing Semi-supervision via Label Grouping and Co-training

Related tags

Deep Learningsemco
Overview

SemCo

The official pytorch implementation of the paper All Labels Are Not Created Equal: Enhancing Semi-supervision via Label Grouping and Co-training (appearing in CVPR2021)

SemCo Conceptual Diagram

Install Dependencies

  • Create a new environment and install dependencies using pip install -r requirements.txt
  • Install apex to enable automatic mixed precision training (AMP).
git clone https://github.com/NVIDIA/apex
cd apex
python setup.py install --cpp_ext --cuda_ext

Note: Installing apex is optional, if you don't want to implement amp, you can simply pass --no_amp command line argument to the launcher.

Dataset

We use a standard directory structure for all our datasets to enable running the code on any dataset of choice without the need to edit the dataloaders. The datasets directory follow the below structure (only shown for cifar100 but is the same for all other datasets):

datasets
└───cifar100
   └───train
       │   <image1>
       │   <image2>
       │   ...
   └───test
       │   <image1-test>
       │   <image2-test>
       │   ...
   └───labels
       │   labels_train.feather
       │   labels_test.feather

An example of the above directory structure for cifar100 can be found here.

To preprocess a generic dataset into the above format, you can refer to utils/utils.py for several examples.

To configure the datasets directory path, you can either set the environment variable SEMCO_DATA_PATH or pass a command line argument --dataset-path to the launcher. (e.g. export SEMCO_DATA_PATH=/home/data). Note that this path references the parent datasets directory which contains the different sub directories for the individual datasets (e.g. cifar100, mini-imagenet, etc.)

Label Semantics Embeddings

SemCo expects a prior representation of all class labels via a semantic embedding for each class name. In our experiments, we use embeddings obtained from ConceptNet knowledge graph which contains a total of ~550K term embeddings. SemCo uses a matching criteria to find the best embedding for each of the class labels. Alternatively, you can use class attributes as the prior (like we did for CUB200 dataset), so you can build your own semantic dictionary.

To run experiments, please download the semantic embedding file here and set the path to the downloaded file either via SEMCO_WV_PATH environment variable or --word-vec-path command line argument. (e.g. export SEMCO_WV_PATH=/home/inas0003/data/numberbatch-en-19.08_128D.dict.pkl

Defining the Splits

For each of the experiments, you will need to specify to the launcher 4 command line arguments:

  • --dataset-name: denoting the dataset directory name (e.g. cifar100)
  • --train-split-pickle: path to pickle file with training split
  • --valid-split-pickle: (optional) path to pickle file with validation/test split (by default contains all the files in the test folder)
  • --classes-pickle: (optional) path to pickle file with list of class names

To obtain the three pickle files for any dataset, you can use generate_tst_pkls.py script specifying the dataset name and the number of instances per label and optionally a random seed. Example as follows:

python generate_tst_pkls.py --dataset-name cifar100 --instances-per-label 10 --random-seed 000 --output-path splits

The above will generate a train split with 10 images per class using a random seed of 000 together with the class names and the validation split containing all the files placed in the test folder. This can be tweaked by editing the python script.

Training the model

To train the model on cifar100 with 40 labeled samples, you can run the script:

    $ python launch_semco.py --dataset-name cifar100 --train-split-pickle splits/cifar100_labelled_data_40_seed123.pkl --model_backbone=wres --wres-k=2

or without amp

    $ python launch_semco.py --dataset-name cifar100 --train-split-pickle splits/cifar100_labelled_data_40_seed123.pkl --model_backbone=wres --wres-k=2 --no_amp

Similary to train the model on mini_imagenet with 400 labeled samples, you can run the script:

    $  python launch_semco.py --dataset-name mini_imagenet --train-split-pickle testing/mini_imagenet_labelled_data_40_seed456.pkl --model_backbone=resnet18 --im-size=84 --cropsize=84 
MIRACLE (Missing data Imputation Refinement And Causal LEarning)

MIRACLE (Missing data Imputation Refinement And Causal LEarning) Code Author: Trent Kyono This repository contains the code used for the "MIRACLE: Cau

van_der_Schaar \LAB 15 Dec 29, 2022
git《Investigating Loss Functions for Extreme Super-Resolution》(CVPR 2020) GitHub:

Investigating Loss Functions for Extreme Super-Resolution NTIRE 2020 Perceptual Extreme Super-Resolution Submission. Our method ranked first and secon

Sejong Yang 0 Oct 17, 2022
On Nonlinear Latent Transformations for GAN-based Image Editing - PyTorch implementation

On Nonlinear Latent Transformations for GAN-based Image Editing - PyTorch implementation On Nonlinear Latent Transformations for GAN-based Image Editi

Valentin Khrulkov 22 Oct 24, 2022
[ICLR'19] Trellis Networks for Sequence Modeling

TrellisNet for Sequence Modeling This repository contains the experiments done in paper Trellis Networks for Sequence Modeling by Shaojie Bai, J. Zico

CMU Locus Lab 460 Oct 13, 2022
The official GitHub repository for the Argoverse 2 dataset.

Argoverse 2 API Official GitHub repository for the Argoverse 2 family of datasets. If you have any questions or run into any problems with either the

Argo AI 156 Dec 23, 2022
We will see a basic program that is basically a hint to brute force attack to crack passwords. In other words, we will make a program to Crack Any Password Using Python. Show some ❤️ by starring this repository!

Crack Any Password Using Python We will see a basic program that is basically a hint to brute force attack to crack passwords. In other words, we will

Ananya Chatterjee 11 Dec 03, 2022
Film review classification

Film review classification Решение задачи классификации отзывов на фильмы на положительные и отрицательные с помощью рекуррентных нейронных сетей 1. З

Nikita Dukin 3 Jan 21, 2022
MOOSE (Multi-organ objective segmentation) a data-centric AI solution that generates multilabel organ segmentations to facilitate systemic TB whole-person research

MOOSE (Multi-organ objective segmentation) a data-centric AI solution that generates multilabel organ segmentations to facilitate systemic TB whole-person research.The pipeline is based on nn-UNet an

QIMP team 30 Jan 01, 2023
Implementation of ConvMixer in TensorFlow and Keras

ConvMixer ConvMixer, an extremely simple model that is similar in spirit to the ViT and the even-more-basic MLP-Mixer in that it operates directly on

Sayan Nath 8 Oct 03, 2022
Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals.

Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals This repo contains the Pytorch implementation of our paper: Unsupervised Seman

Wouter Van Gansbeke 335 Dec 28, 2022
CSAC - Collaborative Semantic Aggregation and Calibration for Separated Domain Generalization

CSAC Introduction This repository contains the implementation code for paper: Co

ScottYuan 5 Jul 22, 2022
This repository is for the preprint "A generative nonparametric Bayesian model for whole genomes"

BEAR Overview This repository contains code associated with the preprint A generative nonparametric Bayesian model for whole genomes (2021), which pro

Debora Marks Lab 10 Sep 18, 2022
A library for efficient similarity search and clustering of dense vectors.

Faiss Faiss is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any

Meta Research 18.8k Jan 08, 2023
DSAC* for Visual Camera Re-Localization (RGB or RGB-D)

DSAC* for Visual Camera Re-Localization (RGB or RGB-D) Introduction Installation Data Structure Supported Datasets 7Scenes 12Scenes Cambridge Landmark

Visual Learning Lab 143 Dec 22, 2022
The Wearables Development Toolkit - a development environment for activity recognition applications with sensor signals

Wearables Development Toolkit (WDK) The Wearables Development Toolkit (WDK) is a framework and set of tools to facilitate the iterative development of

Juan Haladjian 114 Nov 27, 2022
NVIDIA Deep Learning Examples for Tensor Cores

NVIDIA Deep Learning Examples for Tensor Cores Introduction This repository provides State-of-the-Art Deep Learning examples that are easy to train an

NVIDIA Corporation 10k Dec 31, 2022
Portfolio analytics for quants, written in Python

QuantStats: Portfolio analytics for quants QuantStats Python library that performs portfolio profiling, allowing quants and portfolio managers to unde

Ran Aroussi 2.7k Jan 08, 2023
PyTorch Implementation of Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis

Daft-Exprt - PyTorch Implementation PyTorch Implementation of Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis The

Keon Lee 47 Dec 18, 2022
Generating Radiology Reports via Memory-driven Transformer

R2Gen This is the implementation of Generating Radiology Reports via Memory-driven Transformer at EMNLP-2020. Citations If you use or extend our work,

CUHK-SZ NLP Group 101 Dec 13, 2022
Source code for the paper "PLOME: Pre-training with Misspelled Knowledge for Chinese Spelling Correction" in ACL2021

PLOME:Pre-training with Misspelled Knowledge for Chinese Spelling Correction (ACL2021) This repository provides the code and data of the work in ACL20

197 Nov 26, 2022