Implementation of the paper All Labels Are Not Created Equal: Enhancing Semi-supervision via Label Grouping and Co-training

Related tags

Deep Learningsemco
Overview

SemCo

The official pytorch implementation of the paper All Labels Are Not Created Equal: Enhancing Semi-supervision via Label Grouping and Co-training (appearing in CVPR2021)

SemCo Conceptual Diagram

Install Dependencies

  • Create a new environment and install dependencies using pip install -r requirements.txt
  • Install apex to enable automatic mixed precision training (AMP).
git clone https://github.com/NVIDIA/apex
cd apex
python setup.py install --cpp_ext --cuda_ext

Note: Installing apex is optional, if you don't want to implement amp, you can simply pass --no_amp command line argument to the launcher.

Dataset

We use a standard directory structure for all our datasets to enable running the code on any dataset of choice without the need to edit the dataloaders. The datasets directory follow the below structure (only shown for cifar100 but is the same for all other datasets):

datasets
└───cifar100
   └───train
       │   <image1>
       │   <image2>
       │   ...
   └───test
       │   <image1-test>
       │   <image2-test>
       │   ...
   └───labels
       │   labels_train.feather
       │   labels_test.feather

An example of the above directory structure for cifar100 can be found here.

To preprocess a generic dataset into the above format, you can refer to utils/utils.py for several examples.

To configure the datasets directory path, you can either set the environment variable SEMCO_DATA_PATH or pass a command line argument --dataset-path to the launcher. (e.g. export SEMCO_DATA_PATH=/home/data). Note that this path references the parent datasets directory which contains the different sub directories for the individual datasets (e.g. cifar100, mini-imagenet, etc.)

Label Semantics Embeddings

SemCo expects a prior representation of all class labels via a semantic embedding for each class name. In our experiments, we use embeddings obtained from ConceptNet knowledge graph which contains a total of ~550K term embeddings. SemCo uses a matching criteria to find the best embedding for each of the class labels. Alternatively, you can use class attributes as the prior (like we did for CUB200 dataset), so you can build your own semantic dictionary.

To run experiments, please download the semantic embedding file here and set the path to the downloaded file either via SEMCO_WV_PATH environment variable or --word-vec-path command line argument. (e.g. export SEMCO_WV_PATH=/home/inas0003/data/numberbatch-en-19.08_128D.dict.pkl

Defining the Splits

For each of the experiments, you will need to specify to the launcher 4 command line arguments:

  • --dataset-name: denoting the dataset directory name (e.g. cifar100)
  • --train-split-pickle: path to pickle file with training split
  • --valid-split-pickle: (optional) path to pickle file with validation/test split (by default contains all the files in the test folder)
  • --classes-pickle: (optional) path to pickle file with list of class names

To obtain the three pickle files for any dataset, you can use generate_tst_pkls.py script specifying the dataset name and the number of instances per label and optionally a random seed. Example as follows:

python generate_tst_pkls.py --dataset-name cifar100 --instances-per-label 10 --random-seed 000 --output-path splits

The above will generate a train split with 10 images per class using a random seed of 000 together with the class names and the validation split containing all the files placed in the test folder. This can be tweaked by editing the python script.

Training the model

To train the model on cifar100 with 40 labeled samples, you can run the script:

    $ python launch_semco.py --dataset-name cifar100 --train-split-pickle splits/cifar100_labelled_data_40_seed123.pkl --model_backbone=wres --wres-k=2

or without amp

    $ python launch_semco.py --dataset-name cifar100 --train-split-pickle splits/cifar100_labelled_data_40_seed123.pkl --model_backbone=wres --wres-k=2 --no_amp

Similary to train the model on mini_imagenet with 400 labeled samples, you can run the script:

    $  python launch_semco.py --dataset-name mini_imagenet --train-split-pickle testing/mini_imagenet_labelled_data_40_seed456.pkl --model_backbone=resnet18 --im-size=84 --cropsize=84 
Empirical Study of Transformers for Source Code & A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source Code

Transformers for variable misuse, function naming and code completion tasks The official PyTorch implementation of: Empirical Study of Transformers fo

Bayesian Methods Research Group 56 Nov 15, 2022
Best practices for segmentation of the corporate network of any company

Best-practice-for-network-segmentation What is this? This project was created to publish the best practices for segmentation of the corporate network

2k Jan 07, 2023
Official code of paper "PGT: A Progressive Method for Training Models on Long Videos" on CVPR2021

PGT Code for paper PGT: A Progressive Method for Training Models on Long Videos. Install Run pip install -r requirements.txt. Run python setup.py buil

Bo Pang 27 Mar 30, 2022
code for TCL: Vision-Language Pre-Training with Triple Contrastive Learning, CVPR 2022

Vision-Language Pre-Training with Triple Contrastive Learning, CVPR 2022 News (03/16/2022) upload retrieval checkpoints finetuned on COCO and Flickr T

187 Jan 02, 2023
Train Yolov4 using NBX-Jobs

yolov4-trainer-nbox Train Yolov4 using NBX-Jobs. Use the powerfull functionality available in nbox-SDK repo to train a tiny-Yolo v4 model on Pascal VO

Yash Bonde 1 Jan 12, 2022
Algo-burn - Script to configure an Algorand address as a "burn" address for one or more ASA tokens

Algorand Burn Address This is a simple script to illustrate how a "burn address"

GSD 5 May 10, 2022
Program your own vulkan.gpuinfo.org query in Python. Used to determine baseline hardware for WebGPU.

query-gpuinfo-data License This software is not presently released under a license. The data in data/ is obtained under CC BY 4.0 as specified there.

Kai Ninomiya 5 Jul 18, 2022
Balancing Principle for Unsupervised Domain Adaptation

Blancing Principle for Domain Adaptation NeurIPS 2021 Paper Abstract We address the unsolved algorithm design problem of choosing a justified regulari

Marius-Constantin Dinu 4 Dec 15, 2022
MemStream: Memory-Based Anomaly Detection in Multi-Aspect Streams with Concept Drift

MemStream Implementation of MemStream: Memory-Based Anomaly Detection in Multi-Aspect Streams with Concept Drift . Siddharth Bhatia, Arjit Jain, Shivi

Stream-AD 61 Dec 02, 2022
A research toolkit for particle swarm optimization in Python

PySwarms is an extensible research toolkit for particle swarm optimization (PSO) in Python. It is intended for swarm intelligence researchers, practit

Lj Miranda 1k Dec 30, 2022
Given a 2D triangle mesh, we could randomly generate cloud points that fill in the triangle mesh

generate_cloud_points Given a 2D triangle mesh, we could randomly generate cloud points that fill in the triangle mesh. Run python disp_mesh.py Or you

Peng Yu 2 Dec 24, 2021
blind SQLIpy sebuah alat injeksi sql yang menggunakan waktu sql untuk mendapatkan sebuah server database.

blind SQLIpy Alat blind SQLIpy ini merupakan alat injeksi sql yang menggunakan metode time based blind sql injection metode tersebut membutuhkan waktu

Galih Anggoro Prasetya 4 Feb 24, 2022
The comma.ai Calibration Challenge!

Welcome to the comma.ai Calibration Challenge! Your goal is to predict the direction of travel (in camera frame) from provided dashcam video. This rep

comma.ai 697 Jan 05, 2023
An index of recommendation algorithms that are based on Graph Neural Networks.

An index of recommendation algorithms that are based on Graph Neural Networks.

FIB LAB, Tsinghua University 564 Jan 07, 2023
Official implementation of Rich Semantics Improve Few-Shot Learning (BMVC, 2021)

Rich Semantics Improve Few-Shot Learning Paper Link Abstract : Human learning benefits from multi-modal inputs that often appear as rich semantics (e.

Mohamed Afham 11 Jul 26, 2022
Breaching - Breaching privacy in federated learning scenarios for vision and text

Breaching - A Framework for Attacks against Privacy in Federated Learning This P

Jonas Geiping 139 Jan 03, 2023
A python package for generating, analyzing and visualizing building shadows

pybdshadow Introduction pybdshadow is a python package for generating, analyzing and visualizing building shadows from large scale building geographic

Qing Yu 13 Nov 30, 2022
A smaller subset of 10 easily classified classes from Imagenet, and a little more French

Imagenette 🎶 Imagenette, gentille imagenette, Imagenette, je te plumerai. 🎶 (Imagenette theme song thanks to Samuel Finlayson) NB: Versions of Image

fast.ai 718 Jan 01, 2023
RobustART: Benchmarking Robustness on Architecture Design and Training Techniques

The first comprehensive Robustness investigation benchmark on large-scale dataset ImageNet regarding ARchitecture design and Training techniques towards diverse noises.

132 Dec 23, 2022
The implementation for paper Joint t-SNE for Comparable Projections of Multiple High-Dimensional Datasets.

Joint t-sne This is the implementation for paper Joint t-SNE for Comparable Projections of Multiple High-Dimensional Datasets. abstract: We present Jo

IDEAS Lab 7 Dec 18, 2022