Supervised Contrastive Learning for Downstream Optimized Sequence Representations

Last update: Oct 21, 2022

Related tags

Overview

SupCL-Seq 📖

Supervised Contrastive Learning for Downstream Optimized Sequence representations (SupCS-Seq) accepted to be published in EMNLP 2021, extends the supervised contrastive learning from computer vision to the optimization of sequence representations in NLP. By altering the dropout mask probability in standard Transformer architectures (e.g. BERT_base), for every representation (anchor), we generate augmented altered views. A supervised contrastive loss is then utilized to maximize the system’s capability of pulling together similar samples (e.g. anchors and their altered views) and pushing apart the samples belonging to the other classes. Despite its simplicity, SupCL-Seq leads to large gains in many sequence classification tasks on the GLUE benchmark compared to a standard BERT_base, including 6% absolute improvement on CoLA, 5.4% on MRPC, 4.7% on RTE and 2.6% on STS-B.

This package can be easily run on almost all of the transformer models in Huggingface 🤗 that contain an encoder including but not limited to:

ALBERT
BERT
BigBird
RoBerta
ERNIE
And many more models!

GLUE Benchmark BERT SupCL-SEQ

GLUE Benchmark BERT SupCL-SEQ

The table below reports the improvements over naive finetuning of BERT model on GLUE benchmark. We employed [CLS] token during training and expect that using the mean would further improve these results.

Installation

First you need to install one of, or both, TensorFlow 2.0 and PyTorch. Please refer to TensorFlow installation page, PyTorch installation page and/or Flax installation page regarding the specific install command for your platform.
Second step:

$ pip install SupCL-Seq

Usage

The package builds on the trainer from Huggingface 🤗 . Therefore, its use is exactly similar to trainer. The pipeline works as follows:

First employ supervised contrastive learning to constratively optimize sentence embeddings using your annotated data.

from SupCL_Seq import SupCsTrainer

SupCL_trainer = SupCsTrainer.SupCsTrainer(
            w_drop_out=[0.0,0.05,0.2],      # Number of views and their associated mask drop-out probabilities [Optional]
            temperature= 0.05,              # Temeprature for the contrastive loss function [Optional]
            def_drop_out=0.1,               # Default drop out of the transformer, this is usually 0.1 [Optional]
            pooling_strategy='mean',        # Strategy used to extract embeddings can be from `mean` or `pooling` [Optional]
            model = model,                  # model
            args = CL_args,                 # Arguments from `TrainingArguments` [Optional]
            train_dataset=train_dataset,    # Train dataloader
            tokenizer=tokenizer,            # Tokenizer
            compute_metrics=compute_metrics # If you need a customized evaluation [Optional]
        )

After contrastive training:

2.1 Add a linear classification layer to your model

2.2 Freeze the base layer

2.3 Finetune the linear layer on your annotated data

For detailed implementation see glue.ipynb

Run on GLUE

In order to evaluate the method on GLUE benchmark please see the glue.ipynb

How to Cite

@misc{sedghamiz2021supclseq,
      title={SupCL-Seq: Supervised Contrastive Learning for Downstream Optimized Sequence Representations}, 
      author={Hooman Sedghamiz and Shivam Raval and Enrico Santus and Tuka Alhanai and Mohammad Ghassemi},
      year={2021},
      eprint={2109.07424},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

References

[1] Supervised Contrastive Learning

[2] SimCSE: Simple Contrastive Learning of Sentence Embeddings

Supervised Contrastive Learning for Downstream Optimized Sequence Representations

Related tags

Overview

SupCL-Seq 📖

Table of Contents

GLUE Benchmark BERT SupCL-SEQ

Installation

Usage

Run on GLUE

How to Cite

References

Owner

Hooman Sedghamiz

Code repository for "Free View Synthesis", ECCV 2020.

Re-implementation of 'Grokking: Generalization beyond overfitting on small algorithmic datasets'

Car Price Predictor App used to predict the price of the car based on certain input parameters created using python's scikit-learn, fastapi, numpy and joblib packages.

Simple image captioning model - CLIP prefix captioning.

Spectrum Surveying: Active Radio Map Estimation with Autonomous UAVs

Code for "NeRS: Neural Reflectance Surfaces for Sparse-View 3D Reconstruction in the Wild," in NeurIPS 2021

Convnext-tf - Unofficial tensorflow keras implementation of ConvNeXt

Kalman Filter book using Jupyter Notebook. Focuses on building intuition and experience, not formal proofs. Includes Kalman filters,extended Kalman filters, unscented Kalman filters, particle filters, and more. All exercises include solutions.

Official implementation of NeurIPS 2021 paper "One Loss for All: Deep Hashing with a Single Cosine Similarity based Learning Objective"

PyTorch implementation for Graph Contrastive Learning with Augmentations

Code for MSc Quantitative Finance Dissertation

Disease Informed Neural Networks (DINNs) — neural networks capable of learning how diseases spread, forecasting their progression, and finding their unique parameters (e.g. death rate).

User-friendly bulk RNAseq deconvolution using simulated annealing

A higher performance pytorch implementation of DeepLab V3 Plus(DeepLab v3+)

Differentiable Surface Triangulation

Codebase for Diffusion Models Beat GANS on Image Synthesis.

Python scripts for performing 3D human pose estimation using the Mobile Human Pose model in ONNX.

Inverse Rendering for Complex Indoor Scenes: Shape, Spatially-Varying Lighting and SVBRDF From a Single Image

PyTorch implementation of "Learn to Dance with AIST++: Music Conditioned 3D Dance Generation."

Collapse by Conditioning: Training Class-conditional GANs with Limited Data