Official PyTorch Implementation of SSMix (Findings of ACL 2021)

Related tags

Deep Learningssmix
Overview

SSMix: Saliency-based Span Mixup for Text Classification (Findings of ACL 2021)

Official PyTorch Implementation of SSMix | Paper


SSMix

Abstract

Data augmentation with mixup has shown to be effective on various computer vision tasks. Despite its great success, there has been a hurdle to apply mixup to NLP tasks since text consists of discrete tokens with variable length. In this work, we propose SSMix, a novel mixup method where the operation is performed on input text rather than on hidden vectors like previous approaches. SSMix synthesizes a sentence while preserving the locality of two original texts by span-based mixing and keeping more tokens related to the prediction relying on saliency information. With extensive experiments, we empirically validate that our method outperforms hidden-level mixup methods on the wide range of text classification benchmarks, including textual entailment, sentiment classification, and question-type classification.

Code Structure

|__ augmentation/ --> augmentation methods by method type
    |__ __init__.py --> wrapper for all augmentation methods. Contains metric used for single & paired sentence tasks
    |__ saliency.py --> Calculates saliency by L2 norm gradient backpropagation
    |__ ssmix.py --> Output ssmix sentence with options such as no span and no saliency given two input sentence with additional information
    |__ unk.py --> Output randomly replaced unk sentence 
|__ read_data/ --> Module used for loading data
    |__ __init__.py --> wrapper function for getting data split by train and valid depending on dataset type
    |__  dataset.py --> Class to get NLU dataset
    |__ preprocess.py --> preprocessor that makes input, label, and accuracy metric depending on dataset type
|__ trainer.py --> Code that does actual training 
|__ run_train.py --> Load hyperparameter, initiate training, pipeline
|__ classifiation_model.py -> Augmented from huggingface modeling_bert.py. Define BERT architectures that can handle multiple inputs for Tmix

Part of code is modified from the MixText implementation.

Getting Started

pip install -r requirements.txt

Code is runnable on both CPU and GPU, but we highly recommended to run on GPU. Strictly following the versions specified in the requirements.txt file is desirable to sucessfully execute our code without errors.

Model Training

python run_train.py --batch_size ${BSZ} --seed ${SEED} --dataset {DATASET} --optimizer_lr ${LR} ${MODE}

For all our experiments, we use 32 as the batch size (BSZ), and perform five different runs by changing the seed (SEED) from 0 to 4. We experiment on a wide range of text classifiction datasets (DATASET): 'sst2', 'qqp', 'mnli', 'qnli', 'rte', 'mrpc', 'trec-coarse', 'trec-fine', 'anli'. You should set --anli_round argument to one of 1, 2, 3 for the ANLI dataset.

Once you run the code, trained checkpoints are created under checkpoints directory. To train a model without mixup, you have to set MODE to 'normal'. To run with mixup approaches including our SSMix, you should set MODE as the name of the mixup method ('ssmix', 'tmix', 'embedmix', 'unk'). We load the checkpoint trained without mixup before training with mixup. We use 5e-5 for the normal mode and 1e-5 for mixup methods as the learning rate (LR).

You can modify the argument values (e.g., embed_alpha, hidden_alpha, etc) to adjust to your training hyperparameter needs. For ablation study of SSMix, you can exclude salieny constraint (--ss_no_saliency) or span constraint (--ss_no_span). Type python run_train.py --help or check run_train.py to see the full list of available hyperparameters. For debugging or analysis, you can turn on verbose options (--verbose and --verbose_show_augment_example).

License

Copyright 2021-present NAVER Corp.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Owner
Clova AI Research
Open source repository of Clova AI Research, NAVER & LINE
Clova AI Research
PyTorch implementation of "PatchGame: Learning to Signal Mid-level Patches in Referential Games" to appear in NeurIPS 2021

PatchGame: Learning to Signal Mid-level Patches in Referential Games This repository is the official implementation of the paper - "PatchGame: Learnin

Kamal Gupta 22 Mar 16, 2022
A Number Recognition algorithm

Paddle-VisualAttention Results_Compared SVHN Dataset Methods Steps GPU Batch Size Learning Rate Patience Decay Step Decay Rate Training Speed (FPS) Ac

1 Nov 12, 2021
Official implementation of Protected Attribute Suppression System, ICCV 2021

Official implementation of Protected Attribute Suppression System, ICCV 2021

Prithviraj Dhar 6 Jan 01, 2023
Official repository of the paper 'Essentials for Class Incremental Learning'

Essentials for Class Incremental Learning Official repository of the paper 'Essentials for Class Incremental Learning' This Pytorch repository contain

33 Nov 27, 2022
Feed forward VQGAN-CLIP model, where the goal is to eliminate the need for optimizing the latent space of VQGAN for each input prompt

Feed forward VQGAN-CLIP model, where the goal is to eliminate the need for optimizing the latent space of VQGAN for each input prompt. This is done by

Mehdi Cherti 135 Dec 30, 2022
NitroFE is a Python feature engineering engine which provides a variety of modules designed to internally save past dependent values for providing continuous calculation.

NitroFE is a Python feature engineering engine which provides a variety of modules designed to internally save past dependent values for providing continuous calculation.

100 Sep 28, 2022
StyleGAN - Official TensorFlow Implementation

StyleGAN — Official TensorFlow Implementation Picture: These people are not real – they were produced by our generator that allows control over differ

NVIDIA Research Projects 13.1k Jan 09, 2023
A toolkit for Lagrangian-based constrained optimization in Pytorch

Cooper About Cooper is a toolkit for Lagrangian-based constrained optimization in Pytorch. This library aims to encourage and facilitate the study of

Cooper 34 Jan 01, 2023
Code release for ICCV 2021 paper "Anticipative Video Transformer"

Anticipative Video Transformer Ranked first in the Action Anticipation task of the CVPR 2021 EPIC-Kitchens Challenge! (entry: AVT-FB-UT) [project page

Facebook Research 123 Dec 13, 2022
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

Microsoft 8.4k Jan 01, 2023
Finite-temperature variational Monte Carlo calculation of uniform electron gas using neural canonical transformation.

CoulombGas This code implements the neural canonical transformation approach to the thermodynamic properties of uniform electron gas. Building on JAX,

FermiFlow 9 Mar 03, 2022
CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation

[ICCV2021] TransReID: Transformer-based Object Re-Identification [pdf] The official repository for TransReID: Transformer-based Object Re-Identificati

DamoCV 569 Dec 30, 2022
CBREN: Convolutional Neural Networks for Constant Bit Rate Video Quality Enhancement

CBREN This is the Pytorch implementation for our IEEE TCSVT paper : CBREN: Convolutional Neural Networks for Constant Bit Rate Video Quality Enhanceme

Zhao Hengrun 3 Nov 04, 2022
Official implementation of the paper DeFlow: Learning Complex Image Degradations from Unpaired Data with Conditional Flows

DeFlow: Learning Complex Image Degradations from Unpaired Data with Conditional Flows Official implementation of the paper DeFlow: Learning Complex Im

Valentin Wolf 86 Nov 16, 2022
Multi Camera Calibration

Multi Camera Calibration 'modules/camera_calibration/app/camera_calibration.cpp' is for calculating extrinsic parameter of each individual cameras. 'm

7 Dec 01, 2022
A PyTorch implementation of SlowFast based on ICCV 2019 paper "SlowFast Networks for Video Recognition"

SlowFast A PyTorch implementation of SlowFast based on ICCV 2019 paper SlowFast Networks for Video Recognition. Requirements Anaconda PyTorch conda in

Hao Ren 8 Dec 23, 2022
Software for Multimodalty 2D+3D Facial Expression Recognition (FER) UI

EmotionUI Software for Multimodalty 2D+3D Facial Expression Recognition (FER) UI. demo screenshot (with RealSense) required packages Python = 3.6 num

Yang Jiao 2 Dec 23, 2021
Random-Afg - Afghanistan Random Old Idz Cloner Tools

AFGHANISTAN RANDOM OLD IDZ CLONER TOOLS Install $ apt update $ apt upgrade $ apt

MAHADI HASAN AFRIDI 5 Jan 26, 2022
Keras implementation of PersonLab for Multi-Person Pose Estimation and Instance Segmentation.

PersonLab This is a Keras implementation of PersonLab for Multi-Person Pose Estimation and Instance Segmentation. The model predicts heatmaps and vari

OCTI 160 Dec 21, 2022
Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds (CVPR 2022, Oral)

Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds (CVPR 2022, Oral) This is the official implementat

Yifan Zhang 259 Dec 25, 2022