Mind the Trade-off: Debiasing NLU Models without Degrading the In-distribution Performance

Overview

Mind the Trade-off: Debiasing NLU Models without Degrading the In-distribution Performance

Abstract: Models for natural language understanding (NLU) tasks often rely on the idiosyncratic biases of the dataset, which make them brittle against test cases outside the training distribution. Recently, several proposed debiasing methods are shown to be very effective in improving out-of-distribution performance. However, their improvements come at the expense of performance drop when models are evaluated on the in-distribution data, which contain examples with higher diversity. This seemingly inevitable trade-off may not tell us much about the changes in the reasoning and understanding capabilities of the resulting models on broader types of examples beyond the small subset represented in the out-of-distribution data. In this paper, we address this trade-off by introducing a novel debiasing method, called confidence regularization, which discourage models from exploiting biases while enabling them to receive enough incentive to learn from all the training examples. We evaluate our method on three NLU tasks and show that, in contrast to its predecessors, it improves the performance on out-of-distribution datasets (e.g., 7pp gain on HANS dataset) while maintaining the original in-distribution accuracy.

The repository contains the code to reproduce our work in debiasing NLU models without in-distribution degradation. We provide 2 runs of experiment that are shown in our paper:

  1. Debias MNLI model from syntactic bias and evaluate on HANS as the out-of-distribution data.
  2. Debias MNLI model from hypothesis-only bias and evaluate on MNLI-hard sets as the out-of-distribution data.

Requirements

The code requires python >= 3.6 and pytorch >= 1.1.0.

Additional required dependencies can be found in requirements.txt. Install all requirements by running:

pip install -r requirements.txt

Data

Our experiments use MNLI dataset version provided by GLUE benchmark. Download the file from here, and unzip under the directory ./dataset Additionally download the following files here for evaluating on hard/easy splits of both MNLI dev and test sets. The dataset directory should be structured as the following:

└── dataset 
    └── MNLI
        ├── train.tsv
        ├── dev_matched.tsv
        ├── dev_mismatched.tsv
        ├── dev_mismatched.tsv
        ├── dev_matched_easy.tsv
        ├── dev_matched_hard.tsv
        ├── dev_mismatched_easy.tsv
        ├── dev_mismatched_hard.tsv
        ├── multinli_hard
        │   ├── multinli_0.9_test_matched_unlabeled_hard.jsonl
        │   └── multinli_0.9_test_mismatched_unlabeled_hard.jsonl
        ├── multinli_test
        │   ├── multinli_0.9_test_matched_unlabeled.jsonl
        │   └── multinli_0.9_test_mismatched_unlabeled.jsonl
        └── original

Running the experiments

For each evaluation setting, use the --mode and --which_bias arguments to set the appropriate loss function and the type of bias to mitigate (e.g, hans, hypo).

To reproduce our result on MNLI ⮕ HANS, run the following:

cd src/
CUDA_VISIBLE_DEVICES=6 python train_distill_bert.py \
    --output_dir ../checkpoints/hans/bert_confreg_lr5_epoch3_seed444 \
    --do_train --do_eval --mode smoothed_distill \
    --seed 444 --which_bias hans

For the MNLI ⮕ hard splits, run the following:

cd src/
CUDA_VISIBLE_DEVICES=10 python train_distill_bert.py \
    --output_dir ../checkpoints/hypo/bert_confreg_lr5_epoch3_seed111 \
    --do_train --do_eval --mode smoothed_distill \
    --seed 111 --which_bias hypo

Expected results

Results on the MNLI ⮕ HANS setting:

Mode Seed MNLI-m MNLI-mm HANS avg.
None 111 84.57 84.72 62.04
conf-reg 111 84.17 85.02 69.62

Results on the MNLI ⮕ Hard-splits setting:

Mode Seed MNLI-m MNLI-mm MNLI-m hard MNLI-mm hard
None 111 84.62 84.71 76.07 76.75
conf-reg 111 85.01 84.87 78.02 78.89

Contact

Contact person: Ajie Utama, [email protected]

https://www.ukp.tu-darmstadt.de/

Please reach out to us for further questions or if you encounter any issue. You can cite this work by the following:

@InProceedings{UtamaDebias2020,
  author    = {Utama, P. Ajie and Moosavi, Nafise Sadat and Gurevych, Iryna},
  title     = {Mind the Trade-off: Debiasing NLU Models without Degrading the In-distribution Performance},
  booktitle = {Proceedings of the 58th Conference of the Association for Computational Linguistics},
  month     = jul,
  year      = {2020},
  publisher = {Association for Computational Linguistics}
}

Acknowledgement

The code in this repository is build on the implementation of debiasing method by Clark et al. The original version can be found here

Owner
Ubiquitous Knowledge Processing Lab
Ubiquitous Knowledge Processing Lab
Moving Object Segmentation in 3D LiDAR Data: A Learning-based Approach Exploiting Sequential Data

LiDAR-MOS: Moving Object Segmentation in 3D LiDAR Data This repo contains the code for our paper: Moving Object Segmentation in 3D LiDAR Data: A Learn

Photogrammetry & Robotics Bonn 394 Dec 29, 2022
NCVX (NonConVeX): A User-Friendly and Scalable Package for Nonconvex Optimization in Machine Learning.

NCVX NCVX: A User-Friendly and Scalable Package for Nonconvex Optimization in Machine Learning. Please check https://ncvx.org for detailed instruction

SUN Group @ UMN 28 Aug 03, 2022
A deep learning framework for historical document image analysis

DIVA-DAF Description A deep learning framework for historical document image analysis. How to run Install dependencies # clone project git clone https

9 Aug 04, 2022
Tensorflow implementation of Semi-supervised Sequence Learning (https://arxiv.org/abs/1511.01432)

Transfer Learning for Text Classification with Tensorflow Tensorflow implementation of Semi-supervised Sequence Learning(https://arxiv.org/abs/1511.01

DONGJUN LEE 82 Oct 22, 2022
This YoloV5 based model is fit to detect people and different types of land vehicles, and displaying their density on a fitted map, according to their coordinates and detected labels.

This YoloV5 based model is fit to detect people and different types of land vehicles, and displaying their density on a fitted map, according to their

Liron Bdolah 8 May 22, 2022
Evaluating AlexNet features at various depths

Linear Separability Evaluation This repo provides the scripts to test a learned AlexNet's feature representation performance at the five different con

Yuki M. Asano 32 Dec 30, 2022
Large-scale open domain KNOwledge grounded conVERsation system based on PaddlePaddle

Knover Knover is a toolkit for knowledge grounded dialogue generation based on PaddlePaddle. Knover allows researchers and developers to carry out eff

607 Dec 31, 2022
An End-to-End Machine Learning Library to Optimize AUC (AUROC, AUPRC).

Logo by Zhuoning Yuan LibAUC: A Machine Learning Library for AUC Optimization Website | Updates | Installation | Tutorial | Research | Github LibAUC a

Optimization for AI 176 Jan 07, 2023
Code release for Convolutional Two-Stream Network Fusion for Video Action Recognition

Convolutional Two-Stream Network Fusion for Video Action Recognition

Christoph Feichtenhofer 676 Dec 31, 2022
A set of tools for creating and testing machine learning features, with a scikit-learn compatible API

Feature Forge This library provides a set of tools that can be useful in many machine learning applications (classification, clustering, regression, e

Machinalis 380 Nov 05, 2022
UMich 500-Level Mobile Robotics Course

MOBILE ROBOTICS: METHODS & ALGORITHMS - WINTER 2022 University of Michigan - NA 568/EECS 568/ROB 530 For slides, lecture notes, and example codes, see

393 Dec 29, 2022
Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network

Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network This repository is the official implementation of Speech Separati

Kai Li (李凯) 116 Nov 09, 2022
Tensorflow implementation of ID-Unet: Iterative Soft and Hard Deformation for View Synthesis.

ID-Unet: Iterative-view-synthesis(CVPR2021 Oral) Tensorflow implementation of ID-Unet: Iterative Soft and Hard Deformation for View Synthesis. Overvie

17 Aug 23, 2022
Python codes for Lite Audio-Visual Speech Enhancement.

Lite Audio-Visual Speech Enhancement (Interspeech 2020) Introduction This is the PyTorch implementation of Lite Audio-Visual Speech Enhancement (LAVSE

Shang-Yi Chuang 85 Dec 01, 2022
Learning kernels to maximize the power of MMD tests

Code for the paper "Generative Models and Model Criticism via Optimized Maximum Mean Discrepancy" (arXiv:1611.04488; published at ICLR 2017), by Douga

Danica J. Sutherland 201 Dec 17, 2022
PyTorch Implement of Context Encoders: Feature Learning by Inpainting

Context Encoders: Feature Learning by Inpainting This is the Pytorch implement of CVPR 2016 paper on Context Encoders 1) Semantic Inpainting Demo Inst

321 Dec 25, 2022
Code & Data for the Paper "Time Masking for Temporal Language Models", WSDM 2022

Time Masking for Temporal Language Models This repository provides a reference implementation of the paper: Time Masking for Temporal Language Models

Guy Rosin 12 Jan 06, 2023
CCPD: a diverse and well-annotated dataset for license plate detection and recognition

CCPD (Chinese City Parking Dataset, ECCV) UPdate on 10/03/2019. CCPD Dataset is now updated. We are confident that images in subsets of CCPD is much m

detectRecog 1.8k Dec 30, 2022
CausalNLP is a practical toolkit for causal inference with text as treatment, outcome, or "controlled-for" variable.

CausalNLP CausalNLP is a practical toolkit for causal inference with text as treatment, outcome, or "controlled-for" variable. Install pip install -U

Arun S. Maiya 95 Jan 03, 2023
[CVPR 2021] MiVOS - Mask Propagation module. Reproduced STM (and better) with training code :star2:. Semi-supervised video object segmentation evaluation.

MiVOS (CVPR 2021) - Mask Propagation Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang [arXiv] [Paper PDF] [Project Page] [Papers with Code] This repo impleme

Rex Cheng 106 Jan 03, 2023