Diagnostic tests for linguistic capacities in language models

Overview

LM diagnostics

This repository contains the diagnostic datasets and experimental code for What BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language models, by Allyson Ettinger.

Diagnostic test data

The datasets folder contains TSV files with data for each diagnostic test, along with explanatory README files for each dataset.

Code

[All code now updated to be run with Python 3.]

The code in this section can be used to process the diagnostic datasets for input to a language model, and then to run the diagnostic tests on that language model's predictions. The code should be used in three steps:

Step 1: Process datasets to produce inputs for LM

proc_datasets.py can be used to process the provided datasets into 1) <testname>-contextlist files containing contexts (one per line) on which the LM's predictions should be conditioned, and b) <testname>-targetlist files containing target words (one per line, aligned with the contexts in *-contextlist) for which you will need probabilities conditioned on the corresponding contexts. Repeats in *-contextlist are intentional, to align with the targets in *-targetlist.

Basic usage:

python proc_datasets.py \
  --outputdir <location for output files> \
  --role_stim datasets/ROLE-88/ROLE-88.tsv \
  --negnat_stim datasets/NEG-88/NEG-88-NAT.tsv \
  --negsimp_stim datasets/NEG-88/NEG-88-SIMP.tsv \
  --cprag_stim datasets/CPRAG-34/CPRAG-34.tsv \
  --add_mask_tok
  • add_mask_tok flag will append '[MASK]' to the contexts in *-contextlist, for use with BERT.
  • <testname> comes from the following list: cprag, role, negsimp, negnat for CPRAG-34, ROLE-88, NEG-88-SIMP and NEG-88-NAT, respectively.

Step 2: Get LM predictions/probabilities

You will need to produce two files: one containing top word predictions conditioned on each context, and one containing the probabilities for each target word conditioned on its corresponding context.

Predictions: Model word predictions should be written to a file with naming modelpreds-<testname>-<modelname>. Each line of this file should contain the top word predictions conditioned on the context in the corresponding line in *-contextlist. Word predictions on a given line should be separated by whitespace. Number of predictions per line should be no less than the highest k that you want to use for accuracy tests.

Probabilities Model target probabilities should be written to a file with naming modeltgtprobs-<testname>-<modelname>. Each line of this file should contain the probability of the target word on the corresponding line of *-targetlist, conditioned on the context on the corresponding line of *-contextlist.

  • <testname> list is as above. <modelname> should be the name of the model that will be input to the code in Step 3.

Step 3: Run accuracy and sensitivity tests for each diagnostic

prediction_accuracy_tests.py takes modelpreds-<testname>-<modelname> as input and runs word prediction accuracy tests.

Basic usage:

python prediction_accuracy_tests.py \
  --preddir <location of modelpreds-<testname>-<modelname>> \
  --resultsdir <location for results files> \
  --models <names of models to be tested, e.g., bert-base-uncased bert-large-uncased> \
  --k_values <list of k values to be tested, e.g., 1 5> \
  --role_stim datasets/ROLE-88/ROLE-88.tsv \
  --negnat_stim datasets/NEG-88/NEG-88-NAT.tsv \
  --negsimp_stim datasets/NEG-88/NEG-88-SIMP.tsv \
  --cprag_stim datasets/CPRAG-34/CPRAG-34.tsv

sensitivity_tests.py takes modeltgtprobs-<testname>-<modelname> as input and runs sensitivity tests.

Basic usage:

python sensitivity_tests.py \
  --probdir <location of modelpreds-<testname>-<modelname>> \
  --resultsdir <location for results files> \
  --models <names of models to be tested, e.g., bert-base-uncased bert-large-uncased> \
  --role_stim datasets/ROLE-88/ROLE-88.tsv \
  --negnat_stim datasets/NEG-88/NEG-88-NAT.tsv \
  --negsimp_stim datasets/NEG-88/NEG-88-SIMP.tsv \
  --cprag_stim datasets/CPRAG-34/CPRAG-34.tsv

Experimental code

run_diagnostics_bert.py is the code that was used for the experiments on BERTBASE and BERTLARGE reported in the paper, including perturbations.

Example usage:

python run_diagnostics_bert.py \
  --cprag_stim datasets/CPRAG-34/CPRAG-34.tsv \
  --role_stim datasets/ROLE-88/ROLE-88.tsv \
  --negnat_stim datasets/NEG-88/NEG-88-NAT.tsv \
  --negsimp_stim datasets/NEG-88/NEG-88-SIMP.tsv \
  --resultsdir <location for results files> \
  --bertbase <BERT BASE location> \
  --bertlarge <BERT LARGE location> \
  --incl_perturb
  • bertbase and bertlarge specify locations for PyTorch BERTBASE and BERTLARGE models -- each folder is expected to include vocab.txt, bert_config.json, and pytorch_model.bin for the corresponding PyTorch BERT model. (Note that experiments were run with the original pytorch-pretrained-bert version, so I can't guarantee identical results with the updated pytorch-transformers.)
  • incl_perturb runs experiments with all perturbations reported in the paper. Without this flag, only runs experiments without perturbations.
[ICML'21] Estimate the accuracy of the classifier in various environments through self-supervision

What Does Rotation Prediction Tell Us about Classifier Accuracy under Varying Testing Environments? [Paper] [ICML'21 Project] PyTorch Implementation T

24 Oct 26, 2022
Repository of 3D Object Detection with Pointformer (CVPR2021)

3D Object Detection with Pointformer This repository contains the code for the paper 3D Object Detection with Pointformer (CVPR 2021) [arXiv]. This wo

Zhuofan Xia 117 Jan 06, 2023
Unoffical implementation about Image Super-Resolution via Iterative Refinement by Pytorch

Image Super-Resolution via Iterative Refinement Paper | Project Brief This is a unoffical implementation about Image Super-Resolution via Iterative Re

LiangWei Jiang 2.5k Jan 02, 2023
PaddleRobotics is an open-source algorithm library for robots based on Paddle, including open-source parts such as human-robot interaction, complex motion control, environment perception, SLAM positioning, and navigation.

简体中文 | English PaddleRobotics paddleRobotics是基于paddle的机器人开源算法库集,包括人机交互、复杂运动控制、环境感知、slam定位导航等开源算法部分。 人机交互 主动多模交互技术TFVT-HRI 主动多模交互技术是通过视觉、语音、触摸传感器等输入机器人

185 Dec 26, 2022
Tensorflow Repo for "DeepGCNs: Can GCNs Go as Deep as CNNs?"

DeepGCNs: Can GCNs Go as Deep as CNNs? In this work, we present new ways to successfully train very deep GCNs. We borrow concepts from CNNs, mainly re

Guohao Li 612 Nov 15, 2022
Deep Learning Algorithms for Hedging with Frictions

Deep Learning Algorithms for Hedging with Frictions This repository contains the Forward-Backward Stochastic Differential Equation (FBSDE) solver and

Xiaofei Shi 3 Dec 22, 2022
Equipped customers with insights about their EVs Hourly energy consumption and helped predict future charging behavior using LSTM model

Equipped customers with insights about their EVs Hourly energy consumption and helped predict future charging behavior using LSTM model. Designed sample dashboard with insights and recommendation for

Yash 2 Apr 07, 2022
[TIP 2020] Multi-Temporal Scene Classification and Scene Change Detection with Correlation based Fusion

Multi-Temporal Scene Classification and Scene Change Detection with Correlation based Fusion Code for Multi-Temporal Scene Classification and Scene Ch

Lixiang Ru 33 Dec 12, 2022
Code for the paper "Controllable Video Captioning with an Exemplar Sentence"

SMCG Code for the paper "Controllable Video Captioning with an Exemplar Sentence" Introduction We investigate a novel and challenging task, namely con

10 Dec 04, 2022
Tandem Mass Spectrum Prediction with Graph Transformers

MassFormer This is the original implementation of MassFormer, a graph transformer for small molecule MS/MS prediction. Check out the preprint on arxiv

Röst Lab 13 Oct 27, 2022
We propose a new method for effective shadow removal by regarding it as an exposure fusion problem.

Auto-exposure fusion for single-image shadow removal We propose a new method for effective shadow removal by regarding it as an exposure fusion proble

Qing Guo 146 Dec 31, 2022
Keras implementation of Deeplab v3+ with pretrained weights

Keras implementation of Deeplabv3+ This repo is not longer maintained. I won't respond to issues but will merge PR DeepLab is a state-of-art deep lear

1.3k Dec 07, 2022
Pytorch implementation of Generative Models as Distributions of Functions 🌿

Generative Models as Distributions of Functions This repo contains code to reproduce all experiments in Generative Models as Distributions of Function

Emilien Dupont 117 Dec 29, 2022
HAT: Hierarchical Aggregation Transformers for Person Re-identification

HAT: Hierarchical Aggregation Transformers for Person Re-identification

11 Sep 05, 2022
Multi-Scale Aligned Distillation for Low-Resolution Detection (CVPR2021)

MSAD Multi-Scale Aligned Distillation for Low-Resolution Detection Lu Qi*, Jason Kuen*, Jiuxiang Gu, Zhe Lin, Yi Wang, Yukang Chen, Yanwei Li, Jiaya J

DV Lab 115 Dec 23, 2022
Learning To Have An Ear For Face Super-Resolution

Learning To Have An Ear For Face Super-Resolution [Project Page] This repository contains demo code of our CVPR2020 paper. Training and evaluation on

50 Nov 16, 2022
This example implements the end-to-end MLOps process using Vertex AI platform and Smart Analytics technology capabilities

MLOps with Vertex AI This example implements the end-to-end MLOps process using Vertex AI platform and Smart Analytics technology capabilities. The ex

Google Cloud Platform 238 Dec 21, 2022
Human-Pose-and-Motion History

Human Pose and Motion Scientist Approach Eadweard Muybridge, The Galloping Horse Portfolio, 1887 Etienne-Jules Marey, Descent of Inclined Plane, Chron

Daito Manabe 47 Dec 16, 2022
Project NII pytorch scripts

project-NII-pytorch-scripts By Xin Wang, National Institute of Informatics, since 2021 I am a new pytorch user. If you have any suggestions or questio

Yamagishi and Echizen Laboratories, National Institute of Informatics 184 Dec 23, 2022
Equivariant layers for RC-complement symmetry in DNA sequence data

Equi-RC Equivariant layers for RC-complement symmetry in DNA sequence data This is a repository that implements the layers as described in "Reverse-Co

7 May 19, 2022