Source code for the paper: Variance-Aware Machine Translation Test Sets (NeurIPS 2021 Datasets and Benchmarks Track)

Overview

Variance-Aware-MT-Test-Sets

Variance-Aware Machine Translation Test Sets

License

See LICENSE. We follow the data licensing plan as the same as the WMT benchmark.

VAT Data

We release 70 lightweight and discriminative test sets for machine translation evaluation, covering 35 translation directions from WMT16 to WMT20 competitions. See VAT_data folder for detailed information.

For each translation direction of a specific year, both source and reference are provided for different types of evaluation metrics. For example,

VAT_data/
├── wmt20
    ├── ...
    ├── vat_newstest2020-zhen-ref.en.txt
    └── vat_newstest2020-zhen-src.zh.txt

Meta-Information of VAT

We also provide the meta-inforamtion of reserved data. Each json file contains the IDs of retained data in the original test set. For instance, file wmt20/bert-r_filter-std60.json describes:

{
	...
	"en-de": [4, 6, 10, 13, 14, 15, ...],
	"de-en": [0, 3, 4, 5, 7, 9, ...],
	...
}

Reproduce & Create VAT

The reported results in the paper were produced by single NVIDIA GeForce 1080Ti card.

We will keep updating the code and related documentation after the response.

Requirements

  • sacreBLEU version >= 1.4.14
  • BLEURT version >= 0.0.2
  • COMET version >= 0.1.0
  • BERTScore version >= 0.3.7 (hug_trans==4.2.1)
  • PyTorch version >= 1.5.1
  • Python version >= 3.8
  • CUDA & cudatoolkit >= 10.1

Note: the minimal version is for reproducing the results

Pipeline

  1. Use score_xxx.py to generate the CSV files that stores the sentence-level scores evaluated by the corresponding metrics. For example, evaluating all the WMT20 submissions of all the language pairs using BERTScore:
    CUDA_VISIBLE_DEVICES=0 python score_bert.py -b 128 -s -r dummy -c dummy --rescale_with_baseline \
    	--hypos-dir ${WMT_DATA_PATH}/system-outputs \
    	--refs-dir ${WMT_DATA_PATH}/references \
    	--scores-dir ${WMT_DATA_PATH}/results/system-level/scores_ALL \
    	--testset-name newstest2020 --score-dump wmt20-bertscore.csv
    (Alternative Option) You can use your implementation for dumping the scores given by the metrics. But the CSV header should contain:
    ,TESTSET,LP,ID,METRIC,SYS,SCORE
    
  2. Use cal_filtering.py to filter the test set based on the score warehouse calculated in the last step. For example,
    python cal_filtering.py --score-dump wmt20-bertscore.csv --output VAT_meta/wmt20-test/ --filter-per 60
    It will produce the json files which contain the IDs of reserved sentences.

Statistics of VAT (References)

Benchmark Translation Direction # Sentences # Words # Vocabulary
wmt20 km-en 928 17170 3645
wmt20 cs-en 266 12568 3502
wmt20 en-de 567 21336 5945
wmt20 ja-en 397 10526 3063
wmt20 ps-en 1088 20296 4303
wmt20 en-zh 567 18224 5019
wmt20 en-ta 400 7809 4028
wmt20 de-en 314 16083 4046
wmt20 zh-en 800 35132 6457
wmt20 en-ja 400 12718 2969
wmt20 en-cs 567 16579 6391
wmt20 en-pl 400 8423 3834
wmt20 en-ru 801 17446 6877
wmt20 pl-en 400 7394 2399
wmt20 iu-en 1188 23494 3876
wmt20 ru-en 396 6966 2330
wmt20 ta-en 399 7427 2148
wmt19 zh-en 800 36739 6168
wmt19 en-cs 799 15433 6111
wmt19 de-en 800 15219 4222
wmt19 en-gu 399 8494 3548
wmt19 fr-de 680 12616 3698
wmt19 en-zh 799 20230 5547
wmt19 fi-en 798 13759 3555
wmt19 en-fi 799 13303 6149
wmt19 kk-en 400 9283 2584
wmt19 de-cs 799 15080 6166
wmt19 lt-en 400 10474 2874
wmt19 en-lt 399 7251 3364
wmt19 ru-en 800 14693 3817
wmt19 en-kk 399 6411 3252
wmt19 en-ru 799 16393 6125
wmt19 gu-en 406 8061 2434
wmt19 de-fr 680 16181 3517
wmt19 en-de 799 18946 5340
wmt18 en-cs 1193 19552 7926
wmt18 cs-en 1193 23439 5453
wmt18 en-fi 1200 16239 7696
wmt18 en-tr 1200 19621 8613
wmt18 en-et 800 13034 6001
wmt18 ru-en 1200 26747 6045
wmt18 et-en 800 20045 5045
wmt18 tr-en 1200 25689 5955
wmt18 fi-en 1200 24912 5834
wmt18 zh-en 1592 42983 7985
wmt18 en-zh 1592 34796 8579
wmt18 en-ru 1200 22830 8679
wmt18 de-en 1199 28275 6487
wmt18 en-de 1199 25473 7130
wmt17 en-lv 800 14453 6161
wmt17 zh-en 800 20590 5149
wmt17 en-tr 1203 17612 7714
wmt17 lv-en 800 18653 4747
wmt17 en-de 1202 22055 6463
wmt17 ru-en 1200 24807 5790
wmt17 en-fi 1201 17284 7763
wmt17 tr-en 1203 23037 5387
wmt17 en-zh 800 18001 5629
wmt17 en-ru 1200 22251 8761
wmt17 fi-en 1201 23791 5300
wmt17 en-cs 1202 21278 8256
wmt17 de-en 1202 23838 5487
wmt17 cs-en 1202 22707 5310
wmt16 tr-en 1200 19225 4823
wmt16 ru-en 1199 23010 5442
wmt16 ro-en 800 16200 3968
wmt16 de-en 1200 22612 5511
wmt16 en-ru 1199 20233 7872
wmt16 fi-en 1200 20744 5176
wmt16 cs-en 1200 23235 5324
Owner
NLP2CT Lab, University of Macau
Natural Language Processing & Portuguese - Chinese Machine Translation Laboratory
NLP2CT Lab, University of Macau
potpourri3d - An invigorating blend of 3D geometry tools in Python.

A Python library of various algorithms and utilities for 3D triangle meshes and point clouds. Managed by Nicholas Sharp, with new tools added lazily as needed. Currently, mainly bindings to C++ tools

Nicholas Sharp 295 Jan 05, 2023
FedJAX is a library for developing custom Federated Learning (FL) algorithms in JAX.

FedJAX: Federated learning with JAX What is FedJAX? FedJAX is a library for developing custom Federated Learning (FL) algorithms in JAX. FedJAX priori

Google 208 Dec 14, 2022
TensorFlow implementation of "A Simple Baseline for Bayesian Uncertainty in Deep Learning"

TensorFlow implementation of "A Simple Baseline for Bayesian Uncertainty in Deep Learning"

YeongHyeon Park 7 Aug 28, 2022
Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

Tensor2Tensor Tensor2Tensor, or T2T for short, is a library of deep learning models and datasets designed to make deep learning more accessible and ac

12.9k Jan 09, 2023
Image Completion with Deep Learning in TensorFlow

Image Completion with Deep Learning in TensorFlow See my blog post for more details and usage instructions. This repository implements Raymond Yeh and

Brandon Amos 1.3k Dec 23, 2022
Bottom-up Human Pose Estimation

Introduction This is the official code of Rethinking the Heatmap Regression for Bottom-up Human Pose Estimation. This paper has been accepted to CVPR2

108 Dec 01, 2022
Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network

Super Resolution Examples We run this script under TensorFlow 2.0 and the TensorLayer2.0+. For TensorLayer 1.4 version, please check release. 🚀 🚀 🚀

TensorLayer Community 2.9k Jan 08, 2023
Deepface is a lightweight face recognition and facial attribute analysis (age, gender, emotion and race) framework for python

deepface Deepface is a lightweight face recognition and facial attribute analysis (age, gender, emotion and race) framework for python. It is a hybrid

Kushal Shingote 2 Feb 10, 2022
MTCNN face detection implementation for TensorFlow, as a PIP package.

MTCNN Implementation of the MTCNN face detector for Keras in Python3.4+. It is written from scratch, using as a reference the implementation of MTCNN

Iván de Paz Centeno 1.9k Dec 30, 2022
Experiments with differentiable stacks and queues in PyTorch

Please use stacknn-core instead! StackNN This project implements differentiable stacks and queues in PyTorch. The data structures are implemented in s

Will Merrill 141 Oct 06, 2022
Open source simulator for autonomous vehicles built on Unreal Engine / Unity, from Microsoft AI & Research

Welcome to AirSim AirSim is a simulator for drones, cars and more, built on Unreal Engine (we now also have an experimental Unity release). It is open

Microsoft 13.8k Jan 05, 2023
🚗 INGI Dakar 2K21 - Be the first one on the finish line ! 🚗

🚗 INGI Dakar 2K21 - Be the first one on the finish line ! 🚗 This year's first semester Club Info challenge will put you at the head of a car racing

ClubINFO INGI (UCLouvain) 6 Dec 10, 2021
Vector AI — A platform for building vector based applications. Encode, query and analyse data using vectors.

Vector AI is a framework designed to make the process of building production grade vector based applications as quickly and easily as possible. Create

Vector AI 267 Dec 23, 2022
Implementation of Cross Transformer for spatially-aware few-shot transfer, in Pytorch

Cross Transformers - Pytorch (wip) Implementation of Cross Transformer for spatially-aware few-shot transfer, in Pytorch Install $ pip install cross-t

Phil Wang 40 Dec 22, 2022
Space Ship Simulator using python

FlyOver Basic space-ship simulator using python How to run? Just double click run.py What modules do i need? All modules that i currently using is bui

0 Oct 09, 2022
Fuzzing JavaScript Engines with Aspect-preserving Mutation

DIE Repository for "Fuzzing JavaScript Engines with Aspect-preserving Mutation" (in S&P'20). You can check the paper for technical details. Environmen

gts3.org (<a href=[email protected])"> 190 Dec 11, 2022
An Open Source Machine Learning Framework for Everyone

Documentation TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, a

170.1k Jan 04, 2023
This is an easy python software which allows to sort images with faces by gender and after by age.

Gender-age Classifier This is an easy python software which allows to sort images with faces by gender and after by age. Usage First install Deepface

Claudio Ciccarone 6 Sep 17, 2022
PyTorch implementation of Deformable Convolution

Deformable Convolutional Networks in PyTorch This repo is an implementation of Deformable Convolution. Ported from author's MXNet implementation. Buil

411 Dec 16, 2022
[ICCV 2021] Official Tensorflow Implementation for "Single Image Defocus Deblurring Using Kernel-Sharing Parallel Atrous Convolutions"

KPAC: Kernel-Sharing Parallel Atrous Convolutional block This repository contains the official Tensorflow implementation of the following paper: Singl

Hyeongseok Son 50 Dec 29, 2022