BEAMetrics: Benchmark to Evaluate Automatic Metrics in Natural Language Generation

Overview

BEAMetrics: Benchmark to Evaluate Automatic Metrics in Natural Language Generation

Installing The Dependencies

$ conda create --name beametrics python>=3.8
$ conda activate beametrics

WARNING: You need to install, before any package, correct version of pytorch linked to your cuda version.

(beametrics) $ conda install pytorch cudatoolkit=10.1 -c pytorch

Install BEAMetrics:

(beametrics) $ cd BEAMetrics
(beametrics) $ pip install -e .

Install Nubia metric (not on PyPI, 16/08/2021):

(beametrics) git clone https://github.com/wl-research/nubia.git
(beametrics) pip install -r requirements.txt

Alternatively, you can remove nubia from _DEFAULT_METRIC_NAMES in metrics.metric_reporter.

Reproducing the results

First you need to get the processed files, which include the metric scores. You can do that either by simply downloading the processed data (see Section Download Data), or by re-computing the scores (see Section Compute Correlations).

Then, the first bloc in the notebook visualize.ipynb allows to get all the tables from the paper (and also to generate the latex code in data/correlation).

Download the data

All the dataset can be downloaded from this zip file. It needs to be unzipped into the path data before running the correlations.

unzip data.zip

The data folder contains:

  • a subfolder raw containing all the original dataset
  • a subfolder processed containing all the dataset processed in a unified format
  • a subfolder correlation containing all the final correlation results, and the main tables of the paper
  • a subfolder datacards containing all the data cards

Computing the correlations

Processing the files to a clean json with the metrics computed:

python beametrics/run_all.py

The optional argument --dataset allows to compute only on a specific dataset, e.g.:

python run_all.py --dataset SummarizationCNNDM.

The list of the datasets and their corresponding configuration can be found in configs/__init__.

When finished, you can print the final table as in the paper, see the notebook visualize.ipynb.

Data Cards:

For each dataset, a data card is available in the datacard folder. The cards are automatically generated when running run_all.py, by filling the template with the dataset configuration as detailed bellow, in Adding a new dataset.

Adding a new dataset:

In configs/, you need to create a new .py file that inherites from ConfigBase (in configs/co'nfig_base.py). You are expected to fill the mandatory fields that allow to run the code and fill the data card template:

  • file_name: the file name located in data/raw
  • file_name_processed: the file name once processed and formated
  • metric_names: you can pass _DEFAULT_METRIC_NAMES by default or customize it, e.g. metric_names = metric_names + ('sari',) where sari corresponds to a valid metric (see the next section)
  • name_dataset: the name of the dataset as it was published
  • short_name_dataset: few letters that will be used to name the dataset in the final table report
  • languages: the languages of the dataset (e.g. [en] or [en, fr])
  • task: e.g. 'simplification', 'data2text
  • number_examples: the total number of evaluated texts
  • nb_refs: the number of references available in the dataset
  • dimensions_definitions: the evaluated dimensions and their corresponding definition e.g. {'fluency: 'How fluent is the text?'}
  • scale: the scale used during the evaluation, as defined in the protocol
  • source_eval_sets: the dataset from which the source were collected to generate the evaluated examples
  • annotators: some information about who were the annotators
  • sampled_from: the URL where was released the evaluation dataset
  • citation: the citation of the paper where the dataset was released

Your class needs its custom method format_file. The function takes as input the dataset's file_name and return a dictionary d_data. The format for d_data has to be the same for all the datasets:

d_data = {
    key_1: {
        'source': "a_source", 
        'hypothesis': "an_hypothesis",
        'references': ["ref_1", "ref_2", ...],
        'dim_1': float(a_score),
        'dim_2': float(an_other_score),
    },
    ...
    key_n: {
        ...
    }
}

where 'key_1' and 'key_n' are the keys for the first and nth example, dim_1 and dim_2 dimensions corresponding to self.dimensions.

Finally, you need to add your dataset to the dictionary D_ALL_DATASETS located in config/__init__.

Adding a new metric:

First, create a class inheriting from metrics/metrics/MetricBase. Then, simply add it to the dictionary _D_METRICS in metrics/__init__.

For the metric to be computed by default, its name has to be added to either

  • _DEFAULT_METRIC_NAMES: metrics computed on each dataset
  • _DEFAULT_METRIC_NAMES_SRC: metrics computed on dataset that have a text format for their source (are excluded for now image captioning and data2text). These two tuples are located in metrics/metric_reported.

Alternatively, you can add the metric to a specific configuration by adding it to the attribute metric_names in the config.

Implementation of the Paper: "Parameterized Hypercomplex Graph Neural Networks for Graph Classification" by Tuan Le, Marco Bertolini, Frank Noé and Djork-Arné Clevert

Parameterized Hypercomplex Graph Neural Networks (PHC-GNNs) PHC-GNNs (Le et al., 2021): https://arxiv.org/abs/2103.16584 PHM Linear Layer Illustration

Bayer AG 26 Aug 11, 2022
VGGVox models for Speaker Identification and Verification trained on the VoxCeleb (1 & 2) datasets

VGGVox models for speaker identification and verification This directory contains code to import and evaluate the speaker identification and verificat

338 Dec 27, 2022
YOLOX-RMPOLY

本算法为适应robomaster比赛,而改动自矩形识别的yolox算法。 基于旷视科技YOLOX,实现对不规则四边形的目标检测 TODO 修改onnx推理模型 更改/添加标注: 1.yolox/models/yolox_polyhead.py: 1.1继承yolox/models/yolo_

3 Feb 25, 2022
Cascaded Pyramid Network (CPN) based on Keras (Tensorflow backend)

ML2 Takehome Project Reimplementing the paper: Cascaded Pyramid Network for Multi-Person Pose Estimation Dataset The model uses the COCO dataset which

Vo Van Tu 1 Nov 22, 2021
STMTrack: Template-free Visual Tracking with Space-time Memory Networks

STMTrack This is the official implementation of the paper: STMTrack: Template-free Visual Tracking with Space-time Memory Networks. Setup Prepare Anac

Zhihong Fu 62 Dec 21, 2022
Run containerized, rootless applications with podman

Why? restrict scope of file system access run any application without root privileges creates usable "Desktop applications" to integrate into your nor

119 Dec 27, 2022
Real-Time Semantic Segmentation in Mobile device

Real-Time Semantic Segmentation in Mobile device This project is an example project of semantic segmentation for mobile real-time app. The architectur

708 Jan 01, 2023
Reinforcement learning library(framework) designed for PyTorch, implements DQN, DDPG, A2C, PPO, SAC, MADDPG, A3C, APEX, IMPALA ...

Automatic, Readable, Reusable, Extendable Machin is a reinforcement library designed for pytorch. Build status Platform Status Linux Windows Supported

Iffi 348 Dec 24, 2022
Rotation-Only Bundle Adjustment

ROBA: Rotation-Only Bundle Adjustment Paper, Video, Poster, Presentation, Supplementary Material In this repository, we provide the implementation of

Seong 51 Nov 29, 2022
FedML: A Research Library and Benchmark for Federated Machine Learning

FedML: A Research Library and Benchmark for Federated Machine Learning 📄 https://arxiv.org/abs/2007.13518 News 2021-02-01 (Award): #NeurIPS 2020# Fed

FedML-AI 2.3k Jan 08, 2023
Automatic library of congress classification, using word embeddings from book titles and synopses.

Automatic Library of Congress Classification The Library of Congress Classification (LCC) is a comprehensive classification system that was first deve

Ahmad Pourihosseini 3 Oct 01, 2022
Original Implementation of Prompt Tuning from Lester, et al, 2021

Prompt Tuning This is the code to reproduce the experiments from the EMNLP 2021 paper "The Power of Scale for Parameter-Efficient Prompt Tuning" (Lest

Google Research 282 Dec 28, 2022
OneFlow is a performance-centered and open-source deep learning framework.

OneFlow OneFlow is a performance-centered and open-source deep learning framework. Latest News Version 0.5.0 is out! First class support for eager exe

OneFlow 4.2k Jan 07, 2023
Invertible conditional GANs for image editing

Invertible Conditional GANs This is the implementation of the IcGAN model proposed in our paper: Invertible Conditional GANs for image editing. Novemb

Guim 278 Dec 12, 2022
Fast and simple implementation of RL algorithms, designed to run fully on GPU.

RSL RL Fast and simple implementation of RL algorithms, designed to run fully on GPU. This code is an evolution of rl-pytorch provided with NVIDIA's I

Robotic Systems Lab - Legged Robotics at ETH Zürich 68 Dec 29, 2022
A tensorflow model that predicts if the image is of a cat or of a dog.

Quick intro Hello and thank you for your interest in my project! This is the backend part of a two-repo application. The other part can be found here

Tudor Matei 0 Mar 08, 2022
i-RevNet Pytorch Code

i-RevNet: Deep Invertible Networks Pytorch implementation of i-RevNets. i-RevNets define a family of fully invertible deep networks, built from a succ

Jörn Jacobsen 378 Dec 06, 2022
This is the source code for the experiments related to the paper Unsupervised Audio Source Separation Using Differentiable Parametric Source Models

Unsupervised Audio Source Separation Using Differentiable Parametric Source Models This is the source code for the experiments related to the paper Un

30 Oct 19, 2022
Adversarial Graph Augmentation to Improve Graph Contrastive Learning

ADGCL : Adversarial Graph Augmentation to Improve Graph Contrastive Learning Introduction This repo contains the Pytorch [1] implementation of Adversa

susheel suresh 62 Nov 19, 2022
Generalized and Efficient Blackbox Optimization System.

OpenBox Doc | OpenBox中文文档 OpenBox: Generalized and Efficient Blackbox Optimization System OpenBox is an efficient and generalized blackbox optimizatio

DAIR Lab 238 Dec 29, 2022