NL-Augmenter 🦎 → 🐍 A Collaborative Repository of Natural Language Transformations

Overview

Checks Forks Issues Pull requests Contributors License

NL-Augmenter 🦎 🐍

The NL-Augmenter is a collaborative effort intended to add transformations of datasets dealing with natural language. Transformations augment text datasets in diverse ways, including: randomizing names and numbers, changing style/syntax, paraphrasing, KB-based paraphrasing ... and whatever creative augmentation you contribute. We invite submissions of transformations to this framework by way of GitHub pull request, through August 31, 2021. All submitters of accepted transformations (and filters) will be included as co-authors on a paper announcing this framework.

The framework organizers can be contacted at [email protected].

Submission timeline

Due date Description
A̶u̶g̶u̶s̶t̶ 3̶1̶, 2̶0̶2̶1̶ P̶u̶l̶l̶ r̶e̶q̶u̶e̶s̶t̶ m̶u̶s̶t̶ b̶e̶ o̶p̶e̶n̶e̶d̶ t̶o̶ b̶e̶ e̶l̶i̶g̶i̶b̶l̶e̶ f̶o̶r̶ i̶n̶c̶l̶u̶s̶i̶o̶n̶ i̶n̶ t̶h̶e̶ f̶r̶a̶m̶e̶w̶o̶r̶k̶ a̶n̶d̶ a̶s̶s̶o̶c̶i̶a̶t̶e̶d̶ p̶a̶p̶e̶r̶
September 2̶2̶, 30 2021 Review process for pull request above must be complete

A transformation can be revised between the pull request submission and pull request merge deadlines. We will provide reviewer feedback to help with the revisions.

The transformations which are already accepted to NL-Augmenter are summarized in the transformations folder. Transformations undergoing review can be seen as pull requests.

Table of contents

Colab notebook

Open In Colab To quickly see transformations and filters in action, run through our colab notebook.

Some Ideas for Transformations

If you need inspiration for what transformations to implement, check out https://github.com/GEM-benchmark/NL-Augmenter/issues/75, where some ideas and previous papers are discussed. So far, contributions have focused on morphological inflections, character level changes, and random noise. The best new pull requests will be dissimilar from these existing contributions.

Installation

Requirements

  • Python 3.7

Instructions

# When creating a new transformation, replace this with your forked repository (see below)
git clone https://github.com/GEM-benchmark/NL-Augmenter.git
cd NL-Augmenter
python setup.py sdist
pip install -e .
pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.0.0/en_core_web_sm-3.0.0.tar.gz

How do I create a transformation?

Setup

First, fork the repository in GitHub! 🍴

fork button

Your fork will have its own location, which we will call PATH_TO_YOUR_FORK. Next, clone the forked repository and create a branch for your transformation, which here we will call my_awesome_transformation:

git clone $PATH_TO_YOUR_FORK
cd NL-Augmenter
git checkout -b my_awesome_transformation

We will base our transformation on an existing example. Create a new transformation directory by copying over an existing transformation. You can choose to copy from other transformation directories depending on the task you wish to create a transformation for. Check some of the existing pull requests and merged transformations first to avoid duplicating efforts or creating transformations too similar to previous ones.

cd transformations/
cp -r butter_fingers_perturbation my_awesome_transformation
cd my_awesome_transformation

Creating a transformation

  1. In the file transformation.py, rename the class ButterFingersPerturbation to MyAwesomeTransformation and choose one of the interfaces from the interfaces/ folder. See the full list of options here.
  2. Now put all your creativity in implementing the generate method. If you intend to use external libraries, add them with their version numbers in requirements.txt
  3. Update my_awesome_transformation/README.md to describe your transformation.

Testing and evaluating (Optional)

Once you are done, add at least 5 example pairs as test cases in the file test.json so that no one breaks your code inadvertently.

Once the transformation is ready, test it:

pytest -s --t=my_awesome_transformation

If you would like to evaluate your transformation against a common 🤗 HuggingFace model, we encourage you to check evaluation

Code Styling To standardized the code we use the black code formatter which will run at the time of pre-commit. To use the pre-commit hook, install pre-commit with pip install pre-commit (should already be installed if you followed the above instructions). Then run pre-commit install to install the hook. On future commits, you should see the black code formatter is run on all python files you've staged for commit.

Submitting

Once the tests pass and you are happy with the transformation, submit them for review. First, commit and push your changes:

git add transformations/my_awesome_transformation/*
git commit -m "Added my_awesome_transformation"
git push --set-upstream origin my_awesome_transformation

Finally, submit a pull request. The last git push command prints a URL that can be copied into a browser to initiate such a pull request. Alternatively, you can do so from the GitHub website.

pull request button

Congratulations, you've submitted a transformation to NL-Augmenter!

How do I create a filter?

We also accept pull-requests for creating filters which identify interesting subpopulations of a dataset. The process to add a new filter is just the same as above. All filter implementations require implementing .filter instead of .generate and need to be placed in the filters folder. So, just the way transformations can transform examples of text, filters can identify whether an example follows some pattern of text! The only difference is that while transformations return another example of the same input format, filters simply return True or False! For step-by-step instructions, follow these steps.

BIG-Bench 🪑

If you are interested in NL-Augmenter, you may also be interested in the BIG-bench large scale collaborative benchmark for language models.

Most Creative Implementations 🏆

After all pull-requests have been merged, 3 of the most creative implementations would be selected and featured on this README page and on the NL-Augmenter webpage.

License

Some transformations include components released under a different (permissive, open source) license. For license details, refer to the README.md and any license files in the transformations's or filter's directory.

[CoRL 21'] TANDEM: Tracking and Dense Mapping in Real-time using Deep Multi-view Stereo

TANDEM: Tracking and Dense Mapping in Real-time using Deep Multi-view Stereo Lukas Koestler1*    Nan Yang1,2*,†    Niclas Zeller2,3    Daniel Cremers1

TUM Computer Vision Group 744 Jan 04, 2023
Modeling Temporal Concept Receptive Field Dynamically for Untrimmed Video Analysis

Modeling Temporal Concept Receptive Field Dynamically for Untrimmed Video Analysis This is a PyTorch implementation of the model described in our pape

qzhb 6 Jul 08, 2021
Global Pooling, More than Meets the Eye: Position Information is Encoded Channel-Wise in CNNs, ICCV 2021

Global Pooling, More than Meets the Eye: Position Information is Encoded Channel-Wise in CNNs, ICCV 2021 Global Pooling, More than Meets the Eye: Posi

Md Amirul Islam 32 Apr 24, 2022
GNN-based Recommendation Benchmark

GRecX A Fair Benchmark for GNN-based Recommendation Homepage and Documentation Homepage: Documentation: Paper: GRecX: An Efficient and Unified Benchma

73 Oct 17, 2022
Python interface for the DIGIT tactile sensor

DIGIT-INTERFACE Python interface for the DIGIT tactile sensor. For updates and discussions please join the #DIGIT channel at the www.touch-sensing.org

Facebook Research 35 Dec 22, 2022
Python Classes: Medical Insurance Project using Object Oriented Programming Concepts

Medical-Insurance-Project-OOP Python Classes: Medical Insurance Project using Object Oriented Programming Concepts Classes are an incredibly useful pr

Hugo B. 0 Feb 04, 2022
Urban mobility simulations with Python3, RLlib (Deep Reinforcement Learning) and Mesa (Agent-based modeling)

Deep Reinforcement Learning for Smart Cities Documentation RLlib: https://docs.ray.io/en/master/rllib.html Mesa: https://mesa.readthedocs.io/en/stable

1 May 15, 2022
Code to accompany our paper "Continual Learning Through Synaptic Intelligence" ICML 2017

Continual Learning Through Synaptic Intelligence This repository contains code to reproduce the key findings of our path integral approach to prevent

Ganguli Lab 82 Nov 03, 2022
Provide baselines and evaluation metrics of the task: traffic flow prediction

Note: This repo is adpoted from https://github.com/UNIMIBInside/Smart-Mobility-Prediction. Due to technical reasons, I did not fork their code. Introd

Zhangzhi Peng 11 Nov 02, 2022
Official Repo for ICCV2021 Paper: Learning to Regress Bodies from Images using Differentiable Semantic Rendering

[ICCV2021] Learning to Regress Bodies from Images using Differentiable Semantic Rendering Getting Started DSR has been implemented and tested on Ubunt

Sai Kumar Dwivedi 83 Nov 27, 2022
Implementation of Kalman Filter in Python

Kalman Filter in Python This is a basic example of how Kalman filter works in Python. I do plan on refactoring and expanding this repo in the future.

Enoch Kan 35 Sep 11, 2022
Fast RFC3339 compliant Python date-time library

udatetime: Fast RFC3339 compliant date-time library Handling date-times is a painful act because of the sheer endless amount of formats used by people

Simon Pirschel 235 Oct 25, 2022
Build a medical knowledge graph based on Unified Language Medical System (UMLS)

UMLS-Graph Build a medical knowledge graph based on Unified Language Medical System (UMLS) Requisite Install MySQL Server 5.6 and import UMLS data int

Donghua Chen 6 Dec 25, 2022
Official Implementation of "Designing an Encoder for StyleGAN Image Manipulation"

Designing an Encoder for StyleGAN Image Manipulation (SIGGRAPH 2021) Recently, there has been a surge of diverse methods for performing image editing

749 Jan 09, 2023
Simple implementation of OpenAI CLIP model in PyTorch.

It was in January of 2021 that OpenAI announced two new models: DALL-E and CLIP, both multi-modality models connecting texts and images in some way. In this article we are going to implement CLIP mod

Moein Shariatnia 226 Jan 05, 2023
PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud, CVPR 2019.

PointRCNN PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud Code release for the paper PointRCNN:3D Object Proposal Generation a

Shaoshuai Shi 1.5k Dec 27, 2022
A very short and easy implementation of Quantile Regression DQN

Quantile Regression DQN Quantile Regression DQN a Minimal Working Example, Distributional Reinforcement Learning with Quantile Regression (https://arx

Arsenii Senya Ashukha 80 Sep 17, 2022
Gym environment for FLIPIT: The Game of "Stealthy Takeover"

gym-flipit Gym environment for FLIPIT: The Game of "Stealthy Takeover" invented by Marten van Dijk, Ari Juels, Alina Oprea, and Ronald L. Rivest. Desi

Lisa Oakley 2 Dec 15, 2021
Python inverse kinematics for your robot model based on Pinocchio.

Python inverse kinematics for your robot model based on Pinocchio.

Stéphane Caron 50 Dec 22, 2022
The code of paper "Block Modeling-Guided Graph Convolutional Neural Networks".

Block Modeling-Guided Graph Convolutional Neural Networks This repository contains the demo code of the paper: Block Modeling-Guided Graph Convolution

22 Dec 08, 2022