InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective

Last update: Nov 25, 2022

Overview

InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective

This is the official code base for our ICLR 2021 paper:

"InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective".

Boxin Wang, Shuohang Wang, Yu Cheng, Zhe Gan, Ruoxi Jia, Bo Li, Jingjing Liu

Usage

Prepare your environment

Download required packages

pip install -r requirements.txt

ANLI and TextFooler

To run ANLI and TextFooler experiments, refer to README in the ANLI directory.

SQuAD

We will upload the code for the SQuAD experiments soon.

Citation

@inproceedings{
wang2021infobert,
title={InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective},
author={Wang, Boxin and Wang, Shuohang and Cheng, Yu and Gan, Zhe and Jia, Ruoxi and Li, Bo and Liu, Jingjing},
booktitle={International Conference on Learning Representations},
year={2021}}

Owner

AI Secure

UIUC Secure Learning Lab

GitHub Repository

Checking spelling of form elements

Checking spelling of form elements. You can check the source files of external workflows/reports and configuration files

15 Sep 12, 2022

This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is regular-expression based, extensible, and advanced tokeniser written in C++ (http://ilk.uvt.nl/ucto).

Ucto for Python This is a Python binding to the tokeniser Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task,

27 Dec 14, 2022

In this repository, I have developed an end to end Automatic speech recognition project. I have developed the neural network model for automatic speech recognition with PyTorch and used MLflow to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry.

End to End Automatic Speech Recognition In this repository, I have developed an end to end Automatic speech recognition project. I have developed the

22 Nov 13, 2022

Official PyTorch implementation of SegFormer

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers Figure 1: Performance of SegFormer-B0 to SegFormer-B5. Project page

1.4k Dec 29, 2022

Sentiment-Analysis and EDA on the IMDB Movie Review Dataset

Sentiment-Analysis and EDA on the IMDB Movie Review Dataset The main part of the work focuses on the exploration and study of different approaches whi

1 Jan 12, 2022

🚀Clone a voice in 5 seconds to generate arbitrary speech in real-time

English | 中文 Features 🌍 Chinese supported mandarin and tested with multiple datasets: aidatatang_200zh, magicdata, aishell3, data_aishell, and etc. ?

25.6k Dec 31, 2022

An Open-Source Package for Neural Relation Extraction (NRE)

OpenNRE We have a DEMO website (http://opennre.thunlp.ai/). Try it out! OpenNRE is an open-source and extensible toolkit that provides a unified frame

3.9k Jan 03, 2023

A curated list of efficient attention modules

awesome-fast-attention A curated list of efficient attention modules

891 Dec 22, 2022

This is the Alpha of Nutte language, she is not complete yet / Essa é a Alpha da Nutte language, não está completa ainda

nutte-language This is the Alpha of Nutte language, it is not complete yet / Essa é a Alpha da Nutte language, não está completa ainda My language was

2 Dec 18, 2021

Diaformer: Automatic Diagnosis via Symptoms Sequence Generation

Diaformer Diaformer: Automatic Diagnosis via Symptoms Sequence Generation (AAAI 2022) Diaformer is an efficient model for automatic diagnosis via symp

20 Dec 13, 2022

ACL'22: Structured Pruning Learns Compact and Accurate Models

☕ CoFiPruning: Structured Pruning Learns Compact and Accurate Models This repository contains the code and pruned models for our ACL'22 paper Structur

130 Jan 04, 2023

ConferencingSpeech2022; Non-intrusive Objective Speech Quality Assessment (NISQA) Challenge

ConferencingSpeech 2022 challenge This repository contains the datasets list and scripts required for the ConferencingSpeech 2022 challenge. For more

21 Dec 02, 2022

Opal-lang - A WIP programming language based on Python

thanks to aphitorite for the beautiful logo! opal opal is a WIP transcompiled pr

3 Nov 04, 2022

Automatic privilege escalation for misconfigured capabilities, sudo and suid binaries

GTFONow Automatic privilege escalation for misconfigured capabilities, sudo and suid binaries. Features Automatically escalate privileges using miscon

101 Jan 03, 2023

Snips Python library to extract meaning from text

Snips NLU Snips NLU (Natural Language Understanding) is a Python library that allows to extract structured information from sentences written in natur

3.7k Dec 30, 2022

NVDA, the free and open source Screen Reader for Microsoft Windows

NVDA NVDA (NonVisual Desktop Access) is a free, open source screen reader for Microsoft Windows. It is developed by NV Access in collaboration with a

1.6k Jan 07, 2023

Mkdocs + material + cool stuff

Modern-Python-Doc-Example mkdocs + material + cool stuff Doc is live here Features out of the box amazing good looking website thanks to mkdocs.org an

61 Oct 26, 2022

Unlimited Call - Text Bombing Tool

FastBomber Unlimited Call - Text Bombing Tool Installation On Termux

6 Nov 10, 2022

🦆 Contextually-keyed word vectors

sense2vec: Contextually-keyed word vectors sense2vec (Trask et. al, 2015) is a nice twist on word2vec that lets you learn more interesting and detaile

1.5k Dec 25, 2022

TextFlint is a multilingual robustness evaluation platform for natural language processing tasks,

TextFlint is a multilingual robustness evaluation platform for natural language processing tasks, which unifies general text transformation, task-specific transformation, adversarial attack, sub-popu

587 Dec 20, 2022

InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective

Related tags

Overview

InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective

Usage

Prepare your environment

ANLI and TextFooler

SQuAD

Citation

Owner

AI Secure

Checking spelling of form elements

Official PyTorch implementation of SegFormer

Sentiment-Analysis and EDA on the IMDB Movie Review Dataset

🚀Clone a voice in 5 seconds to generate arbitrary speech in real-time

An Open-Source Package for Neural Relation Extraction (NRE)

A curated list of efficient attention modules

This is the Alpha of Nutte language, she is not complete yet / Essa é a Alpha da Nutte language, não está completa ainda

Diaformer: Automatic Diagnosis via Symptoms Sequence Generation

ACL'22: Structured Pruning Learns Compact and Accurate Models

ConferencingSpeech2022; Non-intrusive Objective Speech Quality Assessment (NISQA) Challenge

Opal-lang - A WIP programming language based on Python

Automatic privilege escalation for misconfigured capabilities, sudo and suid binaries

Snips Python library to extract meaning from text

NVDA, the free and open source Screen Reader for Microsoft Windows

Mkdocs + material + cool stuff

Unlimited Call - Text Bombing Tool

🦆 Contextually-keyed word vectors

TextFlint is a multilingual robustness evaluation platform for natural language processing tasks,