Tracking Progress in Natural Language Processing

Overview

Tracking Progress in Natural Language Processing

Table of contents

English

Vietnamese

Hindi

Chinese

For more tasks, datasets and results in Chinese, check out the Chinese NLP website.

French

Russian

Spanish

Portuguese

Korean

Nepali

Bengali

Persian

Turkish

German

This document aims to track the progress in Natural Language Processing (NLP) and give an overview of the state-of-the-art (SOTA) across the most common NLP tasks and their corresponding datasets.

It aims to cover both traditional and core NLP tasks such as dependency parsing and part-of-speech tagging as well as more recent ones such as reading comprehension and natural language inference. The main objective is to provide the reader with a quick overview of benchmark datasets and the state-of-the-art for their task of interest, which serves as a stepping stone for further research. To this end, if there is a place where results for a task are already published and regularly maintained, such as a public leaderboard, the reader will be pointed there.

If you want to find this document again in the future, just go to nlpprogress.com or nlpsota.com in your browser.

Contributing

Guidelines

Results   Results reported in published papers are preferred; an exception may be made for influential preprints.

Datasets   Datasets should have been used for evaluation in at least one published paper besides the one that introduced the dataset.

Code   We recommend to add a link to an implementation if available. You can add a Code column (see below) to the table if it does not exist. In the Code column, indicate an official implementation with Official. If an unofficial implementation is available, use Link (see below). If no implementation is available, you can leave the cell empty.

Adding a new result

If you would like to add a new result, you can just click on the small edit button in the top-right corner of the file for the respective task (see below).

Click on the edit button to add a file

This allows you to edit the file in Markdown. Simply add a row to the corresponding table in the same format. Make sure that the table stays sorted (with the best result on top). After you've made your change, make sure that the table still looks ok by clicking on the "Preview changes" tab at the top of the page. If everything looks good, go to the bottom of the page, where you see the below form.

Fill out the file change information

Add a name for your proposed change, an optional description, indicate that you would like to "Create a new branch for this commit and start a pull request", and click on "Propose file change".

Adding a new dataset or task

For adding a new dataset or task, you can also follow the steps above. Alternatively, you can fork the repository. In both cases, follow the steps below:

  1. If your task is completely new, create a new file and link to it in the table of contents above.
  2. If not, add your task or dataset to the respective section of the corresponding file (in alphabetical order).
  3. Briefly describe the dataset/task and include relevant references.
  4. Describe the evaluation setting and evaluation metric.
  5. Show how an annotated example of the dataset/task looks like.
  6. Add a download link if available.
  7. Copy the below table and fill in at least two results (including the state-of-the-art) for your dataset/task (change Score to the metric of your dataset). If your dataset/task has multiple metrics, add them to the right of Score.
  8. Submit your change as a pull request.
Model Score Paper / Source Code

Wish list

These are tasks and datasets that are still missing:

  • Bilingual dictionary induction
  • Discourse parsing
  • Keyphrase extraction
  • Knowledge base population (KBP)
  • More dialogue tasks
  • Semi-supervised learning
  • Frame-semantic parsing (FrameNet full-sentence analysis)

Exporting into a structured format

You can extract all the data into a structured, machine-readable JSON format with parsed tasks, descriptions and SOTA tables.

The instructions are in structured/README.md.

Instructions for building the site locally

Instructions for building the website locally using Jekyll can be found here.

Comments
  • Conll-2003 uncomparable results

    Conll-2003 uncomparable results

    Because of the small size the training set of Conll-2003, some authors incorporated the development set as a part of training data after tuning the hyper-parameters. Consequently, not all results are directly comparable.

    Train+dev:

    Flair embeddings (Akbik et al., 2018) Peters et al. (2017) Yang et al. (2017)

    Maybe those results should be marked by an asterisk

    opened by ghaddarAbs 28
  • NLP Progress Graph

    NLP Progress Graph

    Hi Sebastian, loved your idea for this repo. I was thinking if we can have a graph, something like this

    showing progress of different tasks in NLP based on the updates to their markdown file. I have created a shell script which clones your repo into my local, counts the no of commit for different files and using python/pandas preprocess the result and create a bar chart out of it and uploads it to a free image uploading service.

    Currently, it shows count of all the commit for a specific file but if we can have a guideline for adding new results, fixing errors .. Maybe different identifiers

    Then we can count the no of times, a new result has been added to an NLP task. This can help in visualizing the NLP areas of most active/Improving research.

    Currently, the graph doesn't make much sense but over the time it will improve as we update with more results.

    Also, If you think something like this can benefit the community, i can create a cron job on my pc(i don't have a server) which will update the image url with the latest graph which you can show on the main page.

    opened by nirmalsinghania2008 16
  • YAML - pros and cons

    YAML - pros and cons

    I'd like to discuss here the pros and cons of using YAML going forward or whether we should stick with Markdown tables. Here are some pros and cons, mainly from @NirantK (in https://github.com/sebastianruder/NLP-progress/pull/116), @stared (in https://github.com/sebastianruder/NLP-progress/issues/43, https://github.com/sebastianruder/NLP-progress/pull/64) and myself.

    Pros:

    • Easier trend spotting in performance improvements
    • Easy to create plots and visualizations going forward
    • Data is separated from presentation

    Cons:

    • Hard for contributors, e.g. HTML omissions can't be spotted without setting up Jekyll locally
    • Github Repo becomes useless for readers, relying exclusively on nlpprogress.com
    • Many visualizations (e.g. bar charts) based on performance numbers are not more useful than the raw tables

    Other opinions are welcome.

    opened by sebastianruder 10
  • What about other languages?

    What about other languages?

    Thanks for this work!

    These pages seem to cover the progress only for English (well, except MT). Do you have plans to include other languages?

    One extreme example is POS tagging and dependency parsing. UD has 60+ languages :) For others, there should be very limited data

    opened by Hrant-Khachatrian 10
  • Incorrect BLEU score for English-Hindi MT System

    Incorrect BLEU score for English-Hindi MT System

    The BLEU score written in the Document is 89.35 which looks wrong to me. The referred paper mentions a BLEU score of 12.83 which itself is not state-of-the-art for the language pair.

    opened by kartikeypant 7
  • add G2P conversion task of schwa deletion to Hindi

    add G2P conversion task of schwa deletion to Hindi

    There's been a good body of previous work on schwa deletion in NLP/CL, you can see some of it in our paper. It'll be good to keep track of the SOTA on it since it's an important task for G2P conversion in North Indian languages.

    opened by aryamanarora 6
  • Added new task: data-to-text generation

    Added new task: data-to-text generation

    I have added a new task of Data-to-Text Natural Language Generation (D2T NLG). D2T NLG differs from other NLG tasks such as MT or QA in a way that the input to text generation system is a structured representation (table, knowledge graph, or JSON) instead of unstructured text. This document provides an overview of three most recent and popular datasets available publicly for D2T NLG. With the advancements in deep learning - several novel neural methods are being proposed that are capable of generating accurate, fluent and diverse texts.

    opened by ashishu007 6
  • Explain relation to paperswithcode.com

    Explain relation to paperswithcode.com

    Since the inception of this great repository of state-of-the-art results, alternatives such as paperswithcode.com have gained traction. This raises the question of the usefulness of keeping both resources up to date with the latest results. Could users and maintainers of this repository perhaps elaborate a bit, here and/or the README, how they see this resource relating to paperswithcode.com and particularly what nlpprogress.com does well that the former does not?

    opened by cwenner 6
  • add TCAN results to LM

    add TCAN results to LM

    To be honest, I'm a bit skeptical about their results and asked them some questions via email. So let's put a hold on this pull request for now (unless the maintainers think it's fine) and I will update it when they answered my questions.

    opened by Separius 6
  • Add missing LM SOTA result + # params + prev SOTA

    Add missing LM SOTA result + # params + prev SOTA

    Add missing LM ensemble which is SOTA for PTB. Add second-in-line LM SOTA for strict interpretation. Add number of params for LM results.

    (unsure why it lists commits that have already been merged)

    opened by cwenner 6
  • Data in YAML for structure and plots

    Data in YAML for structure and plots

    Related to #43.

    Right now did some demo for CCG. I didn't work on the plot form, just wanted to show it is possible and easy. Also - I think that data form can be standarized - so it would be simpler to add more complicated things (e.g. further comments, links to multiple implementations, etc).

    See files in:

    • _data - data in YAML format
    • _includes - for ways of converting data into its presentations (tables, charts, etc)
    • ccg_supertagging.md to see how to include these

    IMHO YAML is cleaner for writing and reading than markdown tables, so it is an advantage on its own. From my experience contributors (ones who use GitHub) have no slightest problem in using YAML (vide https://p.migdal.pl/interactive-machine-learning-list/).

    Right now I generate data through Liquid template.

    opened by stared 6
  • Pull request with new emotion detection dataset

    Pull request with new emotion detection dataset

    There seems to be some conflicts, therefore I am not resolving it as it might remove some code. So could you be kind to resolve them and merge my request?

    opened by KhondokerIslam 0
  • Update paraphrase-generation.md

    Update paraphrase-generation.md

    MULTIPIT, MULTIPITCROWD and MULTIPITEXPERT

    Past efforts on creating paraphrase corpora only consider one paraphrase criteria without taking into account the fact that the desired “strictness” of semantic equivalence in paraphrases varies from task to task (Bhagat and Hovy, 2013; Liu and Soh, 2022). For example, for the purpose of tracking unfolding events, “A tsunami hit Haiti.” and “303 people died because of the tsunami in Haiti” are sufficiently close to be considered as paraphrases; whereas for paraphrase generation, the extra information “303 people dead” in the latter sentence may lead models to learn to hallucinate and generate more unfaithful content. In this paper, the authors present an effective data collection and annotation method to address these issues.

    MULTIPIT is a topic Paraphrase in Twitter corpus that consists of a total of 130k sentence pairs with crowdsoursing (MULTIPITCROWD ) and expert (MULTIPITEXPERT ) annotations. MULTIPITCROWD is a large crowdsourced set of 125K sentence pairs that is useful for tracking information onTwitter. | Model | F1 | Paper / Source | Code | | ------------- | :-----:| --- | --- | | DeBERTaV3large | 92.00 |Improving Large-scale Paraphrase Acquisition and Generation| Unavailable|

    MULTIPITEXPERT is an expert annotated set of 5.5K sentence pairs using a stricter definition that is more suitable for acquiring paraphrases for generation purpose. | Model | F1 | Paper / Source | Code | | ------------- | :-----:| --- | --- | | DeBERTaV3large | 83.20 |Improving Large-scale Paraphrase Acquisition and Generation| Unavailable|

    opened by adrienpayong 0
  • add this to machine translation,. Is it okay?

    add this to machine translation,. Is it okay?

    opened by adrienpayong 0
Releases(v0.3)
Owner
Sebastian Ruder
Research Scientist @DeepMind
Sebastian Ruder
A raytrace framework using taichi language

ti-raytrace The code use Taichi programming language Current implement acceleration lvbh disney brdf How to run First config your anaconda workspace,

蕉太狼 73 Dec 11, 2022
BERT-based Financial Question Answering System

BERT-based Financial Question Answering System In this example, we use Jina, PyTorch, and Hugging Face transformers to build a production-ready BERT-b

Bithiah Yuan 61 Sep 18, 2022
Twewy-discord-chatbot - Build a Discord AI Chatbot that Speaks like Your Favorite Character

Build a Discord AI Chatbot that Speaks like Your Favorite Character! This is a Discord AI Chatbot that uses the Microsoft DialoGPT conversational mode

Lynn Zheng 231 Dec 30, 2022
WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

Google Research Datasets 740 Dec 24, 2022
Watson Natural Language Understanding and Knowledge Studio

Material de demonstração dos serviços: Watson Natural Language Understanding e Knowledge Studio Visão Geral: https://www.ibm.com/br-pt/cloud/watson-na

Vanderlei Munhoz 4 Oct 24, 2021
Différents programmes créant une interface graphique a l'aide de Tkinter pour simplifier la vie des étudiants.

GP211-Grand-Projet Ce repertoire contient tout les programmes nécessaires au bon fonctionnement de notre projet-logiciel. Cette interface graphique es

1 Dec 21, 2021
GAP-text2SQL: Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training

GAP-text2SQL: Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training Code and model from our AAAI 2021 paper

Amazon Web Services - Labs 83 Jan 09, 2023
Large-scale Knowledge Graph Construction with Prompting

Large-scale Knowledge Graph Construction with Prompting across tasks (predictive and generative), and modalities (language, image, vision + language, etc.)

ZJUNLP 161 Dec 28, 2022
apple's universal binaries BUT MUCH WORSE (PRACTICAL SHITPOST) (NOT PRODUCTION READY)

hyperuniversality investment opportunity: what if we could run multiple architectures in a single file, again apple universal binaries, but worse how

luna 2 Oct 19, 2021
A collection of Korean Text Datasets ready to use using Tensorflow-Datasets.

tfds-korean A collection of Korean Text Datasets ready to use using Tensorflow-Datasets. TensorFlow-Datasets를 이용한 한국어/한글 데이터셋 모음입니다. Dataset Catalog |

Jeong Ukjae 20 Jul 11, 2022
Big Bird: Transformers for Longer Sequences

BigBird, is a sparse-attention based transformer which extends Transformer based models, such as BERT to much longer sequences. Moreover, BigBird comes along with a theoretical understanding of the c

Google Research 457 Dec 23, 2022
Code for the paper: Sequence-to-Sequence Learning with Latent Neural Grammars

Code for the paper: Sequence-to-Sequence Learning with Latent Neural Grammars

Yoon Kim 43 Dec 23, 2022
a chinese segment base on crf

Genius Genius是一个开源的python中文分词组件,采用 CRF(Conditional Random Field)条件随机场算法。 Feature 支持python2.x、python3.x以及pypy2.x。 支持简单的pinyin分词 支持用户自定义break 支持用户自定义合并词

duanhongyi 237 Nov 04, 2022
A script that automatically creates a branch name using google translation api and jira api

About google translation api와 jira api을 사용하여 자동으로 브랜치 이름을 만들어주는 스크립트 Setup 환경변수에 다음 3가지를 등록해야 한다. JIRA_USER : JIRA email (ex: hyunwook.kim 2 Dec 20, 2021

Basic Utilities for PyTorch Natural Language Processing (NLP)

Basic Utilities for PyTorch Natural Language Processing (NLP) PyTorch-NLP, or torchnlp for short, is a library of basic utilities for PyTorch NLP. tor

Michael Petrochuk 2.1k Jan 01, 2023
A collection of Classical Chinese natural language processing models, including Classical Chinese related models and resources on the Internet.

GuwenModels: 古文自然语言处理模型合集, 收录互联网上的古文相关模型及资源. A collection of Classical Chinese natural language processing models, including Classical Chinese related models and resources on the Internet.

Ethan 66 Dec 26, 2022
Text classification on IMDB dataset using Keras and Bi-LSTM network

Text classification on IMDB dataset using Keras and Bi-LSTM Text classification on IMDB dataset using Keras and Bi-LSTM network. Usage python3 main.py

Hamza Rashid 2 Sep 27, 2022
A modular Karton Framework service that unpacks common packers like UPX and others using the Qiling Framework.

Unpacker Karton Service A modular Karton Framework service that unpacks common packers like UPX and others using the Qiling Framework. This project is

c3rb3ru5 45 Jan 05, 2023
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language mod

13.2k Jul 07, 2021
NLP project that works with news (NER, context generation, news trend analytics)

СоАвтор СоАвтор – платформа и открытый набор инструментов для редакций и журналистов-фрилансеров, который призван сделать процесс создания контента ма

38 Jan 04, 2023