Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

Last update: Dec 30, 2022

Related tags

Text Data & NLP PLBART

Overview

PLBART

Code pre-release of our work, Unified Pre-training for Program Understanding and Generation accepted at NAACL 2021.

Note. A detailed documentation is coming soon.

Pre-training data

PLBART is pre-trained on Java and Python functions and natural language descriptions collected from Github and StackOverflow.

Evaluation tasks

We evaluated PLBART on five tasks.

Code summarization [REF]
Code generation [REF]
Code translation [REF]
Clone detection [REF]
Vulnerability REF [REF]

Notes

We will publish the pretrained PLBART checkpoint soon.
We list all the files in this repository here.

Acknowledgement

PLBART uses Fairseq, codeXglue, and TransCoder and thanks the authors of these works for their contribution.

Citation

@inproceedings{ahmad2020summarization,
    author = {Ahmad, Wasi Uddin and Chakraborty, Saikat and Ray, Baishakhi and Chang, Kai-Wei},
    booktitle = {Proceedings of the 2021 Conference of the North {A}merican Chapter of the Association for Computational Linguistics},
    title = {Unified Pre-training for Program Understanding and Generation},
    year = {2021}
}

Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

Related tags

Overview

PLBART

Pre-training data

Evaluation tasks

Notes

Acknowledgement

Citation

Owner

Wasi Ahmad

A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR models

This program do translate english words to portuguese

This repository contains the code for "Exploiting Cloze Questions for Few-Shot Text Classification and Natural Language Inference"

Fine-tune GPT-3 with a Google Chat conversation history

Trex is a tool to match semantically similar functions based on transfer learning.

Code for Editing Factual Knowledge in Language Models

Data preprocessing rosetta parser for python

A PyTorch implementation of paper "Learning Shared Semantic Space for Speech-to-Text Translation", ACL (Findings) 2021

Top2Vec is an algorithm for topic modeling and semantic search.

File-based TF-IDF: Calculates keywords in a document, using a word corpus.

Snips Python library to extract meaning from text

Trains an OpenNMT PyTorch model and SentencePiece tokenizer.

Visual Automata is a Python 3 library built as a wrapper for Caleb Evans' Automata library to add more visualization features.

This is Assignment1 code for the Web Data Processing System.

[ICLR 2021 Spotlight] Pytorch implementation for "Long-tailed Recognition by Routing Diverse Distribution-Aware Experts."

Composed Image Retrieval using Pretrained LANguage Transformers (CIRPLANT)

Easy to start. Use deep nerual network to predict the sentiment of movie review.

⛵️The official PyTorch implementation for "BERT-of-Theseus: Compressing BERT by Progressive Module Replacing" (EMNLP 2020).

A BERT-based reverse dictionary of Korean proverbs

Turn clang-tidy warnings and fixes to comments in your pull request