Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

Last update: Dec 30, 2022

Related tags

Text Data & NLP PLBART

Overview

PLBART

Code pre-release of our work, Unified Pre-training for Program Understanding and Generation accepted at NAACL 2021.

Note. A detailed documentation is coming soon.

Pre-training data

PLBART is pre-trained on Java and Python functions and natural language descriptions collected from Github and StackOverflow.

Evaluation tasks

We evaluated PLBART on five tasks.

Code summarization [REF]
Code generation [REF]
Code translation [REF]
Clone detection [REF]
Vulnerability REF [REF]

Notes

We will publish the pretrained PLBART checkpoint soon.
We list all the files in this repository here.

Acknowledgement

PLBART uses Fairseq, codeXglue, and TransCoder and thanks the authors of these works for their contribution.

Citation

@inproceedings{ahmad2020summarization,
    author = {Ahmad, Wasi Uddin and Chakraborty, Saikat and Ray, Baishakhi and Chang, Kai-Wei},
    booktitle = {Proceedings of the 2021 Conference of the North {A}merican Chapter of the Association for Computational Linguistics},
    title = {Unified Pre-training for Program Understanding and Generation},
    year = {2021}
}

Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

Related tags

Overview

PLBART

Pre-training data

Evaluation tasks

Notes

Acknowledgement

Citation

Owner

Wasi Ahmad

💫 Industrial-strength Natural Language Processing (NLP) in Python

【原神】自动演奏风物之诗琴的程序

KoBERTopic은 BERTopic을 한국어 데이터에 적용할 수 있도록 토크나이저와 BERT를 수정한 코드입니다.

DataCLUE: 国内首个以数据为中心的AI测评（含模型分析报告）

News-Articles-and-Essays - NLP (Topic Modeling and Clustering)

A natural language processing model for sequential sentence classification in medical abstracts.

StarGAN - Official PyTorch Implementation

Code for EMNLP'21 paper "Types of Out-of-Distribution Texts and How to Detect Them"

Pipeline for fast building text classification TF-IDF + LogReg baselines.

Ask for weather information like a human

voice2json is a collection of command-line tools for offline speech/intent recognition on Linux

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Chinese NER with albert/electra or other bert descendable model (keras)

NLP-SentimentAnalysis - Coursera Course ( Duration : 5 weeks ) offered by DeepLearning.AI

⛵️The official PyTorch implementation for "BERT-of-Theseus: Compressing BERT by Progressive Module Replacing" (EMNLP 2020).

CrossNER: Evaluating Cross-Domain Named Entity Recognition (AAAI-2021)

T‘rex Park is a Youzan sponsored project. Offering Chinese NLP and image models pretrained from E-commerce datasets

[Preprint] Escaping the Big Data Paradigm with Compact Transformers, 2021

NLP techniques such as named entity recognition, sentiment analysis, topic modeling, text classification with Python to predict sentiment and rating of drug from user reviews.

Pretrain CPM - 大规模预训练语言模型的预训练代码