Code, Data and Demo for Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting

Overview

InversePrompting

Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting

Code: The code is provided in the "chinese_ip" and "english_ip" package.

Chinese Inverse Prompting:

based on https://github.com/THUDM/Chinese-Transformer-XL

Packages Required

torch,apex,boto3,sentencepiece,nltk,jsonlines,filelock,deepspeed,pypinyin,pandas

Train:

bash scripts/ds_pretrain_gpt2_29B.sh

Direct Generation:

bash scripts/generate_text.sh

Generate Poems:

python generate_pms_refined.py  --Inverse Prompting for TCP Generation

Generate QA:

python generate_qa_desc.py  --Inverse Prompting for QA

English Inverse Prompting:

based on megatron-lm https://github.com/NVIDIA/Megatron-LM, follow its guide to download model weights and put them under the correct path, then run

python tools/generate_samples_sgpu.py --use-set 1

for inverse prompting.

Data:

Chinese Language Model:

See https://github.com/THUDM/Chinese-Transformer-XL

English Language Model:

See https://github.com/NVIDIA/Megatron-LM

Generated TCPs:

jiuge:

data/poems_jiuge.jsonl
jiuge generated from http://jiuge.thunlp.org/

IP+RL:

data/poems_ip_rl.zip
IP-only:
data/poems_ip_norl.zip
Base Model:
data/poems_noip.zip

QAs:

CPM:

data/qa_cpm.zip
IP:
data/qa_ip.zip
base model:
data/qa_basemodel.zip
Human:
data/qa_human.jsonl

Human Evaluation Raw Data (results listed in paper):

based on evaluator:

data/user-records.jsonl
based on prompts: QA:
data/qa-records.jsonl
poem:
data/poem-records.jsonl

Paper: full version of paper(generated using XeLaTeX) is included in this repo. The arXiv version uses pdflatex and tables with Chinese characters are transferred to English as pdflatex does not allow UTF-8 characters(non-English languages) presence.

paper.pdf

There's also a demo where you can try your own questions/titles for QA/poem generation.

QA: https://pretrain.aminer.cn/app/qa

Poem Generation: https://pretrain.aminer.cn/apps/poetry.html

Note that the demo version is updating frequently and may be different from the repo version.

Some examples of poems it generates:

咏特朗普

天下岂有华盛顿,外强中衰叹累累。
白宫总统成陪衬,螳臂挡车虎尾寒。
坐观美国朝野势,风雨飘摇现暴难。
拜登再任难抵挡,明年恐将命归残。
夜过虹桥机场 

卢浦斜晖里,西楼醉客行。
影侵双塔晚,灯落一城明。
空客还频顾,航灯未可惊。
空留城市夜,月映水帘星。
排队购房作 

向晚万人候,售楼幢馅齐。
验资堪买主,瞧室亦堪栖。
回柱瞻佳处,连楼仰远姿。
殷勤申买者,莫待扣扉期。
论资本主义 

若为自由故,如今逐利逃。
入城操法律,两股战空槽。
漂白藏珠玉,欢呼夺锦袍。
管窥矜势利,夸视堕尘劳。
赠美国友人

清远寄吴士,华州逢旧知。
大洋环万里,学馆阻三时。
道别殷勤意,地连海峤西。
同来艰运日,异域远风姿。
安克雷奇中美会谈

特务狂声振,朗官降虏庭。
普天皆窃笑,攻守几无惊。
入市商人拜,国殇将士迎。
会同诛狡寇,世界定清明。

If you have any questions, please contact [email protected]

Please cite

@article{zou2021controllable,
  title={Controllable Generation from Pre-trained Language Models via Inverse Prompting},
  author={Zou, Xu and Yin, Da and Zhong, Qingyang and Yang, Hongxia and Yang, Zhilin and Tang, Jie}, 
  journal={arXiv preprint arXiv:2103.10685},  
  year={2021}  
}
Owner
THUDM
Data Mining Research Group at Tsinghua University
THUDM
Code for DeepCurrents: Learning Implicit Representations of Shapes with Boundaries

DeepCurrents | Webpage | Paper DeepCurrents: Learning Implicit Representations of Shapes with Boundaries David Palmer*, Dmitriy Smirnov*, Stephanie Wa

Dima Smirnov 36 Dec 08, 2022
Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition

Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition The official code of ABINet (CVPR 2021, Oral).

334 Dec 31, 2022
Pytorch implementation of the paper Improving Text-to-Image Synthesis Using Contrastive Learning

T2I_CL This is the official Pytorch implementation of the paper Improving Text-to-Image Synthesis Using Contrastive Learning Requirements Linux Python

42 Dec 31, 2022
Towards the D-Optimal Online Experiment Design for Recommender Selection (KDD 2021)

Towards the D-Optimal Online Experiment Design for Recommender Selection (KDD 2021) Contact 0 Jan 11, 2022

Voice assistant - Voice assistant with python

🌐 Python Voice Assistant 🌵 - User's greeting 🌵 - Writing tasks to todo-list ?

PythonToday 10 Dec 26, 2022
Analysis code and Latex source of the manuscript describing the conditional permutation test of confounding bias in predictive modelling.

Git repositoty of the manuscript entitled Statistical quantification of confounding bias in predictive modelling by Tamas Spisak The manuscript descri

PNI - Predictive Neuroimaging Lab, University Hospital Essen, Germany 0 Nov 22, 2021
Official PyTorch implementation of "Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble" (NeurIPS'21)

Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble This is the code for reproducing the results of the paper Uncertainty-Bas

43 Nov 23, 2022
SHIFT15M: multiobjective large-scale fashion dataset with distributional shifts

[arXiv] The main motivation of the SHIFT15M project is to provide a dataset that contains natural dataset shifts collected from a web service IQON, wh

ZOZO, Inc. 138 Nov 24, 2022
PyTorch implementation of Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy

Anomaly Transformer in PyTorch This is an implementation of Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy. This pape

spencerbraun 160 Dec 19, 2022
M2MRF: Many-to-Many Reassembly of Features for Tiny Lesion Segmentation in Fundus Images

M2MRF: Many-to-Many Reassembly of Features for Tiny Lesion Segmentation in Fundus Images This repo is the official implementation of paper "M2MRF: Man

12 Dec 14, 2022
PEPit is a package enabling computer-assisted worst-case analyses of first-order optimization methods.

PEPit: Performance Estimation in Python This open source Python library provides a generic way to use PEP framework in Python. Performance estimation

Baptiste 53 Nov 16, 2022
Benchmark VAE - Library for Variational Autoencoder benchmarking

Documentation pythae This library implements some of the most common (Variational) Autoencoder models. In particular it provides the possibility to pe

1.1k Jan 02, 2023
Activity tragle - Google is tracking everything, we just look at it

activity_tragle Google is tracking everything, we just look at it here. You need

BERNARD Guillaume 1 Feb 15, 2022
A 3D Dense mapping backend library of SLAM based on taichi-Lang designed for the aerial swarm.

TaichiSLAM This project is a 3D Dense mapping backend library of SLAM based Taichi-Lang, designed for the aerial swarm. Intro Taichi is an efficient d

XuHao 230 Dec 19, 2022
v objective diffusion inference code for PyTorch.

v-diffusion-pytorch v objective diffusion inference code for PyTorch, by Katherine Crowson (@RiversHaveWings) and Chainbreakers AI (@jd_pressman). The

Katherine Crowson 635 Dec 30, 2022
The dataset and source code for our paper: "Did You Ask a Good Question? A Cross-Domain Question IntentionClassification Benchmark for Text-to-SQL"

TriageSQL The dataset and source code for our paper: "Did You Ask a Good Question? A Cross-Domain Question Intention Classification Benchmark for Text

Yusen Zhang 22 Nov 09, 2022
Allows including an action inside another action (by preprocessing the Yaml file). This is how composite actions should have worked.

actions-includes Allows including an action inside another action (by preprocessing the Yaml file). Instead of using uses or run in your action step,

Tim Ansell 70 Nov 04, 2022
End-to-End Dense Video Captioning with Parallel Decoding (ICCV 2021)

PDVC Official implementation for End-to-End Dense Video Captioning with Parallel Decoding (ICCV 2021) [paper] [valse论文速递(Chinese)] This repo supports:

Teng Wang 118 Dec 16, 2022
(CVPR 2021) PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds

PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds by Mutian Xu*, Runyu Ding*, Hengshuang Zhao, and Xiaojuan Qi. Int

CVMI Lab 228 Dec 25, 2022
Compartmental epidemic model to assess undocumented infections: applications to SARS-CoV-2 epidemics in Brazil - Datasets and Codes

Compartmental epidemic model to assess undocumented infections: applications to SARS-CoV-2 epidemics in Brazil - Datasets and Codes The codes for simu

1 Jan 12, 2022