中文問句產生器；使用台達電閱讀理解資料集(DRCD)

Last update: Oct 22, 2021

Overview

Transformer QG on DRCD

The inputs of the model refers to

we integrate C and A into a new C' in the following form.
C' = [c1, c2, ..., [HL], a1, ..., a|A|, [HL], ..., c|C|]

Proposed by Ying-Hong Chan & Yao-Chung Fan. (2019). A Re-current BERT-based Model for Question Generation.

我們還有另外一個英文QG: Transformer-QG-on-SQuAD

Features

完整的流程；從微調到模型評分
支援許多先進的語言模型
內建Flask，可快速作為API server

DRCD dataset

台達閱讀理解資料集 Delta Reading Comprehension Dataset (DRCD) 屬於通用領域繁體中文機器閱讀理解資料集。 DRCD資料集從2,108篇維基條目中整理出10,014篇段落，並從段落中標註出30,000多個問題。

Available models

BART (base on uer/bart-base-chinese-cluecorpussmall)

Use in Transformers

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
  
tokenizer = AutoTokenizer.from_pretrained("p208p2002/bart-drcd-qg-hl")

model = AutoModelForSeq2SeqLM.from_pretrained("p208p2002/bart-drcd-qg-hl")

Expriments

Model	Bleu 1	Bleu 2	Bleu 3	Bleu 4	METEOR	ROUGE-L
BART-HLSQG	34.25	27.70	22.43	18.13	23.58	36.88

Environment requirements

The hole development is based on Ubuntu system

If you don't have pytorch 1.6+ please install or update first

https://pytorch.org/get-started/locally/

Install packages pip install -r requirements.txt
Setup scorer python setup_scorer.py
Download dataset python init_dataset.py

Training

Seq2Seq LM

usage: train_seq2seq_lm.py [-h]
                           [--base_model {bert-base-chinese,uer/bart-base-chinese-cluecorpussmall,p208p2002/bart-drcd-qg-hl}]
                           [-d {drcd}] [--batch_size BATCH_SIZE]
                           [--epoch EPOCH] [--lr LR] [--dev DEV] [--server]
                           [--run_test] [-fc FROM_CHECKPOINT]

optional arguments:
  -h, --help            show this help message and exit
  --base_model {bert-base-chinese,uer/bart-base-chinese-cluecorpussmall,p208p2002/bart-drcd-qg-hl}
  -d {drcd}, --dataset {drcd}
  --batch_size BATCH_SIZE
  --epoch EPOCH
  --lr LR
  --dev DEV
  --server
  --run_test
  -fc FROM_CHECKPOINT, --from_checkpoint FROM_CHECKPOINT

Run as API server

From pre-trained (recommend)

python train_seq2seq_lm.py --server --base_model p208p2002/bart-drcd-qg-hl

From your own checkpoint

python train_xxx_lm.py --server --base_model YOUR_BASE_MODEL --from_checkpoint FROM_CHECKPOINT

Request example

curl --location --request POST 'http://127.0.0.1:5000/' \
--header 'Content-Type: application/x-www-form-urlencoded' \
--data-urlencode 'context=[HL]伊隆·里夫·馬斯克[HL]是一名企業家和商業大亨'

{"predict": "哪一個人是一名企業家和商業大亨?"}

中文問句產生器；使用台達電閱讀理解資料集(DRCD)

Related tags

Overview

Transformer QG on DRCD

Features

DRCD dataset

Available models

Use in Transformers

Expriments

Environment requirements

Training

Seq2Seq LM

Run as API server

From pre-trained (recommend)

From your own checkpoint

Request example

Owner

Philip

Fidibo.com comments Sentiment Analyser

LSTC: Boosting Atomic Action Detection with Long-Short-Term Context

Finally decent dictionaries based on Wiktionary for your beloved eBook reader.

A python script to prefab your scripts/text files, and re create them with ease and not have to open your browser to copy code or write code yourself

PORORO: Platform Of neuRal mOdels for natuRal language prOcessing

Deploying a Text Summarization NLP use case on Docker Container Utilizing Nvidia GPU

Fixes mojibake and other glitches in Unicode text, after the fact.

ReCoin - Restoring our environment and businesses in parallel

Reformer, the efficient Transformer, in Pytorch

运小筹公众号是致力于分享运筹优化(LP、MIP、NLP、随机规划、鲁棒优化)、凸优化、强化学习等研究领域的内容以及涉及到的算法的代码实现。

State-of-the-art NLP through transformer models in a modular design and consistent APIs.

Pre-training with Extracted Gap-sentences for Abstractive SUmmarization Sequence-to-sequence models

A single model that parses Universal Dependencies across 75 languages.

ETM - R package for Topic Modelling in Embedding Spaces

Big Bird: Transformers for Longer Sequences

Lingtrain Aligner — ML powered library for the accurate texts alignment.

Official Stanford NLP Python Library for Many Human Languages

Based on 125GB of data leaked from Twitch, you can see their monthly revenues from 2019-2021

Create a machine learning model which will predict if the mortgage will be approved or not based on 5 variables

基于pytorch_rnn的古诗词生成