ConvBERT-Prod

Overview

ConvBERT

目录

0. 仓库结构

root:[./]
|--convbert_base_outputs
|      |--args.json
|      |--best.pdparams
|      |      |--model_config.json
|      |      |--model_state.pdparams
|      |      |--tokenizer_config.json
|      |      |--vocab.txt
|--convbert_infer
|      |--inference.pdiparams
|      |--inference.pdiparams.info
|      |--inference.pdmodel
|      |--tokenizer_config.json
|      |--vocab.txt
|--deploy
|      |--inference_python
|      |      |--infer.py
|      |      |--README.md
|      |--serving_python
|      |      |--config.yml
|      |      |--convbert_client
|      |      |      |--serving_client_conf.prototxt
|      |      |      |--serving_client_conf.stream.prototxt
|      |      |--convbert_server
|      |      |      |--inference.pdiparams
|      |      |      |--inference.pdmodel
|      |      |      |--serving_server_conf.prototxt
|      |      |      |--serving_server_conf.stream.prototxt
|      |      |--PipelineServingLogs
|      |      |      |--pipeline.log
|      |      |      |--pipeline.log.wf
|      |      |      |--pipeline.tracer
|      |      |--pipeline_http_client.py
|      |      |--ProcessInfo.json
|      |      |--README.md
|      |      |--web_service.py
|--images
|      |--convbert_framework.jpg
|      |--py_serving_client_results.jpg
|      |--py_serving_startup_visualization.jpg
|--LICENSE
|--output_inference_engine.npy
|--output_predict_engine.npy
|--paddlenlp
|--print_project_tree.py
|--README.md
|--requirements.txt
|--shell
|      |--export.sh
|      |--inference_python.sh
|      |--predict.sh
|      |--train.sh
|      |--train_dist.sh
|--test_tipc
|      |--common_func.sh
|      |--configs
|      |      |--ConvBERT
|      |      |      |--model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt
|      |      |      |--train_infer_python.txt
|      |--docs
|      |      |--test_serving.md
|      |      |--test_train_inference_python.md
|      |      |--tipc_guide.png
|      |      |--tipc_serving.png
|      |      |--tipc_train_inference.png
|      |--output
|      |      |--python_infer_cpu_usemkldnn_False_threads_null_precision_null_batchsize_null.log
|      |      |--python_infer_gpu_usetrt_null_precision_null_batchsize_null.log
|      |      |--results_python.log
|      |      |--results_serving.log
|      |      |--server_infer_gpu_pipeline_http_usetrt_null_precision_null_batchsize_1.log
|      |--README.md
|      |--test_serving.sh
|      |--test_train_inference_python.sh
|--tools
|      |--export_model.py
|      |--predict.py
|--train.log
|--train.py

1. 简介

论文: ConvBERT: Improving BERT with Span-based Dynamic Convolution

摘要: 像BERT及其变体这样的预训练语言模型最近在各种自然语言理解任务中取得了令人印象深刻的表现。然而,BERT严重依赖全局自注意力块,因此需要大量内存占用和计算成本。 虽然它的所有注意力头从全局角度查询整个输入序列以生成注意力图,但我们观察到一些头只需要学习局部依赖,这意味着存在计算冗余。 因此,我们提出了一种新颖的基于跨度的动态卷积来代替这些自注意力头,以直接对局部依赖性进行建模。新的卷积头与其余的自注意力头一起形成了一个新的混合注意力块,在全局和局部上下文学习中都更有效。 我们为 BERT 配备了这种混合注意力设计并构建了一个ConvBERT模型。实验表明,ConvBERT 在各种下游任务中明显优于BERT及其变体,具有更低的训练成本和更少的模型参数。 值得注意的是,ConvBERT-base 模型达到86.4GLUE分数,比ELECTRA-base高0.7,同时使用不到1/4的训练成本。

2. 数据集和复现精度

数据集为SST-2

模型 sst-2 dev acc (复现精度)
ConvBERT 0.9461

3. 准备环境与数据

3.1 准备环境

  • 下载代码
git clone https://github.com/junnyu/ConvBERT-Prod.git
  • 安装paddlepaddle
# 需要安装2.2及以上版本的Paddle,如果
# 安装GPU版本的Paddle
pip install paddlepaddle-gpu==2.2.0
# 安装CPU版本的Paddle
pip install paddlepaddle==2.2.0

更多安装方法可以参考:Paddle安装指南

  • 安装requirements
pip install -r requirements.txt

3.2 准备数据

SST-2数据已经集成在paddlenlp仓库中。

3.3 准备模型

如果您希望直接体验评估或者预测推理过程,可以直接根据第2章的内容下载提供的预训练模型,直接体验模型评估、预测、推理部署等内容。

4. 开始使用

4.1 模型训练

  • 单机单卡训练
export CUDA_VISIBLE_DEVICES=0
python -m paddle.distributed.launch --gpus "0" train.py \
    --model_type convbert \
    --model_name_or_path convbert-base \
    --task_name sst-2 \
    --max_seq_length 128 \
    --learning_rate 1e-4 \
    --num_train_epochs 3 \
    --output_dir ./convbert_base_outputs/ \
    --logging_steps 100 \
    --save_steps 400 \
    --batch_size 32   \
    --warmup_proportion 0.1

部分训练日志如下所示。

====================================================================================================
global step 2500/6315, epoch: 1, batch: 394, rank_id: 0, loss: 0.140546, lr: 0.0000671182, speed: 3.7691 step/s
global step 2600/6315, epoch: 1, batch: 494, rank_id: 0, loss: 0.062813, lr: 0.0000653589, speed: 4.1413 step/s
global step 2700/6315, epoch: 1, batch: 594, rank_id: 0, loss: 0.051268, lr: 0.0000635996, speed: 4.1867 step/s
global step 2800/6315, epoch: 1, batch: 694, rank_id: 0, loss: 0.133289, lr: 0.0000618403, speed: 4.1769 step/s
eval loss: 0.342346, acc: 0.9461009174311926,
eval done total : 1.9056718349456787 s
====================================================================================================
  • 单机多卡训练
export CUDA_VISIBLE_DEVICES=0,1,2,3
python -m paddle.distributed.launch --gpus "0,1,2,3" train.py \
    --model_type convbert \
    --model_name_or_path convbert-base \
    --task_name sst-2 \
    --max_seq_length 128 \
    --learning_rate 1e-4 \
    --num_train_epochs 3 \
    --output_dir ./convbert_base_outputs/ \
    --logging_steps 100 \
    --save_steps 400 \
    --batch_size 32   \
    --warmup_proportion 0.1

更多配置参数可以参考train.pyget_args_parser函数。

4.2 模型评估

该项目中,训练与评估脚本同时进行,请查看训练过程中的评价指标。

4.3 模型预测

  • 使用GPU预测
python tools/predict.py --model_path=./convbert_base_outputs/best.pdparams

对于下面的文本进行预测

the problem , it is with most of these things , is the script .

最终输出结果为label_id: 0, prob: 0.9959235191345215,表示预测的标签ID是0,置信度为0.9959

  • 使用CPU预测
python tools/predict.py --model_path=./convbert_base_outputs/best.pdparams --device=cpu

对于下面的文本进行预测

the problem , it is with most of these things , is the script .

最终输出结果为label_id: 0, prob: 0.995919406414032,表示预测的标签ID是0,置信度为0.9959

5. 模型推理部署

5.1 基于Inference的推理

Inference推理教程可参考:链接

5.2 基于Serving的服务化部署

Serving部署教程可参考:链接

6. TIPC自动化测试脚本

以Linux基础训练推理测试为例,测试流程如下。

  • 运行测试命令
bash test_tipc/test_train_inference_python.sh test_tipc/configs/ConvBERT/train_infer_python.txt whole_train_whole_infer

如果运行成功,在终端中会显示下面的内容,具体的日志也会输出到test_tipc/output/文件夹中的文件中。

�[33m Run successfully with command - python train.py --save_steps 400      --max_steps=6315           !  �[0m
�[33m Run successfully with command - python tools/export_model.py --model_path=./convbert_base_outputs/best.pdparams --save_inference_dir ./convbert_infer      !  �[0m
�[33m Run successfully with command - python deploy/inference_python/infer.py --model_dir ./convbert_infer --use_gpu=True               > ./test_tipc/output/python_infer_gpu_usetrt_null_precision_null_batchsize_null.log 2>&1 !  �[0m
�[33m Run successfully with command - python deploy/inference_python/infer.py --model_dir ./convbert_infer --use_gpu=False --benchmark=False               > ./test_tipc/output/python_infer_cpu_usemkldnn_False_threads_null_precision_null_batchsize_null.log 2>&1 !  �[0m

7. 注意

为了可以使用静态图导出功能,本项目修改了paddlenlp仓库中的convbert模型,主要修改部分如下。

    1. 使用paddle.shape而不是tensor.shape获取tensor的形状。
    1. F.unfold对于静态图不怎么友好,只好采用for循环。
if self.conv_type == "sdconv":
    bs = paddle.shape(q)[0]
    seqlen = paddle.shape(q)[1]
    mixed_key_conv_attn_layer = self.key_conv_attn_layer(query)
    conv_attn_layer = mixed_key_conv_attn_layer * q

    # conv_kernel_layer
    conv_kernel_layer = self.conv_kernel_layer(conv_attn_layer)
    conv_kernel_layer = tensor.reshape(
        conv_kernel_layer, shape=[-1, self.conv_kernel_size, 1])
    conv_kernel_layer = F.softmax(conv_kernel_layer, axis=1)
    conv_out_layer = self.conv_out_layer(query)
    conv_out_layer = paddle.stack(
        [
            paddle.slice(F.pad(conv_out_layer, pad=[
                            self.padding, self.padding], data_format="NLC"), [1], starts=[i], ends=[i+seqlen])
            for i in range(self.conv_kernel_size)
        ],
        axis=-1,
    )
    conv_out_layer = tensor.reshape(
        conv_out_layer,
        shape=[-1, self.head_dim, self.conv_kernel_size])
    conv_out_layer = tensor.matmul(conv_out_layer, conv_kernel_layer)
    conv_out = tensor.reshape(
        conv_out_layer,
        shape=[bs, seqlen, self.num_heads, self.head_dim])

8. LICENSE

本项目的发布受Apache 2.0 license许可认证。

9. 参考链接与文献

TODO

Owner
yujun
Please show me your code.
yujun
Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Kashgari Overview | Performance | Installation | Documentation | Contributing 🎉 🎉 🎉 We released the 2.0.0 version with TF2 Support. 🎉 🎉 🎉 If you

Eliyar Eziz 2.3k Dec 29, 2022
Various Algorithms for Short Text Mining

Short Text Mining in Python Introduction This package shorttext is a Python package that facilitates supervised and unsupervised learning for short te

Kwan-Yuet 466 Dec 06, 2022
Lingtrain Aligner — ML powered library for the accurate texts alignment.

Lingtrain Aligner ML powered library for the accurate texts alignment in different languages. Purpose Main purpose of this alignment tool is to build

Sergei Averkiev 76 Dec 14, 2022
Shared, streaming Python dict

UltraDict Sychronized, streaming Python dictionary that uses shared memory as a backend Warning: This is an early hack. There are only few unit tests

Ronny Rentner 192 Dec 23, 2022
Finally, some decent sample sentences

tts-dataset-prompts This repository aims to be a decent set of sentences for people looking to clone their own voices (e.g. using Tacotron 2). Each se

hecko 19 Dec 13, 2022
Use the power of GPT3 to execute any function inside your programs just by giving some doctests

gptrun Don't feel like coding today? Use the power of GPT3 to execute any function inside your programs just by giving some doctests. How is this diff

Roberto Abdelkader Martínez Pérez 11 Nov 11, 2022
Quick insights from Zoom meeting transcripts using Graph + NLP

Transcript Analysis - Graph + NLP This program extracts insights from Zoom Meeting Transcripts (.vtt) using TigerGraph and NLTK. In order to run this

Advit Deepak 7 Sep 17, 2022
숭실대학교 컴퓨터학부 전공종합설계프로젝트

✨ 시각장애인을 위한 버스도착 알림 장치 ✨ 👀 개요 현대 사회에서 대중교통 위치 정보를 이용하여 사람들이 간단하게 이용할 대중교통의 정보를 얻고 쉽게 대중교통을 이용할 수 있다. 해당 정보는 각종 어플리케이션과 대중교통 이용시설에서 위치 정보를 제공하고 있지만 시각

taegyun 3 Jan 25, 2022
Journey is a NLP-Powered Developer assistant

Journey Journey is a NLP-Powered Developer assistant Using on the powerful Natural Language Processing library Mindmeld, this projects aims to assist

Christian Eilers 21 Dec 11, 2022
nlabel is a library for generating, storing and retrieving tagging information and embedding vectors from various nlp libraries through a unified interface.

nlabel is a library for generating, storing and retrieving tagging information and embedding vectors from various nlp libraries through a unified interface.

Bernhard Liebl 2 Jun 10, 2022
Code for paper Multitask-Finetuning of Zero-shot Vision-Language Models

Code for paper Multitask-Finetuning of Zero-shot Vision-Language Models

Zhenhailong Wang 2 Jul 15, 2022
Code for the paper: Sequence-to-Sequence Learning with Latent Neural Grammars

Code for the paper: Sequence-to-Sequence Learning with Latent Neural Grammars

Yoon Kim 43 Dec 23, 2022
Binaural Speech Synthesis

Binaural Speech Synthesis This repository contains code to train a mono-to-binaural neural sound renderer. If you use this code or the provided datase

Facebook Research 135 Dec 18, 2022
Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.

TextDistance TextDistance -- python library for comparing distance between two or more sequences by many algorithms. Features: 30+ algorithms Pure pyt

Life4 3k Jan 06, 2023
An automated program that helps customers of Pizza Palour place their pizza orders

PIzza_Order_Assistant Introduction An automated program that helps customers of Pizza Palour place their pizza orders. The program uses voice commands

Tindi Sommers 1 Dec 26, 2021
Synthetic data for the people.

zpy: Synthetic data in Blender. Website • Install • Docs • Examples • CLI • Contribute • Licence Abstract Collecting, labeling, and cleaning data for

Zumo Labs 253 Dec 21, 2022
Galois is an auto code completer for code editors (or any text editor) based on OpenAI GPT-2.

Galois is an auto code completer for code editors (or any text editor) based on OpenAI GPT-2. It is trained (finetuned) on a curated list of approximately 45K Python (~470MB) files gathered from the

Galois Autocompleter 91 Sep 23, 2022
This is the Alpha of Nutte language, she is not complete yet / Essa é a Alpha da Nutte language, não está completa ainda

nutte-language This is the Alpha of Nutte language, it is not complete yet / Essa é a Alpha da Nutte language, não está completa ainda My language was

catdochrome 2 Dec 18, 2021
Source code for AAAI20 "Generating Persona Consistent Dialogues by Exploiting Natural Language Inference".

Generating Persona Consistent Dialogues by Exploiting Natural Language Inference Source code for RCDG model in AAAI20 Generating Persona Consistent Di

16 Oct 08, 2022
A 10000+ hours dataset for Chinese speech recognition

A 10000+ hours dataset for Chinese speech recognition

309 Dec 16, 2022