ConvBERT-Prod

Overview

ConvBERT

目录

0. 仓库结构

root:[./]
|--convbert_base_outputs
|      |--args.json
|      |--best.pdparams
|      |      |--model_config.json
|      |      |--model_state.pdparams
|      |      |--tokenizer_config.json
|      |      |--vocab.txt
|--convbert_infer
|      |--inference.pdiparams
|      |--inference.pdiparams.info
|      |--inference.pdmodel
|      |--tokenizer_config.json
|      |--vocab.txt
|--deploy
|      |--inference_python
|      |      |--infer.py
|      |      |--README.md
|      |--serving_python
|      |      |--config.yml
|      |      |--convbert_client
|      |      |      |--serving_client_conf.prototxt
|      |      |      |--serving_client_conf.stream.prototxt
|      |      |--convbert_server
|      |      |      |--inference.pdiparams
|      |      |      |--inference.pdmodel
|      |      |      |--serving_server_conf.prototxt
|      |      |      |--serving_server_conf.stream.prototxt
|      |      |--PipelineServingLogs
|      |      |      |--pipeline.log
|      |      |      |--pipeline.log.wf
|      |      |      |--pipeline.tracer
|      |      |--pipeline_http_client.py
|      |      |--ProcessInfo.json
|      |      |--README.md
|      |      |--web_service.py
|--images
|      |--convbert_framework.jpg
|      |--py_serving_client_results.jpg
|      |--py_serving_startup_visualization.jpg
|--LICENSE
|--output_inference_engine.npy
|--output_predict_engine.npy
|--paddlenlp
|--print_project_tree.py
|--README.md
|--requirements.txt
|--shell
|      |--export.sh
|      |--inference_python.sh
|      |--predict.sh
|      |--train.sh
|      |--train_dist.sh
|--test_tipc
|      |--common_func.sh
|      |--configs
|      |      |--ConvBERT
|      |      |      |--model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt
|      |      |      |--train_infer_python.txt
|      |--docs
|      |      |--test_serving.md
|      |      |--test_train_inference_python.md
|      |      |--tipc_guide.png
|      |      |--tipc_serving.png
|      |      |--tipc_train_inference.png
|      |--output
|      |      |--python_infer_cpu_usemkldnn_False_threads_null_precision_null_batchsize_null.log
|      |      |--python_infer_gpu_usetrt_null_precision_null_batchsize_null.log
|      |      |--results_python.log
|      |      |--results_serving.log
|      |      |--server_infer_gpu_pipeline_http_usetrt_null_precision_null_batchsize_1.log
|      |--README.md
|      |--test_serving.sh
|      |--test_train_inference_python.sh
|--tools
|      |--export_model.py
|      |--predict.py
|--train.log
|--train.py

1. 简介

论文: ConvBERT: Improving BERT with Span-based Dynamic Convolution

摘要: 像BERT及其变体这样的预训练语言模型最近在各种自然语言理解任务中取得了令人印象深刻的表现。然而,BERT严重依赖全局自注意力块,因此需要大量内存占用和计算成本。 虽然它的所有注意力头从全局角度查询整个输入序列以生成注意力图,但我们观察到一些头只需要学习局部依赖,这意味着存在计算冗余。 因此,我们提出了一种新颖的基于跨度的动态卷积来代替这些自注意力头,以直接对局部依赖性进行建模。新的卷积头与其余的自注意力头一起形成了一个新的混合注意力块,在全局和局部上下文学习中都更有效。 我们为 BERT 配备了这种混合注意力设计并构建了一个ConvBERT模型。实验表明,ConvBERT 在各种下游任务中明显优于BERT及其变体,具有更低的训练成本和更少的模型参数。 值得注意的是,ConvBERT-base 模型达到86.4GLUE分数,比ELECTRA-base高0.7,同时使用不到1/4的训练成本。

2. 数据集和复现精度

数据集为SST-2

模型 sst-2 dev acc (复现精度)
ConvBERT 0.9461

3. 准备环境与数据

3.1 准备环境

  • 下载代码
git clone https://github.com/junnyu/ConvBERT-Prod.git
  • 安装paddlepaddle
# 需要安装2.2及以上版本的Paddle,如果
# 安装GPU版本的Paddle
pip install paddlepaddle-gpu==2.2.0
# 安装CPU版本的Paddle
pip install paddlepaddle==2.2.0

更多安装方法可以参考:Paddle安装指南

  • 安装requirements
pip install -r requirements.txt

3.2 准备数据

SST-2数据已经集成在paddlenlp仓库中。

3.3 准备模型

如果您希望直接体验评估或者预测推理过程,可以直接根据第2章的内容下载提供的预训练模型,直接体验模型评估、预测、推理部署等内容。

4. 开始使用

4.1 模型训练

  • 单机单卡训练
export CUDA_VISIBLE_DEVICES=0
python -m paddle.distributed.launch --gpus "0" train.py \
    --model_type convbert \
    --model_name_or_path convbert-base \
    --task_name sst-2 \
    --max_seq_length 128 \
    --learning_rate 1e-4 \
    --num_train_epochs 3 \
    --output_dir ./convbert_base_outputs/ \
    --logging_steps 100 \
    --save_steps 400 \
    --batch_size 32   \
    --warmup_proportion 0.1

部分训练日志如下所示。

====================================================================================================
global step 2500/6315, epoch: 1, batch: 394, rank_id: 0, loss: 0.140546, lr: 0.0000671182, speed: 3.7691 step/s
global step 2600/6315, epoch: 1, batch: 494, rank_id: 0, loss: 0.062813, lr: 0.0000653589, speed: 4.1413 step/s
global step 2700/6315, epoch: 1, batch: 594, rank_id: 0, loss: 0.051268, lr: 0.0000635996, speed: 4.1867 step/s
global step 2800/6315, epoch: 1, batch: 694, rank_id: 0, loss: 0.133289, lr: 0.0000618403, speed: 4.1769 step/s
eval loss: 0.342346, acc: 0.9461009174311926,
eval done total : 1.9056718349456787 s
====================================================================================================
  • 单机多卡训练
export CUDA_VISIBLE_DEVICES=0,1,2,3
python -m paddle.distributed.launch --gpus "0,1,2,3" train.py \
    --model_type convbert \
    --model_name_or_path convbert-base \
    --task_name sst-2 \
    --max_seq_length 128 \
    --learning_rate 1e-4 \
    --num_train_epochs 3 \
    --output_dir ./convbert_base_outputs/ \
    --logging_steps 100 \
    --save_steps 400 \
    --batch_size 32   \
    --warmup_proportion 0.1

更多配置参数可以参考train.pyget_args_parser函数。

4.2 模型评估

该项目中,训练与评估脚本同时进行,请查看训练过程中的评价指标。

4.3 模型预测

  • 使用GPU预测
python tools/predict.py --model_path=./convbert_base_outputs/best.pdparams

对于下面的文本进行预测

the problem , it is with most of these things , is the script .

最终输出结果为label_id: 0, prob: 0.9959235191345215,表示预测的标签ID是0,置信度为0.9959

  • 使用CPU预测
python tools/predict.py --model_path=./convbert_base_outputs/best.pdparams --device=cpu

对于下面的文本进行预测

the problem , it is with most of these things , is the script .

最终输出结果为label_id: 0, prob: 0.995919406414032,表示预测的标签ID是0,置信度为0.9959

5. 模型推理部署

5.1 基于Inference的推理

Inference推理教程可参考:链接

5.2 基于Serving的服务化部署

Serving部署教程可参考:链接

6. TIPC自动化测试脚本

以Linux基础训练推理测试为例,测试流程如下。

  • 运行测试命令
bash test_tipc/test_train_inference_python.sh test_tipc/configs/ConvBERT/train_infer_python.txt whole_train_whole_infer

如果运行成功,在终端中会显示下面的内容,具体的日志也会输出到test_tipc/output/文件夹中的文件中。

�[33m Run successfully with command - python train.py --save_steps 400      --max_steps=6315           !  �[0m
�[33m Run successfully with command - python tools/export_model.py --model_path=./convbert_base_outputs/best.pdparams --save_inference_dir ./convbert_infer      !  �[0m
�[33m Run successfully with command - python deploy/inference_python/infer.py --model_dir ./convbert_infer --use_gpu=True               > ./test_tipc/output/python_infer_gpu_usetrt_null_precision_null_batchsize_null.log 2>&1 !  �[0m
�[33m Run successfully with command - python deploy/inference_python/infer.py --model_dir ./convbert_infer --use_gpu=False --benchmark=False               > ./test_tipc/output/python_infer_cpu_usemkldnn_False_threads_null_precision_null_batchsize_null.log 2>&1 !  �[0m

7. 注意

为了可以使用静态图导出功能,本项目修改了paddlenlp仓库中的convbert模型,主要修改部分如下。

    1. 使用paddle.shape而不是tensor.shape获取tensor的形状。
    1. F.unfold对于静态图不怎么友好,只好采用for循环。
if self.conv_type == "sdconv":
    bs = paddle.shape(q)[0]
    seqlen = paddle.shape(q)[1]
    mixed_key_conv_attn_layer = self.key_conv_attn_layer(query)
    conv_attn_layer = mixed_key_conv_attn_layer * q

    # conv_kernel_layer
    conv_kernel_layer = self.conv_kernel_layer(conv_attn_layer)
    conv_kernel_layer = tensor.reshape(
        conv_kernel_layer, shape=[-1, self.conv_kernel_size, 1])
    conv_kernel_layer = F.softmax(conv_kernel_layer, axis=1)
    conv_out_layer = self.conv_out_layer(query)
    conv_out_layer = paddle.stack(
        [
            paddle.slice(F.pad(conv_out_layer, pad=[
                            self.padding, self.padding], data_format="NLC"), [1], starts=[i], ends=[i+seqlen])
            for i in range(self.conv_kernel_size)
        ],
        axis=-1,
    )
    conv_out_layer = tensor.reshape(
        conv_out_layer,
        shape=[-1, self.head_dim, self.conv_kernel_size])
    conv_out_layer = tensor.matmul(conv_out_layer, conv_kernel_layer)
    conv_out = tensor.reshape(
        conv_out_layer,
        shape=[bs, seqlen, self.num_heads, self.head_dim])

8. LICENSE

本项目的发布受Apache 2.0 license许可认证。

9. 参考链接与文献

TODO

Owner
yujun
Please show me your code.
yujun
Transformers Wav2Vec2 + Parlance's CTCDecodeTransformers Wav2Vec2 + Parlance's CTCDecode

🤗 Transformers Wav2Vec2 + Parlance's CTCDecode Introduction This repo shows how 🤗 Transformers can be used in combination with Parlance's ctcdecode

Patrick von Platen 9 Jul 21, 2022
Knowledge Management for Humans using Machine Learning & Tags

HyperTag helps humans intuitively express how they think about their files using tags and machine learning. Represent how you think using tags. Find what you look for using semantic search for your t

Ravn Tech, Inc. 166 Jan 07, 2023
translate using your voice

speech-to-text-translator Usage translate using your voice description this project makes translating a word easy, all you have to do is speak and...

1 Oct 18, 2021
Tools to download and cleanup Common Crawl data

cc_net Tools to download and clean Common Crawl as introduced in our paper CCNet. If you found these resources useful, please consider citing: @inproc

Meta Research 483 Jan 02, 2023
小布助手对话短文本语义匹配的一个baseline

oppo-text-match 小布助手对话短文本语义匹配的一个baseline 模型 参考:https://kexue.fm/archives/8213 base版本线下大概0.952,线上0.866(单模型,没做K-flod融合)。 训练 测试环境:tensorflow 1.15 + keras

苏剑林(Jianlin Su) 132 Dec 14, 2022
Natural language computational chemistry command line interface.

nlcc Install pip install nlcc Must have Open-AI Codex key: export OPENAI_API_KEY=your key here then nlcc key bindings ctrl-w copy to clipboard (Note

Andrew White 37 Dec 14, 2022
A framework for implementing federated learning

This is partly the reproduction of the paper of [Privacy-Preserving Federated Learning in Fog Computing](DOI: 10.1109/JIOT.2020.2987958. 2020)

DavidChen 46 Sep 23, 2022
Material for GW4SHM workshop, 16/03/2022.

GW4SHM Workshop Wednesday, 16th March 2022 (13:00 – 15:15 GMT): Presented by: Dr. Rhodri Nelson, Imperial College London Project website: https://www.

Devito Codes 1 Mar 16, 2022
Faster, modernized fork of the language identification tool langid.py

py3langid py3langid is a fork of the standalone language identification tool langid.py by Marco Lui. Original license: BSD-2-Clause. Fork license: BSD

Adrien Barbaresi 12 Nov 05, 2022
Implementation of COCO-LM, Correcting and Contrasting Text Sequences for Language Model Pretraining, in Pytorch

COCO LM Pretraining (wip) Implementation of COCO-LM, Correcting and Contrasting Text Sequences for Language Model Pretraining, in Pytorch. They were a

Phil Wang 44 Jul 28, 2022
Code from the paper "High-Performance Brain-to-Text Communication via Handwriting"

Code from the paper "High-Performance Brain-to-Text Communication via Handwriting"

Francis R. Willett 305 Dec 22, 2022
A Practitioner's Guide to Natural Language Processing

Learn how to process, classify, cluster, summarize, understand syntax, semantics and sentiment of text data with the power of Python! This repository contains code and datasets used in my book, Text

Dipanjan (DJ) Sarkar 1.5k Jan 03, 2023
PhoNLP: A BERT-based multi-task learning toolkit for part-of-speech tagging, named entity recognition and dependency parsing

PhoNLP is a multi-task learning model for joint part-of-speech (POS) tagging, named entity recognition (NER) and dependency parsing. Experiments on Vietnamese benchmark datasets show that PhoNLP prod

VinAI Research 109 Dec 02, 2022
Large-scale pretraining for dialogue

A State-of-the-Art Large-scale Pretrained Response Generation Model (DialoGPT) This repository contains the source code and trained model for a large-

Microsoft 1.8k Jan 07, 2023
Final Project for the Intel AI Readiness Boot Camp NLP (Jan)

NLP Boot Camp (Jan) Synopsis Full Name: Prameya Mohanty Name of your School: Delhi Public School, Rourkela Class: VIII Title of the Project: iTransect

TheCodingHub 1 Feb 01, 2022
Signature remover is a NLP based solution which removes email signatures from the rest of the text.

Signature Remover Signature remover is a NLP based solution which removes email signatures from the rest of the text. It helps to enchance data conten

Forges Alterway 8 Jan 06, 2023
LUKE -- Language Understanding with Knowledge-based Embeddings

LUKE (Language Understanding with Knowledge-based Embeddings) is a new pre-trained contextualized representation of words and entities based on transf

Studio Ousia 587 Dec 30, 2022
Application for shadowing Chinese.

chinese-shadowing Simple APP for shadowing chinese. With this application, it is very easy to record yourself, play the sound recorded and listen to s

Thomas Hirtz 5 Sep 06, 2022
novel deep learning research works with PaddlePaddle

Research 发布基于飞桨的前沿研究工作,包括CV、NLP、KG、STDM等领域的顶会论文和比赛冠军模型。 目录 计算机视觉(Computer Vision) 自然语言处理(Natrual Language Processing) 知识图谱(Knowledge Graph) 时空数据挖掘(Spa

1.5k Jan 03, 2023
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

ALBERT ***************New March 28, 2020 *************** Add a colab tutorial to run fine-tuning for GLUE datasets. ***************New January 7, 2020

Google Research 3k Dec 26, 2022