p-tuning_NLU

Overview

这个小项目是受乐于分享的苏剑林大佬这篇p-tuning 文章启发，也实现了个使用P-tuning进行NLU分类的任务，思路是一样的，prompt实现方式有不同，这里是将[unused*]的embeddings参数抽取出用于初始化prompt_embed后，再接一个lstm和mlp用于关联各prompt，与最初p-tuning提出《GPT Understands, Too》的实现一样，结果显示在few-shot上p-tuning非常接近finetune效果。

Dataset

数据是情感分类，下载地址百度网盘提取码：osja

Evaluation

1. finetune

python few_shot_finetune.py

测试集效果：

epoch: 0 - acc: 0.897679 - best_test_acc: 0.8976788252013264
epoch: 1 - acc: 0.876362 - best_test_acc: 0.8976788252013264
epoch: 2 - acc: 0.884889 - best_test_acc: 0.8976788252013264
epoch: 3 - acc: 0.884415 - best_test_acc: 0.8976788252013264
epoch: 4 - acc: 0.884415 - best_test_acc: 0.8976788252013264

全量参数对小样本进行finetune，仅1个epoch就收敛了

2. p-tuning

python few_shot_ptuning.py

测试集效果：

epoch: 0 - acc: 0.546660 - best_test_acc: 0.5466603505447655
epoch: 1 - acc: 0.687826 - best_test_acc: 0.6878256750355282
epoch: 2 - acc: 0.737091 - best_test_acc: 0.7370914258645191
epoch: 3 - acc: 0.722406 - best_test_acc: 0.7370914258645191
epoch: 4 - acc: 0.776883 - best_test_acc: 0.7768829938417812
epoch: 5 - acc: 0.805306 - best_test_acc: 0.8053055423969683
epoch: 6 - acc: 0.833254 - best_test_acc: 0.8332543818095689
epoch: 7 - acc: 0.837991 - best_test_acc: 0.8379914732354334
epoch: 8 - acc: 0.854571 - best_test_acc: 0.8545712932259593
epoch: 9 - acc: 0.858361 - best_test_acc: 0.8583609663666508
epoch: 10 - acc: 0.856466 - best_test_acc: 0.8583609663666508
epoch: 11 - acc: 0.853150 - best_test_acc: 0.8583609663666508
epoch: 12 - acc: 0.868783 - best_test_acc: 0.8687825675035529
epoch: 13 - acc: 0.877309 - best_test_acc: 0.877309332070109
epoch: 14 - acc: 0.873993 - best_test_acc: 0.877309332070109
epoch: 15 - acc: 0.877783 - best_test_acc: 0.8777830412126955
epoch: 16 - acc: 0.882994 - best_test_acc: 0.8829938417811464
epoch: 17 - acc: 0.881573 - best_test_acc: 0.8829938417811464
epoch: 18 - acc: 0.889626 - best_test_acc: 0.8896257697773567
epoch: 19 - acc: 0.877783 - best_test_acc: 0.8896257697773567

仅prompt_embed和lstm及mlp去做p-tuning，20个epoch后接近收敛，acc=0.8896，略小于finetun的acc 0.8977

附上苏神结果对比：

p-tuning for few-shot NLU task

Related tags

Overview

p-tuning_NLU

Overview

Dataset

Evaluation

Owner

Convolutional 2D Knowledge Graph Embeddings resources

Tokenizer - Module python d'analyse syntaxique et de grammaire, tokenization

SGMC: Spectral Graph Matrix Completion

A multi-lingual approach to AllenNLP CoReference Resolution along with a wrapper for spaCy.

A fast and lightweight python-based CTC beam search decoder for speech recognition.

Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration

OpenChat: Opensource chatting framework for generative models

Trains an OpenNMT PyTorch model and SentencePiece tokenizer.

Generate text line images for training deep learning OCR model (e.g. CRNN)

Uses Google's gTTS module to easily create robo text readin' on command.

PyTorch code for EMNLP 2019 paper "LXMERT: Learning Cross-Modality Encoder Representations from Transformers".

chaii - hindi & tamil question answering

Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.

A PyTorch implementation of the Transformer model in "Attention is All You Need".

Grading tools for Advanced NLP (11-711)Grading tools for Advanced NLP (11-711)

Simple multilingual lemmatizer for Python, especially useful for speed and efficiency

Sentello is python script that simulates the anti-evasion and anti-analysis techniques used by malware.

This repository contains examples of Task-Informed Meta-Learning

Speech Recognition for Uyghur using Speech transformer

This project consists of data analysis and data visualization (done using python)of all IPL seasons from 2008 to 2019 and answering the most asked questions about the IPL.