硕士期间自学的NLP子任务，供学习参考

Last update: May 31, 2022

Overview

NLP_Chinese_down_stream_task

自学的NLP子任务，供学习参考

任务1 ：短文本分类

(1).数据集：THUCNews中文文本数据集(10分类)

(2).模型：BERT+FC/LSTM，Pytorch实现

(3).使用方法：

预训练模型使用的是中文BERT-WWM, 下载地址(https://github.com/ymcui/Chinese-BERT-wwm), 下载解压后放入[bert_pretrain]文件夹下，运行“main.py”即可

(4).训练结果：

任务2：命名体识别(NER)

(1).数据集：china-people-daily-ner-corpus（中国人民日报数据集）

(2).模型：BiLSTM+CRF，Tensorflow_cpu >= 2.1

使用了中文Wikipedia训练好的100维词向量，运行main.py即可。

(3).训练结果:

(4).F1-Score结果:

任务3：文本匹配（语义相似度，Semantic Textual Similarity）

(1).数据集：fake-news-pair-classification-challenge(kaggle虚假新闻标题分类竞赛，标签有三种关系：'unrelated', 'agreed', 'disagreed')

(2).模型：Siamese LSTM + 任意文本相似度匹配方法，Tensorflow_cpu >= 2.1

(3).使用方法：

直接运行“main.py”即可

硕士期间自学的NLP子任务，供学习参考

Related tags

Overview

NLP_Chinese_down_stream_task

任务1 ：短文本分类

(3).使用方法：

(4).训练结果：

任务2：命名体识别(NER)

(3).训练结果:

(4).F1-Score结果:

任务3：文本匹配（语义相似度，Semantic Textual Similarity）

(3).使用方法：

(4).训练结果：

Reference:

Owner

Accurately generate all possible forms of an English word e.g "election" --> "elect", "electoral", "electorate" etc.

초성 해석기 based on ko-BART

PyWorld3 is a Python implementation of the World3 model

[EMNLP 2021] LM-Critic: Language Models for Unsupervised Grammatical Error Correction

A high-level yet extensible library for fast language model tuning via automatic prompt search

Download videos from YouTube/Twitch/Twitter right in the Windows Explorer, without installing any shady shareware apps

Flaxformer: transformer architectures in JAX/Flax

Study German declensions (dER nettE Mann, ein nettER Mann, mit dEM nettEN Mann, ohne dEN nettEN Mann ...) Generate as many exercises as you want using the incredible power of SPACY!

FewCLUE: 为中文NLP定制的小样本学习测评基准

Voice Assistant inspired by Google Assistant, Cortana, Alexa, Siri, ...

This repository contains examples of Task-Informed Meta-Learning

A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis

pysentimiento: A Python toolkit for Sentiment Analysis and Social NLP tasks

Course project of [email protected]

Fine-tuning scripts for evaluating transformer-based models on KLEJ benchmark.

Transformation spoken text to written text

Paradigm Shift in NLP - "Paradigm Shift in Natural Language Processing".

A Multilingual Latent Dirichlet Allocation (LDA) Pipeline with Stop Words Removal, n-gram features, and Inverse Stemming, in Python.

SIGIR'22 paper: Axiomatically Regularized Pre-training for Ad hoc Search

The FinQA dataset from paper: FinQA: A Dataset of Numerical Reasoning over Financial Data