硕士期间自学的NLP子任务，供学习参考

Last update: May 31, 2022

Overview

NLP_Chinese_down_stream_task

自学的NLP子任务，供学习参考

任务1 ：短文本分类

(1).数据集：THUCNews中文文本数据集(10分类)

(2).模型：BERT+FC/LSTM，Pytorch实现

(3).使用方法：

预训练模型使用的是中文BERT-WWM, 下载地址(https://github.com/ymcui/Chinese-BERT-wwm), 下载解压后放入[bert_pretrain]文件夹下，运行“main.py”即可

(4).训练结果：

任务2：命名体识别(NER)

(1).数据集：china-people-daily-ner-corpus（中国人民日报数据集）

(2).模型：BiLSTM+CRF，Tensorflow_cpu >= 2.1

使用了中文Wikipedia训练好的100维词向量，运行main.py即可。

(3).训练结果:

(4).F1-Score结果:

任务3：文本匹配（语义相似度，Semantic Textual Similarity）

(1).数据集：fake-news-pair-classification-challenge(kaggle虚假新闻标题分类竞赛，标签有三种关系：'unrelated', 'agreed', 'disagreed')

(2).模型：Siamese LSTM + 任意文本相似度匹配方法，Tensorflow_cpu >= 2.1

(3).使用方法：

直接运行“main.py”即可

硕士期间自学的NLP子任务，供学习参考

Related tags

Overview

NLP_Chinese_down_stream_task

任务1 ：短文本分类

(3).使用方法：

(4).训练结果：

任务2：命名体识别(NER)

(3).训练结果:

(4).F1-Score结果:

任务3：文本匹配（语义相似度，Semantic Textual Similarity）

(3).使用方法：

(4).训练结果：

Reference:

Owner

IMS-Toucan is a toolkit to train state-of-the-art Speech Synthesis models

ReCoin - Restoring our environment and businesses in parallel

A Practitioner's Guide to Natural Language Processing

A website which allows you to play with the GPT-2 transformer

Sequence model architectures from scratch in PyTorch

TalkNet: Audio-visual active speaker detection Model

A programming language with logic of Python, and syntax of all languages.

DensePhrases provides answers to your natural language questions from the entire Wikipedia in real-time

A linter to manage all your python exceptions and try/except blocks (limited only for those who like dinosaurs).

Code repository for "It's About Time: Analog clock Reading in the Wild"

Facilitating the design, comparison and sharing of deep text matching models.

NLP library designed for reproducible experimentation management

This project converts your human voice input to its text transcript and to an automated voice too.

Baseline code for Korean open domain question answering(ODQA)

This repository contains the code for "Generating Datasets with Pretrained Language Models".

Transformer - A TensorFlow Implementation of the Transformer: Attention Is All You Need

A very simple framework for state-of-the-art Natural Language Processing (NLP)

A python project made to generate code using either OpenAI's codex or GPT-J (Although not as good as codex)

Course project of [email protected]

Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.