NLP算法

说明

此算法仓库包括文本分类、序列标注、关系抽取、文本匹配、文本相似度匹配这五个主流NLP任务，涉及到22个相关的模型算法。

框架结构

文件结构

all_models
├── Base_line
│   ├── __init__.py
│   ├── base_data_process.py
│   ├── base_evaluation.py
│   └── single_tokenizer.py
│
├── Texts_Classification
│   ├── 机器学习_文本分类
│   ├── fasttext_文本分类
│   ├── textcnn_文本分类
│   ├── lstm_文本分类
│   ├── han_文本分类
│   ├── bert_文本分类
│   └── 数据准备
│
├── Sequence_Labeling
│   ├── crf_suite
│   ├── lstm_crf
│   ├── bert_lstm_crf
│   ├── bert_mrc
│   └── 数据准备
│
├── Relation_Extraction
│   ├── CasRel
│   ├── multihead_joint_extraction
│   ├── R-bert_relation_recognition
│   ├── attention_lstm_relation_recognition
│   ├── attention_lstm_relation_recognition_for_single_sentence
│   ├── tagging_scheme_joint_extraction
│   ├── entity_extraction_bert_lstm_crf
│   └── 数据准备
│
├── Text_Matching
│   ├── DSSM
│   ├── ARC-II
│   ├── ESIM
│   ├── bert
│   └── 数据准备
│
├── Text_Similarity_Matching
│   ├── tfidf
│   ├── BM25
│   ├── pysparnn
│   └── commodity_title.txt
│
├── 记录
├── .gitignore
└── README.md

nlp基础任务

Related tags

Overview

NLP算法

说明

框架结构

文件结构

Owner

zuxinqi

File-based TF-IDF: Calculates keywords in a document, using a word corpus.

A notebook that shows how to import the IITB English-Hindi Parallel Corpus from the HuggingFace datasets repository

gaiic2021-track3-小布助手对话短文本语义匹配复赛rank3、决赛rank4

Active learning for text classification in Python

A complete NLP guideline for enthusiasts

The code for two papers: Feedback Transformer and Expire-Span.

Auto translate textbox from Japanese to English or Indonesia

Tools, wrappers, etc... for data science with a concentration on text processing

Open-source offline translation library written in Python. Uses OpenNMT for translations

Wrapper to display a script output or a text file content on the desktop in sway or other wlroots-based compositors

Research code for the paper "Fine-tuning wav2vec2 for speaker recognition"

Course project of [email protected]

Basic yet complete Machine Learning pipeline for NLP tasks

Some embedding layer implementation using ivy library

结巴中文分词

Code for the paper "Language Models are Unsupervised Multitask Learners"

Ecco is a python library for exploring and explaining Natural Language Processing models using interactive visualizations.

Code for paper "Which Training Methods for GANs do actually Converge? (ICML 2018)"

Chinese named entity recognization (bert/roberta/macbert/bert_wwm with Keras)

Black for Python docstrings and reStructuredText (rst).