nlpcommon is a python Open Source Toolkit for text classification.

Overview

nlpcommon

PyPI version Contributions welcome GitHub contributors License Apache 2.0 python_vesion GitHub issues Wechat Group

nlpcommon, Python Text Tool. Python3开发。

Guide

Feature

nlpcommon is a python Open Source Toolkit for text classification. The goal is to implement text analysis algorithm, so as to achieve the use in the production environment.

nlpcommon has the characteristics of clear algorithm, high performance and customizable corpus.

Functions:

Classifier

  • LogisticRegression
  • Random Forest
  • Decision Tree
  • K-Nearest Neighbours
  • Naive bayes
  • Xgboost
  • Support Vector Machine(SVM)
  • TextCNN
  • TextRNN_Att
  • Fasttext
  • Bert

Cluster

  • MiniBatchKmeans

While providing rich functions, nlpcommon internal modules adhere to low coupling, model adherence to inert loading, dictionary publication, and easy to use.

Install

  • Requirements and Installation
pip3 install nlpcommon

or

git clone https://github.com/shibing624/nlpcommon.git
cd nlpcommon
python3 setup.py install

Usage

data

Stopwrods

examples/base_demo.py:

import sys

sys.path.append('..')
from nlpcommon import stopwords

if __name__ == '__main__':
    print(len(stopwords), stopwords)

output:

2438 {'', '大家', '孰知', '至于', './', '知道', '二话没说', '一何', '从宽', 'especially' ... }

Contact

  • Issue(建议):GitHub issues
  • 邮件我:xuming: [email protected]
  • 微信我:加我微信号:xuming624, 进Python-NLP交流群,备注:姓名-公司名-NLP

Cite

如果你在研究中使用了nlpcommon,请按如下格式引用:

@software{nlpcommon,
  author = {Xu Ming},
  title = {nlpcommon: A Tool for Text NLP},
  year = {2021},
  url = {https://github.com/shibing624/nlpcommon},
}

License

授权协议为 The Apache License 2.0,可免费用做商业用途。请在产品说明中附加nlpcommon的链接和授权协议。

Contribute

项目代码还很粗糙,如果大家对代码有所改进,欢迎提交回本项目,在提交之前,注意以下两点:

  • tests添加相应的单元测试
  • 使用python setup.py test来运行所有单元测试,确保所有单测都是通过的

之后即可提交PR。

Reference

  • pytextclassifier
Owner
xuming
Researcher, Machine Learning Developer, Advertising Risk Control.
xuming
Simple program that translates the name of files into English

Simple program that translates the name of files into English. Useful for when editing/inspecting programs that were developed in a foreign language.

0 Dec 22, 2021
NLP Overview

NLP-Overview Introduction The field of NPL encompasses a variety of topics which involve the computational processing and understanding of human langu

PeterPham 1 Jan 13, 2022
Full Spectrum Bioinformatics - a free online text designed to introduce key topics in Bioinformatics using the Python

Full Spectrum Bioinformatics is a free online text designed to introduce key topics in Bioinformatics using the Python programming language. The text is written in interactive Jupyter Notebooks, whic

Jesse Zaneveld 33 Dec 28, 2022
🤗🖼️ HuggingPics: Fine-tune Vision Transformers for anything using images found on the web.

🤗 🖼️ HuggingPics Fine-tune Vision Transformers for anything using images found on the web. Check out the video below for a walkthrough of this proje

Nathan Raw 185 Dec 21, 2022
Implemented shortest-circuit disambiguation, maximum probability disambiguation, HMM-based lexical annotation and BiLSTM+CRF-based named entity recognition

Implemented shortest-circuit disambiguation, maximum probability disambiguation, HMM-based lexical annotation and BiLSTM+CRF-based named entity recognition

0 Feb 13, 2022
ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

Antlr Project 13.6k Jan 05, 2023
【原神】自动演奏风物之诗琴的程序

疯物之诗琴 读取midi并自动演奏原神风物之诗琴。 可以自定义配置文件自动调整音符来适配风物之诗琴。 (原神1.4直播那天就开始做了!到现在才能放出来。。) 如何使用 在Release页面中下载打包好的程序和midi压缩包并解压。 双击运行“疯物之诗琴.exe”。 在原神中打开风物之诗琴,软件内输入

435 Jan 04, 2023
Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate nearest neighbors, in Pytorch

Memorizing Transformers - Pytorch Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memori

Phil Wang 364 Jan 06, 2023
kochat

Kochat 챗봇 빌더는 성에 안차고, 자신만의 딥러닝 챗봇 애플리케이션을 만드시고 싶으신가요? Kochat을 이용하면 손쉽게 자신만의 딥러닝 챗봇 애플리케이션을 빌드할 수 있습니다. # 1. 데이터셋 객체 생성 dataset = Dataset(ood=True) #

1 Oct 25, 2021
端到端的长本文摘要模型(法研杯2020司法摘要赛道)

端到端的长文本摘要模型(法研杯2020司法摘要赛道)

苏剑林(Jianlin Su) 334 Jan 08, 2023
Codename generator using WordNet parts of speech database

codenames Codename generator using WordNet parts of speech database References: https://possiblywrong.wordpress.com/2021/09/13/code-name-generator/ ht

possiblywrong 27 Oct 30, 2022
Study German declensions (dER nettE Mann, ein nettER Mann, mit dEM nettEN Mann, ohne dEN nettEN Mann ...) Generate as many exercises as you want using the incredible power of SPACY!

Study German declensions (dER nettE Mann, ein nettER Mann, mit dEM nettEN Mann, ohne dEN nettEN Mann ...) Generate as many exercises as you want using the incredible power of SPACY!

Hans Alemão 4 Jul 20, 2022
Pangu-Alpha for Transformers

Pangu-Alpha for Transformers Usage Download MindSpore FP32 weights for GPU from here to data/Pangu-alpha_2.6B.ckpt Activate MindSpore environment and

One 5 Oct 01, 2022
jel - Japanese Entity Linker - is Bi-encoder based entity linker for japanese.

jel: Japanese Entity Linker jel - Japanese Entity Linker - is Bi-encoder based entity linker for japanese. Usage Currently, link and question methods

izuna385 10 Jan 06, 2023
Code for the paper "VisualBERT: A Simple and Performant Baseline for Vision and Language"

This repository contains code for the following two papers: VisualBERT: A Simple and Performant Baseline for Vision and Language (arxiv) with a short

Natural Language Processing @UCLA 464 Jan 04, 2023
VMD Audio/Text control with natural language

This repository is a proof of principle for performing Molecular Dynamics analysis, in this case with the program VMD, via natural language commands.

Andrew White 13 Jun 09, 2022
DANeS is an open-source E-newspaper dataset by collaboration between DATASET JSC (dataset.vn) and AIV Group (aivgroup.vn)

DANeS - Open-source E-newspaper dataset Source: Technology vector created by macrovector - www.freepik.com. DANeS is an open-source E-newspaper datase

DATASET .JSC 64 Aug 17, 2022
Code for the ACL 2021 paper "Structural Guidance for Transformer Language Models"

Structural Guidance for Transformer Language Models This repository accompanies the paper, Structural Guidance for Transformer Language Models, publis

International Business Machines 10 Dec 14, 2022
Experiments in converting wikidata to ftm

FollowTheMoney / Wikidata mappings This repo will contain tools for converting Wikidata entities into FtM schema. Prefixes: https://www.mediawiki.org/

Friedrich Lindenberg 2 Nov 12, 2021
Spacy-ginza-ner-webapi - Named Entity Recognition API with spaCy and GiNZA

Named Entity Recognition API with spaCy and GiNZA I wrote a blog post about this

Yuki Okuda 3 Feb 27, 2022