Machine learning models from Singapore's NLP research community

Last update: Dec 17, 2022

Related tags

Overview

SG-NLP

Machine learning models from Singapore's natural language processing (NLP) research community.

sgnlp is a Python package that allows you to easily get started on using various (NLP) models implemented using the Pytorch and Transfromers frameworks.

We have an accompanying demo site where you can interact with our models and get a better understanding on how they work.

Installation

Python >= 3.8

pip install sgnlp

Documentation

Visit our documentation for tutorials.

License

Code and models from this project are released under the MIT License unless otherwise stated. If a model's code is under a separate license, it can be found in the respective model's folder.

Comments

Change demo api to use gevent worker
Using multiple workers of the default type 'sync' in gunicorn is not working on Kubernetes

Workers constantly terminated due to signal 9

Try gevent to see if it works out
opened by jonheng 2
UFD use case tutorial and usability improvement
Added additional tutorial on how to use UFD to train and evaluate on custom dataset

Bug fix for UFD parse_args_and_load_config util function

Added feature to create folder if folder doesn't exist

Added some train args param in eval args param to improve usability

Made caching optional

Added validation to make debugging easier

Added links to config file examples for reccon models
opened by vincenttzc 1
Wrong assert comparison for SenticGCN dataclass
Latest SenticGCN implementation for the Dev branch. In the dataclass.py, post_init method in SenticGCNTrainArgs, there are the following assertions,

assert self.repeats > 1, "Repeats value must be at least 1." assert self.patience > 1, "Patience value must be at least 1."

The comparison operator should be >= instead.
bug
opened by raymondng76 0
47 centralized logging
Create a centralized logger for 'sgnlp' base logger

'sgnlp' logger is created from a config json and is init a the 'sgnlp' module init.py

Replace all logging method call with their own script specific logger
opened by raymondng76 0
Add parent class for preprocessor
[x] Create a module named sgnlp.base

[x] Add abstractmethods for preprocess, save, load

[x] Add batch iteration to parent __call__

[x] Parent __call__ should return a dictionary

enhancement
opened by jonheng 0
46 senticgcn bugfix
Add multi-word aspect support

Update documentation to reflect multi-word support

Update unit tests

Update usage example to include multi-word support
opened by raymondng76 0
Fix multi-word aspect issue with Sentic-GCN preprocessor

The current implementation of preprocessor matches a single aspect index for the purpose of matching postprocessor output. The aspect index field for process_input payload should be expended to handle aspects with multiple indexes.
bug

opened by raymondng76 0
Add Sentic-GCN demo_api to SGNlp
Close #43

This pull request is to add Sentic-GCN demo_api models to sgnlp. Includes the follow components:

model_card

api.py

dockerfiles

requirements.txt

usage.py
opened by K-WeiMing 0
Add Sentic-GCN to SGNlp
close #41

This pull request is to add Sentic-GCN models to sgnlp. Includes the follow components:

Models

Configs

Tokenizers

Embedding models

Trainer/Evaluator

Unit test

documentation

Does not include demo_api as it is covered in another issue tickets.
opened by raymondng76 0
download_pretrained for demo API does not cache downloaded files/models
To allow the containers to start up quicker, models and files were downloaded and cached during build time.

Recent changes in the huggingface transformers package has broken this functionality:

Released in v4.22.0

Issue

Possible choices moving forward:

Write a simple caching utility function

Stick to versions of transformers before 4.22.0
opened by jonheng 0
Add Stance Detection model

Paper: https://aclanthology.org/2020.emnlp-main.108.pdf

Prof: Jiang Jing from SMU

Repo: GitHub - jefferyYu/DualHierarchicalTransformer: Predicting Stance and Rumor Veracity via Dual Hierarchical Transformer

opened by atenzer 0

Releases(v0.4.0)

v0.4.0(Oct 7, 2022)

New model: Coherence Momentum Model
Source code(tar.gz)
Source code(zip)
v0.3.0(Apr 22, 2022)
New models:

Sentic GCN

LIF

UFD

Source code(tar.gz)
Source code(zip)
v0.2.0(Oct 19, 2021)
New models:

RST Pointer

GEC

Source code(tar.gz)
Source code(zip)
v0.1.1(Aug 26, 2021)

Bug fix on rumour detection module paths
Source code(tar.gz)
Source code(zip)
v0.1.0(Aug 26, 2021)

Removed UFD for further review.

Refactoring and improvements to LSR and Rumour detection models.
Source code(tar.gz)
Source code(zip)
v0.0.1(Aug 5, 2021)
Initial release of sgnlp.

Models included:

RECCON

LSR

UFD

Rumour detection twitter

Source code(tar.gz)
Source code(zip)

Owner

AI Singapore | AI Makerspace

Grow local AI talents and empowering start-ups, SMEs and enterprises with AI components, frameworks, platforms and advisory services.

GitHub Repository

nlpcommon is a python Open Source Toolkit for text classification.

nlpcommon nlpcommon, Python Text Tool. Guide Feature Install Usage Dataset Contact Cite Reference Feature nlpcommon is a python Open Source

3 May 29, 2022

This repository consists of a complete guide on natural language processing (NLP) in Python where we'll learn various techniques for implementing NLP including parsing & text processing and understand how to use NLP for text feature engineering.

Python_Natural_Language_Processing This repository contains tutorials on important topics related to Natural Language Processing (NPL). No. Name 01 01

170 Dec 13, 2022

Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

T5: Text-To-Text Transfer Transformer The t5 library serves primarily as code for reproducing the experiments in Exploring the Limits of Transfer Lear

4.6k Jan 01, 2023

Simple multilingual lemmatizer for Python, especially useful for speed and efficiency

Simplemma: a simple multilingual lemmatizer for Python Purpose Lemmatization is the process of grouping together the inflected forms of a word so they

70 Dec 29, 2022

Coreference resolution for English, German and Polish, optimised for limited training data and easily extensible for further languages

Coreferee Author: Richard Paul Hudson, msg systems ag 1. Introduction 1.1 The basic idea 1.2 Getting started 1.2.1 English 1.2.2 German 1.2.3 Polish 1

169 Dec 21, 2022

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Rasa Open Source Rasa is an open source machine learning framework to automate text-and voice-based conversations. With Rasa, you can build contextual

15.3k Jan 03, 2023

Machine learning models from Singapore's NLP research community

Related tags

Overview

SG-NLP

Installation

Documentation

License

Comments

Releases(v0.4.0)

v0.4.0(Oct 7, 2022)

v0.3.0(Apr 22, 2022)

v0.2.0(Oct 19, 2021)

v0.1.1(Aug 26, 2021)

v0.1.0(Aug 26, 2021)

v0.0.1(Aug 5, 2021)

Owner

AI Singapore | AI Makerspace

nlpcommon is a python Open Source Toolkit for text classification.

This repository consists of a complete guide on natural language processing (NLP) in Python where we'll learn various techniques for implementing NLP including parsing & text processing and understand how to use NLP for text feature engineering.

Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

Simple multilingual lemmatizer for Python, especially useful for speed and efficiency

Coreference resolution for English, German and Polish, optimised for limited training data and easily extensible for further languages

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Vad-sli-asr - A Python scripts for a speech processing pipeline with Voice Activity Detection (VAD)

An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.

An easier way to build neural search on the cloud

Main repository for the chatbot Bobotinho.

Training and evaluation codes for the BertGen paper (ACL-IJCNLP 2021)

Blackstone is a spaCy model and library for processing long-form, unstructured legal text

text to speech toolkit. 好用的中文语音合成工具箱，包含语音编码器、语音合成器、声码器和可视化模块。

A framework for cleaning Chinese dialog data

⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x using fastT5.

This is a really simple text-to-speech app made with python and tkinter.

ConvBERT: Improving BERT with Span-based Dynamic Convolution

Text-to-Speech for Belarusian language

This repository has a implementations of data augmentation for NLP for Japanese.

Extracting Summary Knowledge Graphs from Long Documents