Machine learning models from Singapore's NLP research community

Last update: Dec 17, 2022

Related tags

Overview

SG-NLP

Machine learning models from Singapore's natural language processing (NLP) research community.

sgnlp is a Python package that allows you to easily get started on using various (NLP) models implemented using the Pytorch and Transfromers frameworks.

We have an accompanying demo site where you can interact with our models and get a better understanding on how they work.

Installation

Python >= 3.8

pip install sgnlp

Documentation

Visit our documentation for tutorials.

License

Code and models from this project are released under the MIT License unless otherwise stated. If a model's code is under a separate license, it can be found in the respective model's folder.

Comments

Change demo api to use gevent worker
Using multiple workers of the default type 'sync' in gunicorn is not working on Kubernetes

Workers constantly terminated due to signal 9

Try gevent to see if it works out
opened by jonheng 2
UFD use case tutorial and usability improvement
Added additional tutorial on how to use UFD to train and evaluate on custom dataset

Bug fix for UFD parse_args_and_load_config util function

Added feature to create folder if folder doesn't exist

Added some train args param in eval args param to improve usability

Made caching optional

Added validation to make debugging easier

Added links to config file examples for reccon models
opened by vincenttzc 1
Wrong assert comparison for SenticGCN dataclass
Latest SenticGCN implementation for the Dev branch. In the dataclass.py, post_init method in SenticGCNTrainArgs, there are the following assertions,

assert self.repeats > 1, "Repeats value must be at least 1." assert self.patience > 1, "Patience value must be at least 1."

The comparison operator should be >= instead.
bug
opened by raymondng76 0
47 centralized logging
Create a centralized logger for 'sgnlp' base logger

'sgnlp' logger is created from a config json and is init a the 'sgnlp' module init.py

Replace all logging method call with their own script specific logger
opened by raymondng76 0
Add parent class for preprocessor
[x] Create a module named sgnlp.base

[x] Add abstractmethods for preprocess, save, load

[x] Add batch iteration to parent __call__

[x] Parent __call__ should return a dictionary

enhancement
opened by jonheng 0
46 senticgcn bugfix
Add multi-word aspect support

Update documentation to reflect multi-word support

Update unit tests

Update usage example to include multi-word support
opened by raymondng76 0
Fix multi-word aspect issue with Sentic-GCN preprocessor

The current implementation of preprocessor matches a single aspect index for the purpose of matching postprocessor output. The aspect index field for process_input payload should be expended to handle aspects with multiple indexes.
bug

opened by raymondng76 0
Add Sentic-GCN demo_api to SGNlp
Close #43

This pull request is to add Sentic-GCN demo_api models to sgnlp. Includes the follow components:

model_card

api.py

dockerfiles

requirements.txt

usage.py
opened by K-WeiMing 0
Add Sentic-GCN to SGNlp
close #41

This pull request is to add Sentic-GCN models to sgnlp. Includes the follow components:

Models

Configs

Tokenizers

Embedding models

Trainer/Evaluator

Unit test

documentation

Does not include demo_api as it is covered in another issue tickets.
opened by raymondng76 0
download_pretrained for demo API does not cache downloaded files/models
To allow the containers to start up quicker, models and files were downloaded and cached during build time.

Recent changes in the huggingface transformers package has broken this functionality:

Released in v4.22.0

Issue

Possible choices moving forward:

Write a simple caching utility function

Stick to versions of transformers before 4.22.0
opened by jonheng 0
Add Stance Detection model

Paper: https://aclanthology.org/2020.emnlp-main.108.pdf

Prof: Jiang Jing from SMU

Repo: GitHub - jefferyYu/DualHierarchicalTransformer: Predicting Stance and Rumor Veracity via Dual Hierarchical Transformer

opened by atenzer 0

Releases(v0.4.0)

v0.4.0(Oct 7, 2022)

New model: Coherence Momentum Model
Source code(tar.gz)
Source code(zip)
v0.3.0(Apr 22, 2022)
New models:

Sentic GCN

LIF

UFD

Source code(tar.gz)
Source code(zip)
v0.2.0(Oct 19, 2021)
New models:

RST Pointer

GEC

Source code(tar.gz)
Source code(zip)
v0.1.1(Aug 26, 2021)

Bug fix on rumour detection module paths
Source code(tar.gz)
Source code(zip)
v0.1.0(Aug 26, 2021)

Removed UFD for further review.

Refactoring and improvements to LSR and Rumour detection models.
Source code(tar.gz)
Source code(zip)
v0.0.1(Aug 5, 2021)
Initial release of sgnlp.

Models included:

RECCON

LSR

UFD

Rumour detection twitter

Source code(tar.gz)
Source code(zip)

Owner

AI Singapore | AI Makerspace

Grow local AI talents and empowering start-ups, SMEs and enterprises with AI components, frameworks, platforms and advisory services.

GitHub Repository

CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

CPT This repository contains code and checkpoints for CPT. CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Gener

342 Jan 05, 2023

This is the writeup of all the challenges from Advent-of-cyber-2019 of TryHackMe

Advent-of-cyber-2019-writeup This is the writeup of all the challenges from Advent-of-cyber-2019 of TryHackMe https://tryhackme.com/shivam007/badges/c

5 Jul 17, 2022

Continuously update some NLP practice based on different tasks.

NLP_practice We will continuously update some NLP practice based on different tasks. prerequisites Software pytorch = 1.10 torchtext = 0.11.0 sklear

0 Jan 05, 2022

Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks

Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks, which modifies the input text with a textual template and directly uses PLMs to conduct pre

2.3k Jan 08, 2023

Torchrecipes provides a set of reproduci-able, re-usable, ready-to-run RECIPES for training different types of models, across multiple domains, on PyTorch Lightning.

Recipes are a standard, well supported set of blueprints for machine learning engineers to rapidly train models using the latest research techniques without significant engineering overhead.Specifica

193 Dec 28, 2022

Speach Recognitions

easy_meeting Добро пожаловать в интерфейс сервиса автопротоколирования совещаний Easy Meeting. Website - http://cf5c-62-192-251-83.ngrok.io/ Принципиа

3 Feb 18, 2022

Speech Recognition for Uyghur using Speech transformer

Speech Recognition for Uyghur using Speech transformer Training: this model using CTC loss and Cross Entropy loss for training. Download pretrained mo

11 Nov 17, 2022

This converter will create the exact measure for your cappuccino recipe from the grandiose Rafaella Ballerini!

About CappuccinoJs This converter will create the exact measure for your cappuccino recipe from the grandiose Rafaella Ballerini! Este conversor criar

48 Nov 15, 2022

This project deals with a simplified version of a more general problem of Aspect Based Sentiment Analysis.

Aspect_Based_Sentiment_Extraction Created on: 5th Jan, 2022. This project deals with an important field of Natural Lnaguage Processing - Aspect Based

4 Jan 01, 2023

Modular and extensible speech recognition library leveraging pytorch-lightning and hydra.

Lightning ASR Modular and extensible speech recognition library leveraging pytorch-lightning and hydra What is Lightning ASR • Installation • Get Star

40 Sep 19, 2022

A library for finding knowledge neurons in pretrained transformer models.

knowledge-neurons An open source repository replicating the 2021 paper Knowledge Neurons in Pretrained Transformers by Dai et al., and extending the t

96 Dec 21, 2022

LCG T-TEST USING EUCLIDEAN METHOD

This project has been created for statistical usage, purposing for determining ATL takers and nontakers using LCG ttest and Euclidean Method, especially for internal business case in Telkomsel.

2 Jan 21, 2022

Generate a cool README/About me page for your Github Profile

Github Profile README/ About Me Generator 💯 This webapp lets you build a cool README for your profile. A few inputs + ~15 mins = Your Github Profile

179 Jan 07, 2023

Common Voice Dataset explorer

Common Voice Dataset Explorer Common Voice Dataset is by Mozilla Made during huggingface finetuning week Usage pip install -r requirements.txt streaml

22 Nov 16, 2022

pkuseg多领域中文分词工具; The pkuseg toolkit for multi-domain Chinese word segmentation

pkuseg：一个多领域中文分词工具包 (English Version) pkuseg 是基于论文[Luo et. al, 2019]的工具包。其简单易用，支持细分领域分词，有效提升了分词准确度。目录主要亮点编译和安装各类分词工具包的性能对比使用方式论文引用作者常见问题及解答主要

6k Dec 29, 2022

Pipeline for chemical image-to-text competition

BMS-Molecular-Translation Introduction This is a pipeline for Bristol-Myers Squibb – Molecular Translation by Vadim Timakin and Maksim Zhdanov. We got

7 Sep 20, 2022

Source code for AAAI20 "Generating Persona Consistent Dialogues by Exploiting Natural Language Inference".

Generating Persona Consistent Dialogues by Exploiting Natural Language Inference Source code for RCDG model in AAAI20 Generating Persona Consistent Di

16 Oct 08, 2022

This repository contains helper functions which can help you generate additional data points depending on your NLP task.

NLP Albumentations For Data Augmentation This repository contains helper functions which can help you generate additional data points depending on you

6 May 22, 2022

Official implementation of Meta-StyleSpeech and StyleSpeech

Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation Dongchan Min, Dong Bok Lee, Eunho Yang, and Sung Ju Hwang This is an official code

169 Jan 05, 2023

Trains an OpenNMT PyTorch model and SentencePiece tokenizer.

Trains an OpenNMT PyTorch model and SentencePiece tokenizer. Designed for use with Argos Translate and LibreTranslate.

61 Dec 13, 2022