Basic yet complete Machine Learning pipeline for NLP tasks

Last update: Aug 22, 2022

Related tags

Text Data & NLP ml-pipeline

Overview

Basic yet complete Machine Learning pipeline for NLP tasks

This repository accompanies the article on building basic yet complete ML pipelines for solving NLP tasks.

Requirements

Docker

telnet

Please refer to installation instructions for your system if needed.

Running the pipeline

The whole pipeline of 4 services (mail server, database, prediction service and orchestrator) can be started with one command:

docker-compose -f docker-compose.yaml up --build

It should start printing log messages from the services.

Sending an email

The pipeline is triggered by an unread email appearing in the mailbox. In order to send one, telnet util can be used.

Connecting to the IMAP mail server: telnet localhost 3025

Sending the email with telnet:

EHLO user
MAIL FROM:<[email protected]>
RCPT TO:<user>
DATA
Subject: Hello World
 
Hello!

She works at Apple now but before that she worked at Microsoft.
.
QUIT

If everything went well, something like this should appear in logs:

orchestrator_1                   | Polling mailbox...
prediction-worker_1              | INFO:     172.19.0.5:55294 - "POST /predict HTTP/1.1" 200 OK
orchestrator_1                   | Recorded to DB with id=34: [{'entity_text': 'Apple', 'start': 24, 'end': 29}, {'entity_text': 'Microsoft', 'start': 58, 'end': 67}]

Checking the result

The data must also be recorded to the database. In order to check that, any DB client can be used with the following connection parameters:

host: localhost
port: 5432
database: maildb
username: pguser
pasword: password

and running SELECT * FROM mail LIMIT 10 query.

Basic yet complete Machine Learning pipeline for NLP tasks

Related tags

Overview

Basic yet complete Machine Learning pipeline for NLP tasks

Requirements

Running the pipeline

Running the pipeline

Sending an email

Checking the result

Owner

Ivan

DensePhrases provides answers to your natural language questions from the entire Wikipedia in real-time

使用pytorch+transformers复现了SimCSE论文中的有监督训练和无监督训练方法

:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

Write Python in Urdu - اردو میں کوڈ لکھیں

無料で使える中品質なテキスト読み上げソフトウェア、VOICEVOXの音声合成エンジン

A Telegram bot to add notes to Flomo.

Scikit-learn style model finetuning for NLP

Tools for curating biomedical training data for large-scale language modeling

Recognition of 38 speech commands in russian. Based on Yandex Cup 2021 ML Challenge: ASR

BERT Attention Analysis

aMLP Transformer Model for Japanese

Command Line Text-To-Speech using Google TTS

Code from the paper "High-Performance Brain-to-Text Communication via Handwriting"

Natural Language Processing for Adverse Drug Reaction (ADR) Detection

Codename generator using WordNet parts of speech database

Deep Learning for Natural Language Processing - Lectures 2021

Unsupervised Language Modeling at scale for robust sentiment classification

STS Benchmark comprises a selection of the English datasets used in the STS tasks organized in the context of SemEval between 2012 and 2017. The selection of datasets include text from image captions, news headlines and user forums.

Code for "Parallel Instance Query Network for Named Entity Recognition", accepted at ACL 2022.

Big Bird: Transformers for Longer Sequences