Basic yet complete Machine Learning pipeline for NLP tasks

Last update: Aug 22, 2022

Related tags

Text Data & NLP ml-pipeline

Overview

Basic yet complete Machine Learning pipeline for NLP tasks

This repository accompanies the article on building basic yet complete ML pipelines for solving NLP tasks.

Requirements

Docker

telnet

Please refer to installation instructions for your system if needed.

Running the pipeline

The whole pipeline of 4 services (mail server, database, prediction service and orchestrator) can be started with one command:

docker-compose -f docker-compose.yaml up --build

It should start printing log messages from the services.

Sending an email

The pipeline is triggered by an unread email appearing in the mailbox. In order to send one, telnet util can be used.

Connecting to the IMAP mail server: telnet localhost 3025

Sending the email with telnet:

EHLO user
MAIL FROM:<[email protected]>
RCPT TO:<user>
DATA
Subject: Hello World
 
Hello!

She works at Apple now but before that she worked at Microsoft.
.
QUIT

If everything went well, something like this should appear in logs:

orchestrator_1                   | Polling mailbox...
prediction-worker_1              | INFO:     172.19.0.5:55294 - "POST /predict HTTP/1.1" 200 OK
orchestrator_1                   | Recorded to DB with id=34: [{'entity_text': 'Apple', 'start': 24, 'end': 29}, {'entity_text': 'Microsoft', 'start': 58, 'end': 67}]

Checking the result

The data must also be recorded to the database. In order to check that, any DB client can be used with the following connection parameters:

host: localhost
port: 5432
database: maildb
username: pguser
pasword: password

and running SELECT * FROM mail LIMIT 10 query.

Basic yet complete Machine Learning pipeline for NLP tasks

Related tags

Overview

Basic yet complete Machine Learning pipeline for NLP tasks

Requirements

Running the pipeline

Running the pipeline

Sending an email

Checking the result

Owner

Ivan

AEC_DeepModel - Deep learning based acoustic echo cancellation baseline code

Modified GPT using average pooling to reduce the softmax attention memory constraints.

DomainWordsDict, Chinese words dict that contains more than 68 domains, which can be used as text classification、knowledge enhance task

A practical and feature-rich paraphrasing framework to augment human intents in text form to build robust NLU models for conversational engines. Created by Prithiviraj Damodaran. Open to pull requests and other forms of collaboration.

Translate - a PyTorch Language Library

VoiceFixer VoiceFixer is a framework for general speech restoration.

CMeEE 数据集医学实体抽取

Implementation of COCO-LM, Correcting and Contrasting Text Sequences for Language Model Pretraining, in Pytorch

CCKS-Title-based-large-scale-commodity-entity-retrieval-top1

Idea is to build a model which will take keywords as inputs and generate sentences as outputs.

:mag: Transformers at scale for question answering & neural search. Using NLP via a modular Retriever-Reader-Pipeline. Supporting DPR, Elasticsearch, HuggingFace's Modelhub...

Use fastai-v2 with HuggingFace's pretrained transformers

A python package to fine-tune transformer-based models for named entity recognition (NER).

[ICCV 2021] Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification

In this project, we compared Spanish BERT and Multilingual BERT in the Sentiment Analysis task.

Python library to make development of portfolio analysis faster and easier

Deduplication is the task to combine different representations of the same real world entity.

Programme de chiffrement et de déchiffrement inverse d'un message en python3.

Espial is an engine for automated organization and discovery of personal knowledge

Mkdocs + material + cool stuff