Deep learning for NLP crash course at ABBYY.

Overview

Deep NLP Course at ABBYY

Deep learning for NLP crash course at ABBYY.

Suggested textbook: Neural Network Methods in Natural Language Processing by Yoav Goldberg

I'm gradually updating and translating the notebooks right now. Stay in touch.

Materials

Week 1: Introduction

Sentiment analysis on the IMDB movie review dataset: a short overview of classical machine learning for NLP + indecently brief intro to keras.

Russian version: Open In Colab

Updated English version: Open In Colab

Week 2: Word Embeddings: Part 1

Meet the Word Embeddings: an unsupervised method to capture some fun relationships between words.
Phrases similarity with word embeddings model + word based machine translation without parallel data (with MUSE word embeddings).

Russian version: Open In Colab

Updated English version: Open In Colab

Week 3: Word Embeddings: Part 2

Introduction to PyTorch. Implementation of pet linear regression on pure numpy and pytorch. Implementations of CBoW, skip-gram, negative sampling and structured Word2vec models.

Russian version: Open In Colab

Updated English version: Open In Colab

Week 4: Convolutional Neural Networks

Introduction to convolutional networks. Relations between convolutions and n-grams. Simple surname detector on character-level convolutions + fun visualizations.

Russian version: Open In Colab

Updated English version: Open In Colab

Week 5: RNNs: Part 1

RNNs for text classification. Simple RNN implementation + memorization test. Surname detector in multilingual setup: character-level LSTM classifier.

Russian version: Open In Colab

Updated English version: Open In Colab

Week 6: RNNs: Part 2

RNNs for sequence labelling. Part-of-speech tagger implementations based on word embeddings and character-level word embeddings.

Russian version: Open In Colab

Week 7: Language Models: Part 1

Character-level language model for Russian troll tweets generation: fixed-window model via convolutions and RNN model.
Simple conditional language model: surname generation given source language.
And Toxic Comment Classification Challenge - to apply your skills to a real-world problem.

Russian version: Open In Colab

Week 8: Language Models: Part 2

Word-level language model for poetry generation. Pet examples of transfer learning and multi-task learning applied to language models.

Russian version: Open In Colab

Week 9: Seq2seq

Seq2seq for machine translation and image captioning. Byte-pair encoding, beam search and other usefull stuff for machine translation.

Russian version: Open In Colab

Week 10: Seq2seq with Attention

Seq2seq with attention for machine translation and image captioning.

Russian version: Open In Colab

Week 11: Transformers & Text Summarization

Implementation of Transformer model for text summarization. Discussion of Pointer-Generator Networks for text summarization.

Russian version: Open In Colab

Week 12: Dialogue Systems: Part 1

Goal-orientied dialogue systems. Implemention of the multi-task model: intent classifier and token tagger for dialogue manager.

Russian version: Open In Colab

Week 13: Dialogue Systems: Part 2

General conversation dialogue systems and DSSMs. Implementation of question answering model on SQuAD dataset and chit-chat model on OpenSubtitles dataset.

Russian version: Open In Colab

Week 14: Pretrained Models

Pretrained models for various tasks: Universal Sentence Encoder for sentence similarity, ELMo for sequence tagging (with a bit of CRF), BERT for SWAG - reasoning about possible continuation.

Russian version: Open In Colab

Final Presentation

NLP Summary - summary of cool stuff that appeared and didn't in the course.

Owner
Dan Anastasyev
Dan Anastasyev
Idea is to build a model which will take keywords as inputs and generate sentences as outputs.

keytotext Idea is to build a model which will take keywords as inputs and generate sentences as outputs. Potential use case can include: Marketing Sea

Gagan Bhatia 364 Jan 03, 2023
Named-entity recognition using neural networks. Easy-to-use and state-of-the-art results.

NeuroNER NeuroNER is a program that performs named-entity recognition (NER). Website: neuroner.com. This page gives step-by-step instructions to insta

Franck Dernoncourt 1.6k Dec 27, 2022
Deploying a Text Summarization NLP use case on Docker Container Utilizing Nvidia GPU

GPU Docker NLP Application Deployment Deploying a Text Summarization NLP use case on Docker Container Utilizing Nvidia GPU, to setup the enviroment on

Ritesh Yadav 9 Oct 14, 2022
Research code for "What to Pre-Train on? Efficient Intermediate Task Selection", EMNLP 2021

efficient-task-transfer This repository contains code for the experiments in our paper "What to Pre-Train on? Efficient Intermediate Task Selection".

AdapterHub 26 Dec 24, 2022
🐍 A hyper-fast Python module for reading/writing JSON data using Rust's serde-json.

A hyper-fast, safe Python module to read and write JSON data. Works as a drop-in replacement for Python's built-in json module. This is alpha software

Matthias 479 Jan 01, 2023
Code for paper "Which Training Methods for GANs do actually Converge? (ICML 2018)"

GAN stability This repository contains the experiments in the supplementary material for the paper Which Training Methods for GANs do actually Converg

Lars Mescheder 884 Nov 11, 2022
An end to end ASR Transformer model training repo

END TO END ASR TRANSFORMER 本项目基于transformer 6*encoder+6*decoder的基本结构构造的端到端的语音识别系统 Model Instructions 1.数据准备: 自行下载数据,遵循文件结构如下: ├── data │ ├── train │

旷视天元 MegEngine 10 Jul 19, 2022
REST API for sentence tokenization and embedding using Multilingual Universal Sentence Encoder.

What is MUSE? MUSE stands for Multilingual Universal Sentence Encoder - multilingual extension (16 languages) of Universal Sentence Encoder (USE). MUS

Dani El-Ayyass 47 Sep 05, 2022
Model for recasing and repunctuating ASR transcripts

Recasing and punctuation model based on Bert Benoit Favre 2021 This system converts a sequence of lowercase tokens without punctuation to a sequence o

Benoit Favre 88 Dec 29, 2022
Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": UNITER adversarial training part

VILLA: Vision-and-Language Adversarial Training This is the official repository of VILLA (NeurIPS 2020 Spotlight). This repository currently supports

Zhe Gan 109 Dec 31, 2022
Sentence Embeddings with BERT & XLNet

Sentence Transformers: Multilingual Sentence Embeddings using BERT / RoBERTa / XLM-RoBERTa & Co. with PyTorch This framework provides an easy method t

Ubiquitous Knowledge Processing Lab 9.1k Jan 02, 2023
Legal text retrieval for python

legal-text-retrieval Overview This system contains 2 steps: generate training data containing negative sample found by mixture score of cosine(tfidf)

Nguyễn Minh Phương 22 Dec 06, 2022
Ongoing research training transformer language models at scale, including: BERT & GPT-2

Megatron (1 and 2) is a large, powerful transformer developed by the Applied Deep Learning Research team at NVIDIA.

NVIDIA Corporation 3.5k Dec 30, 2022
Predicting the usefulness of reviews given the review text and metadata surrounding the reviews.

Predicting Yelp Review Quality Table of Contents Introduction Motivation Goal and Central Questions The Data Data Storage and ETL EDA Data Pipeline Da

Jeff Johannsen 3 Nov 27, 2022
中文生成式预训练模型

T5 PEGASUS 中文生成式预训练模型,以mT5为基础架构和初始权重,通过类似PEGASUS的方式进行预训练。 详情可见:https://kexue.fm/archives/8209 Tokenizer 我们将T5 PEGASUS的Tokenizer换成了BERT的Tokenizer,它对中文更

410 Jan 03, 2023
Kurumi ChatBot

KurumiChatBot Just another Telegram AI chat bot written in Python using Pyrogram. A public running instance can be found on telegram as @TokisakiChatB

Yoga Pranata 3 Jun 28, 2022
Multilingual word vectors in 78 languages

Aligning the fastText vectors of 78 languages Facebook recently open-sourced word vectors in 89 languages. However these vectors are monolingual; mean

Babylon Health 1.2k Dec 17, 2022
Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.

Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.

Ubiquitous Knowledge Processing Lab 59 Dec 01, 2022
💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Rasa Open Source Rasa is an open source machine learning framework to automate text-and voice-based conversations. With Rasa, you can build contextual

Rasa 15.3k Dec 30, 2022
Knowledge Graph,Question Answering System,基于知识图谱和向量检索的医疗诊断问答系统

Knowledge Graph,Question Answering System,基于知识图谱和向量检索的医疗诊断问答系统

wangle 823 Dec 28, 2022