Interpretable Models for NLP using PyTorch

Overview

This repo is deprecated. Please find the updated package here.

https://github.com/EdGENetworks/anuvada

Anuvada: Interpretable Models for NLP using PyTorch

One of the common criticisms of deep learning has been it's black box nature. To address this issue, researchers have developed many ways to visualise and explain the inference. Some examples would be attention in the case of RNN's, activation maps, guided back propagation and occlusion (in the case of CNN's). This library is an ongoing effort to provide a high-level access to such models relying on PyTorch.

Installing

Clone this repo and add it to your python library path.

Getting started

Importing libraries

import anuvada
import numpy as np
import torch
import pandas as pd
from anuvada.models.classification_attention_rnn import AttentionClassifier

Creating the dataset

from anuvada.datasets.data_loader import CreateDataset
from anuvada.datasets.data_loader import LoadData
data = CreateDataset()
df = pd.read_csv('MovieSummaries/movie_summary_filtered.csv')
# passing only the first 512 samples, I don't have a GPU!
y = list(df.Genre.values)[0:512]
x = list(df.summary.values)[0:512]
x, y = data.create_dataset(x,y, folder_path='test', max_doc_tokens=500)

Loading created dataset

l = LoadData()
x, y, token2id, label2id, lengths_mask = l.load_data_from_path('test')

Change into torch vectors

x = torch.from_numpy(x)
y = torch.from_numpy(y)

Create attention classifier

acf = AttentionClassifier(vocab_size=len(token2id),embed_size=25,gru_hidden=25,n_classes=len(label2id))
loss = acf.fit(x,y, lengths_mask ,epochs=5)
Epoch 1 / 5
[========================================] 100%	loss: 3.9904loss: 3.9904

Epoch 2 / 5
[========================================] 100%	loss: 3.9851loss: 3.9851

Epoch 3 / 5
[========================================] 100%	loss: 3.9783loss: 3.9783

Epoch 4 / 5
[========================================] 100%	loss: 3.9739loss: 3.9739

Epoch 5 / 5
[========================================] 100%	loss: 3.9650loss: 3.9650

To do list

  • Implement Attention with RNN
  • Implement Attention Visualisation
  • Implement working Fit Module
  • Implement support for masking gradients in RNN (Working now!)
  • Implement a generic data set loader
  • Implement CNN Classifier with feature map visualisation

Acknowledgments

Owner
Sandeep Tammu
Data Scientist.
Sandeep Tammu
ConvBERT: Improving BERT with Span-based Dynamic Convolution

ConvBERT Introduction In this repo, we introduce a new architecture ConvBERT for pre-training based language model. The code is tested on a V100 GPU.

YITUTech 237 Dec 10, 2022
Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Pattern Pattern is a web mining module for Python. It has tools for: Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM par

Computational Linguistics Research Group 8.4k Dec 30, 2022
Contact Extraction with Question Answering.

contactsQA Extraction of contact entities from address blocks and imprints with Extractive Question Answering. Goal Input: Dr. Max Mustermann Hauptstr

Jan 2 Apr 20, 2022
Production First and Production Ready End-to-End Keyword Spotting Toolkit

Production First and Production Ready End-to-End Keyword Spotting Toolkit

223 Jan 02, 2023
Python api wrapper for JellyFish Lights

Python api wrapper for JellyFish Lights The hope is to make this a pip installable package Current capabalilities: Connects to a local JellyFish Light

10 Dec 18, 2022
A number of methods in order to perform Natural Language Processing on live data derived from Twitter

A number of methods in order to perform Natural Language Processing on live data derived from Twitter

1 Nov 24, 2021
Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS)

TOPSIS implementation in Python Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) CHING-LAI Hwang and Yoon introduced TOPSIS

Hamed Baziyad 8 Dec 10, 2022
Train BPE with fastBPE, and load to Huggingface Tokenizer.

BPEer Train BPE with fastBPE, and load to Huggingface Tokenizer. Description The BPETrainer of Huggingface consumes a lot of memory when I am training

Lizhuo 1 Dec 23, 2021
NLP codes implemented with Pytorch (w/o library such as huggingface)

NLP_scratch NLP codes implemented with Pytorch (w/o library such as huggingface) scripts ├── models: Neural Network models ├── data: codes for dataloa

3 Dec 28, 2021
Transformers implementation for Fall 2021 Clinic

Installation Download miniconda3 if not already installed You can check by running typing conda in command prompt. Use conda to create an environment

Aakash Tripathi 1 Oct 28, 2021
Python generation script for BitBirds

BitBirds generation script Intro This is published under MIT license, which means you can do whatever you want with it - entirely at your own risk. Pl

286 Dec 06, 2022
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

ELECTRA Introduction ELECTRA is a method for self-supervised language representation learning. It can be used to pre-train transformer networks using

Google Research 2.1k Dec 28, 2022
A collection of Classical Chinese natural language processing models, including Classical Chinese related models and resources on the Internet.

GuwenModels: 古文自然语言处理模型合集, 收录互联网上的古文相关模型及资源. A collection of Classical Chinese natural language processing models, including Classical Chinese related models and resources on the Internet.

Ethan 66 Dec 26, 2022
Code associated with the "Data Augmentation using Pre-trained Transformer Models" paper

Data Augmentation using Pre-trained Transformer Models Code associated with the Data Augmentation using Pre-trained Transformer Models paper Code cont

44 Dec 31, 2022
🤗 The largest hub of ready-to-use NLP datasets for ML models with fast, easy-to-use and efficient data manipulation tools

🤗 The largest hub of ready-to-use NLP datasets for ML models with fast, easy-to-use and efficient data manipulation tools

Hugging Face 15k Jan 02, 2023
QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries

Moment-DETR QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries Jie Lei, Tamara L. Berg, Mohit Bansal For dataset de

Jie Lei 雷杰 133 Dec 22, 2022
nlabel is a library for generating, storing and retrieving tagging information and embedding vectors from various nlp libraries through a unified interface.

nlabel is a library for generating, storing and retrieving tagging information and embedding vectors from various nlp libraries through a unified interface.

Bernhard Liebl 2 Jun 10, 2022
Cải thiện Elasticsearch trong bài toán semantic search sử dụng phương pháp Sentence Embeddings

Cải thiện Elasticsearch trong bài toán semantic search sử dụng phương pháp Sentence Embeddings Trong bài viết này mình sẽ sử dụng pretrain model SimCS

Vo Van Phuc 18 Nov 25, 2022
PyJPBoatRace: Python-based Japanese boatrace tools 🚤

pyjpboatrace :speedboat: provides you with useful tools for data analysis and auto-betting for boatrace.

5 Oct 29, 2022