The Easy-to-use Dialogue Response Selection Toolkit for Researchers

Last update: Nov 13, 2022

Related tags

Deep Learning SimpleReDial-v1

Overview

Easy-to-use toolkit for retrieval-based Chatbot

Recent Activity

Our released RRS corpus can be found here.
Our released BERT-FP post-training checkpoint for the RRS corpus can be found here.

How to Use

Init the repo

Before using the repo, please run the following command to init:

# create the necessay folders
python init.py

# prepare the environment
# if some package cannot be installed, just google and install it from other ways
pip install -r requirements.txt

train the model

./scripts/train.sh <dataset_name> <model_name> <cuda_ids>

test the model [rerank]

./scripts/test_rerank.sh <dataset_name> <model_name> <cuda_id>

test the model [recal]

# different recall_modes are available: q-q, q-r
./scripts/test_recall.sh <dataset_name> <model_name> <cuda_id>

inference the responses and save into the faiss index

Somethings inference will missing data samples, please use the 1 gpu (faiss-gpu search use 1 gpu quickly)

It should be noted that: 1. For writer dataset, use extract_inference.py script to generate the inference.txt 2. For other datasets(douban, ecommerce, ubuntu), just cp train.txt inference.txt. The dataloader will automatically read the test.txt to supply the corpus.

# work_mode=response, inference the response and save into faiss (for q-r matching) [dual-bert/dual-bert-fusion]
# work_mode=context, inference the context to do q-q matching
# work_mode=gray, inference the context; read the faiss(work_mode=response has already been done), search the topk hard negative samples; remember to set the BERTDualInferenceContextDataloader in config/base.yaml
./scripts/inference.sh <dataset_name> <model_name> <cuda_ids>

If you want to generate the gray dataset for the dataset:

# 1. set the mode as the **response**, to generate the response faiss index; corresponding dataset name: BERTDualInferenceDataset;
./scripts/inference.sh <dataset_name> response <cuda_ids>

# 2. set the mode as the **gray**, to inference the context in the train.txt and search the top-k candidates as the gray(hard negative) samples; corresponding dataset name: BERTDualInferenceContextDataset
./scripts/inference.sh <dataset_name> gray <cuda_ids>

# 3. set the mode as the **gray-one2many** if you want to generate the extra positive samples for each context in the train set, the needings of this mode is the same as the **gray** work mode
./scripts/inference.sh <dataset_name> gray-one2many <cuda_ids>

If you want to generate the pesudo positive pairs, run the following commands:

# make sure the dual-bert inference dataset name is BERTDualInferenceDataset
./scripts/inference.sh <dataset_name> unparallel <cuda_ids>

deploy the rerank and recall model

# load the model on the cuda:0(can be changed in deploy.sh script)
./scripts/deploy.sh <cuda_id>

at the same time, you can test the deployed model by using:

# test_mode: recall, rerank, pipeline
./scripts/test_api.sh <test_mode> <dataset>

test the recall performance of the elasticsearch

Before testing the es recall, make sure the es index has been built:

# recall_mode: q-q/q-r
./scripts/build_es_index.sh <dataset_name> <recall_mode>

# recall_mode: q-q/q-r
./scripts/test_es_recall.sh <dataset_name> <recall_mode> 0

simcse generate the gray responses

# train the simcse model
./script/train.sh <dataset_name> simcse <cuda_ids>

# generate the faiss index, dataset name: BERTSimCSEInferenceDataset
./script/inference_response.sh <dataset_name> simcse <cuda_ids>

# generate the context index
./script/inference_simcse_response.sh <dataset_name> simcse <cuda_ids>
# generate the test set for unlikelyhood-gen dataset
./script/inference_simcse_unlikelyhood_response.sh <dataset_name> simcse <cuda_ids>

# generate the gray response
./script/inference_gray_simcse.sh <dataset_name> simcse <cuda_ids>
# generate the test set for unlikelyhood-gen dataset
./script/inference_gray_simcse_unlikelyhood.sh <dataset_name> simcse <cuda_ids>

The Easy-to-use Dialogue Response Selection Toolkit for Researchers

Related tags

Overview

Easy-to-use toolkit for retrieval-based Chatbot

Recent Activity

How to Use

Owner

GMFTBY

A super lightweight Lagrangian model for calculating millions of trajectories using ERA5 data

This Deep Learning Model Predicts that from which disease you are suffering.

Updated for TTS(CE) = Also Known as TTN V3. The code requires the first server to be 'ttn' protocol.

Real life contra a deep learning project built using mediapipe and openc

Learning Time-Critical Responses for Interactive Character Control

Code for reproducible experiments presented in KSD Aggregated Goodness-of-fit Test.

Stroke-predictions-ml-model - Machine learning model to predict individuals chances of having a stroke

PyZebrascope - an open-source Python platform for brain-wide neural activity imaging in behaving zebrafish

In real-world applications of machine learning, reliable and safe systems must consider measures of performance beyond standard test set accuracy

Generating Band-Limited Adversarial Surfaces Using Neural Networks

ADOP: Approximate Differentiable One-Pixel Point Rendering

The official implementation of CVPR 2021 Paper: Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.

Code accompanying the paper "Knowledge Base Completion Meets Transfer Learning"

AlphaBot2 Pi Core software for interfacing with the various components.

Source code of NeurIPS 2021 Paper ''Be Confident! Towards Trustworthy Graph Neural Networks via Confidence Calibration''

This is the second place solution for : UmojaHack Africa 2022: African Snake Antivenom Binding Challenge

A Low Complexity Speech Enhancement Framework for Full-Band Audio (48kHz) based on Deep Filtering.

SCALE: Modeling Clothed Humans with a Surface Codec of Articulated Local Elements (CVPR 2021)

Release of the ConditionalQA dataset

Multi-task Learning of Order-Consistent Causal Graphs (NeuRIPs 2021)