The Easy-to-use Dialogue Response Selection Toolkit for Researchers

Overview

Easy-to-use toolkit for retrieval-based Chatbot

Recent Activity

  1. Our released RRS corpus can be found here.
  2. Our released BERT-FP post-training checkpoint for the RRS corpus can be found here.

How to Use

  1. Init the repo

    Before using the repo, please run the following command to init:

    # create the necessay folders
    python init.py
    
    # prepare the environment
    # if some package cannot be installed, just google and install it from other ways
    pip install -r requirements.txt
  2. train the model

    ./scripts/train.sh <dataset_name> <model_name> <cuda_ids>
  3. test the model [rerank]

    ./scripts/test_rerank.sh <dataset_name> <model_name> <cuda_id>
  4. test the model [recal]

    # different recall_modes are available: q-q, q-r
    ./scripts/test_recall.sh <dataset_name> <model_name> <cuda_id>
  5. inference the responses and save into the faiss index

    Somethings inference will missing data samples, please use the 1 gpu (faiss-gpu search use 1 gpu quickly)

    It should be noted that: 1. For writer dataset, use extract_inference.py script to generate the inference.txt 2. For other datasets(douban, ecommerce, ubuntu), just cp train.txt inference.txt. The dataloader will automatically read the test.txt to supply the corpus.

    # work_mode=response, inference the response and save into faiss (for q-r matching) [dual-bert/dual-bert-fusion]
    # work_mode=context, inference the context to do q-q matching
    # work_mode=gray, inference the context; read the faiss(work_mode=response has already been done), search the topk hard negative samples; remember to set the BERTDualInferenceContextDataloader in config/base.yaml
    ./scripts/inference.sh <dataset_name> <model_name> <cuda_ids>

    If you want to generate the gray dataset for the dataset:

    # 1. set the mode as the **response**, to generate the response faiss index; corresponding dataset name: BERTDualInferenceDataset;
    ./scripts/inference.sh <dataset_name> response <cuda_ids>
    
    # 2. set the mode as the **gray**, to inference the context in the train.txt and search the top-k candidates as the gray(hard negative) samples; corresponding dataset name: BERTDualInferenceContextDataset
    ./scripts/inference.sh <dataset_name> gray <cuda_ids>
    
    # 3. set the mode as the **gray-one2many** if you want to generate the extra positive samples for each context in the train set, the needings of this mode is the same as the **gray** work mode
    ./scripts/inference.sh <dataset_name> gray-one2many <cuda_ids>

    If you want to generate the pesudo positive pairs, run the following commands:

    # make sure the dual-bert inference dataset name is BERTDualInferenceDataset
    ./scripts/inference.sh <dataset_name> unparallel <cuda_ids>
  6. deploy the rerank and recall model

    # load the model on the cuda:0(can be changed in deploy.sh script)
    ./scripts/deploy.sh <cuda_id>

    at the same time, you can test the deployed model by using:

    # test_mode: recall, rerank, pipeline
    ./scripts/test_api.sh <test_mode> <dataset>
  7. test the recall performance of the elasticsearch

    Before testing the es recall, make sure the es index has been built:

    # recall_mode: q-q/q-r
    ./scripts/build_es_index.sh <dataset_name> <recall_mode>
    # recall_mode: q-q/q-r
    ./scripts/test_es_recall.sh <dataset_name> <recall_mode> 0
  8. simcse generate the gray responses

    # train the simcse model
    ./script/train.sh <dataset_name> simcse <cuda_ids>
    # generate the faiss index, dataset name: BERTSimCSEInferenceDataset
    ./script/inference_response.sh <dataset_name> simcse <cuda_ids>
    # generate the context index
    ./script/inference_simcse_response.sh <dataset_name> simcse <cuda_ids>
    # generate the test set for unlikelyhood-gen dataset
    ./script/inference_simcse_unlikelyhood_response.sh <dataset_name> simcse <cuda_ids>
    # generate the gray response
    ./script/inference_gray_simcse.sh <dataset_name> simcse <cuda_ids>
    # generate the test set for unlikelyhood-gen dataset
    ./script/inference_gray_simcse_unlikelyhood.sh <dataset_name> simcse <cuda_ids>
Owner
GMFTBY
Those who are crazy enough to think they can change the world are the ones who can.
GMFTBY
Lightweight Salient Object Detection in Optical Remote Sensing Images via Feature Correlation

CorrNet This project provides the code and results for 'Lightweight Salient Object Detection in Optical Remote Sensing Images via Feature Correlation'

Gongyang Li 13 Nov 03, 2022
Dialect classification

Dialect-Classification This repository presents the data that was used in a talk at ICKL-5 (5th International Conference on Kurdish Linguistics) at th

Kurdish-BLARK 0 Nov 12, 2021
Nvidia Semantic Segmentation monorepo

Paper | YouTube | Cityscapes Score Pytorch implementation of our paper Hierarchical Multi-Scale Attention for Semantic Segmentation. Please refer to t

NVIDIA Corporation 1.6k Jan 04, 2023
BraTs-VNet - BraTS(Brain Tumour Segmentation) using V-Net

BraTS(Brain Tumour Segmentation) using V-Net This project is an approach to dete

Rituraj Dutta 7 Nov 27, 2022
Fuzzification helps developers protect the released, binary-only software from attackers who are capable of applying state-of-the-art fuzzing techniques

About Fuzzification Fuzzification helps developers protect the released, binary-only software from attackers who are capable of applying state-of-the-

gts3.org (<a href=[email protected])"> 55 Oct 25, 2022
《Geo Word Clouds》paper implementation

《Geo Word Clouds》paper implementation

Russellwzr 2 Jan 28, 2022
A pure PyTorch implementation of the loss described in "Online Segment to Segment Neural Transduction"

ssnt-loss ℹ️ This is a WIP project. the implementation is still being tested. A pure PyTorch implementation of the loss described in "Online Segment t

張致強 1 Feb 09, 2022
Numenta published papers code and data

Numenta research papers code and data This repository contains reproducible code for selected Numenta papers. It is currently under construction and w

Numenta 293 Jan 06, 2023
LETR: Line Segment Detection Using Transformers without Edges

LETR: Line Segment Detection Using Transformers without Edges Introduction This repository contains the official code and pretrained models for Line S

mlpc-ucsd 157 Jan 06, 2023
PatchMatch-RL: Deep MVS with Pixelwise Depth, Normal, and Visibility

PatchMatch-RL: Deep MVS with Pixelwise Depth, Normal, and Visibility Jae Yong Lee, Joseph DeGol, Chuhang Zou, Derek Hoiem Installation To install nece

31 Apr 19, 2022
PyTorch implementation of Barlow Twins.

Barlow Twins: Self-Supervised Learning via Redundancy Reduction PyTorch implementation of Barlow Twins. @article{zbontar2021barlow, title={Barlow Tw

Facebook Research 839 Dec 29, 2022
Keras community contributions

keras-contrib : Keras community contributions Keras-contrib is deprecated. Use TensorFlow Addons. The future of Keras-contrib: We're migrating to tens

Keras 1.6k Dec 21, 2022
This is the official PyTorch implementation of our paper: "Artistic Style Transfer with Internal-external Learning and Contrastive Learning".

Artistic Style Transfer with Internal-external Learning and Contrastive Learning This is the official PyTorch implementation of our paper: "Artistic S

51 Dec 20, 2022
A complete, self-contained example for training ImageNet at state-of-the-art speed with FFCV

ffcv ImageNet Training A minimal, single-file PyTorch ImageNet training script designed for hackability. Run train_imagenet.py to get... ...high accur

FFCV 92 Dec 31, 2022
DeLag: Detecting Latency Degradation Patterns in Service-based Systems

DeLag: Detecting Latency Degradation Patterns in Service-based Systems Replication package of the work "DeLag: Detecting Latency Degradation Patterns

SEALABQualityGroup @ University of L'Aquila 2 Mar 24, 2022
Official Implementation of 'UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers' ICLR 2021(spotlight)

UPDeT Official Implementation of UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers (ICLR 2021 spotlight) The

hhhusiyi 96 Dec 22, 2022
PyTorch implementation of EfficientNetV2

[NEW!] Check out our latest work involution accepted to CVPR'21 that introduces a new neural operator, other than convolution and self-attention. PyTo

Duo Li 375 Jan 03, 2023
Deep Learning Head Pose Estimation using PyTorch.

Hopenet is an accurate and easy to use head pose estimation network. Models have been trained on the 300W-LP dataset and have been tested on real data with good qualitative performance.

Nataniel Ruiz 1.3k Dec 26, 2022
Use evolutionary algorithms instead of gridsearch in scikit-learn

sklearn-deap Use evolutionary algorithms instead of gridsearch in scikit-learn. This allows you to reduce the time required to find the best parameter

rsteca 709 Jan 03, 2023
This program will stylize your photos with fast neural style transfer.

Neural Style Transfer (NST) Using TensorFlow Demo TensorFlow TensorFlow is an end-to-end open source platform for machine learning. It has a comprehen

Ismail Boularbah 1 Aug 08, 2022