The Easy-to-use Dialogue Response Selection Toolkit for Researchers

Overview

Easy-to-use toolkit for retrieval-based Chatbot

Our released data can be found at this link. Make sure the following steps are adopted to use our codes.

How to Use

  1. Init the repo

    Before using the repo, please run the following command to init:

    # create the necessay folders
    python init.py
    
    # prepare the environment
    # if some package cannot be installed, just google and install it from other ways
    pip install -r requirements.txt
  2. train the model

    ./scripts/train.sh <dataset_name> <model_name> <cuda_ids>
  3. test the model [rerank]

    ./scripts/test_rerank.sh <dataset_name> <model_name> <cuda_id>
  4. test the model [recal]

    # different recall_modes are available: q-q, q-r
    ./scripts/test_recall.sh <dataset_name> <model_name> <cuda_id>
  5. inference the responses and save into the faiss index

    Somethings inference will missing data samples, please use the 1 gpu (faiss-gpu search use 1 gpu quickly)

    It should be noted that: 1. For writer dataset, use extract_inference.py script to generate the inference.txt 2. For other datasets(douban, ecommerce, ubuntu), just cp train.txt inference.txt. The dataloader will automatically read the test.txt to supply the corpus.

    # work_mode=response, inference the response and save into faiss (for q-r matching) [dual-bert/dual-bert-fusion]
    # work_mode=context, inference the context to do q-q matching
    # work_mode=gray, inference the context; read the faiss(work_mode=response has already been done), search the topk hard negative samples; remember to set the BERTDualInferenceContextDataloader in config/base.yaml
    ./scripts/inference.sh <dataset_name> <model_name> <cuda_ids>

    If you want to generate the gray dataset for the dataset:

    # 1. set the mode as the **response**, to generate the response faiss index; corresponding dataset name: BERTDualInferenceDataset;
    ./scripts/inference.sh <dataset_name> response <cuda_ids>
    
    # 2. set the mode as the **gray**, to inference the context in the train.txt and search the top-k candidates as the gray(hard negative) samples; corresponding dataset name: BERTDualInferenceContextDataset
    ./scripts/inference.sh <dataset_name> gray <cuda_ids>
    
    # 3. set the mode as the **gray-one2many** if you want to generate the extra positive samples for each context in the train set, the needings of this mode is the same as the **gray** work mode
    ./scripts/inference.sh <dataset_name> gray-one2many <cuda_ids>

    If you want to generate the pesudo positive pairs, run the following commands:

    # make sure the dual-bert inference dataset name is BERTDualInferenceDataset
    ./scripts/inference.sh <dataset_name> unparallel <cuda_ids>
  6. deploy the rerank and recall model

    # load the model on the cuda:0(can be changed in deploy.sh script)
    ./scripts/deploy.sh <cuda_id>

    at the same time, you can test the deployed model by using:

    # test_mode: recall, rerank, pipeline
    ./scripts/test_api.sh <test_mode> <dataset>
  7. test the recall performance of the elasticsearch

    Before testing the es recall, make sure the es index has been built:

    # recall_mode: q-q/q-r
    ./scripts/build_es_index.sh <dataset_name> <recall_mode>
    # recall_mode: q-q/q-r
    ./scripts/test_es_recall.sh <dataset_name> <recall_mode> 0
  8. simcse generate the gray responses

    # train the simcse model
    ./script/train.sh <dataset_name> simcse <cuda_ids>
    # generate the faiss index, dataset name: BERTSimCSEInferenceDataset
    ./script/inference_response.sh <dataset_name> simcse <cuda_ids>
    # generate the context index
    ./script/inference_simcse_response.sh <dataset_name> simcse <cuda_ids>
    # generate the test set for unlikelyhood-gen dataset
    ./script/inference_simcse_unlikelyhood_response.sh <dataset_name> simcse <cuda_ids>
    # generate the gray response
    ./script/inference_gray_simcse.sh <dataset_name> simcse <cuda_ids>
    # generate the test set for unlikelyhood-gen dataset
    ./script/inference_gray_simcse_unlikelyhood.sh <dataset_name> simcse <cuda_ids>
Owner
GMFTBY
Those who are crazy enough to think they can change the world are the ones who can.
GMFTBY
Based on nonebot, a common bot framework for maimai.

mai bot 使用指南 此 README 提供了最低程度的 mai bot 教程与支持。 Step 1. 安装 Python 请自行前往 https://www.python.org/ 下载 Python 3 版本( 3.7)并将其添加到环境变量(在安装过程中勾选 Add to system P

Diving-Fish 150 Jan 01, 2023
Infinity: a Twitter retweet bot that can be used by anyone

INSTAMATE Requires Firefox Instapy Python3 How To Use? Fork the repository Add your credentials in the bot.py file Save commits Clone your fork cd int

unofficialdxnny 3 Jun 23, 2022
AWS Lambda Fast API starter application

AWS Lambda Fast API Fast API starter application compatible with API Gateway and Lambda Function. How to deploy it? Terraform AWS Lambda API is a reus

OBytes 6 Apr 20, 2022
Your custom slash commands Discord bot!

Slashy - Your custom slash-commands bot Hey, I'm Slashy - your friendly neighborhood custom-command bot! The code for this bot exists because I'm like

Omar Zunic 8 Dec 20, 2022
Discord bot for name verifying. Created for TinkerHubGCEK discord server. Tinky is now deployed in heroku

Custom Discord bot This custom discord-python bot assigns roles to members joined at discord server. It looks and compares a list before verifying the

Edwin Jose George 2 Dec 16, 2021
Free & open source API service for obtaining information about +9600 universities worldwide.

Free & open source API service for obtaining information about +9600 universities worldwide.

Yagiz Degirmenci 57 Nov 04, 2022
A multi purpose discord bot for python

Sypher The best multi purpose discord bot. Add Sypher right now Invite Me | Join

Johan Naizu 1 Dec 15, 2022
A auto clock-in script based on python3 for BJUTer.

Introduction A auto clock-in script based on python3 for BJUTer. It could clock in at 9:00 a.m everyday. The script is inspired by tsosunchia What can

X 7 Nov 15, 2022
A Python Script to automate searching of available vaccination centers in the city and hence booking

Cowin Vaccine Availability Notifier Cowin Vaccine Availability Notifier takes your City or PIN code as an input and automatically notifies you via ema

Jayesh Padhiar 7 Sep 05, 2021
An API-driven solution for Makerspaces, Tinkerers, and Hackers.

Mventory is an API-driven inventory solution for Makers, Makerspaces, Hackspaces, and just about anyone else who needs to keep track of "stuff".

Matthew Macdonald-Wallace 107 Dec 21, 2022
Discord bot to monitor collection of mods on the Steam Workshop and notify on update to selected discord server via Nextcordbot API.

Steam-Workshop-Monitor Discord bot to monitor collection of mods on the Steam Workshop and notify on update to selected Discord channel via Nextcordbo

7 Nov 03, 2022
A Python library for PagerDuty.

Pygerduty Python Library for PagerDuty's REST API and Events API. This library was originally written to support v1 and is currently being updated to

Dropbox 164 Dec 20, 2022
Advanced and powerful Userbot written with telethon. ♥

Daisy-X-UB ☣️ The Most Super Powerfull UserBot ☣️ ⚡ †hê ∂αιѕу χ ⚡ Legendary AF Ꭰαιѕу χ This is a userbot made for telegram. I made this userbot with h

TeamDaisyX 31 Jul 30, 2021
Check and write all account info + Check nitro on account

Discord-Token-Checker Check and write all account info + Check nitro on account Also check https://github.com/GuFFy12/Discord-Token-Parser (Parse disc

36 Jan 01, 2023
Python Client for Yandex Cloud Logging

Python Client for Yandex Cloud Logging Installation pip3 install python-yandex-cloud-logging Creating a Yandex Cloud Logging Group yc logging group c

MCode 0 Dec 08, 2021
Trading bot - A Trading bot With Python

Trading_bot Trading bot intended for 1) Tracking current prices of tokens 2) Set

Tymur Kotkov 29 Dec 01, 2022
Popcorn-time-api - Python API for interacting with the Popcorn Time Servers

Popcorn Time API 📝 CONTRIBUTIONS Before doing any contribution read CONTRIBUTIN

Antonio 3 Oct 31, 2022
GBSLocalLauncher - A script to compose ENV file for Local Compose

GBSLocalLauncher This is a script to compose ENV file for Local Compose. It crea

2 Jan 27, 2022
StudyLion is a Discord bot that tracks members' study and work time while offering members to view their statistics and use productivity tools such as: To-do lists, Pomodoro timers, reminders, and much more.

StudyLion - Discord Productivity Bot StudyLion is a Discord bot that tracks members' study and work time while offering members the ability to view th

45 Dec 26, 2022
A program to convert YouTube channel registration information into Json files for ThirdTube.

ThirdTubeImporter A program to convert YouTube channel registration information into Json files for ThirdTube. Usage Japanese https://takeout.google.c

Hidegon 2 Dec 18, 2021