The Easy-to-use Dialogue Response Selection Toolkit for Researchers

Overview

Easy-to-use toolkit for retrieval-based Chatbot

Our released data can be found at this link. Make sure the following steps are adopted to use our codes.

How to Use

  1. Init the repo

    Before using the repo, please run the following command to init:

    # create the necessay folders
    python init.py
    
    # prepare the environment
    # if some package cannot be installed, just google and install it from other ways
    pip install -r requirements.txt
  2. train the model

    ./scripts/train.sh <dataset_name> <model_name> <cuda_ids>
  3. test the model [rerank]

    ./scripts/test_rerank.sh <dataset_name> <model_name> <cuda_id>
  4. test the model [recal]

    # different recall_modes are available: q-q, q-r
    ./scripts/test_recall.sh <dataset_name> <model_name> <cuda_id>
  5. inference the responses and save into the faiss index

    Somethings inference will missing data samples, please use the 1 gpu (faiss-gpu search use 1 gpu quickly)

    It should be noted that: 1. For writer dataset, use extract_inference.py script to generate the inference.txt 2. For other datasets(douban, ecommerce, ubuntu), just cp train.txt inference.txt. The dataloader will automatically read the test.txt to supply the corpus.

    # work_mode=response, inference the response and save into faiss (for q-r matching) [dual-bert/dual-bert-fusion]
    # work_mode=context, inference the context to do q-q matching
    # work_mode=gray, inference the context; read the faiss(work_mode=response has already been done), search the topk hard negative samples; remember to set the BERTDualInferenceContextDataloader in config/base.yaml
    ./scripts/inference.sh <dataset_name> <model_name> <cuda_ids>

    If you want to generate the gray dataset for the dataset:

    # 1. set the mode as the **response**, to generate the response faiss index; corresponding dataset name: BERTDualInferenceDataset;
    ./scripts/inference.sh <dataset_name> response <cuda_ids>
    
    # 2. set the mode as the **gray**, to inference the context in the train.txt and search the top-k candidates as the gray(hard negative) samples; corresponding dataset name: BERTDualInferenceContextDataset
    ./scripts/inference.sh <dataset_name> gray <cuda_ids>
    
    # 3. set the mode as the **gray-one2many** if you want to generate the extra positive samples for each context in the train set, the needings of this mode is the same as the **gray** work mode
    ./scripts/inference.sh <dataset_name> gray-one2many <cuda_ids>

    If you want to generate the pesudo positive pairs, run the following commands:

    # make sure the dual-bert inference dataset name is BERTDualInferenceDataset
    ./scripts/inference.sh <dataset_name> unparallel <cuda_ids>
  6. deploy the rerank and recall model

    # load the model on the cuda:0(can be changed in deploy.sh script)
    ./scripts/deploy.sh <cuda_id>

    at the same time, you can test the deployed model by using:

    # test_mode: recall, rerank, pipeline
    ./scripts/test_api.sh <test_mode> <dataset>
  7. test the recall performance of the elasticsearch

    Before testing the es recall, make sure the es index has been built:

    # recall_mode: q-q/q-r
    ./scripts/build_es_index.sh <dataset_name> <recall_mode>
    # recall_mode: q-q/q-r
    ./scripts/test_es_recall.sh <dataset_name> <recall_mode> 0
  8. simcse generate the gray responses

    # train the simcse model
    ./script/train.sh <dataset_name> simcse <cuda_ids>
    # generate the faiss index, dataset name: BERTSimCSEInferenceDataset
    ./script/inference_response.sh <dataset_name> simcse <cuda_ids>
    # generate the context index
    ./script/inference_simcse_response.sh <dataset_name> simcse <cuda_ids>
    # generate the test set for unlikelyhood-gen dataset
    ./script/inference_simcse_unlikelyhood_response.sh <dataset_name> simcse <cuda_ids>
    # generate the gray response
    ./script/inference_gray_simcse.sh <dataset_name> simcse <cuda_ids>
    # generate the test set for unlikelyhood-gen dataset
    ./script/inference_gray_simcse_unlikelyhood.sh <dataset_name> simcse <cuda_ids>
Owner
GMFTBY
Those who are crazy enough to think they can change the world are the ones who can.
GMFTBY
Python SDK for LUSID by FINBOURNE, a bi-temporal investment management data platform with portfolio accounting capabilities.

LUSID® Python SDK This is the Python SDK for LUSID by FINBOURNE, a bi-temporal investment management data platform with portfolio accounting capabilit

FINBOURNE 6 Dec 24, 2022
Neko is An Anime themed advance Telegram group management bot.

NekoRobot A modular telegram Python bot running on python3 with an sqlalchemy, mongodb database. ╒═══「 Status 」 Maintained Support Group Included Free

Lovely Boy 22 Jan 05, 2023
Official API documentation for Highrise

Highrise API The Highrise API is implemented as vanilla XML over HTTP using all four verbs (GET/POST/PUT/DELETE). Every resource, like Person, Deal, o

Basecamp 128 Dec 06, 2022
Automatically pulls specified repository whenever a specified file is pushed. Great for working collaboratively when you need to run something locally.

autopull Simple python tool that allows you to automatically pull from a github repository whenever a file with a specified name is uploaded installat

carreb 0 Sep 27, 2022
SpamSMS - SPAM SMS menggunakan api web INDIHOME

SPAM SMS Unlimited SPAM SMS menggunakan api web INDIHOME Cara Install Di Termux

Zuck-Ker 1 Jan 08, 2022
Python3 based bittrex rest api wrapper

bittrex-rest-api This open source project was created to give an understanding of the Bittrex Rest API v1.1/v3.0 in pearl language. The sample file sh

4 Nov 15, 2022
Tiktok-bot - A Simple Tiktok bot With Python

Install the requirements pip install selenium pip install pyfiglet==0.7.5 How ca

Muchlis Faroqi 5 Aug 23, 2022
Stock market bot that will be used to learn about API calls and database connections.

Stock market bot that will be used to learn about API calls and database connections.

1 Dec 24, 2021
The system to host your files on the Discord application

Distorage The system to host your files on the Discord application Documentation Documentation Distorage How to use the package You can install it wit

6 Jun 27, 2022
Implementation of Chatterbot using Discord API

discord-chat-bot Implementation of Chatterbot using Discord API. Usage Due to the necessity of storing files to train the AI, the bot is not hosted pu

kiwijuice56 0 Sep 29, 2022
A feishu bot daily push arxiv latest articles.

arxiv-feishu-bot We develop A simple feishu bot script daily pushes arxiv latest articles. His effect is as follows: Of course, you can also use other

huchi 6 Apr 06, 2022
This is a repository for the Duke University Cloud Computing course project on Serveless Data Engineering Pipeline. For this project, I recreated the below pipeline.

AWS Data Engineering Pipeline This is a repository for the Duke University Cloud Computing course project on Serverless Data Engineering Pipeline. For

15 Jul 28, 2021
Autov2new - Pro Auto Filter Bot V2

Pro Auto Filter Bot V2 Deploy You can deploy this bot anywhere. Watch Deploying

1 Jan 06, 2022
Simple Webhook Spammer with Optional Proxy Support

😎 �Simple Webhook Spammer with Optional Proxy Support:- [+] git clone https://g

Terminal1337 12 Sep 29, 2022
Starlink Order Status Notification

Starlink Order Status Notification This script logs into Starlink order portal, pulls your estimated delivery date and emails it to a designated email

Aaron R. 1 Jul 08, 2022
BT CCXT Store

bt-ccxt-store-cn backtrader是一个非常好的开源量化回测平台,我自己也时常用它,backtrader也能接入实盘,而bt-ccxt-store就是帮助backtrader接入数字货币实盘交易的一个插件,但是bt-ccxt-store的某些实现并不是很好,无节制的网络轮询,一些

moses 40 Dec 31, 2022
Tools used by Ada Health's internal IT team to deploy and manage a serverless Munki setup.

Serverless Munki This repository contains cross platform code to deploy a production ready Munki service, complete with AutoPkg, that runs entirely fr

Ada Health 17 Dec 05, 2022
AWS-serverless-starter - AWS Lambda serverless stack via Serverless framework

Serverless app via AWS Lambda, ApiGateway and Serverless framework Configuration

Bəxtiyar 3 Feb 02, 2022
A Python library for inserting an reverse shell attached to Telegram in any Python application.

py tel reverse shell the reverse shell in your telgram! What is this? This program is a Python library that you can use to put an inverted shell conne

Torham 12 Dec 28, 2022
Discord Webhook Spammer (fastest)

Discord Webhook Spammer A simple fast asynchronous webhook spammer. Spammer Features Fast message spamming. Controllable speed. Noob friendly. Usage N

Varient 2 Apr 22, 2022