This repo contains the code and data used in the paper "Wizard of Search Engine: Access to Information Through Conversations with Search Engines"

Last update: Oct 27, 2022

Related tags

Deep Learning CaSE_WISE

Overview

Wizard of Search Engine: Access to Information Through Conversations with Search Engines

by Pengjie Ren, Zhongkun Liu, Xiaomeng Song, Hongtao Tian, Zhumin Chen, Zhaochun Ren and Maarten de Rijke

@inproceedings{ren2021wizard,
title={Wizard of Search Engine: Access to Information Through Conversations with Search Engines},
author={Ren, Pengjie and Liu, Zhongkun and Song, Xiaomeng and Tian, Hongtao and Chen, Zhumin and Ren, Zhaochun and de Rijke, Maarten},
booktitle={Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval},
year={2021}
}

Paper summary

Task pipeline for conversational information seeking (CIS)

Model pipeline for conversational information seeking (CIS)

In this work, we make efforts to facilitate research on conversational information seeking (CIS) from three angles: (1) We formulate a pipeline for CIS with six sub-tasks: intent detection, keyphrase extraction, action prediction, query selection, passage selection, and response generation. (2) We release a benchmark dataset, called wizard of search engine(WISE), which allows for comprehensive and in-depth research on all aspects of CIS. (3) We design a neural architecture capable of training and evaluating both jointly and separately on the six sub-tasks, and devise a pre-train/fine-tune learning scheme, that can reduce the requirements of WISE in scale by making full use of available data.

Running experiments

Requirements

This code is written in PyTorch. Any version later than 1.6 is expected to work with the provided code. Please refer to the official website for an installation guide.

We recommend to use conda for installing the requirements. If you haven't installed conda yet, you can find instructions here. The steps for installing the requirements are:

Create a new environment
```
conda create env -n WISE
```
In the environment, a python version >3.6 should be used.
Activate the environment
```
conda activate WISE
```
Install the requirements within the environment via pip:
```
pip install -r requirements.txt
```

Datasets

We use WebQA, DuReader, KdConv and DuConv datasets for pretraining. You can get them from the provided links and put them in the corresponding folders in ./data/. For example, WebQA datasets should be put in ./data/WebQA, and DuReader datasets in ./data/Dureader and so on. We use the WISE dataset to fine-tune the model, and this dataset is available in ./data/WISE. Details about the WISE dataset can be found here.

Training

Run the following scripts to automatically process the pretraining datasets into the required format:

python ./Run.py --mode='data'

Run the following scripts sequentially:

python -m torch.distributed.launch --nproc_per_node=4 ./Run.py --mode='pretrain'
python -m torch.distributed.launch --nproc_per_node=4 ./Run.py --mode='finetune'

Note that you should select the appropriate pretrain models from the folder ./output/pretrained, and put them into ./output/pretrained_ready which is newly created by yourself before finetuning. The hyperparameters are set to the default values used in our experiments. To see an overview of all hyperparameters, please refer to ./Run.py.

Evaluating

Run the following scripts:

python -m torch.distributed.launch --nproc_per_node=4 ./Run.py --mode='infer-valid'
python -m torch.distributed.launch --nproc_per_node=4 ./Run.py --mode='eval-valid'

python -m torch.distributed.launch --nproc_per_node=4 ./Run.py --mode='infer-test'
python -m torch.distributed.launch --nproc_per_node=4 ./Run.py --mode='eval-test'

This repo contains the code and data used in the paper "Wizard of Search Engine: Access to Information Through Conversations with Search Engines"

Related tags

Overview

Wizard of Search Engine: Access to Information Through Conversations with Search Engines

Paper summary

Running experiments

Requirements

Datasets

Training

Evaluating

Owner

ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation

Car Parking Tracker Using OpenCv

MVS2D: Efficient Multi-view Stereo via Attention-Driven 2D Convolutions

Jetson Nano-based smart camera system that measures crowd face mask usage in real-time.

Aydin is a user-friendly, feature-rich, and fast image denoising tool

Does Oversizing Improve Prosumer Profitability in a Flexibility Market? - A Sensitivity Analysis using PV-battery System

Progressive Growing of GANs for Improved Quality, Stability, and Variation

A curated list of programmatic weak supervision papers and resources

A project to build an AI voice assistant using Python . The Voice assistant interacts with the humans to perform basic tasks.

WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU

This repository contains an overview of important follow-up works based on the original Vision Transformer (ViT) by Google.

Aerial Imagery dataset for fire detection: classification and segmentation (Unmanned Aerial Vehicle (UAV))

PyTorch Implementation for "ForkGAN with SIngle Rainy NIght Images: Leveraging the RumiGAN to See into the Rainy Night"

LSTM built using Keras Python package to predict time series steps and sequences. Includes sin wave and stock market data

✨风纪委员会自动投票脚本，利用Github Action帮你进行裁决操作（为了让其他风纪委员有案件可判，本程序从中午12点才开始运行，有需要请自己修改运行时间）

Spatial Action Maps for Mobile Manipulation (RSS 2020)

A simple implementation of Kalman filter in single object tracking

Layered Neural Atlases for Consistent Video Editing

The Pytorch implementation for "Video-Text Pre-training with Learned Regions"

All course materials for the Zero to Mastery Machine Learning and Data Science course.