Pytorch implementation for the EMNLP 2020 (Findings) paper: Connecting the Dots: A Knowledgeable Path Generator for Commonsense Question Answering

Last update: Dec 05, 2022

Overview

Path-Generator-QA

This is a Pytorch implementation for the EMNLP 2020 (Findings) paper: Connecting the Dots: A Knowledgeable Path Generator for Commonsense Question Answering [arxiv][project page]

Code folders:

(1) learning-generator: conduct path sampling and then train the path generator.

(2) commonse-qa: use the generator to generate paths and then train the qa system on task dataset.

(3) A-Commonsense-Path-Generator-for-Connecting-Entities.ipynb: The notebook illustrating how to use our proposed generator to connect a pair of entities with a commonsense relational path.

Part of this code and instruction rely on our another project [code][arxiv]. Please cite both of our works if you use this code. Thanks!

@article{wang2020connecting,
  title={Connecting the Dots: A Knowledgeable Path Generator for Commonsense Question Answering},
  author={Wang, Peifeng and Peng, Nanyun and Szekely, Pedro and Ren, Xiang},
  journal={arXiv preprint arXiv:2005.00691},
  year={2020}
}

@article{feng2020scalable,
  title={Scalable Multi-Hop Relational Reasoning for Knowledge-Aware Question Answering},
  author={Feng, Yanlin and Chen, Xinyue and Lin, Bill Yuchen and Wang, Peifeng and Yan, Jun and Ren, Xiang},
  journal={arXiv preprint arXiv:2005.00646},
  year={2020}
}

Dependencies

Python >= 3.6
PyTorch == 1.1
transformers == 2.8.0
dgl == 0.3 (GPU version)
networkx == 2.3

Run the following commands to create a conda environment:

conda create -n pgqa python=3.6
source activate pgqa
conda install pytorch torchvision cudatoolkit=10.0 -c pytorch
pip install dgl-cu100
pip install transformers==2.8.0 tqdm networkx==2.3 nltk spacy==2.1.6
python -m spacy download en

For training a path generator

cd learning-generator
cd data
unzip conceptnet.zip
cd ..
python sample_path_rw.py

After path sampling, shuffle the resulting data './data/sample_path/sample_path.txt' and then split them into train.txt, dev.txt and test.txt by ratio of 0.9:0.05:0.05 under './data/sample_path/'

Then you can start to train the path generator by running

# the first arg is for specifying which gpu to use
./run.sh $gpu_device

The checkpoint of the path generator would be stored in './checkpoints/model.ckpt'. Move it to '../commonsense-qa/saved_models/pretrain_generator'. So far, we are done with training the generator.

Alternatively, you can also download our well-trained path generator checkpoint.

For training a commonsense qa system

1. Download Data

First, you need to download all the necessary data in order to train the model:

cd commonsense-qa
bash scripts/download.sh

2. Preprocess

To preprocess the data, run:

python preprocess.py

3. Using the path generator to connect question-answer entities

(Modify ./config/path_generate.config to specify the dataset and gpu device)

./scripts/run_generate.sh

4. Commonsense QA system training

bash scripts/run_main.sh ./config/csqa.config

Training process and final evaluation results would be stored in './saved_models/'

Pytorch implementation for the EMNLP 2020 (Findings) paper: Connecting the Dots: A Knowledgeable Path Generator for Commonsense Question Answering

Related tags

Overview

Path-Generator-QA

Dependencies

For training a path generator

For training a commonsense qa system

1. Download Data

2. Preprocess

3. Using the path generator to connect question-answer entities

4. Commonsense QA system training

Owner

Peifeng Wang

DECA: Detailed Expression Capture and Animation (SIGGRAPH 2021)

Driller: augmenting AFL with symbolic execution!

Attention-based Transformation from Latent Features to Point Clouds (AAAI 2022)

Object Detection using YOLO from PyImageSearch

Differentiable scientific computing library

Graph Convolutional Neural Networks with Data-driven Graph Filter (GCNN-DDGF)

Learning Open-World Object Proposals without Learning to Classify

This is a file about Unet implemented in Pytorch

Train/evaluate a Keras model, get metrics streamed to a dashboard in your browser.

Boston House Prediction Valuation Tool

Caffe models in TensorFlow

Simulating an AI playing 2048 using the Expectimax algorithm

A PyTorch implementation of SIN: Superpixel Interpolation Network

Pytorch implementation of SimSiam Architecture

The official repository for BaMBNet

Unsupervised captioning - Code for Unsupervised Image Captioning

PyTorch implementation of "A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement."

Implements the training, testing and editing tools for "Pluralistic Image Completion"

Model Quantization Benchmark

Source code of CIKM2021 Long Paper "PSSL: Self-supervised Learning for Personalized Search with Contrastive Sampling".