Leaf: Multiple-Choice Question Generation

Overview

Leaf: Multiple-Choice Question Generation

Easy to use and understand multiple-choice question generation algorithm using T5 Transformers. The application accepts a short passage of text and uses two fine-tuned T5 Transformer models to first generate multiple question-answer pairs corresponding to the given text, after which it uses them to generate distractors - additional options used to confuse the test taker.

question generation process

Originally inspired by a Bachelor's machine learning course (github link) and then continued as a topic for my Master's thesis at Sofia University, Bulgaria.

ECIR 2022 Demonstration paper

This work has been accepted as a demo paper for the ECIR 2022 conference.

Video demonstration: here

Live demo: coming soon

Paper: will be uploaded before the conference - 14th April 2022

Abstract: Testing with quiz questions has proven to be an effective strategy for better educational processes. However, manually creating quizzes is a tedious and time-consuming task. To address this challenge, we present Leaf, a system for generating multiple-choice questions from factual text. In addition to being very well suited for classroom settings, Leaf could be also used in an industrial setup, e.g., to facilitate onboarding and knowledge sharing, or as a component of chatbots, question answering systems, or Massive Open Online Courses (MOOCs).

Generating question and answer pairs

To generate the question-answer pairs we have fine-tuned a T5 transformer model from huggingface on the SQuAD1.1. dataset which is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles.

The model accepts the target answer and context as input:

'answer' + '
   
     + 'context' 

   

and outputs a question that answers the given answer for the corresponding text.

'answer' + '
   
     + 'question' 

   

To allow us to generate question-answer pairs without providing a target answer, we have trained the algorithm to do so when in place of the target answer the '[MASK]' token is passed.

'[MASK]' + '
   
     + 'context' 

   

The full training script can be found in the training directory or accessed directly in Google Colab.

Generating incorrect options (distractors)

To generate the distractors, another T5 transformer model has been fine-tuned. This time using the RACE dataset which consists of more than 28,000 passages and nearly 100,000 questions. The dataset is collected from English examinations in China, which are designed for middle school and high school students.

The model accepts the target answer, question and context as input:

'answer' + '
   
     + 'question' + 'context' 

   

and outputs 3 distractors separated by the ' ' token.

'distractor1' + '
   
     + 'distractor2' + '
    
      'distractor3' 

    
   

The full training script can be found in the training directory or accessed directly in Google Colab.

To extend the variety of distractors with simple words that are not so closely related to the context, we have also used sense2vec word embeddings in the cases where the T5 model does not good enough distractors.

Web application

To demonstrate the algorithm, a simple Angular web application has been created. It accepts the given paragraph along with the desired number of questions and outputs each generated question with the ability to redact them (shown below). The algorithm is exposing a simple REST API using flask which is consumed by the web app.

question generation process

The code for the web application is located in a separated repository here.

Installation guide

Creating a virtual environment (optional)

To avoid any conflicts with python packages from other projects, it is a good practice to create a virtual environment in which the packages will be installed. If you do not want to this you can skip the next commands and directly install the the requirements.txt file.

Create a virtual environment :

python -m venv venv

Enter the virtual environment:

Windows:

. .\venv\Scripts\activate

Linux or MacOS

source .\venv\Scripts\activate

Installing packages

pip install -r .\requirements.txt 

Downloading data

Question-answer model

Download the multitask-qg-ag model checkpoint and place it in the app/ml_models/question_generation/models/ directory.

Distractor generation

Download the race-distractors model checkpoint and place it in the app/ml_models/distractor_generation/models/ directory.

Download sense2vec, extract it and place the s2v_old folder and place it in the app/ml_models/sense2vec_distractor_generation/models/ directory.

Training on your own

The training scripts are available in the training directory. You can download the notebooks directly from there or open the Question-Answer Generation and Distractor Generation in Google Colab.

Owner
Kristiyan Vachev
Kristiyan Vachev
torchbearer: A model fitting library for PyTorch

Note: We're moving to PyTorch Lightning! Read about the move here. From the end of February, torchbearer will no longer be actively maintained. We'll

631 Jan 04, 2023
mlpack: a scalable C++ machine learning library --

a fast, flexible machine learning library Home | Documentation | Doxygen | Community | Help | IRC Chat Download: current stable version (3.4.2) mlpack

mlpack 4.2k Jan 09, 2023
MPViT:Multi-Path Vision Transformer for Dense Prediction

MPViT : Multi-Path Vision Transformer for Dense Prediction This repository inlcu

Youngwan Lee 272 Dec 20, 2022
A GUI to automatically create a TOPAS-readable MLC simulation file

Python script to create a TOPAS-readable simulation file descriring a Multi-Leaf-Collimator. Builds the MLC using the data from a 3D .stl file.

Sebastian Schäfer 0 Jun 19, 2022
[Nature Machine Intelligence' 21] "Advancing COVID-19 Diagnosis with Privacy-Preserving Collaboration in Artificial Intelligence"

[UCADI] COVID-19 Diagnosis With Federated Learning Intro We developed a Federated Learning (FL) Framework for global researchers to collaboratively tr

HUST EIC AI-LAB 30 Dec 12, 2022
ADSPM: Attribute-Driven Spontaneous Motion in Unpaired Image Translation

ADSPM: Attribute-Driven Spontaneous Motion in Unpaired Image Translation This repository provides a PyTorch implementation of ADSPM. Requirements Pyth

24 Jul 24, 2022
Implementation for Paper "Inverting Generative Adversarial Renderer for Face Reconstruction"

StyleGAR TODO: add arxiv link Implementation of Inverting Generative Adversarial Renderer for Face Reconstruction TODO: for test Currently, some model

155 Oct 27, 2022
A PyTorch implementation of SlowFast based on ICCV 2019 paper "SlowFast Networks for Video Recognition"

SlowFast A PyTorch implementation of SlowFast based on ICCV 2019 paper SlowFast Networks for Video Recognition. Requirements Anaconda PyTorch conda in

Hao Ren 8 Dec 23, 2022
Object Detection using YOLO from PyImageSearch

Object Detection using YOLO from PyImageSearch By applying object detection, you’ll not only be able to determine what is in an image, but also where

Mohamed NIANG 1 Feb 09, 2022
Xview3 solution - XView3 challenge, 2nd place solution

Xview3, 2nd place solution https://iuu.xview.us/ test split aggregate score publ

Selim Seferbekov 24 Nov 23, 2022
Doing fast searching of nearest neighbors in high dimensional spaces is an increasingly important problem

Benchmarking nearest neighbors Doing fast searching of nearest neighbors in high dimensional spaces is an increasingly important problem, but so far t

Erik Bernhardsson 3.2k Jan 03, 2023
Controlling the MicriSpotAI robot from scratch

Project-MicroSpot-AI Controlling the MicriSpotAI robot from scratch Colaborators Alexander Dennis Components from MicroSpot The MicriSpotAI has the fo

Dennis Núñez-Fernández 5 Oct 20, 2022
Weakly- and Semi-Supervised Panoptic Segmentation (ECCV18)

Weakly- and Semi-Supervised Panoptic Segmentation by Qizhu Li*, Anurag Arnab*, Philip H.S. Torr This repository demonstrates the weakly supervised gro

Qizhu Li 159 Dec 20, 2022
[CVPR'21] Multi-Modal Fusion Transformer for End-to-End Autonomous Driving

TransFuser This repository contains the code for the CVPR 2021 paper Multi-Modal Fusion Transformer for End-to-End Autonomous Driving. If you find our

695 Jan 05, 2023
Gradient representations in ReLU networks as similarity functions

Gradient representations in ReLU networks as similarity functions by Dániel Rácz and Bálint Daróczy. This repo contains the python code related to our

1 Oct 08, 2021
Program your own vulkan.gpuinfo.org query in Python. Used to determine baseline hardware for WebGPU.

query-gpuinfo-data License This software is not presently released under a license. The data in data/ is obtained under CC BY 4.0 as specified there.

Kai Ninomiya 5 Jul 18, 2022
A Pytorch implementation of the multi agent deep deterministic policy gradients (MADDPG) algorithm

Multi-Agent-Deep-Deterministic-Policy-Gradients A Pytorch implementation of the multi agent deep deterministic policy gradients(MADDPG) algorithm This

Phil Tabor 159 Dec 28, 2022
Codes for our paper "SentiLARE: Sentiment-Aware Language Representation Learning with Linguistic Knowledge" (EMNLP 2020)

SentiLARE: Sentiment-Aware Language Representation Learning with Linguistic Knowledge Introduction SentiLARE is a sentiment-aware pre-trained language

74 Dec 30, 2022
PyTorch code for the NAACL 2021 paper "Improving Generation and Evaluation of Visual Stories via Semantic Consistency"

Improving Generation and Evaluation of Visual Stories via Semantic Consistency PyTorch code for the NAACL 2021 paper "Improving Generation and Evaluat

Adyasha Maharana 28 Dec 08, 2022
Official Repository for our ECCV2020 paper: Imbalanced Continual Learning with Partitioning Reservoir Sampling

Imbalanced Continual Learning with Partioning Reservoir Sampling This repository contains the official PyTorch implementation and the dataset for our

Chris Dongjoo Kim 40 Sep 18, 2022