System Combination for Grammatical Error Correction Based on Integer Programming

Last update: Mar 29, 2022

Related tags

Overview

System Combination for Grammatical Error Correction Based on Integer Programming

This repository contains the code and scripts that implement the system combination approach for grammatical error correction in Lin and Ng (2021).

Reference

Ruixi Lin and Hwee Tou Ng (2021). System Combination for Grammatical Error Correction Based on Integer Programming.

Please cite:

@inproceedings{lin2021gecip,
  author    = "Lin, Ruixi and Ng, Hwee Tou",
  title     = "System Combination for Grammatical Error Correction Based on Integer Programming",
  booktitle = "Proceedings of Recent Advances in Natural Language Processing",
  year      = "2021",
  pages     = "829-834"
}

Table of contents

Prerequisites

Example

License

Prerequisites

conda create --name comb python=3.6
conda activate comb
pip install spacy
python -m spacy download en

For the nonlinear integer programming solver, we use

LINGO10.0

Note that educational institutions can obtain a free license to use the LINGO solver.

Example

Combine the 3 GEC systems listed in the paper using the IP approach. The three systems are UEdin-MS (https://aclanthology.org/W19-4427), Kakao (https://aclanthology.org/W19-4423), and Tohoku (https://aclanthology.org/D19-1119). The core functions for the IP objective are implemented in model.lg4. You can find model.lg4 under lingo/inputs.

Run python prepare_data.py -dir . -list kakao uedinms tohoku to generate aggregated TP, FP, and FN counts. The counts files are stored under lingo/inputs.
Load model.lg4 into the LINGO console and specify the input data path with the counts file path, select the INLP model, and run optimizations. Store the solutions to lingo/outputs/sol_kakao_uedinms_tohoku.txt.
Run ./comb.sh . sol_kakao_uedinms_tohoku.txt to load LINGO solutions, merge and apply edits. The resulted blind test file can be found under submissions. It can be zipped and submitted to the BEA CodeLab website (https://competitions.codalab.org/competitions/20228) for evaluations.

The data folder provides individual GEC system output files, and .m2 files generated using ERRANT for the listed systems. For more information, please visit the ERRANT github page.

We include the IP combined .m2 files under merged_m2, and the corresponding text files under submissions.

License

The source code and models in this repository are licensed under the GNU General Public License v3.0 (see LICENSE). For further research interests and commercial use of the code and models, please contact Ruixi Lin ([email protected]) and Prof. Hwee Tou Ng ([email protected]).

System Combination for Grammatical Error Correction Based on Integer Programming

Related tags

Overview

System Combination for Grammatical Error Correction Based on Integer Programming

Reference

Prerequisites

Example

License

Owner

NUS NLP Group

All materials of Cassandra Event, Udyam'22

A Fast Knowledge Distillation Framework for Visual Recognition

PyTorch implementation of 1712.06087 "Zero-Shot" Super-Resolution using Deep Internal Learning

A faster pytorch implementation of faster r-cnn

GeneDisco is a benchmark suite for evaluating active learning algorithms for experimental design in drug discovery.

Node Editor Plug for Blender

InsCLR: Improving Instance Retrieval with Self-Supervision

Setup and customize deep learning environment in seconds.

[ICCV 2021] Code release for "Sub-bit Neural Networks: Learning to Compress and Accelerate Binary Neural Networks"

Implementation of the CVPR 2021 paper "Online Multiple Object Tracking with Cross-Task Synergy"

Distributed Evolutionary Algorithms in Python

PyTorch implementation for Stochastic Fine-grained Labeling of Multi-state Sign Glosses for Continuous Sign Language Recognition.

A crash course in six episodes for software developers who want to become machine learning practitioners.

Homepage of paper: Paint Transformer: Feed Forward Neural Painting with Stroke Prediction, ICCV 2021.

Transfer-Learn is an open-source and well-documented library for Transfer Learning.

Implementation of the master's thesis "Temporal copying and local hallucination for video inpainting".

CM-NAS: Cross-Modality Neural Architecture Search for Visible-Infrared Person Re-Identification (ICCV2021)

Experiments for Operating Systems Lab (ETCS-352)

Setup freqtrade/freqUI on Heroku

Multispectral Object Detection with Yolov5