System Combination for Grammatical Error Correction Based on Integer Programming

Last update: Mar 29, 2022

Related tags

Overview

System Combination for Grammatical Error Correction Based on Integer Programming

This repository contains the code and scripts that implement the system combination approach for grammatical error correction in Lin and Ng (2021).

Reference

Ruixi Lin and Hwee Tou Ng (2021). System Combination for Grammatical Error Correction Based on Integer Programming.

Please cite:

@inproceedings{lin2021gecip,
  author    = "Lin, Ruixi and Ng, Hwee Tou",
  title     = "System Combination for Grammatical Error Correction Based on Integer Programming",
  booktitle = "Proceedings of Recent Advances in Natural Language Processing",
  year      = "2021",
  pages     = "829-834"
}

Table of contents

Prerequisites

Example

License

Prerequisites

conda create --name comb python=3.6
conda activate comb
pip install spacy
python -m spacy download en

For the nonlinear integer programming solver, we use

LINGO10.0

Note that educational institutions can obtain a free license to use the LINGO solver.

Example

Combine the 3 GEC systems listed in the paper using the IP approach. The three systems are UEdin-MS (https://aclanthology.org/W19-4427), Kakao (https://aclanthology.org/W19-4423), and Tohoku (https://aclanthology.org/D19-1119). The core functions for the IP objective are implemented in model.lg4. You can find model.lg4 under lingo/inputs.

Run python prepare_data.py -dir . -list kakao uedinms tohoku to generate aggregated TP, FP, and FN counts. The counts files are stored under lingo/inputs.
Load model.lg4 into the LINGO console and specify the input data path with the counts file path, select the INLP model, and run optimizations. Store the solutions to lingo/outputs/sol_kakao_uedinms_tohoku.txt.
Run ./comb.sh . sol_kakao_uedinms_tohoku.txt to load LINGO solutions, merge and apply edits. The resulted blind test file can be found under submissions. It can be zipped and submitted to the BEA CodeLab website (https://competitions.codalab.org/competitions/20228) for evaluations.

The data folder provides individual GEC system output files, and .m2 files generated using ERRANT for the listed systems. For more information, please visit the ERRANT github page.

We include the IP combined .m2 files under merged_m2, and the corresponding text files under submissions.

License

The source code and models in this repository are licensed under the GNU General Public License v3.0 (see LICENSE). For further research interests and commercial use of the code and models, please contact Ruixi Lin ([email protected]) and Prof. Hwee Tou Ng ([email protected]).

System Combination for Grammatical Error Correction Based on Integer Programming

Related tags

Overview

System Combination for Grammatical Error Correction Based on Integer Programming

Reference

Prerequisites

Example

License

Owner

NUS NLP Group

Neural-PIL: Neural Pre-Integrated Lighting for Reflectance Decomposition - NeurIPS2021

Indonesian Car License Plate Character Recognition using Tensorflow, Keras and OpenCV.

Shallow Convolutional Neural Networks for Human Activity Recognition using Wearable Sensors

Semi-supervised Domain Adaptation via Minimax Entropy

SEAN: Image Synthesis with Semantic Region-Adaptive Normalization (CVPR 2020, Oral)

Trained on Simulated Data, Tested in the Real World

Open source Python module for computer vision

Effective Use of Transformer Networks for Entity Tracking

Traductor de lengua de señas al español basado en Python con Opencv y MedaiPipe

This is the implementation of GGHL (A General Gaussian Heatmap Labeling for Arbitrary-Oriented Object Detection)

Reproduce partial features of DeePMD-kit using PyTorch.

An Official Repo of CVPR '20 "MSeg: A Composite Dataset for Multi-Domain Segmentation"

This repository contains the source code for the paper Tutorial on amortized optimization for learning to optimize over continuous domains by Brandon Amos

Complete U-net Implementation with keras

Deep Residual Learning for Image Recognition

Python package for covariance matrices manipulation and Biosignal classification with application in Brain Computer interface

ShinRL: A Library for Evaluating RL Algorithms from Theoretical and Practical Perspectives

Official PyTorch Implementation of paper "NeLF: Neural Light-transport Field for Single Portrait View Synthesis and Relighting", EGSR 2021.

Stacs-ci - A set of modules to enable integration of STACS with commonly used CI / CD systems

Code repository for "Stable View Synthesis".