Skip to content

ImperialNLP/pysimt

Repository files navigation

pysimt

License: MIT Python 3.7 Python 3.8 Documentation coverage

This repository includes the codes, the experiment configurations and the scripts to prepare/download data for the Simultaneous Machine Translation with Visual Context paper presented at EMNLP 2020.

Note for RL-based codebase

Please visit the sim-mt repository for the implementation of our RL-based pipeline. Specifically, sim-mt provides codebase for the following papers:

Overview

pysimt is a PyTorch-based sequence-to-sequence framework that facilitates research in unimodal and multi-modal machine translation. The framework is especially geared towards a set of recent simultaneous MT approaches, including heuristics-based decoding and prefix-to-prefix training/decoding. Common metrics such as average proportion (AP), average lag (AL), and consecutive wait (CW) are provided through well-defined APIs as well.

Please visit https://imperialnlp.github.io/pysimt for detailed documentation.

Citation

@inproceedings{caglayan-etal-2020-simultaneous,
    title = "Simultaneous Machine Translation with Visual Context",
    author = {Caglayan, Ozan  and
      Ive, Julia  and
      Haralampieva, Veneta  and
      Madhyastha, Pranava  and
      Barrault, Lo{\"\i}c  and
      Specia, Lucia},
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.emnlp-main.184",
    pages = "2350--2361",
}

Installation

The essential dependency of pysimt is torch>=1.7. The following command will create an appropriate Anaconda environment with pysimt installed within in editable mode.

conda env create -f environment.yml

Once the installation is done, run pysimt-install-extra command if you want to use METEOR as an evaluation metric.