Skip to content

argosopentech/argos-train

Repository files navigation

Argos Train

Argos Train trains an OpenNMT PyTorch Transformer model and a SentencePiece tokenizer and packages them with Stanza data as an Argos Translate package. Argos Translate packages, which are zip archives with a .argosmodel extension, can be used with Argos Translate and LibreTranslate.

Pre-trained Argos Translate packages are available for download. If you have trained packages you're willing to share please get in contact so that they can be published on the Argos Translate package index.

LibreTranslate/Locomotive has similar functionality to Argos Train and can also be used to train translation models.

Training example

From inside argosopentech/argostrain Docker container:

$ su argosopentech
$ source ~/argos-train-init

...


$ argos-train
From code (ISO 639): en
To code (ISO 639): es
From name: English
To name: Spanish
Version: 1.0

...

Package saved to /home/argosopentech/argos-train/run/en_es.argosmodel

Data

Data from data-index.json is used for training. Argos Translate primarily uses data from the Opus project.

To train a model with custom data add your data to data-index.json after running argos-train-init with a link to download your custom data package. Data packages are zipped directories with a .argosdata extension (example) that contain a source and target file with parallel data in corresponding lines and a metadata.json file. The data packages are downloaded with HTTP and you will need to run a web server like Nginx to host custom data.

You can also manually load data by putting your data at run/source and run/target and setting data_exists=True in bin/argos-train.

You can use this project to automatically download data from Opus.

Docker

Docker image available at argosopentech/argostrain.

docker run -it argosopentech/argostrain /bin/bash

Run training

argos-train

Environment

CUDA required, tested on vast.ai.

Vast.ai seems to reckognize the CUDA version of the Docker container incorrectly so you may need to check the "Incompatible Machines" option if you're using vast.ai.

Manually creating an Argos Translate package

If you don't want to use Argos Train you can manually train a model with OpenNMT and package it for Argos Translate. Argos Translate packages are a zip archive with a .argosmodel extension containing; a CTranslate2 model, a SentencePiece model, a Stanza 1.1.1 model, and a metadata file. Reference the training script at bin/argos-train for more information.

Documentation

Contributing

Contributions are welcome! Please make a pull request.

Roadmap

License

Licensed under either the MIT or Creative Commons CC0 License