Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet

Last update: Dec 27, 2022

Overview

Sockeye

This package contains the Sockeye project, an open-source sequence-to-sequence framework for Neural Machine Translation based on Apache MXNet (Incubating). Sockeye powers several Machine Translation use cases, including Amazon Translate. The framework implements state-of-the-art machine translation models with Transformers (Vaswani et al, 2017). Recent developments and changes are tracked in our CHANGELOG.

If you have any questions or discover problems, please file an issue. You can also send questions to sockeye-dev-at-amazon-dot-com.

Version 2.0

With version 2.0, we have updated the usage of MXNet by moving to the Gluon API and adding support for several state-of-the-art features such as distributed training, low-precision training and decoding, as well as easier debugging of neural network architectures. In the context of this rewrite, we also trimmed down the large feature set of version 1.18.x to concentrate on the most important types of models and features, to provide a maintainable framework that is suitable for fast prototyping, research, and production. We welcome Pull Requests if you would like to help with adding back features when needed.

Installation

The easiest way to run Sockeye is with Docker or nvidia-docker. To build a Sockeye image with all features enabled, run the build script:

python3 sockeye_contrib/docker/build.py

See the Dockerfile documentation for more information.

Documentation

For information on how to use Sockeye, please visit our documentation.

For a quickstart guide to training a large data WMT model, see the WMT 2018 German-English tutorial.
Developers may be interested in our developer guidelines.

Citation

For more information about Sockeye, see our papers (BibTeX).

Sockeye 2.x

Tobias Domhan, Michael Denkowski, David Vilar, Xing Niu, Felix Hieber, Kenneth Heafield. The Sockeye 2 Neural Machine Translation Toolkit at AMTA 2020. Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (AMTA'20).

Felix Hieber, Tobias Domhan, Michael Denkowski, David Vilar. Sockeye 2: A Toolkit for Neural Machine Translation. Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, Project Track (EAMT'20).

Sockeye 1.x

Felix Hieber, Tobias Domhan, Michael Denkowski, David Vilar, Artem Sokolov, Ann Clifton, Matt Post. The Sockeye Neural Machine Translation Toolkit at AMTA 2018. Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (AMTA'18).

Felix Hieber, Tobias Domhan, Michael Denkowski, David Vilar, Artem Sokolov, Ann Clifton and Matt Post. 2017. Sockeye: A Toolkit for Neural Machine Translation. ArXiv e-prints.

Research with Sockeye

Sockeye has been used for both academic and industrial research. A list of known publications that use Sockeye is shown below. If you know more, please let us know or submit a pull request (last updated: October 2020).

2020

Dinu, Georgiana, Prashant Mathur, Marcello Federico, Stanislas Lauly, Yaser Al-Onaizan. "Joint translation and unit conversion for end-to-end localization." arXiv preprint arXiv:2004.05219 (2020)
Hisamoto, Sorami, Matt Post, Kevin Duh. "Membership Inference Attacks on Sequence-to-Sequence Models: Is My Data In Your Machine Translation System?" Transactions of the Association for Computational Linguistics, Volume 8 (2020)
Naradowsky, Jason, Xuan Zhan, Kevin Duh. "Machine Translation System Selection from Bandit Feedback." arXiv preprint arXiv:2002.09646 (2020)
Niu, Xing, Prashant Mathur, Georgiana Dinu, Yaser Al-Onaizan. "Evaluating Robustness to Input Perturbations for Neural Machine Translation". arXiv preprint arXiv:2005.00580 (2020)
Niu, Xing, Marine Carpuat. "Controlling Neural Machine Translation Formality with Synthetic Supervision." Proceedings of AAAI (2020)
Keung, Phillip, Julian Salazar, Yichao Liu, Noah A. Smith. "Unsupervised Bitext Mining and Translation via Self-Trained Contextual Embeddings." arXiv preprint arXiv:2010.07761 (2020).
Sokolov, Alex, Tracy Rohlin, Ariya Rastrow. "Neural Machine Translation for Multilingual Grapheme-to-Phoneme Conversion." arXiv preprint arXiv:2006.14194 (2020)
Stafanovičs, Artūrs, Toms Bergmanis, Mārcis Pinnis. "Mitigating Gender Bias in Machine Translation with Target Gender Annotations." arXiv preprint arXiv:2010.06203 (2020)
Stojanovski, Dario, Alexander Fraser. "Addressing Zero-Resource Domains Using Document-Level Context in Neural Machine Translation." arXiv preprint arXiv preprint arXiv:2004.14927 (2020)
Zhang, Xuan, Kevin Duh. "Reproducible and Efficient Benchmarks for Hyperparameter Optimization of Neural Machine Translation Systems." Transactions of the Association for Computational Linguistics, Volume 8 (2020)
Swe Zin Moe, Ye Kyaw Thu, Hnin Aye Thant, Nandar Win Min, and Thepchai Supnithi, "Unsupervised Neural Machine Translation between Myanmar Sign Language and Myanmar Language", Journal of Intelligent Informatics and Smart Technology, April 1st Issue, 2020, pp. 53-61. (Submitted December 21, 2019; accepted March 6, 2020; revised March 16, 2020; published online April 30, 2020)
Thazin Myint Oo, Ye Kyaw Thu, Khin Mar Soe and Thepchai Supnithi, "Neural Machine Translation between Myanmar (Burmese) and Dawei (Tavoyan)", In Proceedings of the 18th International Conference on Computer Applications (ICCA 2020), Feb 27-28, 2020, Yangon, Myanmar, pp. 219-227
Müller, Mathias, Annette Rios, Rico Sennrich. "Domain Robustness in Neural Machine Translation." Proceedings of AMTA (2020)
Rios, Annette, Mathias Müller, Rico Sennrich. "Subword Segmentation and a Single Bridge Language Affect Zero-Shot Neural Machine Translation." Proceedings of the 5th WMT: Research Papers (2020)

2019

Agrawal, Sweta, Marine Carpuat. "Controlling Text Complexity in Neural Machine Translation." Proceedings of EMNLP (2019)
Beck, Daniel, Trevor Cohn, Gholamreza Haffari. "Neural Speech Translation using Lattice Transformations and Graph Networks." Proceedings of TextGraphs-13 (EMNLP 2019)
Currey, Anna, Kenneth Heafield. "Zero-Resource Neural Machine Translation with Monolingual Pivot Data." Proceedings of EMNLP (2019)
Gupta, Prabhakar, Mayank Sharma. "Unsupervised Translation Quality Estimation for Digital Entertainment Content Subtitles." IEEE International Journal of Semantic Computing (2019)
Hu, J. Edward, Huda Khayrallah, Ryan Culkin, Patrick Xia, Tongfei Chen, Matt Post, and Benjamin Van Durme. "Improved Lexically Constrained Decoding for Translation and Monolingual Rewriting." Proceedings of NAACL-HLT (2019)
Rosendahl, Jan, Christian Herold, Yunsu Kim, Miguel Graça,Weiyue Wang, Parnia Bahar, Yingbo Gao and Hermann Ney “The RWTH Aachen University Machine Translation Systems for WMT 2019” Proceedings of the 4th WMT: Research Papers (2019)
Thompson, Brian, Jeremy Gwinnup, Huda Khayrallah, Kevin Duh, and Philipp Koehn. "Overcoming catastrophic forgetting during domain adaptation of neural machine translation." Proceedings of NAACL-HLT 2019 (2019)
Tättar, Andre, Elizaveta Korotkova, Mark Fishel “University of Tartu’s Multilingual Multi-domain WMT19 News Translation Shared Task Submission” Proceedings of 4th WMT: Research Papers (2019)
Thazin Myint Oo, Ye Kyaw Thu and Khin Mar Soe, "Neural Machine Translation between Myanmar (Burmese) and Rakhine (Arakanese)", In Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects, NAACL-2019, June 7th 2019, Minneapolis, United States, pp. 80-88

2018

Domhan, Tobias. "How Much Attention Do You Need? A Granular Analysis of Neural Machine Translation Architectures". Proceedings of 56th ACL (2018)
Kim, Yunsu, Yingbo Gao, and Hermann Ney. "Effective Cross-lingual Transfer of Neural Machine Translation Models without Shared Vocabularies." arXiv preprint arXiv:1905.05475 (2019)
Korotkova, Elizaveta, Maksym Del, and Mark Fishel. "Monolingual and Cross-lingual Zero-shot Style Transfer." arXiv preprint arXiv:1808.00179 (2018)
Niu, Xing, Michael Denkowski, and Marine Carpuat. "Bi-directional neural machine translation with synthetic parallel data." arXiv preprint arXiv:1805.11213 (2018)
Niu, Xing, Sudha Rao, and Marine Carpuat. "Multi-Task Neural Models for Translating Between Styles Within and Across Languages." COLING (2018)
Post, Matt and David Vilar. "Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation." Proceedings of NAACL-HLT (2018)
Schamper, Julian, Jan Rosendahl, Parnia Bahar, Yunsu Kim, Arne Nix, and Hermann Ney. "The RWTH Aachen University Supervised Machine Translation Systems for WMT 2018." Proceedings of the 3rd WMT: Shared Task Papers (2018)
Schulz, Philip, Wilker Aziz, and Trevor Cohn. "A stochastic decoder for neural machine translation." arXiv preprint arXiv:1805.10844 (2018)
Tamer, Alkouli, Gabriel Bretschner, and Hermann Ney. "On The Alignment Problem In Multi-Head Attention-Based Neural Machine Translation." Proceedings of the 3rd WMT: Research Papers (2018)
Tang, Gongbo, Rico Sennrich, and Joakim Nivre. "An Analysis of Attention Mechanisms: The Case of Word Sense Disambiguation in Neural Machine Translation." Proceedings of 3rd WMT: Research Papers (2018)
Thompson, Brian, Huda Khayrallah, Antonios Anastasopoulos, Arya McCarthy, Kevin Duh, Rebecca Marvin, Paul McNamee, Jeremy Gwinnup, Tim Anderson, and Philipp Koehn. "Freezing Subnetworks to Analyze Domain Adaptation in Neural Machine Translation." arXiv preprint arXiv:1809.05218 (2018)
Vilar, David. "Learning Hidden Unit Contribution for Adapting Neural Machine Translation Models." Proceedings of NAACL-HLT (2018)
Vyas, Yogarshi, Xing Niu and Marine Carpuat “Identifying Semantic Divergences in Parallel Text without Annotations”. Proceedings of NAACL-HLT (2018)
Wang, Weiyue, Derui Zhu, Tamer Alkhouli, Zixuan Gan, and Hermann Ney. "Neural Hidden Markov Model for Machine Translation". Proceedings of 56th ACL (2018)
Zhang, Xuan, Gaurav Kumar, Huda Khayrallah, Kenton Murray, Jeremy Gwinnup, Marianna J Martindale, Paul McNamee, Kevin Duh, and Marine Carpuat. "An Empirical Exploration of Curriculum Learning for Neural Machine Translation." arXiv preprint arXiv:1811.00739 (2018)
Swe Zin Moe, Ye Kyaw Thu, Hnin Aye Thant and Nandar Win Min, "Neural Machine Translation between Myanmar Sign Language and Myanmar Written Text", In the second Regional Conference on Optical character recognition and Natural language processing technologies for ASEAN languages 2018 (ONA 2018), December 13-14, 2018, Phnom Penh, Cambodia.
Tang, Gongbo, Mathias Müller, Annette Rios and Rico Sennrich. "Why Self-attention? A Targeted Evaluation of Neural Machine Translation Architectures." Proceedings of EMNLP (2018)

2017

Domhan, Tobias and Felix Hieber. "Using target-side monolingual data for neural machine translation through multi-task learning." Proceedings of EMNLP (2017).

Comments

Unable to install the requirements

Hello,

I have installed Sockeye in an Anaconda (Conda 4.10.3 with Python 3.8.8) environment as explained here: https://awslabs.github.io/sockeye/setup.html

But I can't install mxnet:

Could not find a version that satisfies the requirement mxnet==1.8.0.post0 I tried it with conda install -c anaconda mxnet and with pip install mxnet==1.8.0.post0, but nothing could help.

Do you know why I can't install mxnet?

I want to train the model described here: https://aws.amazon.com/blogs/machine-learning/train-neural-machine-translation-models-with-sockeye/

opened by RamoramaInteractive 39

Sockeye freezes at new validation start [v1.18.54]

For the third time in a few days and on 2 independent trainings, I observed that Sockeye freezes after starting some new validation, i.e. it does not crash, does not send any warning, but stops going forward (0% on CPU/GPU). Here are the last lines of my log file before this issue occurs:

[2018-09-24:21:45:33:INFO:sockeye.training:__call__] Epoch[3] Batch [270000]    Speed: 650.11 samples/sec 22445.47 tokens/sec 2.06 updates/sec  perplexity=3.5
46109
[2018-09-24:21:45:34:INFO:root:save_params_to_file] Saved params to "/run/work/generic_fr2en/model_baseline/params.00007"
[2018-09-24:21:45:34:INFO:sockeye.training:fit] Checkpoint [7]  Updates=270000 Epoch=3 Samples=81602144 Time-cost=4711.141 Updates/sec=2.123
[2018-09-24:21:45:34:INFO:sockeye.training:fit] Checkpoint [7]  Train-perplexity=3.546109
[2018-09-24:21:45:36:INFO:sockeye.training:fit] Checkpoint [7]  Validation-perplexity=3.752938
[2018-09-24:21:45:36:INFO:sockeye.utils:log_gpu_memory_usage] GPU 0: 10093/11178 MB (90.29%) GPU 1: 9791/11178 MB (87.59%) GPU 2: 9795/11178 MB (87.63%) GPU 3: 9789/11178 MB (87.57%)
[2018-09-24:21:45:36:INFO:sockeye.training:collect_results] Decoder-6 finished: {'rouge2-val': 0.4331754429258854, 'rouge1-val': 0.6335038896620699, 'decode-walltime-val': 3375.992604494095, 'rougel-val': 0.5947101830587342, 'avg-sec-per-sent-val': 1.794786073627908, 'chrf-val': 0.6585073715647153, 'bleu-val': 0.43439024563194745}
[2018-09-24:21:45:36:INFO:sockeye.training:start_decoder] Starting process: Decoder-7

So at this point, it has outputted params.00007. When I kill the Sockeye process and restart to continue training, it starts again after validation 6 (update 260000), then later overwrites params.00007, starts Decoder-7 and continues training successfully.

I noted that the freezing occurs at the same moment as in #462, but I have no idea whether it is related to this case. I checked all parameters of the last param file after the issue with numpy.isnan() and no nans were reported.

opened by franckbrl 30

How to measure the BLEU of training/translation

Hi, I just trained a 8 layer rnn model and got the following result:

python -m sockeye.train -s corpus.tc.BPE.de \
                        -t corpus.tc.BPE.en \
                        -vs newstest2016.tc.BPE.de \
                        -vt newstest2016.tc.BPE.en \
                        --num-embed 512 \
                        --rnn-num-hidden 512 \
                        --rnn-attention-type dot \
                        --embed-dropout=0.2 \
                        --rnn-decoder-hidden-dropout=0.2 \
                        --max-seq-len 50 \
                        --decode-and-evaluate 500 \
                        --batch-size 128 \
			--batch-type sentence \
                        -o gnmt_model \
			--optimized-metric bleu \
                        --initial-learning-rate=0.0001 \
                        --learning-rate-reduce-num-not-improved=8 \
                        --learning-rate-reduce-factor=0.7 \
                        --weight-init xavier --weight-init-scale 3.0 \
                        --weight-init-xavier-factor-type avg \
			--lock-dir ~/.temp/ \
                        --num-layers 8:8 \
                        --device-ids 0 1

[2018-07-18:00:56:34:INFO:sockeye.training:fit] Training finished. Best checkpoint: 60. Best validation bleu: 0.746082
[2018-07-18:00:56:34:INFO:sockeye.utils:__exit__] Releasing GPU 1.
[2018-07-18:00:56:34:INFO:sockeye.utils:__exit__] Releasing GPU 0.

Evaluate translation:

[INFO:__main__] bleu	(s_opt)	chrf	(s_opt)
0.171	(-)	0.484	(-)

Is this bleu value correct? It is far away from the bleu value(20+) from the paper.

By the way, how to use sockeye to build a GNMT model to align with the tensorflow's config?

opened by xinyu-intel 26

Sampling chooses vocab index that does not exist with certain random seeds

Running into the following error while sampling with certain seeds:

Traceback (most recent call last):
  File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/net/cephfs/scratch/mathmu/map-volatility/venvs/sockeye3/lib/python3.5/site-packages/sockeye/translate.py", line 269, in <module>
    main()
  File "/net/cephfs/scratch/mathmu/map-volatility/venvs/sockeye3/lib/python3.5/site-packages/sockeye/translate.py", line 46, in main
    run_translate(args)
  File "/net/cephfs/scratch/mathmu/map-volatility/venvs/sockeye3/lib/python3.5/site-packages/sockeye/translate.py", line 155, in run_translate
    input_is_json=args.json_input)
  File "/net/cephfs/scratch/mathmu/map-volatility/venvs/sockeye3/lib/python3.5/site-packages/sockeye/translate.py", line 237, in read_and_translate
    chunk_time = translate(output_handler, chunk, translator)
  File "/net/cephfs/scratch/mathmu/map-volatility/venvs/sockeye3/lib/python3.5/site-packages/sockeye/translate.py", line 260, in translate
    trans_outputs = translator.translate(trans_inputs)
  File "/net/cephfs/scratch/mathmu/map-volatility/venvs/sockeye3/lib/python3.5/site-packages/sockeye/inference.py", line 861, in translate
    results.append(self._make_result(trans_input, translation))
  File "/net/cephfs/scratch/mathmu/map-volatility/venvs/sockeye3/lib/python3.5/site-packages/sockeye/inference.py", line 963, in _make_result
    target_tokens = [self.vocab_target_inv[target_id] for target_id in target_ids]
  File "/net/cephfs/scratch/mathmu/map-volatility/venvs/sockeye3/lib/python3.5/site-packages/sockeye/inference.py", line 963, in <listcomp>
    target_tokens = [self.vocab_target_inv[target_id] for target_id in target_ids]
KeyError: 7525

I am calling Sockeye with a script such as

OMP_NUM_THREADS=1 python -m sockeye.translate \
                -i $data_sub/$corpus.pieces.src \
                -o $samples_sub_sub/$corpus.pieces.$seed.trg \
                -m $model_path \
                --sample \
                --seed $seed \
                --length-penalty-alpha 1.0 \
                --device-ids 0 \
                --batch-size 64 \
                --disable-device-locking

Sockeye and Mxnet versions:

[2020-08-25:17:03:03:INFO:sockeye.utils:log_sockeye_version] Sockeye version 2.1.17, commit 92a020a25cbe75935c700ce2f29b286b31a87189, path /net/cephfs/scratch/mathmu/map-volatility/venvs/sockeye3/lib/python3.5/site-packages/sockeye/__init__.py
[2020-08-25:17:03:03:INFO:sockeye.utils:log_mxnet_version] MXNet version 1.6.0, path /net/cephfs/scratch/mathmu/map-volatility/venvs/sockeye3/lib/python3.5/site-packages/mxnet/__init__.py

Details that may be relevant:

This only happens for certain random --seeds
Running on a Tesla V100
OS: Ubuntu 16.04.6 LTS
the MXnet version in the CUDA 10.2 requirements file (https://github.com/awslabs/sockeye/blob/master/requirements/requirements.gpu-cu102.txt) is no longer available on Pypi. I had to install mxnet-cu102mkl==1.6.0.post0.

The vocabulary does not have this index:


[INFO:sockeye.vocab] Vocabulary (7525 words) loaded from "/net/cephfs/scratch/mathmu/map-volatility/models/bel-eng/baseline/vocab.src.0.json"
[INFO:sockeye.vocab] Vocabulary (7525 words) loaded from "/net/cephfs/scratch/mathmu/map-volatility/models/bel-eng/baseline/vocab.trg.0.json"

I suspect that the sampling procedure somehow assumes 1-based indexing, whereas the vocabulary is 0-indexed. This would mean that there is a small chance that max_vocab_id+1 is picked as the next token.

Looking at the inference code, I am not sure yet why this happens.

sockeye_2

opened by bricksdont 21

Sockeye transformer has different total number of trainable parameters from T2T Transformer

I read your arxiv paper, and I found that the total number of trainable parameters of SOCKEYE transformer is 62,946,611 on EN→DE task while the number is 60,668,928 for T2T transformer. I wonder what contributes to this difference?

opened by szhengac 21
[WIP] Sockeye 2 Performance Optimizations
Made changes to Sockeye 2 to improve the performance of the Transformer model in machine translation. Current changes only apply to inference; optimizations to training are planned for later but before completion of the pull request.

A list of changes detailed below:

Replaced the batch_dot ops in multihead attention with ops that do not require folding heads in with batch dimension; one caveat is that batch and sequence_length dimensions are swapped, requiring some adjustments to other parts of the code to account for the change

Removed the take ops that were applied to the encoder states, as they do not change on different beams; this effectively cuts compute time for these takes in half

Gathered the input token ids into a numpy array on CPU before sending them all to the GPU at the beginning of beam search, rather than sending each batch element to the GPU one at a time

Set the data type of arrays during beam search computation to match the model's data type, rather than explicitly setting it to fp32

Pull Request Checklist

[ ] Changes are complete (if posting work-in-progress code, prefix your pull request title with '[WIP]' until you can check this box.

[ ] Unit tests pass (pytest)

[ ] Were system tests modified? If so did you run these at least 5 times to account for the variation across runs?

[ ] System tests pass (pytest test/system)

[ ] Passed code style checking (./style-check.sh)

[x] You have considered writing a test

[ ] Updated major/minor version in sockeye/__init__.py. Major version bump if this is a backwards incompatible change.

[ ] Updated CHANGELOG.md

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
enhancement sockeye_2
opened by blchu 20
Provide multiple source vocabularies as argument
Following issue #527, --source-vocab can now take multiple files for additional source factor vocabularies.

We may want to consider changing a few variable/parameter names. For instance in train.py, now that args.source_vocab is a list, we may rename it to args.source_vocabs (parameter --source-vocabs), but it would probably not go well with the variable source_vocabs (produced by create_data_iters_and_vocabs()). This would also lead to a backwards-incompatible change.

Unit tests output 11 failures on the current master branch version. Since the content of this PR added no failures, I'm assuming the tests are passed.

Pull Request Checklist

[x] Changes are complete (if posting work-in-progress code, prefix your pull request title with '[WIP]' until you can check this box.

[x] Unit tests pass (pytest)

[ ] Were system tests modified? If so did you run these at least 5 times to account for the variation across runs?

[x] System tests pass (pytest test/system)

[x] Passed code style checking (./style-check.sh)

[ ] You have considered writing a test

[x] Updated major/minor version in sockeye/__init__.py. Major version bump if this is a backwards incompatible change.

[x] Updated CHANGELOG.md

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
enhancement
opened by franckbrl 20
Sockeye 2 Interleaved Multi-head Attention Operators
Replaced batched dot product in multi-head attention with interleaved_matmul attention operators to improve performance. Also changes the batch-major data to time-major format while in the model to comply with the new operator requirements

Pull Request Checklist

[x] Changes are complete (if posting work-in-progress code, prefix your pull request title with '[WIP]' until you can check this box.

[ ] Unit tests pass (pytest)

[x] Were system tests modified? If so did you run these at least 5 times to account for the variation across runs?

[ ] System tests pass (pytest test/system)

[ ] Passed code style checking (./style-check.sh)

[x] You have considered writing a test

[ ] Updated major/minor version in sockeye/__init__.py. Major version bump if this is a backwards incompatible change.

[ ] Updated CHANGELOG.md

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
enhancement sockeye_2
opened by blchu 19
serve sockeye using mxnet-model-server

Hello!

mxnet-model-server seems to be a neat way to serve MXNET model. Does sockeye have a plan to add a serving function using mxnet-model-server?

Besides the mxnet model parameter file and the symbol.json, mxnet-model-server requires a customized data preprocessing and postprocessing pipeline, if I want to write the code myself, is it feasible to do it with sockeye? Would you have some suggestions for that?

opened by boliangz 18

Bug with beam-size=1?

In trying to get tests to passing with scoring (#538), I have turned up some weird behavior with scores output by Sockeye. Here are two commands using a transformer model built in the system tests. Notice:

The invocations differ only in the beam size (1 or 2)
--skip-topk is not enabled
With beam size of 1, the scores output should be impossible, since Sockeye outputs negative logprobs.

Any ideas?

CC: @bricksdont

$ python3 -m sockeye.translate -i src --output-type translation_with_score --use-cpu -m model --beam-size 1 2> /dev/null | head
-10.556	7 5 2 7 3 6 5 4 7 7
-10.727	9 2 4 1 6 7 8 6 8
-12.788	8 6 8 7
-10.413	0 5 0 7 5 9 0 6 3 1
-10.731	7 9 2 6 8 5 0 6 5
-12.490	5 6 3 2
-inf	
-11.242	3 9 1 3 8 7
-15.759	2 1
-10.506	8 8 8 2 4 4 5 5 2 5
$ python3 -m sockeye.translate -i src --output-type translation_with_score --use-cpu -m model --beam-size 2 2> /dev/null | head
0.003	7 5 2 7 3 6 5 4 7 7
0.001	9 2 4 1 6 7 8 6 8
0.000	8 6 8 7
0.002	0 5 0 7 5 9 0 6 3 1
0.001	7 9 2 6 8 5 0 6 5
0.001	5 6 3 2
-inf	
0.001	3 9 1 3 8 7
0.001	2 1
0.002	8 8 8 2 4 4 5 5 2 5

opened by mjpost 18

Source factors
Added source factors, as described in.

Linguistic Input Features Improve Neural Machine Translation. Rico Sennrich & Barry Haddow In Proceedings of the First Conference on Machine Translation. Berlin, Germany, pp. 83-91.

Source factors are enabled by passing --source-factors file1 [file2 ...] (-sf), where file1, etc. are token-parallel to the source (-s). This option can be passed both to sockeye.train or in the data preparation step, if data sharding is used. An analogous parameter, --validation-source-factors, is used to pass factors for validation data. The flag --source-factors-num-embed D1 [D2 ...] denotes the embedding dimensions. These are concatenated with the source word dimension (--num-embed), which can continue to be tied to the target (--weight-tying --weight-tying-type=src_trg).

At test time, the input sentence and its factors can be passed by multiple parallel files (--input and --input-factors) or through stdin with token-level annotations, separated by |. Another way is to send a string-serialized JSON object to the CLI through stdin which needs to have a top-level key called 'text' and optionally a key 'factors' of type List[str].

Pull Request Checklist

[x] Changes are complete (if posting work-in-progress code, prefix your pull request title with '[WIP]' until you can check this box.

[x] Unit tests pass (pytest)

[x] System tests pass (pytest test/system)

[x] Passed code style checking (./pre-commit.sh or manual run of pylint & mypy)

[x] You have considered writing a test

[x] Updated major/minor version in sockeye/__init__.py. Major version bump if this is a backwards incompatible change.

[x] Updated CHANGELOG.md

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
backwards_incompatible
opened by mjpost 18
Specif or omit --shared-vocab consistently when training and preparing the data error

Recently when I am trying to use sockeye3, it always returns an error "Specif or omit --shared-vocab consistently when training and preparing the data" while I am using the sockeye-prepare-data to prepare the data without specifying the shared-vocab. when using sockeye-train, the error shows up. I've confirmed that both the preparing and training procedure did not specify the shared-vocab. So is there anything I can do to fix this problem? or sockeye3 only support shared-vocab?

Best regards Peter

opened by NLP-Peter 1

Releases(3.1.29)

3.1.29(Dec 12, 2022)
[3.1.29]

Changed

Running sockeye-evaluate no longer applies text tokenization for TER (same behavior as other metrics).

Turned on type checking for all sockeye modules except test_utils and addressed resulting type issues.

Refactored code in various modules without changing user-level behavior.

[3.1.28]

Added

Added kNN-MT model from Khandelwal et al., 2021.

Installation: see faiss document -- installation via conda is recommended.

Building a faiss index from a sockeye model takes two steps:

Generate decoder states: sockeye-generate-decoder-states -m [model] --source [src] --target [tgt] --output-dir [output dir]

Build index: sockeye-knn -i [input_dir] -o [output_dir] -t [faiss_index_signature] where input_dir is the same as output_dir from the sockeye-generate-decoder-states command.

Faiss index signature reference: see here

Running inference using the built index: sockeye-translate ... --knn-index [index_dir] --knn-lambda [interpolation_weight] where index_dir is the same as output_dir from the sockeye-knn command.

Source code(tar.gz)
Source code(zip)
3.1.27(Nov 6, 2022)
[3.1.27]

Changed

allow torch 1.13 in requirements.txt

Replaced deprecated torch.testing.assert_allclose with torch.testing.close for PyTorch 1.14 compatibility.

[3.1.26]

Added

--tf32 0|1 bool device (torch.backends.cuda.matmul.allow_tf32) enabling 10-bit precision (19 bit total) transparent float32 acceleration. default true for backward compat with torch < 1.12. allow different --tf32 training continuation

Changed

device.init_device() called by train, translate, and score

allow torch 1.12 in requirements.txt

[3.1.25]

Changed

Updated to sacrebleu==2.3.1. Changed default BLEU floor smoothing offset from 0.01 to 0.1.

[3.1.24]

Fixed

Updated DeepSpeed checkpoint conversion to support newer versions of DeepSpeed.

[3.1.23]

Changed

Change decoder softmax size logging level from info to debug.

[3.1.22]

Added

log beam search avg output vocab size

Changed

common base Search for GreedySearch and BeamSearch

.pylintrc: suppress warnings about deprecated pylint warning suppressions

[3.1.21]

Fixed

Send skip_nvs and nvs_thresh args now to Translator constructor in sockeye-translate instead of ignoring them.

[3.1.20]

Added

Added training support for DeepSpeed.

Installation: pip install deepspeed

Usage: deepspeed --no_python ... sockeye-train ...

DeepSpeed mode uses Zero Redundancy Optimizer (ZeRO) stage 1 (Rajbhandari et al., 2019).

Run in FP16 mode with --deepspeed-fp16 or BF16 mode with --deepspeed-bf16.

[3.1.19]

Added

Clean up GPU and CPU memory used during training initialization before starting the main training loop.

Changed

Refactored training code in advance of adding DeepSpeed support:

Moved logic for flagging interleaved key-value parameters from layers.py to model.py.

Refactored LearningRateScheduler API to be compatible with PyTorch/DeepSpeed.

Refactored optimizer and learning rate scheduler creation to be modular.

Migrated to ModelWithLoss API, which wraps a Sockeye model and its losses in a single module.

Refactored primary and secondary worker logic to reduce redundant calculations.

Refactored code for saving/loading training states.

Added utility code for managing model/training configurations.

Removed

Removed unused training option --learning-rate-t-scale.

[3.1.18]

Added

Added sockeye-train and sockeye-translate option --clamp-to-dtype that clamps outputs of transformer attention, feed-forward networks, and process blocks to the min/max finite values for the current dtype. This can prevent inf/nan values from overflow when running large models in float16 mode. See: https://discuss.huggingface.co/t/t5-fp16-issue-is-fixed/3139

[3.1.17]

Added

Added support for offline model quantization with sockeye-quantize.

Pre-quantizing a model avoids the load-time memory spike of runtime quantization. For example, a float16 model loads directly as float16 instead of loading as float32 then casting to float16.

[3.1.16]

Added

Added nbest list reranking options using isometric translation criteria as proposed in an ICASSP 2021 paper https://arxiv.org/abs/2110.03847. To use this feature pass a criterion (isometric-ratio, isometric-diff, isometric-lc) when specifying --metric.

Added --output-best-non-blank to output non-blank best hypothesis from the nbest list.

[3.1.15]

Fixed

Fix type of valid_length to be pt.Tensor instead of Optional[pt.Tensor] = None for jit tracing

Source code(tar.gz)
Source code(zip)
3.1.14(May 5, 2022)
[3.1.14]

Added

Added the implementation of Neural vocabulary selection to Sockeye as presented in our NAACL 2022 paper "The Devil is in the Details: On the Pitfalls of Vocabulary Selection in Neural Machine Translation" (Tobias Domhan, Eva Hasler, Ke Tran, Sony Trenous, Bill Byrne and Felix Hieber).

To use NVS simply specify --neural-vocab-selection to sockeye-train. This will train a model with Neural Vocabulary Selection that is automatically used by sockeye-translate. If you want look at translations without vocabulary selection specify --skip-nvs as an argument to sockeye-translate.

[3.1.13]

Added

Added sockeye-train argument --no-reload-on-learning-rate-reduce that disables reloading the best training checkpoint when reducing the learning rate. This currently only applies to the plateau-reduce learning rate scheduler since other schedulers do not reload checkpoints.

Source code(tar.gz)
Source code(zip)
3.1.12(Apr 26, 2022)
[3.1.12]

Fixed

Fix scoring with batches of size 1 (whic may occur when |data| % batch_size == 1.

[3.1.11]

Fixed

When resuming training with a fully trained model, sockeye-train will correctly exit without creating a duplicate (but separately numbered) checkpoint.

Source code(tar.gz)
Source code(zip)
3.1.10(Apr 12, 2022)
[3.1.10]

Fixed

When loading parameters, SockeyeModel now ignores false positive missing parameters for traced modules. These modules use the same parameters as their original non-traced versions.

Source code(tar.gz)
Source code(zip)
3.1.9(Apr 11, 2022)
[3.1.9]

Changed

Clarified usage of batch_size in Translator code.

[3.1.8]

Fixed

When saving parameters, SockeyeModel now skips parameters for traced modules because these modules are created at runtime and use the same parameters as non-traced versions. When loading parameters, SockeyeModel ignores parameters for traced modules that may have been saved by earlier versions.

Source code(tar.gz)
Source code(zip)
3.1.7(Mar 23, 2022)
[3.1.7]

Changed

SockeyeModel components are now traced regardless of whether inference_only is set, including for the CheckpointDecoder during training.

[3.1.6]

Changed

Moved offsetting of topk scores out of the (traced) TopK module. This allows sending requests of variable batch size to the same Translator/Model/BeamSearch instance.

[3.1.5]

Changed

Allow PyTorch 1.11 in requirements

Source code(tar.gz)
Source code(zip)
3.1.4(Mar 10, 2022)
[3.1.4]

Added

Added support for the use of adding target prefix and target prefix factors to the input in JSON format during inference.

Source code(tar.gz)
Source code(zip)
3.1.3(Feb 28, 2022)
[3.1.3]

Added

Added support for the use of adding source prefixes to the input in JSON format during inference.

[3.1.2]

Changed

Optimized creation of source length mask by using expand instead of repeat_interleave.

[3.1.1]

Changed

Updated torch dependency to 1.10.x (torch>=1.10.0,<1.11.0)

Source code(tar.gz)
Source code(zip)
3.1.0(Feb 11, 2022)
[3.1.0]

Sockeye is now exclusively based on Pytorch.

Changed

Renamed x_pt modules to x. Updated entry points in setup.py.

Removed

Removed MXNet from the codebase

Removed device locking / GPU acquisition logic. Removed dependency on portalocker.

Removed arguments --softmax-temperature, --weight-init-*, --mc-dropout, --horovod, --device-ids

Removed all MXNet-related tests

Source code(tar.gz)
Source code(zip)
3.0.15(Feb 9, 2022)
[3.0.15]

Fixed

Fixed GPU-based scoring by copying to cpu tensor first before converting to numpy.

[3.0.14]

Added

Added support for Translation Error Rate (TER) metric as implemented in sacrebleu==1.4.14. Checkpoint decoder metrics will now include TER scores and early stopping can be determined via TER improvements (--optimized-metric ter)

Source code(tar.gz)
Source code(zip)
3.0.13(Feb 3, 2022)
[3.0.13]

Changed

use expand instead of repeat for attention masks to not allocate additional memory

avoid repeated transpose for initializing cached encoder-attention states in the decoder.

[3.0.12]

Removed

Removed unused code for Weight Normalization. Minor code cleanups.

[3.0.11]

Fixed

Fixed training with a single, fixed learning rate instead of a rate scheduler (--learning-rate-scheduler none --initial-learning-rate ...).

Source code(tar.gz)
Source code(zip)
3.0.10(Jan 19, 2022)
[3.0.10]

Changed

End-to-end trace decode_step of the Sockeye model. Creates less overhead during decoding and a small speedup.

[3.0.9]

Fixed

Fixed not calling the traced target embedding module during inference.

[3.0.8]

Changed

Add support for JIT tracing source/target embeddings and JIT scripting the output layer during inference.

Source code(tar.gz)
Source code(zip)
3.0.7(Dec 20, 2021)
[3.0.7]

Changed

Improve training speed by usingtorch.nn.functional.multi_head_attention_forward for self- and encoder-attention during training. Requires reorganization of the parameter layout of the key-value input projections, as the current Sockeye attention interleaves for faster inference. Attention masks (both for source masking and autoregressive masks need some shape adjustments as requirements for the fused MHA op differ slightly).

Non-interleaved format for joint key-value input projection parameters: in_features=hidden, out_features=2*hidden -> Shape: (2*hidden, hidden)

Interleaved format for joint-key-value input projection stores key and value parameters, grouped by heads: Shape: ((num_heads * 2 * hidden_per_head), hidden)

Models save and load key-value projection parameters in interleaved format.

When model.training == True key-value projection parameters are put into non-interleaved format for torch.nn.functional.multi_head_attention_forward

When model.training == False, i.e. model.eval() is called, key-value projection parameters are again converted into interleaved format in place.

[3.0.6]

Fixed

Fixed checkpoint decoder issue that prevented using bleu as --optimized-metric for distributed training (#995).

[3.0.5]

Fixed

Fixed data download in multilingual tutorial.

Source code(tar.gz)
Source code(zip)
3.0.4(Dec 13, 2021)
[3.0.4]

Make sure data permutation indices are in int64 format (doesn't seem to be the case by default on all platforms).

[3.0.3]

Fixed

Fixed ensemble decoding for models without target factors.

[3.0.2]

Changed

sockeye-translate: Beam search now computes and returns secondary target factor scores. Secondary target factors do not participate in beam search, but are greedily chosen at every time step. Accumulated scores for secondary factors are not normalized by length. Factor scores are included in JSON output (--output-type json).

sockeye-score now returns tab-separated scores for each target factor. Users can decide how to combine factor scores depending on the downstream application. Score for the first, primary factor (i.e. output words) are normalized, other factors are not.

[3.0.1]

Fixed

Parameter averaging (sockeye-average) now always uses the CPU, which enables averaging parameters from GPU-trained models on CPU-only hosts.

Source code(tar.gz)
Source code(zip)
3.0.0(Nov 30, 2021)
[3.0.0] Sockeye 3: Fast Neural Machine Translation with PyTorch

Sockeye is now based on PyTorch. We maintain backwards compatibility with MXNet models in version 2.3.x until 3.1.0. If MXNet 2.x is installed, Sockeye can run both with PyTorch or MXNet but MXNet is no longer strictly required.

Added

Added model converter CLI sockeye.mx_to_pt that converts MXNet models to PyTorch models.

Added --apex-amp training argument that runs entire model in FP16 mode, replaces --dtype float16 (requires Apex).

Training automatically uses Apex fused optimizers if available (requires Apex).

Added training argument --label-smoothing-impl to choose label smoothing implementation (default of mxnet uses the same logic as MXNet Sockeye 2).

Changed

CLI names point to the PyTorch code base (e.g. sockeye-train etc.).

MXNet-based CLIs are now accessible via sockeye-<name>-mx.

MXNet code requires MXNet >= 2.0 since we adopted the new numpy interface.

sockeye-train now uses PyTorch's distributed data-parallel mode for multi-process (multi-GPU) training. Launch with: torchrun --no_python --nproc_per_node N sockeye-train --dist ...

Updated the quickstart tutorial to cover multi-device training with PyTorch Sockeye.

Changed --device-ids argument (plural) to --device-id (singular). For multi-GPU training, see distributed mode noted above.

Updated default value: --pad-vocab-to-multiple-of 8

Removed --horovod argument used with horovodrun (use --dist with torchrun).

Removed --optimizer-params argument (use --optimizer-betas, --optimizer-eps).

Removed --no-hybridization argument (use PYTORCH_JIT=0, see Disable JIT for Debugging).

Removed --omp-num-threads argument (use --env=OMP_NUM_THREADS=N).

Removed

Removed support for constrained decoding (both positive and negative lexical constraints)

Removed support for beam histories

Removed --amp-scale-interval argument.

Removed --kvstore argument.

Removed arguments: --weight-init, --weight-init-scale --weight-init-xavier-factor-type, --weight-init-xavier-rand-type

Removed --decode-and-evaluate-device-id argument.

Removed arguments: --monitor-pattern', --monitor-stat-func

Removed CUDA-specific requirements files in requirements/

Source code(tar.gz)
Source code(zip)
2.3.24(Nov 5, 2021)
[2.3.24]

Added

Use of the safe yaml loader for the model configuration files.

[2.3.23]

Changed

Do not sort BIAS_STATE in beam search. It is constant across decoder steps.

Source code(tar.gz)
Source code(zip)
2.3.22(Sep 30, 2021)
[2.3.22]

Fixed

The previous commit introduced a regression for vocab creation. The results was that the vocabulary was created on the input characters rather than on tokens.

[2.3.21]

Added

Extended parallelization of data preparation to vocabulary and statistics creation while minimizing the overhead of sharding.

[2.3.20]

Added

Added debug logging for restrict_lexicon lookups

[2.3.19]

Changed

When training only the decoder (--fixed-param-strategy all_except_decoder), disable autograd for the encoder and embeddings to save memory.

[2.3.18]

Changed

Updated Docker builds and documentation. See sockeye_contrib/docker.

Source code(tar.gz)
Source code(zip)
wmt14_en_de.tgz(1131.78 MB)
2.3.17(Jun 17, 2021)
[2.3.17]

Added

Added an alternative, faster implementation of greedy search. The '--greedy' flag to sockeye.translate will enable it. This implementation does not support hypothesis scores, batch decoding, or lexical constraints."

[2.3.16]

Added

Added option --transformer-feed-forward-use-glu to use Gated Linear Units in transformer feed forward networks (Dauphin et al., 2016; Shazeer, 2020).

[2.3.15]

Changed

Optimization: Decoder class is now a complete HybridBlock (no forward method).

Source code(tar.gz)
Source code(zip)
2.3.14(Apr 7, 2021)
[2.3.14]

Changed

Updated to MXNet 1.8.0

Removed dependency support for Cuda 9.2 (no longer supported by MXNet 1.8).

Added dependency support for Cuda 11.0 and 11.2.

Updated Python requirement to 3.7 and later. (Removed backporting dataclasses requirement)

[2.3.13]

Added

Target factors are now also collected for nbest translations (and stored in the JSON output handler).

[2.3.12]

Added

Added --config option to prepare_data CLI to allow setting commandline flags via a yaml config.

Flags for the prepare_data CLI are now stored in the output folder under args.yaml (equivalent to the behavior of sockeye_train)

[2.3.11]

Added

Added option prevent_unk to avoid generating <unk> token in beam search.

Source code(tar.gz)
Source code(zip)
2.3.10(Feb 8, 2021)
[2.3.10]

Changed

Make sure that the top N best params files retained, even if N > --keep-last-params. This ensures that model averaging will not be crippled when keeping only a few params files during training. This can result in a significant savings of disk space during training.

[2.3.9]

Added

Added scripts for processing Sockeye benchmark output (--output-type benchmark):

benchmark_to_output.py extracts translations

benchmark_to_percentiles.py computes percentiles

Source code(tar.gz)
Source code(zip)
2.3.8(Jan 8, 2021)
[2.3.8]

Fixed

Fix problem identified in issue #925 that caused learning rate warmup to fail in some instances when doing continued training

[2.3.7]

Changed

Use dataclass module to simplify Config classes. No functional change.

[2.3.6]

Fixed

Fixes the problem identified in issue #890, where the lr_scheduler does not behave as expected when continuing training. The problem is that the lr_scheduler is kept as part of the optimizer, but the optimizer is not saved when saving state. Therefore, every time training is restarted, a new lr_scheduler is created with initial parameter settings. Fix by saving and restoring the lr_scheduling separately.

[2.3.5]

Fixed

Fixed issue with LearningRateSchedulerPlateauReduce.repr printing out num_not_improved instead of reduce_num_not_improved.

[2.3.4]

Fixed

Fixed issue with dtype mismatch in beam search when translating with --dtype float16.

[2.3.3]

Changed

Upgraded SacreBLEU dependency of Sockeye to a newer version (1.4.14).

Source code(tar.gz)
Source code(zip)
2.3.2(Nov 18, 2020)
[2.3.2]

Fixed

Fixed edge case that unintentionally skips softmax for sampling if beam size is 1.

[2.3.1]

Fixed

Optimizing for BLEU/CHRF with horovod required the secondary workers to also create checkpoint decoders.

[2.3.0]

Added

Added support for target factors. If provided with additional target-side tokens/features (token-parallel to the regular target-side) at training time, the model can now learn to predict these in a multi-task setting. You can provide target factor data similar to source factors: --target-factors <factor_file1> [<factor_fileN>]. During training, Sockeye optimizes one loss per factor in a multi-task setting. The weight of the losses can be controlled by --target-factors-weight. At inference, target factors are decoded greedily, they do not participate in beam search. The predicted factor at each time step is the argmax over its separate output layer distribution. To receive the target factor predictions at inference time, use --output-type translation_with_factors.

Changed

load_model(s) now returns a list of target vocabs.

Default source factor combination changed to sum (was concat before).

SockeyeModel class has three new properties: num_target_factors, target_factor_configs, and factor_output_layers.

Source code(tar.gz)
Source code(zip)
2.2.8(Nov 5, 2020)
[2.2.8]

Changed

Make source/target data parameters required for the scoring CLI to avoid cryptic error messages.

[2.2.7]

Added

Added an argument to specify the log level of secondary workers. Defaults to ERROR to hide any logs except for exceptions.

[2.2.6]

Fixed

Avoid a crash due to an edge case when no model improvement has been observed by the time the learning rate gets reduced for the first time.

[2.2.5]

Fixed

Enforce sentence batching for sockeye score tool, set default batch size to 56

[2.2.4]

Changed

Use softmax with length in DotAttentionCell.

Use contrib.arange_like in AutoRegressiveBias block to reduce number of ops.

[2.2.3]

Added

Log the absolute number of <unk> tokens in source and target data

[2.2.2]

Fixed

Fix: Guard against null division for small batch sizes.

[2.2.1]

Fixed

Fixes a corner case bug by which the beam decoder can wrongly return a best hypothesis with -infinite score.

Source code(tar.gz)
Source code(zip)
2.2.0(Oct 4, 2020)
[2.2.0]

Changed

Replaced multi-head attention with interleaved_matmul_encdec operators, which removes previously needed transposes and improves performance.

Beam search states and model layers now assume time-major format.

[2.1.26]

Fixed

Fixes a backwards incompatibility introduced in 2.1.17, which would prevent models trained with prior versions to be used for inference.

[2.1.25]

Changed

Reverting PR #772 as it causes issues with amp.

[2.1.24]

Changed

Make sure to write a final checkpoint when stopping with --max-updates, --max-samples or --max-num-epochs.

[2.1.23]

Changed

Updated to MXNet 1.7.0.

Re-introduced use of softmax with length parameter in DotAttentionCell (see PR #772).

[2.1.22]

Added

Re-introduced --softmax-temperature flag for sockeye.score and sockeye.translate.

Source code(tar.gz)
Source code(zip)
2.1.21(Aug 27, 2020)
[2.1.21]

Added

Added an optional ability to cache encoder outputs of model.

[2.1.20]

Fixed

Fixed a bug where the training state object was saved to disk before training metrics were added to it, leading to an inconsistency between the training state object and the metrics file (see #859).

[2.1.19]

Fixed

When loading a shard in Horovod mode, there is now a check that each non-empty bucket contains enough sentences to cover each worker's slice. If not, the bucket's sentences are replicated to guarantee coverage.

[2.1.18]

Fixed

Fixed a bug where sampling translation fails because an array is created in the wrong context.

Source code(tar.gz)
Source code(zip)
2.1.17(Aug 20, 2020)
[2.1.17]

Added

Added layers.SSRU, which implements a Simpler Simple Recurrent Unit as described in Kim et al, "From Research to Production and Back: Ludicrously Fast Neural Machine Translation" WNGT 2019.

Added ssru_transformer option to --decoder, which enables the usage of SSRUs as a replacement for the decoder-side self-attention layers.

Changed

Reduced the number of arguments for MultiHeadSelfAttention.hybrid_forward(). previous_keys and previous_values should now be input together as previous_states, a list containing two symbols.

Source code(tar.gz)
Source code(zip)
2.1.16(Jul 31, 2020)
[2.1.16]

Fixed

Fixed batch sizing error introduced in version 2.1.12 (c00da52) that caused batch sizes to be multiplied by the number of devices. Batch sizing now works as documented (same as pre-2.1.12 versions).

Fixed max-word batching to properly size batches to a multiple of both --batch-sentences-multiple-of and the number of devices.

[2.1.15]

Added

Inference option --mc-dropout to use dropout during inference, leading to non-deterministic output. This option uses the same dropout parameters present in the model config file.

[2.1.14]

Added

Added sockeye.rerank option --output to specify output file.

Added sockeye.rerank option --output-reference-instead-of-blank to output reference line instead of best hypothesis when best hypothesis is blank.

Source code(tar.gz)
Source code(zip)
2.1.13(Jul 7, 2020)
[2.1.13]

Added

Training option --quiet-secondary-workers that suppresses console output for secondary workers when training with Horovod/MPI.

Set version of isort to <5.0.0 in requirements.dev.txt to avoid incompatibility between newer versions of isort and pylint.

[2.1.12]

Added

Batch type option max-word for max number of words including padding tokens (more predictable memory usage than word).

Batching option --batch-sentences-multiple-of that is similar to --round-batch-sizes-to-multiple-of but always rounds down (more predictable memory usage).

Changed

Default bucketing settings changed to width 8, max sequence length 95 (96 including BOS/EOS tokens), and no bucket scaling.

Argument --no-bucket-scaling replaced with --bucket-scaling which is False by default.

[2.1.11]

Changed

Updated sockeye.rerank module to use "add-k" smoothing for sentence-level BLEU.

Fixed

Updated sockeye.rerank module to use current N-best format.

Source code(tar.gz)
Source code(zip)
2.1.10(Jun 23, 2020)
[2.1.10]

Changed

Changed to a cross-entropy loss implementation that avoids the use of SoftmaxOutput.

[2.1.9]

Added

Added training argument --ignore-extra-params to ignore extra parameters when loading models. The primary use case is continuing training with a model that has already been annotated with scaling factors (sockeye.quantize).

Fixed

Properly pass allow_missing flag to model.load_parameters()

[2.1.8]

Changed

Update to sacrebleu=1.4.10

Source code(tar.gz)
Source code(zip)

Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet

Related tags

Overview

Sockeye

Version 2.0

Installation

Documentation

Citation

Sockeye 2.x

Sockeye 1.x

Research with Sockeye

2020

2019

2018

2017

Comments

Pull Request Checklist

Pull Request Checklist

Pull Request Checklist

Pull Request Checklist

Releases(3.1.29)

3.1.29(Dec 12, 2022)

[3.1.29]

Changed

[3.1.28]

Added

3.1.27(Nov 6, 2022)

[3.1.27]

Changed

[3.1.26]

Added

Changed

[3.1.25]

Changed

[3.1.24]

Fixed

[3.1.23]

Changed

[3.1.22]

Added

Changed

[3.1.21]

Fixed

[3.1.20]

Added

[3.1.19]

Added

Changed

Removed

[3.1.18]

Added

[3.1.17]

Added

[3.1.16]

Added

[3.1.15]

Fixed

3.1.14(May 5, 2022)

[3.1.14]

Added

[3.1.13]

Added

3.1.12(Apr 26, 2022)

[3.1.12]

Fixed

[3.1.11]

Fixed

3.1.10(Apr 12, 2022)

[3.1.10]

Fixed

3.1.9(Apr 11, 2022)

[3.1.9]

Changed

[3.1.8]

Fixed

3.1.7(Mar 23, 2022)

[3.1.7]

Changed

[3.1.6]

Changed