Tackling data scarcity in Speech Translation using zero-shot multilingual Machine Translation techniques

This repository is derived from the NMTGMinor project at https://github.com/quanpn90/NMTGMinor
The SVCCA calculation is derived from https://github.com/nlp-dke/svcca

Powered by Mediaan.com

Speech Translation (ST) is the task of translating speech audio in a source language into text in a target language. This repository implements and experiments on different approaches for ST:

Cascaded ST, including 2 steps: Automatic Speech Recognition (ASR) and Machine Translation (MT)
Direct ST: models trained only on ST data
(Main contribution) End-to-end ST limiting the use of ST data: multi-modal models leveraging ASR and MT training data for ST task

The Transformer architecture is used as the baseline for the implementation.

High-level instruction to use the repo:

Run covost_data_preparation.py to download and preprocess the data.
Run the shell script of interst, change the variables in the script if needed.
- run_translation_pipeline.sh for single-task models (ASR, MT, ST)
- cascaded_ST_evaluation.sh evaluates cascaded ST using pretrained ASR and MT models
- run_translation_multi_modalities_pipeline.sh for multi-task, multi-modality models (including zero-shot)
- run_zeroshot_with_artificial_data.sh for zero-shot models using data augmentation
- run_bidirectional_zeroshot.sh for zero-shot models using additional opposite training data
- run_fine_tunning.sh, run_fine_tunning_fromASR.sh for fine-tuning models with ST data, resulting in few-shot models
- modality_similarity_svcca.sh, modality_similarity_classifier.sh measure text-audio similarity in representation

See notebooks/Repo_Instruction.ipynb for more details.

Citation

@INPROCEEDINGS{9746815,
  author={Dinh, Tu Anh and Liu, Danni and Niehues, Jan},
  booktitle={ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, 
  title={Tackling Data Scarcity in Speech Translation Using Zero-Shot Multilingual Machine Translation Techniques}, 
  year={2022},
  volume={},
  number={},
  pages={6222-6226},
  doi={10.1109/ICASSP43922.2022.9746815}}

Name		Name	Last commit message	Last commit date
Latest commit History 531 Commits
activations		activations
ae		ae
experiments		experiments
models		models
notebooks		notebooks
onmt		onmt
recipes		recipes
report		report
svcca		svcca
tools		tools
.gitignore		.gitignore
CODEOWNERS		CODEOWNERS
LICENSE.md		LICENSE.md
README.md		README.md
autoencoder.py		autoencoder.py
average_checkpoints_auto.py		average_checkpoints_auto.py
cascaded_ST_evaluation.sh		cascaded_ST_evaluation.sh
covost_data_preparation.py		covost_data_preparation.py
eval_autoencoder.py		eval_autoencoder.py
finding_best_model.py		finding_best_model.py
finding_latest_model.py		finding_latest_model.py
get_covost_splits.py		get_covost_splits.py
grad_check.py		grad_check.py
grad_check_relative_attention.py		grad_check_relative_attention.py
make_reverse_text.py		make_reverse_text.py
modality_similarity_classifier.sh		modality_similarity_classifier.sh
modality_similarity_svcca.sh		modality_similarity_svcca.sh
online.py		online.py
options.py		options.py
post_process_activations.py		post_process_activations.py
preprocess.py		preprocess.py
preprocess_multi_dataset.py		preprocess_multi_dataset.py
requirements.txt		requirements.txt
rescore.py		rescore.py
run_bidirectional_zeroshot.sh		run_bidirectional_zeroshot.sh
run_fine_tunning.sh		run_fine_tunning.sh
run_fine_tunning_fromASR.sh		run_fine_tunning_fromASR.sh
run_training_all_directions_small_ST.sh		run_training_all_directions_small_ST.sh
run_translation_multi_modalities_pipeline.sh		run_translation_multi_modalities_pipeline.sh
run_translation_pipeline.sh		run_translation_pipeline.sh
run_zeroshot_with_artificial_data.sh		run_zeroshot_with_artificial_data.sh
sample_lm.py		sample_lm.py
train.py		train.py
train_distributed.py		train_distributed.py
train_language_model.py		train_language_model.py
translate.py		translate.py
translation_evaluation.py		translation_evaluation.py
vocab_generator.py		vocab_generator.py

License

TuAnh23/MultiModalST

Folders and files

Latest commit

History

Repository files navigation

Tackling data scarcity in Speech Translation using zero-shot multilingual Machine Translation techniques

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages