Tackling data scarcity in Speech Translation using zero-shot multilingual Machine Translation techniques

Last update: Sep 07, 2022

Overview

Tackling data scarcity in Speech Translation using zero-shot multilingual Machine Translation techniques

This repository is derived from the NMTGMinor project at https://github.com/quanpn90/NMTGMinor
The SVCCA calculation is derived from https://github.com/nlp-dke/svcca

Powered by Mediaan.com

Speech Translation (ST) is the task of translating speech audio in a source language into text in a target language. This repository implements and experiments on different approaches for ST:

Cascaded ST, including 2 steps: Automatic Speech Recognition (ASR) and Machine Translation (MT)
Direct ST: models trained only on ST data
(Main contribution) End-to-end ST limiting the use of ST data: multi-modal models leveraging ASR and MT training data for ST task

The Transformer architecture is used as the baseline for the implementation.

High-level instruction to use the repo:

Run covost_data_preparation.py to download and preprocess the data.
Run the shell script of interst, change the variables in the script if needed.
- run_translation_pipeline.sh for single-task models (ASR, MT, ST)
- cascaded_ST_evaluation.sh evaluates cascaded ST using pretrained ASR and MT models
- run_translation_multi_modalities_pipeline.sh for multi-task, multi-modality models (including zero-shot)
- run_zeroshot_with_artificial_data.sh for zero-shot models using data augmentation
- run_bidirectional_zeroshot.sh for zero-shot models using additional opposite training data
- run_fine_tunning.sh, run_fine_tunning_fromASR.sh for fine-tuning models with ST data, resulting in few-shot models
- modality_similarity_svcca.sh, modality_similarity_classifier.sh measure text-audio similarity in representation

See notebooks/Repo_Instruction.ipynb for more details.

Tackling data scarcity in Speech Translation using zero-shot multilingual Machine Translation techniques

Related tags

Overview

Tackling data scarcity in Speech Translation using zero-shot multilingual Machine Translation techniques

Owner

Tu Anh Dinh

GNEE - GAT Neural Event Embeddings

My 1st place solution at Kaggle Hotel-ID 2021

a pytorch implementation of auto-punctuation learned character by character

Practical Blind Denoising via Swin-Conv-UNet and Data Synthesis

PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices.

CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

The code is for the paper "A Self-Distillation Embedded Supervised Affinity Attention Model for Few-Shot Segmentation"

TalkingHead-1KH is a talking-head dataset consisting of YouTube videos

Code for the SIGGRAPH 2022 paper "DeltaConv: Anisotropic Operators for Geometric Deep Learning on Point Clouds."

Deep GPs built on top of TensorFlow/Keras and GPflow

Repo for CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning

Implementation of SwinTransformerV2 in TensorFlow.

Multi-Objective Reinforced Active Learning

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

This is an official implementation of the High-Resolution Transformer for Dense Prediction.

GAN-based 3D human pose estimation model for 3DV'17 paper

Cross-modal Deep Face Normals with Deactivable Skip Connections

Repository for MuSiQue: Multi-hop Questions via Single-hop Question Composition

Implementation of "Large Steps in Inverse Rendering of Geometry"

The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"