Tackling data scarcity in Speech Translation using zero-shot multilingual Machine Translation techniques

Last update: Sep 07, 2022

Overview

Tackling data scarcity in Speech Translation using zero-shot multilingual Machine Translation techniques

This repository is derived from the NMTGMinor project at https://github.com/quanpn90/NMTGMinor
The SVCCA calculation is derived from https://github.com/nlp-dke/svcca

Powered by Mediaan.com

Speech Translation (ST) is the task of translating speech audio in a source language into text in a target language. This repository implements and experiments on different approaches for ST:

Cascaded ST, including 2 steps: Automatic Speech Recognition (ASR) and Machine Translation (MT)
Direct ST: models trained only on ST data
(Main contribution) End-to-end ST limiting the use of ST data: multi-modal models leveraging ASR and MT training data for ST task

The Transformer architecture is used as the baseline for the implementation.

High-level instruction to use the repo:

Run covost_data_preparation.py to download and preprocess the data.
Run the shell script of interst, change the variables in the script if needed.
- run_translation_pipeline.sh for single-task models (ASR, MT, ST)
- cascaded_ST_evaluation.sh evaluates cascaded ST using pretrained ASR and MT models
- run_translation_multi_modalities_pipeline.sh for multi-task, multi-modality models (including zero-shot)
- run_zeroshot_with_artificial_data.sh for zero-shot models using data augmentation
- run_bidirectional_zeroshot.sh for zero-shot models using additional opposite training data
- run_fine_tunning.sh, run_fine_tunning_fromASR.sh for fine-tuning models with ST data, resulting in few-shot models
- modality_similarity_svcca.sh, modality_similarity_classifier.sh measure text-audio similarity in representation

See notebooks/Repo_Instruction.ipynb for more details.

Tackling data scarcity in Speech Translation using zero-shot multilingual Machine Translation techniques

Related tags

Overview

Tackling data scarcity in Speech Translation using zero-shot multilingual Machine Translation techniques

Owner

Tu Anh Dinh

CLUES: Few-Shot Learning Evaluation in Natural Language Understanding

Implementation for our ICCV 2021 paper: Dual-Camera Super-Resolution with Aligned Attention Modules

KSAI Lite is a deep learning inference framework of kingsoft, based on tensorflow lite

offical implement of our Lifelong Person Re-Identification via Adaptive Knowledge Accumulation in CVPR2021

Code for the ICCV 2021 paper "Pixel Difference Networks for Efficient Edge Detection" (Oral).

Credit fraud detection in Python using a Jupyter Notebook

CarND-LaneLines-P1 - Lane Finding Project for Self-Driving Car ND

transfer attack; adversarial examples; black-box attack; unrestricted Adversarial Attacks on ImageNet; CVPR2021 天池黑盒竞赛

Large-Scale Unsupervised Object Discovery

A PyTorch implementation of EventProp [https://arxiv.org/abs/2009.08378], a method to train Spiking Neural Networks

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

TensorFlow Tutorial and Examples for Beginners (support TF v1 & v2)

Face recognition project by matching the features extracted using SIFT.

PyTorch implementation for our AAAI 2022 Paper "Graph-wise Common Latent Factor Extraction for Unsupervised Graph Representation Learning"

A library for graph deep learning research

PyTorch implementation of the paper: "Preference-Adaptive Meta-Learning for Cold-Start Recommendation", IJCAI, 2021.

Code for ICCV2021 paper PARE: Part Attention Regressor for 3D Human Body Estimation

Point cloud processing tool library.

This repository contains the data and code for the paper "Diverse Text Generation via Variational Encoder-Decoder Models with Gaussian Process Priors" ([email protected])

Image Completion with Deep Learning in TensorFlow