Code for our EMNLP 2021 paper "Learning Kernel-Smoothed Machine Translation with Retrieved Examples"

Last update: Nov 24, 2022

Related tags

Overview

KSTER

Code for our EMNLP 2021 paper "Learning Kernel-Smoothed Machine Translation with Retrieved Examples" [paper].

Usage

Download the processed datasets from this site. You can also download the built databases from this site and download the model checkpoints from this site.

Train a general-domain base model

Take English -> Germain translation for example.

export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m joeynmt train configs/transformer_base_wmt14_en2de.yaml

Finetuning trained base model on domain-specific datasets

Take English -> Germain translation in Koran domain for example.

export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m joeynmt train configs/transformer_base_koran_en2de.yaml

Build database

Take English -> Germain translation in Koran domain for example, wmt14_en_de.transformer.ckpt is the path of trained general-domain base model checkpoint.

mkdir database/koran_en_de_base
export CUDA_VISIBLE_DEVICES=0
python3 -m joeynmt build_database configs/transformer_base_koran_en2de.yaml \
        --ckpt wmt14_en_de.transformer.ckpt \
        --division train \
        --index_path database/koran_en_de_base/trained.index \
        --token_map_path database/koran_en_de_base/token_map \
        --embedding_path database/koran_en_de_base/embeddings.npy

Train the bandwidth estimator and weight estimator in KSTER

Take English -> Germain translation in Koran domain for example.

export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m joeynmt combiner_train configs/transformer_base_koran_en2de.yaml \
        --ckpt wmt14_en_de.transformer.ckpt \
        --combiner dynamic_combiner \
        --top_k 16 \
        --kernel laplacian \
        --index_path database/koran_en_de_base/trained.index \
        --token_map_path database/koran_en_de_base/token_map \
        --embedding_path database/koran_en_de_base/embeddings.npy \
        --in_memory True

Inference

We unify the inference of base model, finetuned or joint-trained model, kNN-MT and KSTER with a concept of combiner (see joeynmt/combiners.py).

Combiner type	Methods	Description
NoCombiner	Base, Finetuning, Joint-training	Directly inference without retrieval.
StaticCombiner	kNN-MT	Retrieve similar examples during inference. mixing_weight and bandwidth are pre-specified.
DynamicCombiner	KSTER	Retrieve similar examples during inference. mixing_weight and bandwidth are dynamically estimated.

Inference with NoCombiner for Base model

Take English -> Germain translation in Koran domain for example.

export CUDA_VISIBLE_DEVICES=0
python3 -m joeynmt test configs/transformer_base_koran_en2de.yaml \
        --ckpt wmt14_en_de.transformer.ckpt \
        --combiner no_combiner

Inference with StaticCombiner for kNN-MT

Take English -> Germain translation in Koran domain for example.

export CUDA_VISIBLE_DEVICES=0
python3 -m joeynmt test configs/transformer_base_koran_en2de.yaml \
        --ckpt wmt14_en_de.transformer.ckpt \
        --combiner static_combiner \
        --top_k 16 \
        --mixing_weight 0.7 \
        --bandwidth 10 \
        --kernel gaussian \
        --index_path database/koran_en_de_base/trained.index \
        --token_map_path database/koran_en_de_base/token_map

Inference with DynamicCombiner for KSTER

Take English -> Germain translation in Koran domain for example, koran_en_de.laplacian.combiner.ckpt is the path of trained bandwidth estimator and weight estimator for Koran domain.
--in_memory option specifies whether to load the example embeddings to memory. Set in_memory == True for faster inference, set in_memory == False for lower memory demand.

export CUDA_VISIBLE_DEVICES=0
python3 -m joeynmt test configs/transformer_base_koran_en2de.yaml \
        --ckpt wmt14_en_de.transformer.ckpt \
        --combiner dynamic_combiner \
        --combiner_path koran_en_de.laplacian.combiner.ckpt \
        --top_k 16 \
        --kernel laplacian \
        --index_path database/koran_en_de_base/trained.index \
        --token_map_path database/koran_en_de_base/token_map \
        --embedding_path database/koran_en_de_base/embeddings.npy \
        --in_memory True

See bash_scripts/test_*.sh for reproducing our results.
See logs/*.log for the logs of our results.

Acknowledgements

We build the models based on the joeynmt codebase.

Code for our EMNLP 2021 paper "Learning Kernel-Smoothed Machine Translation with Retrieved Examples"

Related tags

Overview

KSTER

Usage

Train a general-domain base model

Finetuning trained base model on domain-specific datasets

Build database

Train the bandwidth estimator and weight estimator in KSTER

Inference

Inference with NoCombiner for Base model

Inference with StaticCombiner for kNN-MT

Inference with DynamicCombiner for KSTER

Acknowledgements

Owner

jiangqn

Official implementation for (Show, Attend and Distill: Knowledge Distillation via Attention-based Feature Matching, AAAI-2021)

This is an implementation of Googles Yogi-Optimizer in Keras (tf.keras)

Code for BMVC2021 "MOS: A Low Latency and Lightweight Framework for Face Detection, Landmark Localization, and Head Pose Estimation"

Development Kit for the SoccerNet Challenge

Synthesize photos from PhotoDNA using machine learning 🌱

Adaptive Pyramid Context Network for Semantic Segmentation (APCNet CVPR'2019)

This library is a location of the LegacyLogger for PyTorch Lightning.

Code for the TPAMI paper: "Syntax Customized Video Captioning by Imitating Exemplar Sentences"

DeepSTD: Mining Spatio-temporal Disturbances of Multiple Context Factors for Citywide Traffic Flow Prediction

Image-Stitching - Panorama composition using SIFT Features and a custom implementaion of RANSAC algorithm

Pytorch implementation for "Density-aware Chamfer Distance as a Comprehensive Metric for Point Cloud Completion" (NeurIPS 2021)

Multi-modal Text Recognition Networks: Interactive Enhancements between Visual and Semantic Features

Complementary Patch for Weakly Supervised Semantic Segmentation, ICCV21 (poster)

D²Conv3D: Dynamic Dilated Convolutions for Object Segmentation in Videos

PERIN is Permutation-Invariant Semantic Parser developed for MRP 2020

Curved Projection Reformation

Code for the ICCV 2021 paper "Pixel Difference Networks for Efficient Edge Detection" (Oral).

Finetune SSL models for MOS prediction

Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set (CVPRW 2019). A PyTorch implementation.

Code To Tune or Not To Tune? Zero-shot Models for Legal Case Entailment.