Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.

Last update: Dec 29, 2022

Related tags

Deep Learning wechsel

Overview

WECHSEL

Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.

arXiv: https://arxiv.org/abs/2112.06598

Models from the paper are available on the HuggingFace Hub:

Installation

We distribute a Python Package via PyPI:

pip install wechsel

Alternatively, clone the repository, install requirements.txt and run the code in wechsel/.

Example usage

Transferring English roberta-base to Swahili:

import torch
from transformers import AutoModel, AutoTokenizer
from datasets import load_dataset
from wechsel import WECHSEL, load_embeddings

source_tokenizer = AutoTokenizer.from_pretrained("roberta-base")
model = AutoModel.from_pretrained("roberta-base")

target_tokenizer = source_tokenizer.train_new_from_iterator(
    load_dataset("oscar", "unshuffled_deduplicated_sw", split="train")["text"],
    vocab_size=len(source_tokenizer)
)

wechsel = WECHSEL(
    load_embeddings("en"),
    load_embeddings("sw"),
    bilingual_dictionary="swahili"
)

target_embeddings, info = wechsel.apply(
    source_tokenizer,
    target_tokenizer,
    model.get_input_embeddings().weight.detach().numpy(),
)

model.get_input_embeddings().weight.data = torch.from_numpy(target_embeddings)

# use `model` and `target_tokenizer` to continue training in Swahili!

Bilingual dictionaries

We distribute 3276 bilingual dictionaries from English to other languages for use with WECHSEL in dicts/.

Citation

Please cite WECHSEL as

@misc{minixhofer2021wechsel,
      title={WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models}, 
      author={Benjamin Minixhofer and Fabian Paischer and Navid Rekabsaz},
      year={2021},
      eprint={2112.06598},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.

Related tags

Overview

WECHSEL

Installation

Example usage

Bilingual dictionaries

Citation

Owner

Institute of Computational Perception

Membership Inference Attack against Graph Neural Networks

ESGD-M - A stochastic non-convex second order optimizer, suitable for training deep learning models, for PyTorch

Source code for our EMNLP'21 paper 《Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning》

Crowd-sourced Annotation of Human Motion.

Visual dialog agents with pre-trained vision-and-language encoders.

HeartRate detector with ArduinoandPython - Use Arduino and Python create a heartrate detector.

PyTorchMemTracer - Depict GPU memory footprint during DNN training of PyTorch

minimizer-space de Bruijn graphs (mdBG) for whole genome assembly

This code is the implementation of the paper "Coherence-Based Distributed Document Representation Learning for Scientific Documents".

Where-Got-Time - An NUS timetable generator which uses a genetic algorithm to optimise timetables to suit the needs of NUS students

Code for the paper Learning the Predictability of the Future

Repository for publicly available deep learning models developed in Rosetta community

[ICLR 2021] Is Attention Better Than Matrix Decomposition?

SeqTR: A Simple yet Universal Network for Visual Grounding

Constrained Logistic Regression - How to apply specific constraints to logistic regression's coefficients

Optimizing DR with hard negatives and achieving SOTA first-stage retrieval performance on TREC DL Track (SIGIR 2021 Full Paper).

Arxiv harvester - Poor man's simple harvester for arXiv resources

PyTorch implementation of federated learning framework based on the acceleration of global momentum

Markov Attention Models

Short and long time series classification using convolutional neural networks