Source code for "Progressive Transformers for End-to-End Sign Language Production" (ECCV 2020)

Last update: Dec 21, 2022

Overview

Progressive Transformers for End-to-End Sign Language Production

Source code for "Progressive Transformers for End-to-End Sign Language Production" (Ben Saunders, Necati Cihan Camgoz, Richard Bowden - ECCV 2020)

Conference video available at https://twitter.com/BenMSaunders/status/1336638886198521857

Usage

Install required packages using the requirements.txt file.

pip install -r requirements.txt

To run, start main.py with arguments "train" and ".\Configs\Base.yaml":

python __main__.py train ./Configs/Base.yaml

An example train.log file can be found in ".\Configs\train.log" and a validation file at ".\Configs\validations.txt"

Back Translation model created from https://github.com/neccam/slt. Back Translation evaluation code coming soon.

Data

Pre-processed Phoenix14T data can be requested via email at [email protected]. If you wish to create the data yourself, please follow below:

Phoenix14T data can be downloaded from https://www-i6.informatik.rwth-aachen.de/~koller/RWTH-PHOENIX-2014-T/ and skeleton joints can be extracted using OpenPose at https://github.com/CMU-Perceptual-Computing-Lab/openpose and lifted to 3D using the 2D to 3D Inverse Kinematics code at https://github.com/gopeith/SignLanguageProcessing under 3DposeEstimator.

Prepare Phoenix14T (or other sign language dataset) data as .txt files for .skel, .gloss, .txt and .files. Data format should be parallel .txt files for "src", "trg" and "files", with each line representing a new sequence:

The "src" file contains source sentences, with each line representing new sentence.
The "trg" file contains skeleton data of each frame, with a space separating frames. The joints should be divided by 3 to match the scaling I used. Each frame contains 150 joint values and a subsequent counter value, all separated by a space. Each sequence should be separated with a new line. If your data contains 150 joints per frame, please ensure that trg_size is set to 150 in the config file.
The "files" file should contain the name of each sequence on a new line.

Examples can be found in /Data/tmp. Data path must be specified in config file.

Pre-Trained Model

A pre-trained Progressive Transformer checkpoint can be downloaded from https://www.dropbox.com/s/l4xmnybp7luz0l3/PreTrained_PTSLP_Model.ckpt?dl=0.

This model has a size of num_layers: 2, num_heads: 4 and embedding_dim: 512, as outlined in ./Configs/Base.yaml. It has been pre-trained on the full PHOENIX14T dataset with the data format as above. The relevant train.log and validations.txt files can be found in .\Configs.

To initialise a model from this checkpoint, pass the --ckpt ./PreTrained_PTSLP_Model.ckpt argument to either train or test modes. Additionally, to initialise the correct src_embed size, the config argument src_vocab: "./Configs/src_vocab.txt" must be set to the location of the src_vocab.txt, found under ./Configs. Please open an issue if this checkpoint cannot be downloaded or loaded.

Reference

If you use this code in your research, please cite the following papers:

@inproceedings{saunders2020progressive,
	title		=	{{Progressive Transformers for End-to-End Sign Language Production}},
	author		=	{Saunders, Ben and Camgoz, Necati Cihan and Bowden, Richard},
	booktitle   	=   	{Proceedings of the European Conference on Computer Vision (ECCV)},
	year		=	{2020}}

@inproceedings{saunders2020adversarial,
	title		=	{{Adversarial Training for Multi-Channel Sign Language Production}},
	author		=	{Saunders, Ben and Camgoz, Necati Cihan and Bowden, Richard},
	booktitle   	=   	{Proceedings of the British Machine Vision Conference (BMVC)},
	year		=	{2020}}

@inproceedings{saunders2021continuous,
	title		=	{{Continuous 3D Multi-Channel Sign Language Production via Progressive Transformers and Mixture Density Networks}},
	author		=	{Saunders, Ben and Camgoz, Necati Cihan and Bowden, Richard},
	booktitle   	=   	{International Journal of Computer Vision (IJCV)},
	year		=	{2021}}

Acknowledgements

_{This work received funding from the SNSF Sinergia project 'SMILE' (CRSII2 160811), the European Union's Horizon2020 research and innovation programme under grant agreement no. 762021 'Content4All' and the EPSRC project 'ExTOL' (EP/R03298X/1). This work reflects only the authors view and the Commission is not responsible for any use that may be made of the information it contains. We would also like to thank NVIDIA Corporation for their GPU grant.}

Source code for "Progressive Transformers for End-to-End Sign Language Production" (ECCV 2020)

Related tags

Overview

Progressive Transformers for End-to-End Sign Language Production

Usage

Data

Pre-Trained Model

Reference

Acknowledgements

Owner

Fast and Context-Aware Framework for Space-Time Video Super-Resolution (VCIP 2021)

Code for SentiBERT: A Transferable Transformer-Based Architecture for Compositional Sentiment Semantics (ACL'2020).

Unofficial implementation of "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" (https://arxiv.org/abs/2103.14030)

maximal update parametrization (µP)

A python module for configuration of block devices

Implementation of CVPR 2021 paper "Spatially-invariant Style-codes Controlled Makeup Transfer"

The first dataset of composite images with rationality score indicating whether the object placement in a composite image is reasonable.

NEG loss implemented in pytorch

[WACV21] Code for our paper: Samuel, Atzmon and Chechik, "From Generalized zero-shot learning to long-tail with class descriptors"

Hardware accelerated, batchable and differentiable optimizers in JAX.

DeFMO: Deblurring and Shape Recovery of Fast Moving Objects (CVPR 2021)

This repository is an unoffical PyTorch implementation of Medical segmentation in 3D and 2D.

A collection of educational notebooks on multi-view geometry and computer vision.

Practical and Real-world applications of ML based on the homework of Hung-yi Lee Machine Learning Course 2021

Python Classes: Medical Insurance Project using Object Oriented Programming Concepts

Code for our work "Activation to Saliency: Forming High-Quality Labels for Unsupervised Salient Object Detection".

MT3: Multi-Task Multitrack Music Transcription

A cross-document event and entity coreference resolution system, trained and evaluated on the ECB+ corpus.

Wanli Li and Tieyun Qian: Exploit a Multi-head Reference Graph for Semi-supervised Relation Extraction, IJCNN 2021

Boosted neural network for tabular data