Goal of the project : Detecting Temporal Boundaries in Sign Language videos

Overview

MVA RecVis course final project :

Goal of the project : Detecting Temporal Boundaries in Sign Language videos.

Sign language automatic indexing is an important challenge to develop better communication tools for the deaf community. However, annotated datasets for sign langage are limited, and there are few people with skills to anotate such data, which makes it hard to train performant machine learning models. An important challenge is therefore to :

  • Increase available training datasets.
  • Make labeling easier for professionnals to reduce risks of bad annotations.

In this context, techniques have emerged to perform automatic sign segmentation in videos, by marking the boundaries between individual signs in sign language videos. The developpment of such tools offers the potential to alleviate the limited supply of labelled dataset currently available for sign research.

demo

Previous work and personal contribution :

This repository provides code for the Object Recognition & Computer Vision (RecVis) course Final project. For more details please refer the the project report report.pdf. In this project, we reproduced the results obtained on the following paper (by using the code from this repository) :

We used the pre-extracted frame-level features obtained by applying the I3D model on videos to retrain the MS-TCN architecture for frame-level binary classification and reproduce the papers results. The tests folder proposes a notebook for reproducing the original paper results, with a meanF1B = 68.68 on the evaluation set of the BSL Corpus.

We further implemented new models in order to improve this result. We wanted to try attention based models as they have received recently a huge gain of interest in the vision research community. We first tried to train a Vanilla Transformer Encoder from scratch, but the results were not satisfactory.

  • Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin: (2018).

We then implemented the ASFormer model (Transformer for Action Segementation), using this code : a hybrid transformer model using some interesting ideas from the MS-TCN architecture. The motivations behind the model and its architecture are detailed in the following paper :

We trained this model on the I3D extracted features and obtained an improvement over the MS-TCN architecture. The results are given in the following table :

ID Model mF1B mF1S
1 MS-TCN 68.68±0.6 47.71±0.8
2 Transformer Encoder 60.28±0.3 42.70±0.2
3 ASFormer 69.79±0.2 49.23±1.2

Contents

Setup

# Clone this repository
git clone https://github.com/loubnabnl/Sign-Segmentation-with-Transformers.git
cd Sign-Segmentation-with-Transformers/
# Create signseg_env environment
conda env create -f environment.yml
conda activate signseg_env

Data and models

You can download the pretrained models (I3D and MS-TCN) (models.zip [302MB]) and data (data.zip [5.5GB]) used in the experiments here or by executing download/download_*.sh. The unzipped data/ and models/ folders should be located on the root directory of the repository (for using the demo downloading the models folder is sufficient).

You can download our best pretrained ASFormer model weights here.

Data:

Please cite the original datasets when using the data: BSL Corpus The authors of github.com/RenzKa/sign-segmentation provided the pre-extracted features and metadata. See here for a detailed description of the data files.

  • Features: data/features/*/*/features.mat
  • Metadata: data/info/*/info.pkl

Models:

  • I3D weights, trained for sign classification: models/i3d/*.pth.tar
  • MS-TCN weights for the demo (see tables below for links to the other models): models/ms-tcn/*.model
  • As_former weights of our best model : models/asformer/*.model

The folder structure should be as below:

sign-segmentation/models/
  i3d/
    i3d_kinetics_bslcp.pth.tar
  ms-tcn/
    mstcn_bslcp_i3d_bslcp.model
  asformer/
    best_asformer_bslcp.model

Demo

The demo folder contains a sample script to estimate the segments of a given sign language video, one can run demo.pyto get a visualization on a sample video.

cd demo
python demo.py

The demo will:

  1. use the models/i3d/i3d_kinetics_bslcp.pth.tar pretrained I3D model to extract features,
  2. use the models/asformer/best_asformer_model.model pretrained ASFormer model to predict the segments out of the features.
  3. save results.

Training

To train I3D please refer to github.com/RenzKa/sign-segmentation. To train ASFormer on the pre-extracted I3D features run main.py, you can change hyperparameters in the arguments inside the file. Or you can run the notebook in the folder test_asformer.

Citation

If you use this code and data, please cite the original papers following:

@inproceedings{Renz2021signsegmentation_a,
    author       = "Katrin Renz and Nicolaj C. Stache and Samuel Albanie and G{\"u}l Varol",
    title        = "Sign Language Segmentation with Temporal Convolutional Networks",
    booktitle    = "ICASSP",
    year         = "2021",
}
@article{yi2021asformer,
  title={Asformer: Transformer for action segmentation},
  author={Yi, Fangqiu and Wen, Hongyu and Jiang, Tingting},
  journal={arXiv preprint arXiv:2110.08568},
  year={2021}
}

License

The license in this repository only covers the code. For data.zip and models.zip we refer to the terms of conditions of original datasets.

Acknowledgements

The code builds on the github.com/RenzKa/sign-segmentation and github.com/ChinaYi/ASFormer repositories.

Owner
Loubna Ben Allal
MVA (Mathematics, Vision, Learning) student at ENS Paris Saclay.
Loubna Ben Allal
Code to reproduce the results for Compositional Attention

Compositional-Attention This repository contains the official implementation for the paper Compositional Attention: Disentangling Search and Retrieval

Sarthak Mittal 58 Nov 30, 2022
docTR by Mindee (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

docTR by Mindee (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

Mindee 1.5k Jan 01, 2023
OpenABC-D: A Large-Scale Dataset For Machine Learning Guided Integrated Circuit Synthesis

OpenABC-D: A Large-Scale Dataset For Machine Learning Guided Integrated Circuit Synthesis Overview OpenABC-D is a large-scale labeled dataset generate

NYU Machine-Learning guided Design Automation (MLDA) 31 Nov 22, 2022
DSL for matching Python ASTs

py-ast-rule-engine This library provides a DSL (domain-specific language) to match a pattern inside a Python AST (abstract syntax tree). The library i

1 Dec 18, 2021
Repository for MuSiQue: Multi-hop Questions via Single-hop Question Composition

🎵 MuSiQue: Multi-hop Questions via Single-hop Question Composition This is the repository for our paper "MuSiQue: Multi-hop Questions via Single-hop

21 Jan 02, 2023
Train emoji embeddings based on emoji descriptions.

emoji2vec This is my attempt to train, visualize and evaluate emoji embeddings as presented by Ben Eisner, Tim Rocktäschel, Isabelle Augenstein, Matko

Miruna Pislar 17 Sep 03, 2022
A rule learning algorithm for the deduction of syndrome definitions from time series data.

README This project provides a rule learning algorithm for the deduction of syndrome definitions from time series data. Large parts of the algorithm a

0 Sep 24, 2021
Capture all information throughout your model's development in a reproducible way and tie results directly to the model code!

Rubicon Purpose Rubicon is a data science tool that captures and stores model training and execution information, like parameters and outcomes, in a r

Capital One 97 Jan 03, 2023
Source code for the NeurIPS 2021 paper "On the Second-order Convergence Properties of Random Search Methods"

Second-order Convergence Properties of Random Search Methods This repository the paper "On the Second-order Convergence Properties of Random Search Me

Adamos Solomou 0 Nov 13, 2021
Compare outputs between layers written in Tensorflow and layers written in Pytorch

Compare outputs of Wasserstein GANs between TensorFlow vs Pytorch This is our testing module for the implementation of improved WGAN in Pytorch Prereq

Hung Nguyen 72 Dec 20, 2022
Planner_backend - Academic planner application designed for students and counselors.

Planner (backend) Academic planner application designed for students and advisors.

2 Dec 31, 2021
Detection of drones using their thermal signatures from thermal camera through YOLO-V3 based CNN with modifications to encapsulate drone motion

Drone Detection using Thermal Signature This repository highlights the work for night-time drone detection using a using an Optris PI Lightweight ther

Chong Yu Quan 6 Dec 31, 2022
M3DSSD: Monocular 3D Single Stage Object Detector

M3DSSD: Monocular 3D Single Stage Object Detector Setup pytorch 0.4.1 Preparation Download the full KITTI detection dataset. Then place a softlink (or

mumianyuxin 64 Dec 27, 2022
TLoL (Python Module) - League of Legends Deep Learning AI (Research and Development)

TLoL-py - League of Legends Deep Learning Library TLoL-py is the Python component of the TLoL League of Legends deep learning library. It provides a s

7 Nov 29, 2022
Keras implementation of Real-Time Semantic Segmentation on High-Resolution Images

Keras-ICNet [paper] Keras implementation of Real-Time Semantic Segmentation on High-Resolution Images. Training in progress! Requisites Python 3.6.3 K

Aitor Ruano 87 Dec 16, 2022
Codebase for the paper titled "Continual learning with local module selection"

This repository contains the codebase for the paper Continual Learning via Local Module Composition. Setting up the environemnt Create a new conda env

Oleksiy Ostapenko 20 Dec 10, 2022
Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning, CVPR 2021

Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning By Zhenda Xie*, Yutong Lin*, Zheng Zhang, Yue Ca

Zhenda Xie 293 Dec 20, 2022
Banglore House Prediction Using Flask Server (Python)

Banglore House Prediction Using Flask Server (Python) 🌐 Links 🌐 📂 Repo In this repository, I've implemented a Machine Learning-based Bangalore Hous

Dhyan Shah 1 Jan 24, 2022
[ICLR 2022] Contact Points Discovery for Soft-Body Manipulations with Differentiable Physics

CPDeform Code and data for paper Contact Points Discovery for Soft-Body Manipulations with Differentiable Physics at ICLR 2022 (Spotlight). @InProceed

(Lester) Sizhe Li 29 Nov 29, 2022
functorch is a prototype of JAX-like composable function transforms for PyTorch.

functorch is a prototype of JAX-like composable function transforms for PyTorch.

Facebook Research 1.2k Jan 09, 2023