This repo contains the official code of our work SAM-SLR which won the CVPR 2021 Challenge on Large Scale Signer Independent Isolated Sign Language Recognition.

Overview

Skeleton Aware Multi-modal Sign Language Recognition

By Songyao Jiang, Bin Sun, Lichen Wang, Yue Bai, Kunpeng Li and Yun Fu.

Smile Lab @ Northeastern University

Python 3.7 Packagist Last Commit License: CC0 4.0 PWC


This repo contains the official code of Skeleton Aware Multi-modal Sign Language Recognition (SAM-SLR) that ranked 1st in CVPR 2021 Challenge: Looking at People Large Scale Signer Independent Isolated Sign Language Recognition.

Our paper has been accepted to CVPR21 Workshop. A preprint version is available on arXiv. Please cite our paper if you find this repo useful in your research.

News

[2021/04/10] Our workshop paper has been accepted. Citation info updated.

[2021/03/24] A preprint version of our paper is released here.

[2021/03/20] Our work has been verified and announced by the organizers as the 1st place winner of the challenge!

[2021/03/15] The code is released to public on GitHub.

[2021/03/11] Our team (smilelab2021) ranked 1st in both tracks and here are the links to the leaderboards:

Table of Contents

Data Preparation

Download AUTSL Dataset.

We processed the dataset into six modalities in total: skeleton, skeleton features, rgb frames, flow color, hha and flow depth.

  1. Please put original train, val, test videos in data folder as
    data
    ├── train
    │   ├── signer0_sample1_color.mp4
    │   ├── signer0_sample1_depth.mp4
    │   ├── signer0_sample2_color.mp4
    │   ├── signer0_sample2_depth.mp4
    │   └── ...
    ├── val
    │   └── ...
    └── test
        └── ...
  1. Follow the data_processs/readme.md to process the data.

  2. Use TPose/data_process to extract wholebody pose features.

Requirements and Docker Image

The code is written using Anaconda Python >= 3.6 and Pytorch 1.7 with OpenCV.

Detailed enviroment requirment can be found in requirement.txt in each code folder.

For convenience, we provide a Nvidia docker image to run our code.

Download Docker Image

Pretrained Models

We provide pretrained models for all modalities to reproduce our submitted results. Please download them at and put them into corresponding folders.

Download Pretrained Models

Usage

Reproducing the Results Submitted to CVPR21 Challenge

To test our pretrained model, please put them under each code folders and run the test code as instructed below. To ensemble the tested results and reproduce our final submission. Please copy all the results .pkl files to ensemble/ and follow the instruction to ensemble our final outputs.

For a step-by-step instruction, please see reproduce.md.

Skeleton Keypoints

Skeleton modality can be trained, finetuned and tested using the code in SL-GCN/ folder. Please follow the SL-GCN/readme.md instruction to prepare skeleton data into four streams (joint, bone, joint_motion, bone motion).

Basic usage:

python main.py --config /path/to/config/file

To train, finetune and test our models, please change the config path to corresponding config files. Detailed instruction can be found in SL-GCN/readme.md

Skeleton Feature

For the skeleton feature, we propose a Separable Spatial-Temporal Convolution Network (SSTCN) to capture spatio-temporal information from those features.

Please follow the instruction in SSTCN/readme.txt to prepare the data, train and test the model.

RGB Frames

The RGB frames modality can be trained, finetuned and tested using the following commands in Conv3D/ folder.

python Sign_Isolated_Conv3D_clip.py

python Sign_Isolated_Conv3D_clip_finetune.py

python Sign_Isolated_Conv3D_clip_test.py

Detailed instruction can be found in Conv3D/readme.md

Optical Flow

The RGB optical flow modality can be trained, finetuned and tested using the following commands in Conv3D/ folder.

python Sign_Isolated_Conv3D_flow_clip.py

python Sign_Isolated_Conv3D_flow_clip_funtine.py

python Sign_Isolated_Conv3D_flow_clip_test.py

Detailed instruction can be found in Conv3D/readme.md

Depth HHA

The Depth HHA modality can be trained, finetuned and tested using the following commands in Conv3D/ folder.

python Sign_Isolated_Conv3D_hha_clip_mask.py

python Sign_Isolated_Conv3D_hha_clip_mask_finetune.py

python Sign_Isolated_Conv3D_hha_clip_mask_test.py

Detailed instruction can be found in Conv3D/readme.md

Depth Flow

The Depth Flow modality can be trained, finetuned and tested using the following commands in Conv3D/ folder.

python Sign_Isolated_Conv3D_depth_flow_clip.py

python Sign_Isolated_Conv3D_depth_flow_clip_finetune.py

python Sign_Isolated_Conv3D_depth_flow_clip_test.py

Detailed instruction can be found in Conv3D/readme.md

Model Ensemble

For both RGB and RGBD track, the tested results of all modalities need to be ensemble together to generate the final results.

  1. For RGB track, we use the results from skeleton, skeleton feature, rgb, and flow color modalities to ensemble the final results.

    a. Test the model using newly trained weights or provided pretrained weights.

    b. Copy all the test results to ensemble folder and rename them as their modality names.

    c. Ensemble SL-GCN results from joint, bone, joint motion, bone motion streams in gcn/ .

     python ensemble_wo_val.py; python ensemble_finetune.py
    

    c. Copy test_gcn_w_val_finetune.pkl to ensemble/. Copy RGB, TPose and optical flow results to ensemble/. Ensemble final prediction.

     python ensemble_multimodal_rgb.py
    

    Final predictions are saved in predictions.csv

  2. For RGBD track, we use the results from skeleton, skeleton feature, rgb, flow color, hha and flow depth modalities to ensemble the final results. a. copy hha and flow depth modalities to ensemble/ folder, then

     python ensemble_multimodal_rgb.py
    

To reproduce our results in CVPR21Challenge, we provide .pkl files to ensemble and obtain our final submitted predictions. Detailed instruction can be find in ensemble/readme.md

License

Licensed under the Creative Commons Zero v1.0 Universal license with the following exceptions:

  • The code is released for academic research use only. Commercial use is prohibited.
  • Published versions (changed or unchanged) must include a reference to the origin of the code.

Citation

If you find this project useful in your research, please cite our paper

@inproceedings{jiang2021skeleton,
  title={Skeleton Aware Multi-modal Sign Language Recognition},
  author={Jiang, Songyao and Sun, Bin and Wang, Lichen and Bai, Yue and Li, Kunpeng and Fu, Yun},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
  year={2021}
}

@article{jiang2021skeleton,
  title={Skeleton Aware Multi-modal Sign Language Recognition},
  author={Jiang, Songyao and Sun, Bin and Wang, Lichen and Bai, Yue and Li, Kunpeng and Fu, Yun},
  journal={arXiv preprint arXiv:2103.08833},
  year={2021}
}

Reference

https://github.com/Sun1992/SSTCN-for-SLR

https://github.com/jin-s13/COCO-WholeBody

https://github.com/open-mmlab/mmpose

https://github.com/0aqz0/SLR

https://github.com/kchengiva/DecoupleGCN-DropGraph

https://github.com/HRNet/HRNet-Human-Pose-Estimation

https://github.com/charlesCXK/Depth2HHA

Owner
Isen (Songyao Jiang)
Isen (Songyao Jiang)
Library for fast text representation and classification.

fastText fastText is a library for efficient learning of word representations and sentence classification. Table of contents Resources Models Suppleme

Facebook Research 24.1k Jan 01, 2023
git《USD-Seg:Learning Universal Shape Dictionary for Realtime Instance Segmentation》(2020) GitHub: [fig2]

USD-Seg This project is an implement of paper USD-Seg:Learning Universal Shape Dictionary for Realtime Instance Segmentation, based on FCOS detector f

Ruolin Ye 80 Nov 28, 2022
A library of extension and helper modules for Python's data analysis and machine learning libraries.

Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. Sebastian Raschka 2014-2020 Links Doc

Sebastian Raschka 4.2k Jan 02, 2023
PyTorch Implementation of CvT: Introducing Convolutions to Vision Transformers

CvT: Introducing Convolutions to Vision Transformers Pytorch implementation of CvT: Introducing Convolutions to Vision Transformers Usage: img = torch

Rishikesh (ऋषिकेश) 193 Jan 03, 2023
Drone-based Joint Density Map Estimation, Localization and Tracking with Space-Time Multi-Scale Attention Network

DroneCrowd Paper Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark. Introduction This paper proposes a space-time multi-scale atte

VisDrone 98 Nov 16, 2022
3D AffordanceNet is a 3D point cloud benchmark consisting of 23k shapes from 23 semantic object categories, annotated with 56k affordance annotations and covering 18 visual affordance categories.

3D AffordanceNet This repository is the official experiment implementation of 3D AffordanceNet benchmark. 3D AffordanceNet is a 3D point cloud benchma

49 Dec 01, 2022
A torch implementation of "Pixel-Level Domain Transfer"

Pixel Level Domain Transfer A torch implementation of "Pixel-Level Domain Transfer". based on dcgan.torch. Dataset The dataset used is "LookBook", fro

Fei Xia 260 Sep 02, 2022
Code for the paper Language as a Cognitive Tool to Imagine Goals in Curiosity Driven Exploration

IMAGINE: Language as a Cognitive Tool to Imagine Goals in Curiosity Driven Exploration This repo contains the code base of the paper Language as a Cog

Flowers Team 26 Dec 22, 2022
Unbalanced Feature Transport for Exemplar-based Image Translation (CVPR 2021)

UNITE and UNITE+ Unbalanced Feature Transport for Exemplar-based Image Translation (CVPR 2021) Unbalanced Intrinsic Feature Transport for Exemplar-bas

Fangneng Zhan 183 Nov 09, 2022
AdelaiDet is an open source toolbox for multiple instance-level detection and recognition tasks.

AdelaiDet is an open source toolbox for multiple instance-level detection and recognition tasks.

Adelaide Intelligent Machines (AIM) Group 3k Jan 02, 2023
NudeNet: Neural Nets for Nudity Classification, Detection and selective censoring

NudeNet: Neural Nets for Nudity Classification, Detection and selective censoring Uncensored version of the following image can be found at https://i.

notAI.tech 1.1k Dec 29, 2022
NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR2021)

NExT-QA We reproduce some SOTA VideoQA methods to provide benchmark results for our NExT-QA dataset accepted to CVPR2021 (with 1 'Strong Accept' and 2

Junbin Xiao 50 Nov 24, 2022
This is a work in progress reimplementation of Instant Neural Graphics Primitives

Neural Hash Encoding This is a work in progress reimplementation of Instant Neural Graphics Primitives Currently this can train an implicit representa

Penn 79 Sep 01, 2022
CLIP+FFT text-to-image

Aphantasia This is a text-to-image tool, part of the artwork of the same name. Based on CLIP model, with FFT parameterizer from Lucent library as a ge

vadim epstein 690 Jan 02, 2023
1st ranked 'driver careless behavior detection' for AI Online Competition 2021, hosted by MSIT Korea.

2021AICompetition-03 본 repo 는 mAy-I Inc. 팀으로 참가한 2021 인공지능 온라인 경진대회 중 [이미지] 운전 사고 예방을 위한 운전자 부주의 행동 검출 모델] 태스크 수행을 위한 레포지토리입니다. mAy-I 는 과학기술정보통신부가 주최하

Junhyuk Park 9 Dec 01, 2022
Submodular Subset Selection for Active Domain Adaptation (ICCV 2021)

S3VAADA: Submodular Subset Selection for Virtual Adversarial Active Domain Adaptation ICCV 2021 Harsh Rangwani, Arihant Jain*, Sumukh K Aithal*, R. Ve

Video Analytics Lab -- IISc 13 Dec 28, 2022
[NeurIPS 2021] Official implementation of paper "Learning to Simulate Self-driven Particles System with Coordinated Policy Optimization".

Code for Coordinated Policy Optimization Webpage | Code | Paper | Talk (English) | Talk (Chinese) Hi there! This is the source code of the paper “Lear

DeciForce: Crossroads of Machine Perception and Autonomy 81 Dec 19, 2022
Code of Adverse Weather Image Translation with Asymmetric and Uncertainty aware GAN

Adverse Weather Image Translation with Asymmetric and Uncertainty-aware GAN (AU-GAN) Official Tensorflow implementation of Adverse Weather Image Trans

Jeong-gi Kwak 36 Dec 26, 2022
A python library to build Model Trees with Linear Models at the leaves.

A python library to build Model Trees with Linear Models at the leaves.

Marco Cerliani 212 Dec 30, 2022
A different spin on dataclasses.

dataklasses Dataklasses is a library that allows you to quickly define data classes using Python type hints. Here's an example of how you use it: from

David Beazley 752 Nov 18, 2022