Collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and related datasets

Last update: Dec 21, 2022

Overview

Reading list in Transformer

We are a team from KAUST Vision-CAIR group and focus on the Multi-modal representation learning.

This repo is aimed to collect all the recent popular Transformer paper, codes and learning resources with respect to the domains of Vision Transformer, NLP and multi-modal, etc.

Recent News

CVPR multi-modal papers are collected in here

The code of VisualGPT is open sourced. They can be found here

The code and paper of LeViT is open sourced. They can be found here

The paper MLP-Mixer: An all-MLP Architecture for Vision is availble here

The code and paper of MDTER is open sourced. They can be found here

The code and papper of RelTransformer is open sourced. They can be found here

The code and paper of Twins-SVT is open sourced. They can be found here

Vision Transformer for deepfake detection. They can be found here

The code of VideoGPT is open sourced. They can be found here

The code of CoaT is open sourced. They can be found here

The code of Kaleido-BERT is open sourced. They can be found here

The code of TimeSformer is open sourced. They can be found here

The code of SwinTransformer is open sourced. They can be found here

Topics (paper and code)

Review Paper in multi-modal

Video-language

Tutorials and workshop

Datasets

Multi-modal Datasets

Blogs

Lil's blogs

Tools

PyTorchVideo a deep learning library for video understanding research
horovod a tool for multi-gpu parallel processing
accelerate an easy API for mixed precision and any kind of distributed computing
hyperparameter search: optuna
AI Conference Deadlines

Collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and related datasets

Related tags

Overview

Reading list in Transformer

Recent News

Topics (paper and code)

Tutorials and workshop

Datasets

Blogs

Tools

Owner

Jun Chen

PyTorch evaluation code for Delving Deep into the Generalization of Vision Transformers under Distribution Shifts.

Official repository of "DeepMIH: Deep Invertible Network for Multiple Image Hiding", TPAMI 2022.

Gems & Holiday Package Prediction

Personalized Transfer of User Preferences for Cross-domain Recommendation (PTUPCDR)

PyTorch implementation of Asymmetric Siamese (https://arxiv.org/abs/2204.00613)

Lane follower: Lane-detector (OpenCV) + Object-detector (YOLO5) + CAN-bus

A program to recognize fruits on pictures or videos using yolov5

Code release for "Self-Tuning for Data-Efficient Deep Learning" (ICML 2021)

The Instructed Glacier Model (IGM)

Action Recognition for Self-Driving Cars

LogAvgExp - Pytorch Implementation of LogAvgExp

Python binding for Khiva library.

Implementation of PersonaGPT Dialog Model

Lab Materials for MIT 6.S191: Introduction to Deep Learning

SIR model parameter estimation using a novel algorithm for differentiated uniformization.

Author's PyTorch implementation of Randomized Ensembled Double Q-Learning (REDQ) algorithm.

buildseg is a building extraction plugin of QGIS based on PaddlePaddle.

Fbone (Flask bone) is a Flask (Python microframework) starter/template/bootstrap/boilerplate application.

BrainGNN - A deep learning model for data-driven discovery of functional connectivity

CLIP: Connecting Text and Image (Learning Transferable Visual Models From Natural Language Supervision)