Collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and related datasets

Last update: Dec 21, 2022

Overview

Reading list in Transformer

We are a team from KAUST Vision-CAIR group and focus on the Multi-modal representation learning.

This repo is aimed to collect all the recent popular Transformer paper, codes and learning resources with respect to the domains of Vision Transformer, NLP and multi-modal, etc.

Recent News

CVPR multi-modal papers are collected in here

The code of VisualGPT is open sourced. They can be found here

The code and paper of LeViT is open sourced. They can be found here

The paper MLP-Mixer: An all-MLP Architecture for Vision is availble here

The code and paper of MDTER is open sourced. They can be found here

The code and papper of RelTransformer is open sourced. They can be found here

The code and paper of Twins-SVT is open sourced. They can be found here

Vision Transformer for deepfake detection. They can be found here

The code of VideoGPT is open sourced. They can be found here

The code of CoaT is open sourced. They can be found here

The code of Kaleido-BERT is open sourced. They can be found here

The code of TimeSformer is open sourced. They can be found here

The code of SwinTransformer is open sourced. They can be found here

Topics (paper and code)

Review Paper in multi-modal

Video-language

Tutorials and workshop

Datasets

Multi-modal Datasets

Blogs

Lil's blogs

Tools

PyTorchVideo a deep learning library for video understanding research
horovod a tool for multi-gpu parallel processing
accelerate an easy API for mixed precision and any kind of distributed computing
hyperparameter search: optuna
AI Conference Deadlines

Collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and related datasets

Related tags

Overview

Reading list in Transformer

Recent News

Topics (paper and code)

Tutorials and workshop

Datasets

Blogs

Tools

Owner

Jun Chen

[SIGGRAPH Asia 2021] Pose with Style: Detail-Preserving Pose-Guided Image Synthesis with Conditional StyleGAN

Weakly Supervised 3D Object Detection from Point Cloud with Only Image Level Annotation

Addon and nodes for working with structural biology and molecular data in Blender.

Implementations of polygamma, lgamma, and beta functions for PyTorch

Graph WaveNet apdapted for brain connectivity analysis.

Planner_backend - Academic planner application designed for students and counselors.

Matching python environment code for Lux AI 2021 Kaggle competition, and a gym interface for RL models.

A Simple Key-Value Data-store written in Python

Gym-TORCS is the reinforcement learning (RL) environment in TORCS domain with OpenAI-gym-like interface.

ACV is a python library that provides explanations for any machine learning model or data.

P-Tuning v2: Prompt Tuning Can Be Comparable to Finetuning Universally Across Scales and Tasks

Pure python PEMDAS expression solver without using built-in eval function

Normal Learning in Videos with Attention Prototype Network

GLNet for Memory-Efficient Segmentation of Ultra-High Resolution Images

Streaming over lightweight data transformations

10th place solution for Google Smartphone Decimeter Challenge at kaggle.

Randstad Artificial Intelligence Challenge (powered by VGEN). Soluzione proposta da Stefano Fiorucci (anakin87) - primo classificato

Waymo motion prediction challenge 2021: 3rd place solution

Memory efficient transducer loss computation

Official Implement of CVPR 2021 paper “Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting”