GluonMM is a library of transformer models for computer vision and multi-modality research

Last update: Dec 02, 2022

Overview

GluonMM

GluonMM is a library of transformer models for computer vision and multi-modality research. It contains reference implementations of widely adopted baseline models and also research work from Amazon Research.

Install

First, clone the repository locally,

git clone https://github.com/amazon-research/gluonmm.git

Then install dependencies,

conda create -n gluonmm python=3.7
conda activate gluonmm
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
pip install timm tensorboardX yacs tqdm requests pandas decord scikit-image opencv-python

# Install apex for half-precision training (optional)
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

We have extensively tested the usage with PyTorch 1.8.1 and torchvision 0.9.1 with CUDA 10.2.

Model zoo

Image classification

Video action recognition

VidTr

Usage

For detailed usage, please refer to the README file in each model family. For example, the training, evaluation and model zoo information of video transformer VidTr can be found at here.

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

Acknowledgement

Parts of the code are heavily derived from pytorch-image-models, DeiT, Swin-transformer, vit-pytorch and vision_transformer.

GluonMM is a library of transformer models for computer vision and multi-modality research

Related tags

Overview

GluonMM

Install

Model zoo

Image classification

Video action recognition

Usage

Security

License

Acknowledgement

Owner

Mouse Brain in the Model Zoo

Deep Learning tutorials in jupyter notebooks.

Fuzzification helps developers protect the released, binary-only software from attackers who are capable of applying state-of-the-art fuzzing techniques

Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

Replication Package for "An Empirical Study of the Effectiveness of an Ensemble of Stand-alone Sentiment Detection Tools for Software Engineering Datasets"

This is the repository for Learning to Generate Piano Music With Sustain Pedals

An implementation of the methods presented in Causal-BALD: Deep Bayesian Active Learning of Outcomes to Infer Treatment-Effects from Observational Data.

Multi Task RL Baselines

TART - A PyTorch implementation for Transition Matrix Representation of Trees with Transposed Convolutions

PyMatting: A Python Library for Alpha Matting

Gradient Inversion with Generative Image Prior

Rank 3 : Source code for OPPO 6G Data Generation Challenge

This repo is developed for Strong Baseline For Vehicle Re-Identification in Track 2 Ai-City-2021 Challenges

A python library for implementing a recommender system

Improving Non-autoregressive Generation with Mixup Training

Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021

VIsually-Pivoted Audio and(N) Text

Official implementation for "Low-light Image Enhancement via Breaking Down the Darkness"

A deep neural networks for images using CNN algorithm.

PRIN/SPRIN: On Extracting Point-wise Rotation Invariant Features