Implementation of Multistream Transformers in Pytorch

Last update: Jul 26, 2022

Overview

Multistream Transformers

Implementation of Multistream Transformers in Pytorch.

This repository deviates slightly from the paper, where instead of using the skip connection across all streams, it uses attention pooling across all tokens in the same position. This has produced the best results in my experiments with number of streams greater than 2.

Install

$ pip install multistream-transformers

Usage

import torch
from multistream_transformers import MultistreamTransformer

model = MultistreamTransformer(
    num_tokens = 256,         # number of tokens
    dim = 512,                # dimension
    depth = 4,                # depth
    causal = True,            # autoregressive or not
    max_seq_len = 1024,       # maximum sequence length
    num_streams = 2           # number of streams - 1 would make it a regular transformer
)

x = torch.randint(0, 256, (2, 1024))
mask = torch.ones((2, 1024)).bool()

logits = model(x, mask = mask) # (2, 1024, 256)

Citations

@misc{burtsev2021multistream,
    title   = {Multi-Stream Transformers}, 
    author  = {Mikhail Burtsev and Anna Rumshisky},
    year    = {2021},
    eprint  = {2107.10342},
    archivePrefix = {arXiv},
    primaryClass = {cs.CL}
}

You might also like...

official Pytorch implementation of ICCV 2021 paper FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting.

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting By Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu

77 Dec 27, 2022

PyTorch Implementation of "Light Field Image Super-Resolution with Transformers"

LFT PyTorch implementation of "Light Field Image Super-Resolution with Transformers", arXiv 2021. [pdf]. Contributions: We make the first attempt to a

62 Nov 28, 2022

Pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering".

TRAnsformer Routing Networks (TRAR) This is an official implementation for ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visu

49 Nov 10, 2022

This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task

🌈 ERASOR (RA-L'21 with ICRA Option) Official page of "ERASOR: Egocentric Ratio of Pseudo Occupancy-based Dynamic Object Removal for Static 3D Point C

225 Dec 29, 2022

This is the official PyTorch implementation for

Implementation of Multistream Transformers in Pytorch

Related tags

Overview

Multistream Transformers

Install

Usage

Citations

You might also like...

official Pytorch implementation of ICCV 2021 paper FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting.

PyTorch Implementation of "Light Field Image Super-Resolution with Transformers"

Pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering".

This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task

This is the official PyTorch implementation for "Mesa: A Memory-saving Training Framework for Transformers".

Code implementation from my Medium blog post: [Transformers from Scratch in PyTorch]

[CVPR'22] Official PyTorch Implementation of Collaborative Transformers for Grounded Situation Recognition

Explainability for Vision Transformers (in PyTorch)

PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO

Releases(0.0.4)

0.0.4(Jul 31, 2021)

0.0.3(Jul 31, 2021)

0.0.2(Jul 30, 2021)

0.0.1(Jul 30, 2021)

Owner

Phil Wang

JASS: Japanese-specific Sequence to Sequence Pre-training for Neural Machine Translation

A pytorch implementation of Detectron. Both training from scratch and inferring directly from pretrained Detectron weights are available.

Home for cuQuantum Python & NVIDIA cuQuantum SDK C++ samples

Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition

The repo of the preprinting paper "Labels Are Not Perfect: Inferring Spatial Uncertainty in Object Detection"

TensorFlow implementation of Style Transfer Generative Adversarial Networks: Learning to Play Chess Differently.

Official code of "Mitigating the Mutual Error Amplification for Semi-Supervised Object Detection"

[CVPR 2019 Oral] Multi-Channel Attention Selection GAN with Cascaded Semantic Guidance for Cross-View Image Translation

Simple torch.nn.module implementation of Alias-Free-GAN style filter and resample

a project for 3D multi-object tracking

Run PowerShell command without invoking powershell.exe

This is a Tensorflow implementation of Learning to See in the Dark in CVPR 2018

GoodNews Everyone! Context driven entity aware captioning for news images

A library for using chemistry in your applications

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

UniLM AI - Large-scale Self-supervised Pre-training across Tasks, Languages, and Modalities

Deploying PyTorch Model to Production with FastAPI in CUDA-supported Docker

Dynamic Neural Representational Decoders for High-Resolution Semantic Segmentation

Simulation-based performance analysis of server-less Blockchain-enabled Federated Learning

Finite Element Analysis