You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling

Related tags

Deep LearningYOSO
Overview

You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling

Transformer-based models are widely used in natural language processing (NLP). Central to the transformer model is the self-attention mechanism, which captures the interactions of token pairs in the input sequences and depends quadratically on the sequence length. Training such models on longer sequences is expensive. In this paper, we show that a Bernoulli sampling attention mechanism based on Locality Sensitive Hash- ing (LSH), decreases the quadratic complexity of such models to linear. We bypass the quadratic cost by considering self-attention as a sum of individual tokens associated with Bernoulli random variables that can, in principle, be sampled at once by a single hash (although in practice, this number may be a small constant). This leads to an efficient sampling scheme to estimate self-attention which relies on specific modifications of LSH (to enable deployment on GPU architectures).

Requirements

docker, nvidia-docker

Start Docker Container

Under YOSO folder, run

docker run --ipc=host --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES= -v "$PWD:/workspace" -it mlpen/transformers:4

For Nvidia's 30 series GPU, run

docker run --ipc=host --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES= -v "$PWD:/workspace" -it mlpen/transformers:5

Then, the YOSO folder is mapped to /workspace in the container.

BERT

Datasets

To be updated

Pre-training

To start pre-training of a specific configuration: create a folder YOSO/BERT/models/ (for example, bert-small) and write YOSO/BERT/models/ /config.json to specify model and training configuration, then under YOSO/BERT folder, run

python3 run_pretrain.py --model 
   

   

The command will create a YOSO/BERT/models/ /model folder holding all checkpoints and log file.

Pre-training from Different Model's Checkpoint

Copy a checkpoint (one of .model or .cp file) from YOSO/BERT/models/ /model folder to YOSO/BERT/models/ folder and add a key-value pair in YOSO/BERT/models/ /config.json : "from_cp": " " . One example is shown in YOSO/BERT/models/bert-small-4096/config.json. This procedure also works for extending the max sequence length of a model (For example, use bert-small pre-trained weights as initialization for bert-small-4096).

GLUE Fine-tuning

Under YOSO/BERT folder, run

python3 run_glue.py --model 
   
     --batch_size 
    
      --lr 
     
       --task 
      
        --checkpoint 
        
       
      
     
    
   

For example,

python3 run_glue.py --model bert-small --batch_size 32 --lr 3e-5 --task MRPC --checkpoint cp-0249.model

The command will create a log file in YOSO/BERT/models/ /model .

Long Range Arena Benchmark

Datasets

To be updated

Run Evaluations

To start evaluation of a specific model on a task in LRA benchmark:

  • Create a folder YOSO/LRA/models/ (for example, softmax)
  • Write YOSO/LRA/models/ /config.json to specify model and training configuration

Under YOSO/LRA folder, run

python3 run_task.py --model 
   
     --task 
    

    
   

For example, run

python3 run_task.py --model softmax --task listops

The command will create a YOSO/LRA/models/ /model folder holding the best validation checkpoint and log file. After completion, the test set accuracy can be found in the last line of the log file.

RoBERTa

Datasets

To be updated

Pre-training

To start pretraining of a specific configuration:

  • Create a folder YOSO/RoBERTa/models/ (for example, bert-small)
  • Write YOSO/RoBERTa/models/ /config.json to specify model and training configuration

Under YOSO/RoBERTa folder, run

python3 run_pretrain.py --model 
   

   

For example, run

python3 run_pretrain.py --model bert-small

The command will create a YOSO/RoBERTa/models/ /model folder holding all checkpoints and log file.

GLUE Fine-tuning

To fine-tune model on GLUE tasks:

Under YOSO/RoBERTa folder, run

python3 run_glue.py --model 
   
     --batch_size 
    
      --lr 
     
       --task 
      
        --checkpoint 
        
       
      
     
    
   

For example,

python3 run_glue.py --model bert-small --batch_size 32 --lr 3e-5 --task MRPC --checkpoint 249

The command will create a log file in YOSO/RoBERTa/models/ /model .

Citation

@article{zeng2021yoso,
  title={You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling},
  author={Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh},
  booktitle={Proceedings of the International Conference on Machine Learning},
  year={2021}
}
Owner
Zhanpeng Zeng
Zhanpeng Zeng
Progressive Coordinate Transforms for Monocular 3D Object Detection

Progressive Coordinate Transforms for Monocular 3D Object Detection This repository is the official implementation of PCT. Introduction In this paper,

58 Nov 06, 2022
Project page for End-to-end Recovery of Human Shape and Pose

End-to-end Recovery of Human Shape and Pose Angjoo Kanazawa, Michael J. Black, David W. Jacobs, Jitendra Malik CVPR 2018 Project Page Requirements Pyt

1.4k Dec 29, 2022
Extension to fastai for volumetric medical data

FAIMED 3D use fastai to quickly train fully three-dimensional models on radiological data Classification from faimed3d.all import * Load data in vari

Keno 26 Aug 22, 2022
Reproducing code of hair style replacement method from Barbershorp.

Barbershorp Reproducing code of hair style replacement method from Barbershorp. Also reproduces II2S, an improved version of Image2StyleGAN. Requireme

1 Dec 24, 2021
A C implementation for creating 2D voronoi diagrams

Branch OSX/Linux Windows master dev jc_voronoi A fast C/C++ header only implementation for creating 2D Voronoi diagrams from a point set Uses Fortune'

Mathias Westerdahl 481 Dec 29, 2022
The final project of "Applying AI to EHR Data" of "AI for Healthcare" nanodegree - Udacity.

Patient Selection for Diabetes Drug Testing Project Overview EHR data is becoming a key source of real-world evidence (RWE) for the pharmaceutical ind

Omar Laham 1 Jan 14, 2022
Label Mask for Multi-label Classification

LM-MLC 一种基于完型填空的多标签分类算法 1 前言 本文主要介绍本人在全球人工智能技术创新大赛【赛道一】设计的一种基于完型填空(模板)的多标签分类算法:LM-MLC,该算法拟合能力很强能感知标签关联性,在多个数据集上测试表明该算法与主流算法无显著性差异,在该比赛数据集上的dev效果很好,但是由

52 Nov 20, 2022
[MedIA2021]MIDeepSeg: Minimally Interactive Segmentation of Unseen Objects from Medical Images Using Deep Learning

MIDeepSeg: Minimally Interactive Segmentation of Unseen Objects from Medical Images Using Deep Learning [MedIA or Arxiv] and [Demo] This repository pr

Healthcare Intelligence Laboratory 92 Dec 08, 2022
Liver segmentation using MONAI and pytorch

Machine Learning use case in the field of Healthcare. In this project MONAI and pytorch frameworks are used for 3D Liver segmentation.

Abhishek Gajbhiye 2 May 30, 2022
Capsule endoscopy detection DACON challenge

capsule_endoscopy_detection (DACON Challenge) Overview Yolov5, Yolor, mmdetection기반의 모델을 사용 (총 11개 모델 앙상블) 모든 모델은 학습 시 Pretrained Weight을 yolov5, yolo

MAILAB 11 Nov 25, 2022
Direct Multi-view Multi-person 3D Human Pose Estimation

Implementation of NeurIPS-2021 paper: Direct Multi-view Multi-person 3D Human Pose Estimation [paper] [video-YouTube, video-Bilibili] [slides] This is

Sea AI Lab 251 Dec 30, 2022
SCALoss: Side and Corner Aligned Loss for Bounding Box Regression (AAAI2022).

SCALoss PyTorch implementation of the paper "SCALoss: Side and Corner Aligned Loss for Bounding Box Regression" (AAAI 2022). Introduction IoU-based lo

TuZheng 20 Sep 07, 2022
Reimplementation of Dynamic Multi-scale filters for Semantic Segmentation.

Paddle implementation of Dynamic Multi-scale filters for Semantic Segmentation.

Hongqiang.Wang 2 Nov 01, 2021
Pytorch Lightning code guideline for conferences

Deep learning project seed Use this seed to start new deep learning / ML projects. Built in setup.py Built in requirements Examples with MNIST Badges

Pytorch Lightning 1k Jan 02, 2023
Cards Against Humanity AI

cah-ai This is a Cards Against Humanity AI implemented using a pre-trained Semantic Search model. How it works A player is described by a combination

Alex Nichol 2 Aug 22, 2022
The official repository for "Revealing unforeseen diagnostic image features with deep learning by detecting cardiovascular diseases from apical four-chamber ultrasounds"

Revealing unforeseen diagnostic image features with deep learning by detecting cardiovascular diseases from apical four-chamber ultrasounds The why Im

3 Mar 29, 2022
A program that uses computer vision to detect hand gestures, used for controlling movie players.

HandGestureDetection This program uses a Haar Cascade algorithm to detect the presence of your hand, and then passes it on to a self-created and self-

2 Nov 22, 2022
Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts

t5-japanese Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts. The following is a list of models that

Kimio Kuramitsu 1 Dec 13, 2021
This repository provides data for the VAW dataset as described in the CVPR 2021 paper titled "Learning to Predict Visual Attributes in the Wild"

Visual Attributes in the Wild (VAW) This repository provides data for the VAW dataset as described in the CVPR 2021 Paper: Learning to Predict Visual

Adobe Research 36 Dec 30, 2022
Original Pytorch Implementation of FLAME: Facial Landmark Heatmap Activated Multimodal Gaze Estimation

FLAME Original Pytorch Implementation of FLAME: Facial Landmark Heatmap Activated Multimodal Gaze Estimation, accepted at the 17th IEEE Internation Co

Neelabh Sinha 19 Dec 17, 2022