Sequencer: Deep LSTM for Image Classification

Related tags

Audiosequencer
Overview

Sequencer: Deep LSTM for Image Classification

arXiv Support Ukraine

Created by

This repository contains implementation for Sequencer.

Abstract

In recent computer vision research, the advent of the Vision Transformer (ViT) has rapidly revolutionized various architectural design efforts: ViT achieved state-of-the-art image classification performance using self-attention found in natural language processing, and MLP-Mixer achieved competitive performance using simple multi-layer perceptrons. In contrast, several studies have also suggested that carefully redesigned convolutional neural networks (CNNs) can achieve advanced performance comparable to ViT without resorting to these new ideas. Against this background, there is growing interest in what inductive bias is suitable for computer vision. Here we propose Sequencer, a novel and competitive architecture alternative to ViT that provides a new perspective on these issues. Unlike ViTs, Sequencer models long-range dependencies using LSTMs rather than self-attention layers. We also propose a two-dimensional version of Sequencer module, where an LSTM is decomposed into vertical and horizontal LSTMs to enhance performance. Despite its simplicity, several experiments demonstrate that Sequencer performs impressively well: Sequencer2D-L, with 54M parameters, realizes 84.6% top-1 accuracy on only ImageNet-1K. Not only that, we show that it has good transferability and the robust resolution adaptability on double resolution-band.

Schematic diagrams

The overall architecture of Sequencer2D is similar to the typical hierarchical ViT and Visual MLP. It uses Sequencer2D blocks instead of Transformer blocks:

Sequencer

Sequencer2D block replaces the Transformer's self-attention layer with an LSTM-based layer like BiLSTM2D layer:

Sequencer2D

BiLSTM2D includes a vertical LSTM and a horizontal LSTM:

BiLSTM2D

Model Zoo

We provide our Sequencer models pretrained on ImageNet-1K:

name arch Params FLOPs [email protected] download
Sequencer2D-S sequencer2d_s 28M 8.4G 82.3 here
Sequencer2D-M sequencer2d_m 38M 11.1G 82.8 here
Sequencer2D-L sequencer2d_l 54M 16.6G 83.4 here

Usage

Requirements

  • torch>=1.10.0
  • torchvision
  • timm==0.5.4
  • Pillow
  • matplotlib
  • scipy
  • etc., see requirements.txt

Data preparation

Download and extract ImageNet images. The directory structure should be as follows.

│imagenet/
├──train/
│  ├── n01440764
│  │   ├── n01440764_10026.JPEG
│  │   ├── n01440764_10027.JPEG
│  │   ├── ......
│  ├── ......
├──val/
│  ├── n01440764
│  │   ├── ILSVRC2012_val_00000293.JPEG
│  │   ├── ILSVRC2012_val_00002138.JPEG
│  │   ├── ......
│  ├── ......

Traning

Command line for training Sequencer models on ImageNet from scratch.

./distributed_train.sh 8 /path/to/imagenet --model sequencer2d_s -b 256 -j 8 --opt adamw --epochs 300 --sched cosine --native-amp --img-size 224 --drop-path 0.1 --lr 2e-3 --weight-decay 0.05 --remode pixel --reprob 0.25 --aa rand-m9-mstd0.5-inc1 --smoothing 0.1 --mixup 0.8 --cutmix 1.0 --warmup-lr 1e-6 --warmup-epochs 20

Command line for fine-tuning a pre-trained model at higher resolution.

./distributed_train.sh 8 /path/to/imagenet --model sequencer2d_l --pretrained -b 64 -j 8 --opt adamw --epochs 30 --sched cosine --native-amp --input-size 3 392 392 --img-size 392 --crop-pct 1.0 --drop-path 0.4 --lr 5e-5 --weight-decay 1e-8 --remode pixel --reprob 0.25 --aa rand-m9-mstd0.5-inc1 --smoothing 0.1 --mixup 0.8 --cutmix 1.0 --warmup-epochs 0 --cooldown-epochs 0

Command line for fine-tuning a pre-trained model on a transfer learning dataset.

./distributed_train.sh 4 /path/to/cifar10 --model sequencer2d_s -b 128 -j 4 --num-classes 10 --dataset torch/cifar10 --pretrained --opt adamw --epochs 200 --sched cosine --native-amp --img-size 224 --clip-grad 1 --drop-path 0.1 --lr 0.0001 --weight-decay 1e-4 --remode pixel --aa rand-m9-mstd0.5-inc1 --smoothing 0.1 --mixup 0.8 --cutmix 1.0 --warmup-lr 1e-6 --warmup-epochs 5

Validation

To evaluate our Sequencer models, run:

python validate.py /path/to/imagenet --model sequencer2d_s -b 16 --input-size 3 224 224 --amp

Reference

You may want to cite:

@article{tatsunami2022sequencer,
  title={Sequencer: Deep LSTM for Image Classification},
  author={Tatsunami, Yuki and Taki, Masato},
  journal={arXiv preprint arXiv:2205.01972},
  year={2022}
}

Acknowledgment

This implementation is based on pytorch-image-models by Ross Wightman. We thank for his brilliant work.

We thank Graduate School of Artificial Intelligence and Science, Rikkyo University (Rikkyo AI) which supports us with computational resources, facilities, and others. logo-rikkyo-ai
AnyTech Co. Ltd. provided valuable comments on the early versions and encouragement. We thank them for their cooperation. In particular, We thank Atsushi Fukuda for organizing discussion opportunities. logo-anytech
You might also like...
Simple-Image-Classification - Simple Image Classification Code (PyTorch)
Simple-Image-Classification - Simple Image Classification Code (PyTorch)

Simple-Image-Classification Simple Image Classification Code (PyTorch) Yechan Kim This repository contains: Python3 / Pytorch code for multi-class ima

Image Classification - A research on image classification and auto insurance claim prediction, a systematic experiments on modeling techniques and approaches

A research on image classification and auto insurance claim prediction, a systematic experiments on modeling techniques and approaches

A resource for learning about deep learning techniques from regression to LSTM and Reinforcement Learning using financial data and the fitness functions of algorithmic trading

A tour through tensorflow with financial data I present several models ranging in complexity from simple regression to LSTM and policy networks. The s

Deep learning based hand gesture recognition using LSTM and MediaPipie.
Deep learning based hand gesture recognition using LSTM and MediaPipie.

Hand Gesture Recognition Deep learning based hand gesture recognition using LSTM and MediaPipie. Demo video using PingPong Robot Files Pretrained mode

Image Captioning using CNN ,LSTM and Attention

Image Captioning using CNN ,LSTM and Attention This is a deeplearning model which tries to summarize an image into a text . Installation Install this

End-to-end image captioning with EfficientNet-b3 + LSTM with Attention

Image captioning End-to-end image captioning with EfficientNet-b3 + LSTM with Attention Model is seq2seq model. In the encoder pretrained EfficientNet

Deep Image Search is an AI-based image search engine that includes deep transfor learning features Extraction and tree-based vectorized search.
Deep Image Search is an AI-based image search engine that includes deep transfor learning features Extraction and tree-based vectorized search.

Deep Image Search - AI-Based Image Search Engine Deep Image Search is an AI-based image search engine that includes deep transfer learning features Ex

The official implementation of the IEEE S&P`22 paper "SoK: How Robust is Deep Neural Network Image Classification Watermarking".

Watermark-Robustness-Toolbox - Official PyTorch Implementation This repository contains the official PyTorch implementation of the following paper to

Automatic deep learning for image classification.

AutoDL AutoDL automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications. With just a few line

paper: Hyperspectral Remote Sensing Image Classification Using Deep Convolutional Capsule Network

DC-CapsNet This is a tensorflow and keras based implementation of DC-CapsNet for HSI in the Remote Sensing Letters R. Lei et al., "Hyperspectral Remot

Using LSTM write Tang poetry
Using LSTM write Tang poetry

本教程将通过一个示例对LSTM进行介绍。通过搭建训练LSTM网络,我们将训练一个模型来生成唐诗。本文将对该实现进行详尽的解释,并阐明此模型的工作方式和原因。并不需要过多专业知识,但是可能需要新手花一些时间来理解的模型训练的实际情况。为了节省时间,请尽量选择GPU进行训练。

Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging and so on.
Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging and so on.

anaGo anaGo is a Python library for sequence labeling(NER, PoS Tagging,...), implemented in Keras. anaGo can solve sequence labeling tasks such as nam

Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging and so on.
Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging and so on.

anaGo anaGo is a Python library for sequence labeling(NER, PoS Tagging,...), implemented in Keras. anaGo can solve sequence labeling tasks such as nam

Tensorflow-based CNN+LSTM trained with CTC-loss for OCR

Overview This collection demonstrates how to construct and train a deep, bidirectional stacked LSTM using CNN features as input with CTC loss to perfo

CNN+LSTM+CTC based OCR implemented using tensorflow.

CNN_LSTM_CTC_Tensorflow CNN+LSTM+CTC based OCR(Optical Character Recognition) implemented using tensorflow. Note: there is No restriction on the numbe

A small C++ implementation of LSTM networks, focused on OCR.

clstm CLSTM is an implementation of the LSTM recurrent neural network model in C++, using the Eigen library for numerical computations. Status and sco

OHLC Average Prediction of Apple Inc. Using LSTM Recurrent Neural Network
OHLC Average Prediction of Apple Inc. Using LSTM Recurrent Neural Network

Stock Price Prediction of Apple Inc. Using Recurrent Neural Network OHLC Average Prediction of Apple Inc. Using LSTM Recurrent Neural Network Dataset:

Using multidimensional LSTM neural networks to create a forecast for Bitcoin price

Multidimensional LSTM BitCoin Time Series Using multidimensional LSTM neural networks to create a forecast for Bitcoin price. For notes around this co

Multi-layer convolutional LSTM with Pytorch

Convolution_LSTM_pytorch Thanks for your attention. I haven't got time to maintain this repo for a long time. I recommend this repo which provides an

Comments
Releases(weights)
Owner
Yuki Tatsunami
Yuki Tatsunami
A2DP agent for promiscuous/permissive audio sinc.

Promiscuous Bluetooth audio sinc A2DP agent for promiscuous/permissive audio sinc for Linux. Once installed, a Bluetooth client, such as a smart phone

Jasper Aorangi 4 May 27, 2022
:sound: Play and Record Sound with Python :snake:

Play and Record Sound with Python This Python module provides bindings for the PortAudio library and a few convenience functions to play and record Nu

spatialaudio.net 750 Dec 31, 2022
An Amazon Music client for Linux (unpretentious)

Amusiz An Amazon Music client for Linux (unpretentious) ↗️ Install You can install Amusiz in multiple ways, choose your favorite. 🚀 AppImage Here you

Mirko Brombin 25 Nov 08, 2022
Play any song directly into your group voice chat.

Telegram VCPlayer Bot Play any song directly into your group voice chat. Official Bot : VCPlayerBot | Discussion Group : VoiceChat Music Player Suppor

Shubham Kumar 50 Nov 21, 2022
A fast MDCT implementation using SciPy and FFTs

MDCT A fast MDCT implementation using SciPy and FFTs Installation As usual pip install mdct Dependencies NumPy SciPy STFT Usage import mdct spectrum

Nils Werner 43 Sep 02, 2022
DeepMusic is an easy to use Spotify like app to manage and listen to your favorites musics.

DeepMusic is an easy to use Spotify like app to manage and listen to your favorites musics. Technically, this project is an Android Client and its ent

Labrak Yanis 1 Jul 12, 2021
Automatically move or copy files based on metadata associated with the files. For example, file your photos based on EXIF metadata or use MP3 tags to file your music files.

Automatically move or copy files based on metadata associated with the files. For example, file your photos based on EXIF metadata or use MP3 tags to file your music files.

Rhet Turnbull 14 Nov 02, 2022
Scalable audio processing framework written in Python with a RESTful API

TimeSide : scalable audio processing framework and server written in Python TimeSide is a python framework enabling low and high level audio analysis,

Parisson 340 Jan 04, 2023
Pythonic bindings for FFmpeg's libraries.

PyAV PyAV is a Pythonic binding for the FFmpeg libraries. We aim to provide all of the power and control of the underlying library, but manage the gri

PyAV 1.8k Jan 03, 2023
L-SpEx: Localized Target Speaker Extraction

L-SpEx: Localized Target Speaker Extraction The data configuration and simulation of L-SpEx. The code scripts will be released in the future. Data Gen

Meng Ge 20 Jan 02, 2023
❤️ This Is The EzilaXMusicPlayer Advaced Repo 🎵

Telegram EzilaXMusicPlayer Bot 🎵 A bot that can play music on telegram group's voice Chat ❤️ Requirements 📝 FFmpeg NodeJS nodesource.com Python 3.7+

Sadew Jayasekara 11 Nov 12, 2022
Frescobaldi LilyPond Editor

README for Frescobaldi Homepage: http://www.frescobaldi.org/ Main author: Wilbert Berendsen Frescobaldi is a LilyPond sheet music text editor. It aims

Frescobaldi 600 Dec 29, 2022
Audio pitch-shifting & re-sampling utility, based on the EMU SP-1200

Pitcher.py Free & OS emulation of the SP-12 & SP-1200 signal chain (now with GUI) Pitch shift / bitcrush / resample audio files Written and tested in

morgan 13 Oct 03, 2022
digital audio workstation, instrument and effect plugins, wave editor

digital audio workstation, instrument and effect plugins, wave editor

306 Jan 05, 2023
NovaMusic is a music sharing robot. Users can get music and music lyrics using inline queries.

A music sharing telegram robot using Redis database and Telebot python library using Redis database.

Hesam Norin 7 Oct 21, 2022
A simple music player, powered by Python, utilising various libraries such as Tkinter and Pygame

A simple music player, powered by Python, utilising various libraries such as Tkinter and Pygame

PotentialCoding 2 May 12, 2022
SU Music Player — The first open-source PyTgCalls based Pyrogram bot to play music in voice chats

SU Music Player — The first open-source PyTgCalls based Pyrogram bot to play music in voice chats Note Neither this, or PyTgCalls are fully

SU Projects 58 Jan 02, 2023
An AI for Music Generation

An AI for Music Generation

Hao-Wen Dong 1.3k Dec 31, 2022
Spotifyd - An open source Spotify client running as a UNIX daemon.

Spotifyd An open source Spotify client running as a UNIX daemon. Spotifyd streams music just like the official client, but is more lightweight and sup

8.5k Jan 09, 2023
PyAbsorp is a python module that has the main focus to help estimate the Sound Absorption Coefficient.

This is a package developed to be use to find the Sound Absorption Coefficient through some implemented models, like Biot-Allard, Johnson-Champoux and

Michael Markus Ackermann 8 Oct 19, 2022