ViT - Vision Transformer

This is an implementation of ViT - Vision Transformer by Google Research Team through the paper "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale"

Please install PyTorch with CUDA support following this link

ViT Architecture

Configs

You can config the network by yourself through the config.txt file

128     #batch_size
500     #epoch
0.001   #learning_rate
0.0001  #gamma
224     #img_size
16 	#patch_size
100	#num_class
768	#d_model
12	#n_head
12      #n_layers
3072    #d_mlp
3	#channels
0.	#dropout
cls	#pool

Training

Currently, you can only train this model on CIFAR-100 with the following commands:

> git clone https://github.com/quanmario0311/ViT_PyTorch.git
> cd ViT_PyTorch
> pip3 install -r requirements.txt
> python3 train.py

Suppport for other dataset and custom datasets will be updated later

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
module		module
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.txt		config.txt
model.py		model.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

module

module

.gitattributes

.gitattributes

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

config.txt

config.txt

model.py

model.py

requirements.txt

requirements.txt

train.py

train.py

Repository files navigation

ViT - Vision Transformer

ViT Architecture

Configs

Training

About

Releases

Packages

Languages

License

qnguyen3/ViT_PyTorch

Folders and files

Latest commit

History

Repository files navigation

ViT - Vision Transformer

ViT Architecture

Configs

Training

About

Topics

Resources

License

Stars

Watchers

Forks

Languages