Source code of AAAI 2022 paper "Towards End-to-End Image Compression and Analysis with Transformers".

Last update: Dec 21, 2022

Overview

Towards End-to-End Image Compression and Analysis with Transformers

Source code of our AAAI 2022 paper "Towards End-to-End Image Compression and Analysis with Transformers".

Usage

The code is run with Python 3.7, Pytorch 1.8.1, Timm 0.4.9 and Compressai 1.1.4.

Data preparation

Download and extract ImageNet train and val images from http://image-net.org/. The directory structure is the standard layout for the torchvision datasets.ImageFolder, and the training and validation data is expected to be in the train folder and val folder respectively:

/path/to/imagenet/
  train/
    class1/
      img1.jpeg
    class2/
      img2.jpeg
  val/
    class1/
      img3.jpeg
    class2/
      img4.jpeg

Pretrained model

The ./pretrained_model provides the pretrained model without compression.

Test

Please adjust --data-path and run sh test.sh:

python main.py --eval --resume ./pretrain_s/checkpoint.pth --model pretrained_model --data-path /path/to/imagenet/ --output_dir ./eval

The ./pretrain_s/checkpoint.pth can be downloaded from Baidu Netdisk, with access code aaai.

Train

Please adjust --data-path and run sh train.sh:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --model pretrained_model --no-model-ema --clip-grad 1.0 --batch-size 128 --num_workers 16 --data-path /path/to/imagenet/ --output_dir ./ckp_pretrain

Full model

The ./full_model provides the full model with compression.

Test

Please adjust --data-path and --resume, respectively. Run sh test.sh:

python main.py --eval --resume ./ckp_s_q1/checkpoint.pth --model full_model --no-pretrained --data-path /path/to/imagenet/ --output_dir ./eval

The ./ckp_s_q1/checkpoint.pth, ./ckp_s_q2/checkpoint.pth and ./ckp_s_q3/checkpoint.pth can be downloaded from Baidu Netdisk, with access code aaai.

Train

Please download ./pretrain_s/checkpoint.pth from Baidu Netdisk with access code aaai, adjust --data-path and --quality, respectively.

quality	alpha	beta
1	0.1	0.001
2	0.3	0.003
3	0.6	0.006

Run sh train.sh:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --model full_model --batch-size 128 --num_workers 16 --clip-grad 1.0 --quality 1 --data-path /path/to/imagenet/ --output_dir ./ckp_full

Citation

@InProceedings{Bai2022AAAI,
  title={Towards End-to-End Image Compression and Analysis with Transformers},
  author={Bai, Yuanchao and Yang, Xu and Liu, Xianming and Jiang, Junjun and Wang, Yaowei and Ji, Xiangyang and Gao, Wen},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  year={2022}
}

Source code of AAAI 2022 paper "Towards End-to-End Image Compression and Analysis with Transformers".

Related tags

Overview

Towards End-to-End Image Compression and Analysis with Transformers

Usage

Data preparation

Pretrained model

Full model

Citation

Owner

This is the official implementation of Elaborative Rehearsal for Zero-shot Action Recognition (ICCV2021)

Multi-Scale Geometric Consistency Guided Multi-View Stereo

TransMIL: Transformer based Correlated Multiple Instance Learning for Whole Slide Image Classification

Finding Donors for CharityML

All the code and files related to the MI-Lab of UE19CS305 course in sem 5

Contains modeling practice materials and homework for the Computational Neuroscience course at Okinawa Institute of Science and Technology

这个开源项目主要是对经典的时间序列预测算法论文进行复现，模型主要参考自GluonTS，框架主要参考自Informer

A semantic segmentation toolbox based on PyTorch

This repository for project that can Automate Number Plate Recognition (ANPR) in Morocco Licensed Vehicles. 💻 + 🚙 + 🇲🇦 = 🤖 🕵🏻‍♂️

Open-source python package for the extraction of Radiomics features from 2D and 3D images and binary masks.

基于深度强化学习的原神自动钓鱼AI

GPU implementation of $k$-Nearest Neighbors and Shared-Nearest Neighbors

TensorFlow Ranking is a library for Learning-to-Rank (LTR) techniques on the TensorFlow platform

Instance Semantic Segmentation List

Official implementation for "Symbolic Learning to Optimize: Towards Interpretability and Scalability"

TensorFlow2 Classification Model Zoo playing with TensorFlow2 on the CIFAR-10 dataset.

Notification Triggers for Python

A CROSS-MODAL FUSION NETWORK BASED ON SELF-ATTENTION AND RESIDUAL STRUCTURE FOR MULTIMODAL EMOTION RECOGNITION

An open source implementation of CLIP.

PyTorch code for DriveGAN: Towards a Controllable High-Quality Neural Simulation