Source code of AAAI 2022 paper "Towards End-to-End Image Compression and Analysis with Transformers".

Last update: Dec 21, 2022

Overview

Towards End-to-End Image Compression and Analysis with Transformers

Source code of our AAAI 2022 paper "Towards End-to-End Image Compression and Analysis with Transformers".

Usage

The code is run with Python 3.7, Pytorch 1.8.1, Timm 0.4.9 and Compressai 1.1.4.

Data preparation

Download and extract ImageNet train and val images from http://image-net.org/. The directory structure is the standard layout for the torchvision datasets.ImageFolder, and the training and validation data is expected to be in the train folder and val folder respectively:

/path/to/imagenet/
  train/
    class1/
      img1.jpeg
    class2/
      img2.jpeg
  val/
    class1/
      img3.jpeg
    class2/
      img4.jpeg

Pretrained model

The ./pretrained_model provides the pretrained model without compression.

Test

Please adjust --data-path and run sh test.sh:

python main.py --eval --resume ./pretrain_s/checkpoint.pth --model pretrained_model --data-path /path/to/imagenet/ --output_dir ./eval

The ./pretrain_s/checkpoint.pth can be downloaded from Baidu Netdisk, with access code aaai.

Train

Please adjust --data-path and run sh train.sh:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --model pretrained_model --no-model-ema --clip-grad 1.0 --batch-size 128 --num_workers 16 --data-path /path/to/imagenet/ --output_dir ./ckp_pretrain

Full model

The ./full_model provides the full model with compression.

Test

Please adjust --data-path and --resume, respectively. Run sh test.sh:

python main.py --eval --resume ./ckp_s_q1/checkpoint.pth --model full_model --no-pretrained --data-path /path/to/imagenet/ --output_dir ./eval

The ./ckp_s_q1/checkpoint.pth, ./ckp_s_q2/checkpoint.pth and ./ckp_s_q3/checkpoint.pth can be downloaded from Baidu Netdisk, with access code aaai.

Train

Please download ./pretrain_s/checkpoint.pth from Baidu Netdisk with access code aaai, adjust --data-path and --quality, respectively.

quality	alpha	beta
1	0.1	0.001
2	0.3	0.003
3	0.6	0.006

Run sh train.sh:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --model full_model --batch-size 128 --num_workers 16 --clip-grad 1.0 --quality 1 --data-path /path/to/imagenet/ --output_dir ./ckp_full

Citation

@InProceedings{Bai2022AAAI,
  title={Towards End-to-End Image Compression and Analysis with Transformers},
  author={Bai, Yuanchao and Yang, Xu and Liu, Xianming and Jiang, Junjun and Wang, Yaowei and Ji, Xiangyang and Gao, Wen},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  year={2022}
}

Source code of AAAI 2022 paper "Towards End-to-End Image Compression and Analysis with Transformers".

Related tags

Overview

Towards End-to-End Image Compression and Analysis with Transformers

Usage

Data preparation

Pretrained model

Full model

Citation

Owner

The official repo of the CVPR 2021 paper Group Collaborative Learning for Co-Salient Object Detection .

Deep-learning X-Ray Micro-CT image enhancement, pore-network modelling and continuum modelling

[ICML 2022] The official implementation of Graph Stochastic Attention (GSAT).

Code for Learning to Segment The Tail (LST)

Forest R-CNN: Large-Vocabulary Long-Tailed Object Detection and Instance Segmentation (ACM MM 2020)

Python package for downloading ECMWF reanalysis data and converting it into a time series format.

Just Go with the Flow: Self-Supervised Scene Flow Estimation

PhysCap: Physically Plausible Monocular 3D Motion Capture in Real Time

This repository contains code to run experiments in the paper "Signal Strength and Noise Drive Feature Preference in CNN Image Classifiers."

Flexible Option Learning - NeurIPS 2021

Predicting lncRNA–protein interactions based on graph autoencoders and collaborative training

Framework for Spectral Clustering on the Sparse Coefficients of Learned Dictionaries

MetaBalance: High-Performance Neural Networks for Class-Imbalanced Data

Official PyTorch Implementation of Learning Self-Similarity in Space and Time as Generalized Motion for Video Action Recognition, ICCV 2021

Semantic segmentation models, datasets and losses implemented in PyTorch.

TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.

This repository is for our EMNLP 2021 paper "Automated Generation of Accurate & Fluent Medical X-ray Reports"

Python-based Informatics Kit for Analysing Chemical Units

DAN: Unfolding the Alternating Optimization for Blind Super Resolution

Hybrid Neural Fusion for Full-frame Video Stabilization

Source code of AAAI 2022 paper "Towards End-to-End Image Compression and Analysis with Transformers".

Related tags

Overview

Towards End-to-End Image Compression and Analysis with Transformers

Usage

Data preparation

Pretrained model

Full model

Citation

Owner

The official repo of the CVPR 2021 paper Group Collaborative Learning for Co-Salient Object Detection .

Deep-learning X-Ray Micro-CT image enhancement, pore-network modelling and continuum modelling

[ICML 2022] The official implementation of Graph Stochastic Attention (GSAT).

Code for Learning to Segment The Tail (LST)

Forest R-CNN: Large-Vocabulary Long-Tailed Object Detection and Instance Segmentation (ACM MM 2020)

Python package for downloading ECMWF reanalysis data and converting it into a time series format.

Just Go with the Flow: Self-Supervised Scene Flow Estimation

PhysCap: Physically Plausible Monocular 3D Motion Capture in Real Time

This repository contains code to run experiments in the paper "Signal Strength and Noise Drive Feature Preference in CNN Image Classifiers."

Flexible Option Learning - NeurIPS 2021

Predicting lncRNA–protein interactions based on graph autoencoders and collaborative training

Framework for Spectral Clustering on the Sparse Coefficients of Learned Dictionaries

MetaBalance: High-Performance Neural Networks for Class-Imbalanced Data

Official PyTorch Implementation of Learning Self-Similarity in Space and Time as Generalized Motion for Video Action Recognition, ICCV 2021

Semantic segmentation models, datasets and losses implemented in PyTorch.

​TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.

This repository is for our EMNLP 2021 paper "Automated Generation of Accurate & Fluent Medical X-ray Reports"

Python-based Informatics Kit for Analysing Chemical Units

DAN: Unfolding the Alternating Optimization for Blind Super Resolution

Hybrid Neural Fusion for Full-frame Video Stabilization

TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.