No-Reference Image Quality Assessment via Transformers, Relative Ranking, and Self-Consistency

Last update: Dec 30, 2022

Related tags

Deep Learning TReS

Overview

This repository contains the implementation for the paper:

No-Reference Image Quality Assessment via Transformers, Relative Ranking, and Self-Consistency (WACV 2022) Video

Creat Environment

This code is train and test on Ubuntu 16.04 while using Anaconda, python 3.6.6, and pytorch 1.8.0. To set up the evironment run: conda env create -f environment.yml after installing the virtuall env you should be able to run python -c "import torch; print(torch.__version__)" in the terminal and see 1.8.0

Datasets

In this work we use 7 datasets for evaluation (LIVE, CSIQ, TID2013, KADID10K, CLIVE, KonIQ, LIVEFB)

To start training please make sure to follow the correct folder structure for each of the aformentioned datasets as provided bellow:

LIVE

live
    |--fastfading
    |    |  ...     
    |--blur
    |    |  ... 
    |--jp2k
    |    |  ...     
    |--jpeg
    |    |  ...     
    |--wn
    |    |  ...     
    |--refimgs
    |    |  ...     
    |--dmos.mat
    |--dmos_realigned.mat
    |--refnames_all.mat
    |--readme.txt

CSIQ

csiq
    |--dst_imgs_all
    |    |--1600.AWGN.1.png
    |    |  ... (you need to put all the distorted images here)
    |--src_imgs
    |    |--1600.png
    |    |  ...
    |--csiq.DMOS.xlsx
    |--csiq_label.txt

TID2013

tid2013
    |--distorted_images
    |--reference_images
    |--mos.txt
    |--mos_std.txt
    |--mos_with_names.txt
    |--readme

KADID10K

kadid10k
    |--distorted_images
    |    |--I01_01_01.png
    |    |  ...    
    |--reference_images
    |    |--I01.png
    |    |  ...    
    |--dmos.csv
    |--mv.sh.save
    |--mvv.sh

CLIVE

clive
    |--Data
    |    |--I01_01_01.png
    |    |  ...    
    |--Images
    |    |--I01.png
    |    |  ...    
    |--ChallengeDB_release
    |    |--README.txt
    |--dmos.csv
    |--mv.sh.save
    |--mvv.sh

KonIQ

fblive
   |--1024x768
   |    |  992920521.jpg 
   |    |  ... (all the images should be here)     
   |--koniq10k_scores_and_distributions.csv

LIVEFB

fblive
   |--FLIVE
   |    |  AVA__149.jpg    
   |    |  ... (all the images should be here)     
   |--labels_image.csv

Training

The training scrips are provided in the run.sh. Please change the paths correspondingly. Please note that to achive the same performace the parameters should match the ones in the run.sh files.

Pretrained models

The pretrain models are provided here.

Acknowledgement

This code is borrowed parts from HyperIQA and DETR.

FAQs

What is the difference between self-consistency and ensembling? and will the self-consistency increase the interface time?

In ensampling methods, we need to have several models (with different initializations) and ensemble the results during the training and testing, but in our self-consistency model, we enforce one model to have consistent performance for one network during the training while the network has an input with different transformations. Our self-consistency model has the same interface time/parameters in the testing similar to the model without self-consistency. In other words, we are not adding any new parameters to the network and it won't affect the interface.

What is the difference between self-consistency and augmentation?

In augmentation, we augment an input and send it to one network, so although the network will become robust to different augmentation, it will never have the chance of enforcing the outputs to be the same for different versions of an input at the same time. In our self-consistency approach, we force the network to have a similar output for an image with a different transformation (in our case horizontal flipping) which leads to more robust performance. Please also note that we still use augmentation during the training, so our model is benefiting from the advantages of both augmentation and self-consistency. Also, please see Fig. 1 in the main paper, where we showed that models that used augmentation alone are sensitive to simple transformations.

Why does the relative ranking loss apply to the samples with the highest and lowest quality scores, why not applying it to all the samples?

1) We did not see a significant improvement by applying our ranking loss to all the samples within each batch compared to the case that we just use extreme cases. 2) Considering more samples lead to more gradient back-propagation and therefore more computation during the training which causes slower training.

Citation

If you find this work useful for your research, please cite our paper:

@InProceedings{golestaneh2021no,
  title={No-Reference Image Quality Assessment via Transformers, Relative Ranking, and Self-Consistency},
  author={Golestaneh, S Alireza and Dadsetan, Saba and Kitani, Kris M},
  booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
  pages={3209--3218},
  year={2022}
}

If you have any questions about our work, please do not hesitate to contact [email protected]

No-Reference Image Quality Assessment via Transformers, Relative Ranking, and Self-Consistency

Related tags

Overview

Creat Environment

Datasets

Training

Pretrained models

Acknowledgement

FAQs

Citation

Owner

Alireza Golestaneh

A Neural Net Training Interface on TensorFlow, with focus on speed + flexibility

Ensemble Knowledge Guided Sub-network Search and Fine-tuning for Filter Pruning

The code for 'Deep Residual Fourier Transformation for Single Image Deblurring'

DropNAS: Grouped Operation Dropout for Differentiable Architecture Search

This repository is for EMNLP 2021 paper: It is Not as Good as You Think! Evaluating Simultaneous Machine Translation on Interpretation Data

Automatically erase objects in the video, such as logo, text, etc.

Sample code and notebooks for Vertex AI, the end-to-end machine learning platform on Google Cloud

Official codebase for Legged Robots that Keep on Learning: Fine-Tuning Locomotion Policies in the Real World

Joint Versus Independent Multiview Hashing for Cross-View Retrieval[J] (IEEE TCYB 2021, PyTorch Code)

Direct Multi-view Multi-person 3D Human Pose Estimation

Smart edu-autobooking - Johnson @ DMI-UNICT study room self-booking system

ESTDepth: Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks (CVPR 2021)

RATCHET is a Medical Transformer for Chest X-ray Diagnosis and Reporting

Chinese Mandarin tts text-to-speech 中文 (普通话) 语音合成 , by fastspeech 2 , implemented in pytorch, using waveglow as vocoder,

A benchmark dataset for emulating atmospheric radiative transfer in weather and climate models with machine learning (NeurIPS 2021 Datasets and Benchmarks Track)

Code for "Diffusion is All You Need for Learning on Surfaces"

CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

MG-GCN: Scalable Multi-GPU GCN Training Framework

Code to replicate the key results from Exploring the Limits of Out-of-Distribution Detection

A way to store images in YAML.

No-Reference Image Quality Assessment via Transformers, Relative Ranking, and Self-Consistency

Related tags

Overview

Creat Environment

Datasets

Training

Pretrained models

Acknowledgement

FAQs

Citation

Owner

Alireza Golestaneh

A Neural Net Training Interface on TensorFlow, with focus on speed + flexibility

Ensemble Knowledge Guided Sub-network Search and Fine-tuning for Filter Pruning

The code for 'Deep Residual Fourier Transformation for Single Image Deblurring'

DropNAS: Grouped Operation Dropout for Differentiable Architecture Search

This repository is for EMNLP 2021 paper: It is Not as Good as You Think! Evaluating Simultaneous Machine Translation on Interpretation Data

Automatically erase objects in the video, such as logo, text, etc.

Sample code and notebooks for Vertex AI, the end-to-end machine learning platform on Google Cloud

Official codebase for Legged Robots that Keep on Learning: Fine-Tuning Locomotion Policies in the Real World

Joint Versus Independent Multiview Hashing for Cross-View Retrieval[J] (IEEE TCYB 2021, PyTorch Code)

Direct Multi-view Multi-person 3D Human Pose Estimation

Smart edu-autobooking - Johnson @ DMI-UNICT study room self-booking system

ESTDepth: Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks (CVPR 2021)

RATCHET is a Medical Transformer for Chest X-ray Diagnosis and Reporting

Chinese Mandarin tts text-to-speech 中文 (普通话) 语音 合成 , by fastspeech 2 , implemented in pytorch, using waveglow as vocoder,

A benchmark dataset for emulating atmospheric radiative transfer in weather and climate models with machine learning (NeurIPS 2021 Datasets and Benchmarks Track)

Code for "Diffusion is All You Need for Learning on Surfaces"

CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

MG-GCN: Scalable Multi-GPU GCN Training Framework

Code to replicate the key results from Exploring the Limits of Out-of-Distribution Detection

A way to store images in YAML.

Chinese Mandarin tts text-to-speech 中文 (普通话) 语音合成 , by fastspeech 2 , implemented in pytorch, using waveglow as vocoder,