TRIQ implementation

Overview

TRIQ Implementation

TF-Keras implementation of TRIQ as described in Transformer for Image Quality Assessment.

Installation

  1. Clone this repository.
  2. Install required Python packages. The code is developed by PyCharm in Python 3.7. The requirements.txt document is generated by PyCharm, and the code should also be run in latest versions of the packages.

Training a model

An example of training TRIQ can be seen in train/train_triq.py. Argparser should be used, but the authors prefer to use dictionary with parameters being defined. It is easy to convert to take arguments. In principle, the following parameters can be defined:

args = {}
args['multi_gpu'] = 0 # gpu setting, set to 1 for using multiple GPUs
args['gpu'] = 0  # If having multiple GPUs, specify which GPU to use

args['result_folder'] = r'..\databases\experiments' # Define result path
args['n_quality_levels'] = 5  # Choose between 1 (MOS prediction) and 5 (distribution prediction)

args['transformer_params'] = [2, 32, 8, 64]

args['train_folders'] =  # Define folders containing training images
    [
    r'..\databases\train\koniq_normal',
    r'..\databases\train\koniq_small',
    r'..\databases\train\live'
    ]
args['val_folders'] =  # Define folders containing testing images
    [
    r'..\databases\val\koniq_normal',
    r'..\databases\val\koniq_small',
    r'..\databases\val\live'
    ]
args['koniq_mos_file'] = r'..\databases\koniq10k_images_scores.csv'  # MOS (distribution of scores) file for KonIQ database
args['live_mos_file'] = r'..\databases\live_mos.csv'   # MOS (standard distribution of scores) file for LIVE-wild database

args['backbone'] = 'resnet50' # Choose from ['resnet50', 'vgg16']
args['weights'] = r'...\pretrained_weights\resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5'  # Define the path of ImageNet pretrained weights
args['initial_epoch'] = 0  # Define initial epoch for use in fine-tune

args['lr_base'] = 1e-4 / 2  # Define the back learning rate in warmup and rate decay approach
args['lr_schedule'] = True  # Choose between True and False, indicating if learning rate schedule should be used or not
args['batch_size'] = 32  # Batch size, should choose to fit in the GPU memory
args['epochs'] = 120  # Maximal epoch number, can set early stop in the callback or not

args['image_aug'] = True # Choose between True and False, indicating if image augmentation should be used or not

Predict image quality using the trained model

After TRIQ has been trained, and the weights have been stored in h5 file, it can be used to predict image quality with arbitrary sizes,

    args = {}
    args['n_quality_levels'] = 5
    args['backbone'] = 'resnet50'
    args['weights'] = r'..\\TRIQ.h5'
    model = create_triq_model(n_quality_levels=args['n_quality_levels'],
                              backbone=args['backbone'],])
    model.load_weights(args['weights'])

And then use ModelEvaluation to predict quality of image set.

In the "examples" folder, an example script examples\image_quality_prediction.py is provided to use the trained weights to predict quality of example images. In the "train" folder, an example script train\validation.py is provided to use the trained weights to predict quality of images in folders.

A potential issue is image shape mismatch. For example, if an image is too large, then line 146 in transformer_iqa.py should be changed to increase the pooling size. For example, it can be changed to self.pooling_small = MaxPool2D(pool_size=(4, 4)) or even larger.

Prepare datasets for model training

This work uses two publicly available databases: KonIQ-10k KonIQ-10k: An ecologically valid database for deep learning of blind image quality assessment by V. Hosu, H. Lin, T. Sziranyi, and D. Saupe; and LIVE-wild Massive online crowdsourced study of subjective and objective picture quality by D. Ghadiyaram, and A.C. Bovik

  1. The two databases were merged, and then split to training and testing sets. Please see README in databases for details.

  2. Make MOS files (note: do NOT include head line):

    For database with score distribution available, the MOS file is like this (koniq format):

        image path, voter number of quality scale 1, voter number of quality scale 2, voter number of quality scale 3, voter number of quality scale 4, voter number of quality scale 5, MOS or Z-score
        10004473376.jpg,0,0,25,73,7,3.828571429
        10007357496.jpg,0,3,45,47,1,3.479166667
        10007903636.jpg,1,0,20,73,2,3.78125
        10009096245.jpg,0,0,21,75,13,3.926605505
    

    For database with standard deviation available, the MOS file is like this (live format):

        image path, standard deviation, MOS or Z-score
        t1.bmp,18.3762,63.9634
        t2.bmp,13.6514,25.3353
        t3.bmp,18.9246,48.9366
        t4.bmp,18.2414,35.8863
    

    The format of MOS file ('koniq' or 'live') and the format of MOS or Z-score ('mos' or 'z_score') should also be specified in misc/imageset_handler/get_image_scores.

  3. In the train script in train/train_triq.py the folders containing training and testing images are provided.

  4. Pretrained ImageNet weights can be downloaded (see README in.\pretrained_weights) and pointed to in the train script.

Trained TRIQ weights

TRIQ has been trained on KonIQ-10k and LIVE-wild databases, and the weights file can be downloaded here.

State-of-the-art models

Other three models are also included in the work. The original implementations of metrics are employed, and they can be found below.

Koncept512 KonIQ-10k: An ecologically valid database for deep learning of blind image quality assessment

SGDNet SGDNet: An end-to-end saliency-guided deep neural network for no-reference image quality assessment

CaHDC End-to-end blind image quality prediction with cascaded deep neural network

Comparison results

We have conducted several experiments to evaluate the performance of TRIQ, please see results.pdf for detailed results.

Error report

In case errors/exceptions are encountered, please first check all the paths. After fixing the path isse, please report any errors in Issues.

FAQ

  • To be added

ViT (Vision Transformer) for IQA

This work is heavily inspired by ViT An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. The module vit_iqa contains implementation of ViT for IQA, and mainly followed the implementation of ViT-PyTorch. Pretrained ViT weights can be downloaded here.

Owner
Junyong You
Junyong You
CONetV2: Efficient Auto-Channel Size Optimization for CNNs

CONetV2: Efficient Auto-Channel Size Optimization for CNNs Exciting News! CONetV2: Efficient Auto-Channel Size Optimization for CNNs has been accepted

Mahdi S. Hosseini 3 Dec 13, 2021
Self-Supervised Monocular 3D Face Reconstruction by Occlusion-Aware Multi-view Geometry Consistency[ECCV 2020]

Self-Supervised Monocular 3D Face Reconstruction by Occlusion-Aware Multi-view Geometry Consistency(ECCV 2020) This is an official python implementati

304 Jan 03, 2023
Gym for multi-agent reinforcement learning

PettingZoo is a Python library for conducting research in multi-agent reinforcement learning, akin to a multi-agent version of Gym. Our website, with

Farama Foundation 1.6k Jan 09, 2023
Open-Ended Commonsense Reasoning (NAACL 2021)

Open-Ended Commonsense Reasoning Quick links: [Paper] | [Video] | [Slides] | [Documentation] This is the repository of the paper, Differentiable Open-

(Bill) Yuchen Lin 31 Oct 19, 2022
Propose a principled and practically effective framework for unsupervised accuracy estimation and error detection tasks with theoretical analysis and state-of-the-art performance.

Detecting Errors and Estimating Accuracy on Unlabeled Data with Self-training Ensembles This project is for the paper: Detecting Errors and Estimating

Jiefeng Chen 13 Nov 21, 2022
PolyTrack: Tracking with Bounding Polygons

PolyTrack: Tracking with Bounding Polygons Abstract In this paper, we present a novel method called PolyTrack for fast multi-object tracking and segme

Gaspar Faure 13 Sep 15, 2022
Deep Learning: Architectures & Methods Project: Deep Learning for Audio Super-Resolution

Deep Learning: Architectures & Methods Project: Deep Learning for Audio Super-Resolution Figure: Example visualization of the method and baseline as a

Oliver Hahn 16 Dec 23, 2022
Keras Image Embeddings using Contrastive Loss

Image to Embedding projection in vector space. Implementation in keras and tensorflow of batch all triplet loss for one-shot/few-shot learning.

Shravan Anand K 5 Mar 21, 2022
Image Segmentation with U-Net Algorithm on Carvana Dataset using AWS Sagemaker

Image Segmentation with U-Net Algorithm on Carvana Dataset using AWS Sagemaker This is a full project of image segmentation using the model built with

Htin Aung Lu 1 Jan 04, 2022
(ICCV'21) Official PyTorch implementation of Relational Embedding for Few-Shot Classification

Relational Embedding for Few-Shot Classification (ICCV 2021) Dahyun Kang, Heeseung Kwon, Juhong Min, Minsu Cho [paper], [project hompage] We propose t

Dahyun Kang 82 Dec 24, 2022
Official code base for the poster "On the use of Cortical Magnification and Saccades as Biological Proxies for Data Augmentation" published in NeurIPS 2021 Workshop (SVRHM)

Self-Supervised Learning (SimCLR) with Biological Plausible Image Augmentations Official code base for the poster "On the use of Cortical Magnificatio

Binxu 8 Aug 17, 2022
This repository contains the segmentation user interface from the OpenSurfaces project, extracted as a lightweight tool

OpenSurfaces Segmentation UI This repository contains the segmentation user interface from the OpenSurfaces project, extracted as a lightweight tool.

Sean Bell 66 Jul 11, 2022
CondNet: Conditional Classifier for Scene Segmentation

CondNet: Conditional Classifier for Scene Segmentation Introduction The fully convolutional network (FCN) has achieved tremendous success in dense vis

ycszen 31 Jul 22, 2022
face2comics by Sxela (Alex Spirin) - face2comics datasets

This is a paired face to comics dataset, which can be used to train pix2pix or similar networks.

Alex 164 Nov 13, 2022
This is a Keras implementation of a CNN for estimating age, gender and mask from a camera.

face-detector-age-gender This is a Keras implementation of a CNN for estimating age, gender and mask from a camera. Before run face detector app, expr

Devdreamsolution 2 Dec 04, 2021
Python Implementation of algorithms in Graph Mining, e.g., Recommendation, Collaborative Filtering, Community Detection, Spectral Clustering, Modularity Maximization, co-authorship networks.

Graph Mining Author: Jiayi Chen Time: April 2021 Implemented Algorithms: Network: Scrabing Data, Network Construbtion and Network Measurement (e.g., P

Jiayi Chen 3 Mar 03, 2022
Patch2Pix: Epipolar-Guided Pixel-Level Correspondences [CVPR2021]

Patch2Pix for Accurate Image Correspondence Estimation This repository contains the Pytorch implementation of our paper accepted at CVPR2021: Patch2Pi

Qunjie Zhou 199 Nov 29, 2022
Malware Bypass Research using Reinforcement Learning

Malware Bypass Research using Reinforcement Learning

Bobby Filar 76 Dec 26, 2022
Pixray is an image generation system

Pixray is an image generation system

pixray 883 Jan 07, 2023
Programming with Neural Surrogates of Programs

Programming with Neural Surrogates of Programs

0 Dec 12, 2021