[ICCV2021] IICNet: A Generic Framework for Reversible Image Conversion

Related tags

Deep LearningIICNet
Overview

IICNet - Invertible Image Conversion Net

Official PyTorch Implementation for IICNet: A Generic Framework for Reversible Image Conversion (ICCV2021). Demo Video | Supplements

Introduction

Reversible image conversion (RIC) aims to build a reversible transformation between specific visual content (e.g., short videos) and an embedding image, where the original content can be restored from the embedding when necessary. This work develops Invertible Image Conversion Net (IICNet) as a generic solution to various RIC tasks due to its strong capacity and task-independent design. Unlike previous encoder-decoder based methods, IICNet maintains a highly invertible structure based on invertible neural networks (INNs) to better preserve the information during conversion. We use a relation module and a channel squeeze layer to improve the INN nonlinearity to extract cross-image relations and the network flexibility, respectively. Experimental results demonstrate that IICNet outperforms the specifically-designed methods on existing RIC tasks and can generalize well to various newly-explored tasks. With our generic IICNet, we no longer need to hand-engineer task-specific embedding networks for rapidly occurring visual content.

Installation

Clone this repository and set up the environment.

git clone https://github.com/felixcheng97/IICNet.git
cd IICNet/
conda env create -f iic.yml

Dataset Preparation

We conduct experments on 5 multiple-and-single RIC tasks in the main paper and 2 single-and-single RIC tasks in the supplements. Note that all the datasets are placed under the ./datasets directory.

Task 1: Spatial-Temporal Video Embedding

We use the high-quality DAVIS 2017 video dataset in this task. You could download the Semi-supervised 480p dataset through this link. Unzip, rename, and place them under the dataset directory with the following structure.

.
`-- datasets
    |-- Adobe-Matting
    |-- DAVIS-2017
    |   |-- DAVIS-2017-test-challenge (rename the DAVIS folder from DAVIS-2017-test-challenge-480p.zip)
    |   |-- DAVIS-2017-test-dev       (rename the DAVIS folder from DAVIS-2017-test-dev-480p.zip)
    |   `-- DAVIS-2017-trainval       (rename the DAVIS folder from DAVIS-2017-trainval-480p.zip)
    |-- DIV2K
    |-- flicker
    |-- flicker1024
    |-- Real-Matting
    `-- VOCdevkit

Then run the following scripts for annotation.

cd codes/scripts
python davis_annotation.py

Task 2: Mononizing Binocular Images

We use the Flickr1024 dataset with the official train and test splits in this task. You could download the dataset through this link. Place the dataset under the dataset directory with the following structure.

.
`-- datasets
    |-- Adobe-Matting
    |-- DAVIS-2017
    |-- DIV2K
    |-- flicker
    |-- flicker1024
    |   |-- Test
    |   |-- Train_1
    |   |-- Train_2
    |   |-- Train_3
    |   |-- Train_4
    |   `-- Validation
    |-- Real-Matting
    `-- VOCdevkit

Then run the following scripts for annotation.

cd codes/scripts
python flicker1024_annotation.py

Task 3: Embedding Dual-View Images

We use the DIV2K dataset in this task. You could download the dataset through this link. Download the corresponding datasets and place them under the dataset directory with the following structure.

.
`-- datasets
    |-- Adobe-Matting
    |-- DAVIS-2017
    |-- DIV2K
    |   |-- DIV2K_train_HR
    |   |-- DIV2K_train_LR_bicubic
    |   |   |-- X2
    |   |   |-- X4
    |   |   |-- X8
    |   |-- DIV2K_valid_HR
    |   `-- DIV2K_valid_LR_bicubic
    |       |-- X2
    |       |-- X4
    |       `-- X8
    |-- flicker
    |-- flicker1024
    |-- Real-Matting
    `-- VOCdevkit

Then run the following scripts for annotation.

cd codes/scripts
python div2kddual_annotation.py

Task 4: Embedding Multi-Layer Images / Composition and Decomposition

We use the Adobe Deep Matting dataset and the Real Matting dataset in this task. You could download the Adobe Deep Matting dataset according to their instructions through this link. You could download the Real Matting dataset on its official GitHub page or through this direct link. Place the downloaded datasets under the dataset directory with the following structure.

.
`-- datasets
    |-- Adobe-Matting
    |   |-- Addobe_Deep_Matting_Dataset.zip
    |   |-- train2014.zip
    |   |-- VOC2008test.tar
    |   `-- VOCtrainval_14-Jul-2008.tar
    |-- DAVIS-2017
    |-- DIV2K
    |-- flicker
    |-- flicker1024
    |-- Real-Matting
    |   |-- fixed-camera
    |   `-- hand-held
    `-- VOCdevkit

Then run the following scripts for annotation.

cd codes/scripts

# process the Adobe Matting dataset
python adobe_process.py
python adobe_annotation.py

# process the Real Matting dataset
python real_process.py
python real_annotation.py

Task 5: Hiding Images in an Image

We use the Flicker 2W dataset in this task. You could download the dataset on its official GitHub page through this link. Place the unzipped dataset under the datasets directory with the following structure.

.
`-- datasets
    |-- Adobe-Matting
    |-- DAVIS-2017
    |-- DIV2K
    |-- flicker
    |   `-- flicker_2W_images
    |-- flicker1024
    |-- Real-Matting
    `-- VOCdevkit

Then run the following scripts for annotation.

cd codes/scripts
python flicker_annotation.py

Task 6 (supp): Invertible Grayscale

We use the VOC2012 dataset in this task. You could download the training/validation dataset through this link. Place the unzipped dataset under the datasets directory with the following structure.

.
`-- datasets
    |-- Adobe-Matting
    |-- DAVIS-2017
    |-- DIV2K
    |-- flicker
    |-- flicker1024
    |-- Real-Matting
    `-- VOCdevkit
        `-- VOC2012

Then run the following scripts for annotation

cd codes/scripts
python voc2012_annotation.py

Task 7 (supp): Invertible Image Rescaling

We use the DIV2K dataset in this task. Please check Task 3: Embedding Dual-View Images to download the corresponding dataset. Then run the following scripts for annotation.

cd codes/scripts
python div2ksr_annotation.py

Training

To train a model for a specific task, run the following script:

cd codes
OMP_NUM_THREADS=4 python train.py -opt ./conf/train/<xxx>.yml

To enable distributed training with multiple GPUs for a specific task, simply assign a list of gpu_ids in the yml file and run the following script. Note that since training with multiple GPU is not tested yet, we suggest to train a model with a single GPU.

cd codes
OMP_NUM_THREADS=4 python -m torch.distributed.launch --nproc_per_node=4 --master_port 29501 train.py -opt ./conf/train/<xxx>.yml

Testing

We provide our trained models in our paper for your reference. Download all the pretrained weights of our models from Google Drive or Baidu Drive (extraction code: e377). Unzip the zip file and place pretrained models under the ./experiments directory.

To test a model for a specific task, run the following script:

cd codes
OMP_NUM_THREADS=4 python test.py -opt ./conf/test/<xxx>.yml

Acknowledgement

Some codes of this repository benefits from Invertible Image Rescaling (IRN).

Citation

If you find this work useful, please cite our paper:

@inproceedings{cheng2021iicnet,
    title = {IICNet: A Generic Framework for Reversible Image Conversion}, 
    author = {Ka Leong Cheng and Yueqi Xie and Qifeng Chen},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision},
    year = {2021}
}

Contact

Feel free to open an issue if you have any question. You could also directly contact us through email at [email protected] (Ka Leong Cheng) and [email protected] (Yueqi Xie).

Owner
felixcheng97
felixcheng97
FaceVerse: a Fine-grained and Detail-controllable 3D Face Morphable Model from a Hybrid Dataset (CVPR2022)

FaceVerse FaceVerse: a Fine-grained and Detail-controllable 3D Face Morphable Model from a Hybrid Dataset Lizhen Wang, Zhiyuan Chen, Tao Yu, Chenguang

Lizhen Wang 219 Dec 28, 2022
LegoDNN: a block-grained scaling tool for mobile vision systems

Table of contents 1 Introduction 1.1 Major features 1.2 Architecture 2 Code and Installation 2.1 Code 2.2 Installation 3 Repository of DNNs in vision

41 Dec 24, 2022
Code and Data for the paper: Molecular Contrastive Learning with Chemical Element Knowledge Graph [AAAI 2022]

Knowledge-enhanced Contrastive Learning (KCL) Molecular Contrastive Learning with Chemical Element Knowledge Graph [ AAAI 2022 ]. We construct a Chemi

Fangyin 58 Dec 26, 2022
The project of phase's key role in complex and real NN

Phase-in-NN This is the code for our project at Princeton (co-authors: Yuqi Nie, Hui Yuan). The paper title is: "Neural Network is heterogeneous: Phas

YuqiNie-lab 1 Nov 04, 2021
MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization. Convert models between Caffe, Keras, MXNet, Tensorflow, CNTK, PyTorch Onnx and CoreML.

MMdnn MMdnn is a comprehensive and cross-framework tool to convert, visualize and diagnose deep learning (DL) models. The "MM" stands for model manage

Microsoft 5.7k Jan 09, 2023
TimeSHAP explains Recurrent Neural Network predictions.

TimeSHAP TimeSHAP is a model-agnostic, recurrent explainer that builds upon KernelSHAP and extends it to the sequential domain. TimeSHAP computes even

Feedzai 90 Dec 18, 2022
Yolov5 deepsort inference,使用YOLOv5+Deepsort实现车辆行人追踪和计数,代码封装成一个Detector类,更容易嵌入到自己的项目中

使用YOLOv5+Deepsort实现车辆行人追踪和计数,代码封装成一个Detector类,更容易嵌入到自己的项目中。

813 Dec 31, 2022
Framework to build and train RL algorithms

RayLink RayLink is a RL framework used to build and train RL algorithms. RayLink was used to build a RL framework, and tested in a large-scale multi-a

Bytedance Inc. 32 Oct 07, 2022
Drone-based Joint Density Map Estimation, Localization and Tracking with Space-Time Multi-Scale Attention Network

DroneCrowd Paper Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark. Introduction This paper proposes a space-time multi-scale atte

VisDrone 98 Nov 16, 2022
An implementation for `Text2Event: Controllable Sequence-to-Structure Generation for End-to-end Event Extraction`

Text2Event An implementation for Text2Event: Controllable Sequence-to-Structure Generation for End-to-end Event Extraction Please contact Yaojie Lu (@

Roger 153 Jan 07, 2023
GalaXC: Graph Neural Networks with Labelwise Attention for Extreme Classification

GalaXC GalaXC: Graph Neural Networks with Labelwise Attention for Extreme Classification @InProceedings{Saini21, author = {Saini, D. and Jain,

Extreme Classification 28 Dec 05, 2022
AITUS - An atomatic notr maker for CYTUS

AITUS an automatic note maker for CYTUS. 利用AI根据指定乐曲生成CYTUS游戏谱面。 效果展示:https://www

GradiusTwinbee 6 Feb 24, 2022
Prompt-BERT: Prompt makes BERT Better at Sentence Embeddings

Prompt-BERT: Prompt makes BERT Better at Sentence Embeddings Results on STS Tasks Model STS12 STS13 STS14 STS15 STS16 STSb SICK-R Avg. unsup-prompt-be

196 Jan 08, 2023
CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

CPT This repository contains code and checkpoints for CPT. CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Gener

fastNLP 341 Dec 29, 2022
FlingBot: The Unreasonable Effectiveness of Dynamic Manipulations for Cloth Unfolding

This repository contains code for training and evaluating FlingBot in both simulation and real-world settings on a dual-UR5 robot arm setup for Ubuntu 18.04

Columbia Artificial Intelligence and Robotics Lab 70 Dec 06, 2022
Face Mesh is a face geometry solution that estimates 468 3D face landmarks in real-time even on mobile devices

Face-Mesh Face Mesh is a face geometry solution that estimates 468 3D face landmarks in real-time even on mobile devices. It employs machine learning

Farnam Javadi 9 Dec 21, 2022
WebUAV-3M: A Benchmark Unveiling the Power of Million-Scale Deep UAV Tracking

WebUAV-3M: A Benchmark Unveiling the Power of Million-Scale Deep UAV Tracking [Paper Link] Abstract In this work, we contribute a new million-scale Un

25 Jan 01, 2023
PyTorch code to run synthetic experiments.

Code repository for Invariant Risk Minimization Source code for the paper: @article{InvariantRiskMinimization, title={Invariant Risk Minimization}

Facebook Research 345 Dec 12, 2022
Face Mask Detector by live camera using tensorflow-keras, openCV and Python

Face Mask Detector 😷 by Live Camera Detecting masked or unmasked faces by live camera with percentange of mask occupation About Project: This an Arti

Karan Shingde 2 Apr 04, 2022
CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation

CoTr: Efficient 3D Medical Image Segmentation by bridging CNN and Transformer This is the official pytorch implementation of the CoTr: Paper: CoTr: Ef

218 Dec 25, 2022