A curated list of papers, code and resources pertaining to image composition

Last update: Dec 30, 2022

Overview

Awesome Image Composition

A curated list of resources including papers, datasets, and relevant links pertaining to image composition.

Contributing

Contributions are welcome. If you wish to contribute, feel free to send a pull request. If you have suggestions for new sections to be included, please raise an issue and discuss before sending a pull request.

Surveys
Papers
Datasets
Other Resources

Surveys

Li Niu, Wenyan Cong, Liu Liu, Yan Hong, Bo Zhang, Jing Liang, Liqing Zhang: "Making Images Real Again: A Comprehensive Survey on Deep Image Composition." arXiv preprint arXiv:2106.14490 (2021). [arXiv]

Papers

Image blending

Huikai Wu, Shuai Zheng, Junge Zhang, Kaiqi Huang: "GP-GAN: Towards Realistic High-Resolution Image Blending." ACM MM (2019) [arXiv] [code]
Lingzhi Zhang, Tarmily Wen, Jianbo Shi: "Deep Image Blending." WACV (2020) [pdf] [arXiv] [code]

Image harmonization

Jun Ling, Han Xue, Li Song, Rong Xie, Xiao Gu: "Region-Aware Adaptive Instance Normalization for Image Harmonization." CVPR (2021) [pdf] [supp] [arXiv] [code].
Zonghui Guo, Haiyong Zheng, Yufeng Jiang, Zhaorui Gu, Bing Zheng: "Intrinsic Image Harmonization." CVPR (2021) [pdf] [supp] [code].
Wenyan Cong, Li Niu, Jianfu Zhang, Jing Liang, Liqing Zhang: "BargainNet: Background-Guided Domain Translation for Image Harmonization." ICME (2021) [arXiv] [code].
Konstantin Sofiiuk, Polina Popenova, Anton Konushin: "Foreground-aware Semantic Representations for Image Harmonization." WACV (2021) [pdf] [supp] [arXiv] [code]
Guoqing Hao, Satoshi Iizuka, Kazuhiro Fukui: "Image Harmonization with Attention-based Deep Feature Modulation." BMVC (2020) [pdf] [supp] [code]
Wenyan Cong, Jianfu Zhang, Li Niu, Liu Liu, Zhixin Ling, Weiyuan Li, Liqing Zhang: "DoveNet: Deep Image Harmonization via Domain Verification." CVPR (2020) [pdf] [supp] [arXiv] [code].
Xiaodong Cun, Chi-Man Pun: "Improving the Harmony of the Composite Image by Spatial-Separated Attention Module." IEEE Trans. Image Process. 29: 4759-4771 (2020) [pdf] [arXiv] [code]
Yi-Hsuan Tsai, Xiaohui Shen, Zhe Lin, Kalyan Sunkavalli, Xin Lu, Ming-Hsuan Yang: "Deep Image Harmonization." CVPR (2017) [pdf] [supp] [arXiv] [code]

Shadow generation

Daquan Liu, Chengjiang Long, Hongpan Zhang, Hanning Yu, Xinzhi Dong, Chunxia Xiao: "ARshadowGAN: Shadow generative adversarial network for augmented reality in single light scenes." CVPR (2020) [pdf] [code].
Shuyang Zhang, Runze Liang, Miao Wang: "ShadowGAN: Shadow synthesis for virtual objects with conditional adversarial networks." Computational Visual Media (2019) [pdf].
Fangneng Zhan, Shijian Lu, Changgong Zhang, Feiying Ma, Xuansong Xie: "Adversarial Image Composition with Auxiliary Illumination." ACCV (2020) [pdf].

Object placement and spatial transformation

Lingzhi Zhang, Tarmily Wen, Jie Min, Jiancong Wang, David Han, Jianbo Shi: "Learning Object Placement by Inpainting for Compositional Data Augmentation" ECCV (2020) [pdf]
Samaneh Azadi, Deepak Pathak, Sayna Ebrahimi, Trevor Darrell: "Compositional GAN: Learning Image-Conditional Binary Composition" International Journal of Computer Vision (2020) [arXiv] [code]
Song-Hai Zhang, Zhengping Zhou, Bin Liu, Xi Dong, Peter Hall: "What and Where: A Context-based Recommendation System for Object Insertion" Computational Visual Media (2020) [arXiv]
Shashank Tripathi, Siddhartha Chandra, Amit Agrawal, Ambrish Tyagi, James M. Rehg, Visesh Chari: "Learning to Generate Synthetic Data via Compositing" CVPR (2019) [arXiv]
Haoshu Fang, Jianhua Sun, Runzhong Wang, Minghao Gou, Yonglu Li, Cewu Lu: "InstaBoost: Boosting Instance Segmentation via Probability Map Guided Copy-Pasting" ICCV (2019) [arXiv] [code]
Chen-Hsuan Lin, Ersin Yumer, Oliver Wang, Eli Shechtman, Simon Lucey: "ST-GAN: Spatial Transformer Generative Adversarial Networks for Image Compositing" CVPR (2018) [arXiv] [code]
Donghoon Lee, Sifei Liu, Jinwei Gu, Ming-Yu Liu, Ming-Hsuan Yang, Jan Kautz: "Context-Aware Synthesis and Placement of Object Instances" NeurIPS (2018) [arXiv] [code]
Fuwen Tan, Crispin Bernier, Benjamin Cohen, Vicente Ordonez, Connelly Barnes: "Where and Who? Automatic Semantic-Aware Person Composition" WACV (2018) [arXiv][code]
Tal Remez, Jonathan Huang, Matthew Brown: "learning to segment via cut-and-paste" ECCV (2018) [arXiv] [code]

Occlusion

Samaneh Azadi, Deepak Pathak, Sayna Ebrahimi, Trevor Darrell: "Compositional GAN: Learning Image-Conditional Binary Composition." IJCV (2020) [arXiv] [code]
Fangneng Zhan, Jiaxing Huang, Shijian Lu, "Hierarchy Composition GAN for High-fidelity Image Synthesis." Transactions on cybernetics (2021) [arXiv]

Datasets

iHarmony4 (image harmonization): It contains four subdatasets: HCOCO, HAdobe5k, HFlickr, Hday2night, with a total of 73,146 pairs of unharmonized images and harmonized images. [pdf] [link]
GMSDataset (image harmonization): It contains 183 images with image resolution of 1940*1440. It consists of 16 different objects and for each object, one source image and 11 target images in different background scenes and illumination conditions are captured. [pdf] [link] (access code: ekn2)
HVIDIT (image harmonization): A dataset built upon VIDIT (Virtual Image Dataset for Illumination Transfer) dataset for image harmonization. It contains 3007 images of 276 scenes for training and 329 images of 24 scenes for testing. [pdf] [link]
RHHarmony (image harmonization): A rendered image harmonization dataset, which contains 15000 ground-truth rendered images and has the potential to generate 135000 composite rendered images. [pdf] [link]
Shadow-AR (shadow generation): It contains 3,000 quintuples, Each quintuple consists of 5 images 640×480 resolution: a synthetic image without the virtual object shadow and its corresponding image containing the virtual object shadow, a mask of the virtual object, a labeled real-world shadow matting and its corresponding labeled occluder. [pdf] [link]
DESOBA (shadow generation): It contains 840 training images with totally 2,999 object-shadow pairs and 160 test images with totally 624 object-shadow pairs. [pdf] [link]
OPA (object placement): It contains 62,074 training images and 11,396 test images, in which the foregrounds/backgrounds in training set and test set have no overlap. The training (resp., test) set contains 21,351 (resp.,3,566) positive samples and 40,724 (resp., 7,830) negative samples. [pdf] [link]

Other resources

Awesome-Image-Harmonization

A curated list of papers, code and resources pertaining to image composition

Related tags

Overview

Awesome Image Composition

Contributing

Table of Contents

Surveys

Papers

Image blending

Image harmonization

Shadow generation

Object placement and spatial transformation

Occlusion

Datasets

Other resources

Owner

BCMI

Color Picker and Color Detection tool for METR4202

Code for the ACL2021 paper "Combining Static Word Embedding and Contextual Representations for Bilingual Lexicon Induction"

This repository contains the code for the paper "SCANimate: Weakly Supervised Learning of Skinned Clothed Avatar Networks"

Basic functions manipulating images using the OpenCV library

Total Text Dataset. It consists of 1555 images with more than 3 different text orientations: Horizontal, Multi-Oriented, and Curved, one of a kind.

Code for the paper "DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks" (ICCV '19)

An advanced 2D image manipulation with features such as edge detection and image segmentation built using OpenCV

Textboxes implementation with Tensorflow (python)

Comparison-of-OCR (KerasOCR, PyTesseract,EasyOCR)

A simple document layout analysis using Python-OpenCV

The project is an official implementation of our paper "3D Human Pose Estimation with Spatial and Temporal Transformers".

Fast image augmentation library and easy to use wrapper around other libraries. Documentation: https://albumentations.ai/docs/ Paper about library: https://www.mdpi.com/2078-2489/11/2/125

This is a project to detect gestures to zoom in or out, using the real-time distance between the index finger and the thumb. It's based on OpenCV and Mediapipe.

Textboxes : Image Text Detection Model : python package (tensorflow)

A little but useful tool to explore OCR data extracted with `pytesseract` and `opencv`

Pixie - A full-featured 2D graphics library for Python

A general list of resources to image text localization and recognition 场景文本位置感知与识别的论文资源与实现合集シーンテキストの位置認識と識別のための論文リソースの要約

Recognizing the text contents from a scanned visiting card

learn how to use Gesture Control to change the volume of a computer

Smart computer vision application

A curated list of papers, code and resources pertaining to image composition

Related tags

Overview

Awesome Image Composition

Contributing

Table of Contents

Surveys

Papers

Image blending

Image harmonization

Shadow generation

Object placement and spatial transformation

Occlusion

Datasets

Other resources

Owner

BCMI

Color Picker and Color Detection tool for METR4202

Code for the ACL2021 paper "Combining Static Word Embedding and Contextual Representations for Bilingual Lexicon Induction"

This repository contains the code for the paper "SCANimate: Weakly Supervised Learning of Skinned Clothed Avatar Networks"

Basic functions manipulating images using the OpenCV library

Total Text Dataset. It consists of 1555 images with more than 3 different text orientations: Horizontal, Multi-Oriented, and Curved, one of a kind.

Code for the paper "DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks" (ICCV '19)

An advanced 2D image manipulation with features such as edge detection and image segmentation built using OpenCV

Textboxes implementation with Tensorflow (python)

Comparison-of-OCR (KerasOCR, PyTesseract,EasyOCR)

A simple document layout analysis using Python-OpenCV

The project is an official implementation of our paper "3D Human Pose Estimation with Spatial and Temporal Transformers".

Fast image augmentation library and easy to use wrapper around other libraries. Documentation: https://albumentations.ai/docs/ Paper about library: https://www.mdpi.com/2078-2489/11/2/125

This is a project to detect gestures to zoom in or out, using the real-time distance between the index finger and the thumb. It's based on OpenCV and Mediapipe.

Textboxes : Image Text Detection Model : python package (tensorflow)

A little but useful tool to explore OCR data extracted with `pytesseract` and `opencv`

Pixie - A full-featured 2D graphics library for Python

A general list of resources to image text localization and recognition 场景文本位置感知与识别的论文资源与实现合集 シーンテキストの位置認識と識別のための論文リソースの要約

Recognizing the text contents from a scanned visiting card

learn how to use Gesture Control to change the volume of a computer

Smart computer vision application

A general list of resources to image text localization and recognition 场景文本位置感知与识别的论文资源与实现合集シーンテキストの位置認識と識別のための論文リソースの要約