Official implementation for: Blended Diffusion for Text-driven Editing of Natural Images.

Last update: Dec 30, 2022

Related tags

Overview

Blended Diffusion for Text-driven Editing of Natural Images

Blended Diffusion for Text-driven Editing of Natural Images
Omri Avrahami, Dani Lischinski, Ohad Fried

Abstract: Natural language offers a highly intuitive interface for image editing. In this paper, we introduce the first solution for performing local (region-based) edits in generic natural images, based on a natural language description along with an ROI mask. We achieve our goal by leveraging and combining a pretrained language-image model (CLIP), to steer the edit towards a user-provided text prompt, with a denoising diffusion probabilistic model (DDPM) to generate natural-looking results. To seamlessly fuse the edited region with the unchanged parts of the image, we spatially blend noised versions of the input image with the local text-guided diffusion latent at a progression of noise levels. In addition, we show that adding augmentations to the diffusion process mitigates adversarial results. We compare against several baselines and related methods, both qualitatively and quantitatively, and show that our method outperforms these solutions in terms of overall realism, ability to preserve the background and matching the text. Finally, we show several text-driven editing applications, including adding a new object to an image, removing/replacing/altering existing objects, background replacement, and image extrapolation.

Applications

Multiple synthesis results for the same prompt

Synthesis results for different prompts

Altering part of an existing object

Background replacement

Scribble-guided editing

Text-guided extrapolation

Composing several applications

Code availability

Full code will be released soon.

Official implementation for: Blended Diffusion for Text-driven Editing of Natural Images.

Related tags

Overview

Blended Diffusion for Text-driven Editing of Natural Images

Applications

Multiple synthesis results for the same prompt

Synthesis results for different prompts

Altering part of an existing object

Background replacement

Scribble-guided editing

Text-guided extrapolation

Composing several applications

Code availability

Owner

TensorFlow Similarity is a python package focused on making similarity learning quick and easy.

Very simple NCHW and NHWC conversion tool for ONNX. Change to the specified input order for each and every input OP. Also, change the channel order of RGB and BGR. Simple Channel Converter for ONNX.

Hierarchical Memory Matching Network for Video Object Segmentation (ICCV 2021)

DeepHyper: Scalable Asynchronous Neural Architecture and Hyperparameter Search for Deep Neural Networks

text_recognition_toolbox: The reimplementation of a series of classical scene text recognition papers with Pytorch in a uniform way.

MMdet2-based reposity about lightweight detection model: Nanodet, PicoDet.

This repository contains the code for "SBEVNet: End-to-End Deep Stereo Layout Estimation" paper by Divam Gupta, Wei Pu, Trenton Tabor, Jeff Schneider

Pytorch codes for "Self-supervised Multi-view Stereo via Effective Co-Segmentation and Data-Augmentation"

Official implementation of the paper 'Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution' in CVPR 2022

Pytorch implementation of U-Net, R2U-Net, Attention U-Net, and Attention R2U-Net.

source code of “Visual Saliency Transformer” (ICCV2021)

RefineMask (CVPR 2021)

Stacked Generative Adversarial Networks

[3DV 2020] PeeledHuman: Robust Shape Representation for Textured 3D Human Body Reconstruction

Code, environments, and scripts for the paper: "How Private Is Your RL Policy? An Inverse RL Based Analysis Framework"

A Pytorch implementation of MoveNet from Google. Include training code and pre-train model.

MarcoPolo is a clustering-free approach to the exploration of bimodally expressed genes along with group information in single-cell RNA-seq data

Resources for the Ki testnet challenge

A collection of metrics for evaluating timbre dissimilarity using the TorchMetrics API

Deep Inside Convolutional Networks - This is a caffe implementation to visualize the learnt model