Official code for "Towards An End-to-End Framework for Flow-Guided Video Inpainting" (CVPR2022)

Overview

E2FGVI (CVPR 2022)

PWC PWC

Python 3.7 pytorch 1.6.0

English | 简体中文

This repository contains the official implementation of the following paper:

Towards An End-to-End Framework for Flow-Guided Video Inpainting
Zhen Li#, Cheng-Ze Lu#, Jianhua Qin, Chun-Le Guo*, Ming-Ming Cheng
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022

[Paper] [Demo Video (Youtube)] [演示视频 (B站)] [Project Page (TBD)] [Poster (TBD)]

You can try our colab demo here: Open In Colab

News

  • 2022.05.15: We release E2FGVI-HQ, which can handle videos with arbitrary resolution. This model could generalize well to much higher resolutions, while it only used 432x240 videos for training. Besides, it performs better than our original model on both PSNR and SSIM metrics. 🔗 Download links: [Google Drive] [Baidu Disk] 🎥 Demo video: [Youtube] [B站]

  • 2022.04.06: Our code is publicly available.

Demo

teaser

More examples (click for details):

Coco (click me)
Tennis
Space
Motocross

Overview

overall_structure

🚀 Highlights:

  • SOTA performance: The proposed E2FGVI achieves significant improvements on all quantitative metrics in comparison with SOTA methods.
  • Highly effiency: Our method processes 432 × 240 videos at 0.12 seconds per frame on a Titan XP GPU, which is nearly 15× faster than previous flow-based methods. Besides, our method has the lowest FLOPs among all compared SOTA methods.

Work in Progress

  • Update website page
  • Hugging Face demo
  • Efficient inference

Dependencies and Installation

  1. Clone Repo

    git clone https://github.com/MCG-NKU/E2FGVI.git
  2. Create Conda Environment and Install Dependencies

    conda env create -f environment.yml
    conda activate e2fgvi
    • Python >= 3.7
    • PyTorch >= 1.5
    • CUDA >= 9.2
    • mmcv-full (following the pipeline to install)

    If the environment.yml file does not work for you, please follow this issue to solve the problem.

Get Started

Prepare pretrained models

Before performing the following steps, please download our pretrained model first.

Model 🔗 Download Links Support Arbitrary Resolution ? PSNR / SSIM / VFID (DAVIS)
E2FGVI [Google Drive] [Baidu Disk] 33.01 / 0.9721 / 0.116
E2FGVI-HQ [Google Drive] [Baidu Disk] 33.06 / 0.9722 / 0.117

Then, unzip the file and place the models to release_model directory.

The directory structure will be arranged as:

release_model
   |- E2FGVI-CVPR22.pth
   |- E2FGVI-HQ-CVPR22.pth
   |- i3d_rgb_imagenet.pt (for evaluating VFID metric)
   |- README.md

Quick test

We provide two examples in the examples directory.

Run the following command to enjoy them:

# The first example (using split video frames)
python test.py --model e2fgvi (or e2fgvi_hq) --video examples/tennis --mask examples/tennis_mask  --ckpt release_model/E2FGVI-CVPR22.pth (or release_model/E2FGVI-HQ-CVPR22.pth)
# The second example (using mp4 format video)
python test.py --model e2fgvi (or e2fgvi_hq) --video examples/schoolgirls.mp4 --mask examples/schoolgirls_mask  --ckpt release_model/E2FGVI-CVPR22.pth (or release_model/E2FGVI-HQ-CVPR22.pth)

The inpainting video will be saved in the results directory. Please prepare your own mp4 video (or split frames) and frame-wise masks if you want to test more cases.

Note: E2FGVI always rescales the input video to a fixed resolution (432x240), while E2FGVI-HQ does not change the resolution of the input video. If you want to custom the output resolution, please use the --set_size flag and set the values of --width and --height.

Example:

# Using this command to output a 720p video
python test.py --model e2fgvi_hq --video <video_path> --mask <mask_path>  --ckpt release_model/E2FGVI-HQ-CVPR22.pth --set_size --width 1280 --height 720

Prepare dataset for training and evaluation

Dataset YouTube-VOS DAVIS
Details For training (3,471) and evaluation (508) For evaluation (50 in 90)
Images [Official Link] (Download train and test all frames) [Official Link] (2017, 480p, TrainVal)
Masks [Google Drive] [Baidu Disk] (For reproducing paper results)

The training and test split files are provided in datasets/<dataset_name>.

For each dataset, you should place JPEGImages to datasets/<dataset_name>.

Then, run sh datasets/zip_dir.sh (Note: please edit the folder path accordingly) for compressing each video in datasets/<dataset_name>/JPEGImages.

Unzip downloaded mask files to datasets.

The datasets directory structure will be arranged as: (Note: please check it carefully)

datasets
   |- davis
      |- JPEGImages
         |- <video_name>.zip
         |- <video_name>.zip
      |- test_masks
         |- <video_name>
            |- 00000.png
            |- 00001.png   
      |- train.json
      |- test.json
   |- youtube-vos
      |- JPEGImages
         |- <video_id>.zip
         |- <video_id>.zip
      |- test_masks
         |- <video_id>
            |- 00000.png
            |- 00001.png
      |- train.json
      |- test.json   
   |- zip_file.sh

Evaluation

Run one of the following commands for evaluation:

 # For evaluating E2FGVI model
 python evaluate.py --model e2fgvi --dataset <dataset_name> --data_root datasets/ --ckpt release_model/E2FGVI-CVPR22.pth
 # For evaluating E2FGVI-HQ model
 python evaluate.py --model e2fgvi_hq --dataset <dataset_name> --data_root datasets/ --ckpt release_model/E2FGVI-HQ-CVPR22.pth

You will get scores as paper reported if you evaluate E2FGVI. The scores of E2FGVI-HQ can be found in [Prepare pretrained models].

The scores will also be saved in the results/<model_name>_<dataset_name> directory.

Please --save_results for further evaluating temporal warping error.

Training

Our training configures are provided in train_e2fgvi.json (for E2FGVI) and train_e2fgvi_hq.json (for E2FGVI-HQ).

Run one of the following commands for training:

 # For training E2FGVI
 python train.py -c configs/train_e2fgvi.json
 # For training E2FGVI-HQ
 python train.py -c configs/train_e2fgvi_hq.json

You could run the same command if you want to resume your training.

The training loss can be monitored by running:

tensorboard --logdir release_model                                                   

You could follow this pipeline to evaluate your model.

Results

Quantitative results

quantitative_results

Citation

If you find our repo useful for your research, please consider citing our paper:

@inproceedings{liCvpr22vInpainting,
   title={Towards An End-to-End Framework for Flow-Guided Video Inpainting},
   author={Li, Zhen and Lu, Cheng-Ze and Qin, Jianhua and Guo, Chun-Le and Cheng, Ming-Ming},
   booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
   year={2022}
}

Contact

If you have any question, please feel free to contact us via zhenli1031ATgmail.com or czlu919AToutlook.com.

License

Licensed under a Creative Commons Attribution-NonCommercial 4.0 International for Non-commercial use only. Any commercial use should get formal permission first.

Acknowledgement

This repository is maintained by Zhen Li and Cheng-Ze Lu.

This code is based on STTN, FuseFormer, Focal-Transformer, and MMEditing.

Comments
  • About custom datasets

    About custom datasets

    Hello, very lucky to learn about your model. I was able to successfully train the davis dataset, but there are some issues with defining the dataset. There are 320 zip files in JPEGImages, each zip has ten photos. There are 320 normal mask files in test_masks, each with ten mask photos. test.json is the same as train.json.

    But when we run our own file, the following error occurs:is invalid for input of size 11272192


    Custom dataset directory: dataset ——ballet ————JPEGImages —————— xxx.zip —————— ......... ———test_masks —————— xxx ————train.json ————test.json


    specific error: Traceback (most recent call last): File "/home/u202080087/data/E2FGVI/train.py", line 84, in mp.spawn(main_worker, nprocs=config['world_size'], args=(config, )) File "/home/u202080087/.conda/envs/e2f/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 199, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/home/u202080087/.conda/envs/e2f/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 157, in start_processes while not context.join(): File "/home/u202080087/.conda/envs/e2f/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 118, in join raise Exception(msg) Exception:

    -- Process 0 terminated with the following error: Traceback (most recent call last): File "/home/u202080087/.conda/envs/e2f/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap fn(i, *args) File "/home/u202080087/data/E2FGVI/train.py", line 64, in main_worker trainer.train() File "/home/u202080087/data/E2FGVI/core/trainer.py", line 288, in train self._train_epoch(pbar) File "/home/u202080087/data/E2FGVI/core/trainer.py", line 307, in _train_epoch pred_imgs, pred_flows = self.netG(masked_frames, l_t) File "/home/u202080087/.conda/envs/e2f/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/u202080087/.conda/envs/e2f/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 619, in forward output = self.module(*inputs[0], **kwargs[0]) File "/home/u202080087/.conda/envs/e2f/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/u202080087/data/E2FGVI/model/e2fgvi_hq.py", line 255, in forward trans_feat = self.transformer([trans_feat, fold_output_size]) File "/home/u202080087/.conda/envs/e2f/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/u202080087/.conda/envs/e2f/lib/python3.7/site-packages/torch/nn/modules/container.py", line 117, in forward input = module(input) File "/home/u202080087/.conda/envs/e2f/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/u202080087/data/E2FGVI/model/modules/tfocal_transformer_hq.py", line 551, in forward attn_windows = self.attn(x_windows_all, mask_all=x_window_masks_all) File "/home/u202080087/.conda/envs/e2f/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/u202080087/data/E2FGVI/model/modules/tfocal_transformer_hq.py", line 252, in forward 0] * self.window_size[1], C // self.num_heads), (q, k, v)) File "/home/u202080087/data/E2FGVI/model/modules/tfocal_transformer_hq.py", line 248, in lambda t: window_partition(t, self.window_size).view( File "/home/u202080087/data/E2FGVI/model/modules/tfocal_transformer_hq.py", line 132, in window_partition window_size[1], C) RuntimeError: shape '[2, 8, 6, 5, 4, 9, 512]' is invalid for input of size 11272192


    Looking forward to your reply

    opened by caimaomao315 10
  • Solving environment: failed

    Solving environment: failed

    I try installing on both windows and linux however I get Solving environment: failed

    conda env create -f environment.yml Collecting package metadata (repodata.json): done Solving environment: failed

    ResolvePackageNotFound:

    • libffi==3.3=he6710b0_2
    • lcms2==2.12=h3be6417_0
    • matplotlib-base==3.4.2=py37hab158f2_0
    • tornado==6.1=py37h27cfd23_0
    • brotli==1.0.9=he6710b0_2
    • scipy==1.6.2=py37had2a1c9_1
    • bzip2==1.0.8=h7b6447c_0
    • locket==0.2.1=py37h06a4308_1
    • libpng==1.6.37=hbc83047_0
    • ffmpeg==4.2.2=h20bf706_0
    • freetype==2.10.4=h5ab3b9f_0
    • expat==2.4.1=h2531618_2
    • xz==5.2.5=h7b6447c_0
    • ncurses==6.2=he6710b0_1
    • openh264==2.1.0=hd408876_0
    • qt==5.9.7=h5867ecd_1pt150_0
    • pywavelets==1.1.1=py37h7b6447c_2
    • libgfortran-ng==7.5.0=ha8ba4b0_17
    • libwebp-base==1.2.0=h27cfd23_0
    • pcre==8.45=h295c915_0
    • jpeg==9d=h7f8727e_0
    • ca-certificates==2022.2.1=h06a4308_0
    • certifi==2021.10.8=py37h06a4308_2
    • gstreamer==1.14.0=h28cd5cc_2
    • lame==3.100=h7b6447c_0
    • libtiff==4.2.0=h85742a9_0
    • tk==8.6.11=h1ccaba5_0
    • glib==2.69.1=h5202010_0
    • pillow==8.3.1=py37h2c7a002_0
    • libgcc-ng==9.3.0=h5101ec6_17
    • openssl==1.1.1m=h7f8727e_0
    • libstdcxx-ng==9.3.0=hd4cf53a_17
    • fontconfig==2.13.1=h6c09931_0
    • zstd==1.4.9=haebb681_0
    • zlib==1.2.11=h7b6447c_3
    • _openmp_mutex==4.5=1_gnu
    • pyqt==5.9.2=py37h05f1152_2
    • libvpx==1.7.0=h439df22_0
    • libgomp==9.3.0=h5101ec6_17
    • python==3.7.11=h12debd9_0
    • dbus==1.13.18=hb2f20db_0
    • x264==1!157.20191217=h7b6447c_0
    • openjpeg==2.4.0=h3ad879b_0
    • libtasn1==4.16.0=h27cfd23_0
    • lz4-c==1.9.3=h295c915_1
    • cytoolz==0.11.0=py37h7b6447c_0
    • mkl_fft==1.3.0=py37h42c9631_2
    • sqlite==3.36.0=hc218d9a_0
    • gnutls==3.6.15=he1e5248_0
    • icu==58.2=he6710b0_3
    • pytorch==1.5.1=py3.7_cuda9.2.148_cudnn7.6.3_0
    • libgfortran4==7.5.0=ha8ba4b0_17
    • yaml==0.2.5=h7b6447c_0
    • ninja==1.10.2=hff7bd54_1
    • nettle==3.7.3=hbbd107a_1
    • kiwisolver==1.3.1=py37h2531618_0
    • setuptools==58.0.4=py37h06a4308_0
    • libopus==1.3.1=h7b6447c_0
    • libunistring==0.9.10=h27cfd23_0
    • matplotlib==3.4.2=py37h06a4308_0
    • sip==4.19.8=py37hf484d3e_0
    • gmp==6.2.1=h2531618_2
    • pip==21.2.2=py37h06a4308_0
    • numpy-base==1.20.3=py37h74d4b33_0
    • libidn2==2.3.2=h7f8727e_0
    • pyyaml==5.4.1=py37h27cfd23_1
    • libxcb==1.14=h7b6447c_0
    • gst-plugins-base==1.14.0=h8213a91_2
    • ld_impl_linux-64==2.35.1=h7274673_9
    • mkl-service==2.4.0=py37h7f8727e_0
    • libuuid==1.0.3=h7f8727e_2
    • mkl_random==1.2.2=py37h51133e4_0
    • mkl==2021.3.0=h06a4308_520
    • libxml2==2.9.12=h03d6c58_0
    • intel-openmp==2021.3.0=h06a4308_3350
    • numpy==1.20.3=py37hf144106_0
    good first issue 
    opened by Tobe2d 8
  • Not able to reproduce the results listed in the paper with my trained model

    Not able to reproduce the results listed in the paper with my trained model

    I met a problem of mode collapse when step number is larger than 300K, and with the final model I got, I am not able to reproduce the result shown int the paper. Can you give your loss curve? @Paper99

    opened by LigZhong 7
  • Question of Focal Transformer

    Question of Focal Transformer

    Hey, thanks for your wonderful work. I think it may be a bug: https://github.com/MCG-NKU/E2FGVI/blob/924b56c133fffe37327f9c9b90290fc3d0538581/model/modules/tfocal_transformer.py#L342,

    should we first transpose and then do view operation?

    Thanks in advance!

    opened by sydney0zq 6
  • Question about learning rate

    Question about learning rate

    你好,感谢您的工作。我有一个关于学习率的问题。我注意到您文章中写到 initial learning rate is 0.0001,reduce at 400k by factor of 10 但在对比工作fuseformer中initial learning rate is 0.01,之后分别在200k,400k和450k时reduce by factor of 10 您是否测试过这二者的区别?是什么让您选择没有follow fuseformer的配置呢? 希望得到您的解答!!!

    opened by unclebuff 5
  • Demo videos to contribute

    Demo videos to contribute

    Hi,

    Thanks for this great repo and project.

    Not really an issue, more a question: I see the demo video section is TBD, would you be interested by some inferenced test videos in the wild for the read me? I am planning to run some anyway, hopefully in the next week or so, let me know and I ll share.

    Would be great to have higher res trained model to produce better quality demo videos too, but I see it is on the book of work.

    opened by Tetsujinfr 5
  • About the pretrained model of discriminator and opt.pth

    About the pretrained model of discriminator and opt.pth

    Hello, I'm very lucky and happy to know your work. What a fantastic work! I am doing some research which also contains video inpainting. I'd like to finetune your pretrained model on my new dataset. However, I could only find the generator model in the link given in README.md. Could you please upload the discriminator model as well (also the opt.pth)? Or could you please tell me how to get access to it in case I missed the downloading link? Thank you very much!

    opened by nlx0021 4
  • Output encoding settings

    Output encoding settings

    Hello. After a long while of trial and error, I managed to get this software running. It still doesn't run well, giving me OOM with more than 250 frames of 120x144 video. I have an 8GB 3060ti, which should be fine for this, in my opinion. Needing to split tasks many times is a pain, but might be manageable.

    What isn't manageable are the output settings. H.263 is outdated and with tiny input sizes and lengths, lossy is a baffling pick. Maybe I missed a customizing option somewhere? I would like to have lossless h264 or FFV1. In addition, I would like to decide the video's framerate (very important for syncing) and not have the video resized. That causes distortions that look bad.

    Thank you. Looking forward to the high-resolution model.

    opened by Troceleng 4
  • frame_idx和flow_idx

    frame_idx和flow_idx

    [syujung] 在七月20号 问了以下这个问题(Issue#25) “作者您好,您的这片工作非常精彩,效果也很棒! 我有一个关于代码的问题,您上传的代码models/modules/feat_prop.py里面,我对比了一下basicvsr++的源码,感觉在backward_propagation的时候,得到cond_n1所用的光流是不是有问题,您写的for循环frame_idx和flow_idx应该保持顺序一致?我看basicvsr++是这样的,想询问一下”

    您可以具体说一下应该怎样修改现在的代码?多谢!

    opened by Roowenliang 3
  • How to generate object-like masks

    How to generate object-like masks

    Hi authors,

    Thanks for your awesome work!

    I'm wondering if you used the same 'create_random_shape_with_random_motion' function for both video completion and object removal, if so, can I say this model has only been trained once for both tasks?

    Besides, does this moving mask (https://github.com/MCG-NKU/E2FGVI/blob/master/core/utils.py#L209) refer to the object-like masks mentioned in your paper (experiment settings)?

    opened by sczhou 2
  • Request a suggestion for model distillation

    Request a suggestion for model distillation

    This model is great, but the calculation speed is a bit slow, I want to try to distill this model, can you give some advice? Such as which layers can be reduced or removed

    opened by 980202006 2
  • Error reported in training <IndexError: list index out of range>

    Error reported in training

    作者您好!最近在阅读您的这篇文章及尝试调试代码。我有一个问题想咨询您。 在使用youtube-vos数据集来训练e2fgvi模型时,出现了以下问题。

    1

    索引越界了。 查看 datasets/youtube-vos/train.json 这个文件,猜测是“数据编号:帧数量”的意思,例如 "003234408d": 180 的意思是youtube-vos数据集里面编号为003234408d的数据一共有180帧。可是 datasets/youtube-vos/JPEGImages 这个文件夹里面并没有编号为003234408d的数据,因此我猜测可能是我下载的数据集出错了。但是我是按照着Prepare dataset for training and evaluation的指引下载了youtube-vos2018(或者Google Drive)的train.zip和test_all_frames.zip这两个文件并解压,mask也用的是指引提供的。 是因为我弄错了数据集吗?

    opened by Lynchrocket 0
  • GPU is not working for prediction

    GPU is not working for prediction

    Hi, I meet a problem when I was predicting using the E2FGVI-HQ. My CPU and Memory working for whole time but GPU does not work at all. I have ensure my CUDA is installed successfully, and the device for this code is return cuda. device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    opened by tchen0623 0
  • Request for the visualization codes

    Request for the visualization codes

    Hi, thanks for your wonderful work. I notice you visualize the feature maps of local frames' features, could you please provide it?

    In supp material: To further investigate the effectiveness of the feature propagation module, we visualize averaged local neighboring features with the temporal size of 5 before conducting content hallucination in Fig. 10.

    opened by sydney0zq 0
  • RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

    RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

    RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED occurred when test.py was run. The environment was installed according to the issuse#3.The specific environment is as follows. May I ask what is the reason for this problem in the current operation? And how to fix it image

    opened by BloodLemonS 0
Owner
Media Computing Group @ Nankai University
Media Computing Group at Nankai University, led by Prof. Ming-Ming Cheng.
Media Computing Group @ Nankai University
Official code for CVPR2022 paper: Depth-Aware Generative Adversarial Network for Talking Head Video Generation

?? Depth-Aware Generative Adversarial Network for Talking Head Video Generation (CVPR 2022) ?? If DaGAN is helpful in your photos/projects, please hel

Fa-Ting Hong 454 Nov 13, 2022
This is an official implementation of the CVPR2022 paper "Blind2Unblind: Self-Supervised Image Denoising with Visible Blind Spots".

Blind2Unblind: Self-Supervised Image Denoising with Visible Blind Spots Blind2Unblind Citing Blind2Unblind @inproceedings{wang2022blind2unblind, tit

demonsjin 53 Nov 13, 2022
The official codes of our CVPR2022 paper: A Differentiable Two-stage Alignment Scheme for Burst Image Reconstruction with Large Shift

TwoStageAlign The official codes of our CVPR2022 paper: A Differentiable Two-stage Alignment Scheme for Burst Image Reconstruction with Large Shift Pa

Shi Guo 32 Oct 11, 2022
Code for the CVPR2022 paper "Frequency-driven Imperceptible Adversarial Attack on Semantic Similarity"

Introduction This is an official release of the paper "Frequency-driven Imperceptible Adversarial Attack on Semantic Similarity" (arxiv link). Abstrac

Leo 18 Nov 2, 2022
Source code for CVPR2022 paper "Abandoning the Bayer-Filter to See in the Dark"

Abandoning the Bayer-Filter to See in the Dark (CVPR 2022) Paper: https://arxiv.org/abs/2203.04042 (Arxiv version) This code includes the training and

null 66 Oct 27, 2022
PSTR: End-to-End One-Step Person Search With Transformers (CVPR2022)

PSTR (CVPR2022) This code is an official implementation of "PSTR: End-to-End One-Step Person Search With Transformers (CVPR2022)". End-to-end one-step

Jiale Cao 23 Nov 15, 2022
CVPR2022 paper "Dense Learning based Semi-Supervised Object Detection"

[CVPR2022] DSL: Dense Learning based Semi-Supervised Object Detection DSL is the first work on Anchor-Free detector for Semi-Supervised Object Detecti

Bhchen 66 Nov 17, 2022
[CVPR2022] Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos

Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos Created by Muheng Li, Lei Chen, Yueqi Duan, Zhilan Hu, Jianjiang Feng, Jie

null 57 Nov 2, 2022
Group R-CNN for Point-based Weakly Semi-supervised Object Detection (CVPR2022)

Group R-CNN for Point-based Weakly Semi-supervised Object Detection (CVPR2022) By Shilong Zhang*, Zhuoran Yu*, Liyang Liu*, Xinjiang Wang, Aojun Zhou,

Shilong Zhang 124 Nov 7, 2022
Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis (CVPR2022)

Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis Multi-View Consistent Generative Adversarial Networks for 3D-aware

Xuanmeng Zhang 75 Nov 1, 2022
TCTrack: Temporal Contexts for Aerial Tracking (CVPR2022)

TCTrack: Temporal Contexts for Aerial Tracking (CVPR2022) Ziang Cao and Ziyuan Huang and Liang Pan and Shiwei Zhang and Ziwei Liu and Changhong Fu In

Intelligent Vision for Robotics in Complex Environment 99 Nov 15, 2022
Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding (CVPR2022)

Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding by Qiaole Dong*, Chenjie Cao*, Yanwei Fu Paper and Supple

Qiaole Dong 170 Nov 15, 2022
FaceVerse: a Fine-grained and Detail-controllable 3D Face Morphable Model from a Hybrid Dataset (CVPR2022)

FaceVerse FaceVerse: a Fine-grained and Detail-controllable 3D Face Morphable Model from a Hybrid Dataset Lizhen Wang, Zhiyuan Chen, Tao Yu, Chenguang

Lizhen Wang 202 Nov 15, 2022
Towards Implicit Text-Guided 3D Shape Generation (CVPR2022)

Towards Implicit Text-Guided 3D Shape Generation Towards Implicit Text-Guided 3D Shape Generation (CVPR2022) Code for the paper [Towards Implicit Text

null 54 Nov 4, 2022
Video Frame Interpolation with Transformer (CVPR2022)

VFIformer Official PyTorch implementation of our CVPR2022 paper Video Frame Interpolation with Transformer Dependencies python >= 3.8 pytorch >= 1.8.0

DV Lab 58 Oct 19, 2022
[CVPR2022] Representation Compensation Networks for Continual Semantic Segmentation

RCIL [CVPR2022] Representation Compensation Networks for Continual Semantic Segmentation Chang-Bin Zhang1, Jia-Wen Xiao1, Xialei Liu1, Ying-Cong Chen2

Chang-Bin Zhang 67 Nov 15, 2022
A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution (CVPR2022)

A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution (CVPR2022) https://arxiv.org/abs/2203.09388 Jianqi Ma, Zheto

MA Jianqi, shiki 99 Nov 17, 2022
Unsupervised Domain Adaptation for Nighttime Aerial Tracking (CVPR2022)

Unsupervised Domain Adaptation for Nighttime Aerial Tracking (CVPR2022) Junjie Ye, Changhong Fu, Guangze Zheng, Danda Pani Paudel, and Guang Chen. Uns

Intelligent Vision for Robotics in Complex Environment 85 Nov 17, 2022
CVPR2022 (Oral) - Rethinking Semantic Segmentation: A Prototype View

Rethinking Semantic Segmentation: A Prototype View Rethinking Semantic Segmentation: A Prototype View, Tianfei Zhou, Wenguan Wang, Ender Konukoglu and

Tianfei Zhou 222 Nov 7, 2022