Official code for CVPR2022 paper: Depth-Aware Generative Adversarial Network for Talking Head Video Generation

Overview

๐Ÿ“– Depth-Aware Generative Adversarial Network for Talking Head Video Generation (CVPR 2022)

๐Ÿ”ฅ If DaGAN is helpful in your photos/projects, please help to โญ it or recommend it to your friends. Thanks ๐Ÿ”ฅ

[Paper] โ€ƒ [Project Page] โ€ƒ [Demo] โ€ƒ [Poster Video]

Fa-Ting Hong, Longhao Zhang, Li Shen, Dan Xu
The Hong Kong University of Science and Technology

Cartoon Sample

cartoon.mp4

Human Sample

celeb.mp4

Voxceleb1 Dataset

๐Ÿšฉ Updates

  • ๐Ÿ”ฅ ๐Ÿ”ฅ โœ… May 19, 2022: The depth face model trained on Voxceleb2 is released! (The corresponding checkpoint of DaGAN will release soon). Click the LINK

  • ๐Ÿ”ฅ ๐Ÿ”ฅ โœ… April 25, 2022: Integrated into Huggingface Spaces ๐Ÿค— using Gradio. Try out the web demo: Hugging Face Spaces (GPU version will come soon!)

  • ๐Ÿ”ฅ ๐Ÿ”ฅ โœ… Add SPADE model, which produces more natural results.

๐Ÿ”ง Dependencies and Installation

Installation

We now provide a clean version of DaGAN, which does not require customized CUDA extensions.

  1. Clone repo

    git clone https://github.com/harlanhong/CVPR2022-DaGAN.git
    cd CVPR2022-DaGAN
  2. Install dependent packages

    pip install -r requirements.txt
    
    ## Install the Face Alignment lib
    cd face-alignment
    pip install -r requirements.txt
    python setup.py install

โšก Quick Inference

We take the paper version for an example. More models can be found here.

YAML configs

See config/vox-adv-256.yaml to get description of each parameter.

Pre-trained checkpoint

The pre-trained checkpoint of face depth network and our DaGAN checkpoints can be found under following link: OneDrive.

Inference! To run a demo, download checkpoint and run the following command:

CUDA_VISIBLE_DEVICES=0 python demo.py  --config config/vox-adv-256.yaml --driving_video path/to/driving --source_image path/to/source --checkpoint path/to/checkpoint --relative --adapt_scale --kp_num 15 --generator DepthAwareGenerator 

The result will be stored in result.mp4. The driving videos and source images should be cropped before it can be used in our method. To obtain some semi-automatic crop suggestions you can use python crop-video.py --inp some_youtube_video.mp4. It will generate commands for crops using ffmpeg.

๐Ÿ’ป Training

Datasets

  1. VoxCeleb. Please follow the instruction from https://github.com/AliaksandrSiarohin/video-preprocessing.

Train on VoxCeleb

To train a model on specific dataset run:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --master_addr="0.0.0.0" --master_port=12348 run.py --config config/vox-adv-256.yaml --name DaGAN --rgbd --batchsize 12 --kp_num 15 --generator DepthAwareGenerator

The code will create a folder in the log directory (each run will create a new name-specific directory). Checkpoints will be saved to this folder. To check the loss values during training see log.txt. By default the batch size is tunned to run on 8 GeForce RTX 3090 gpu (You can obtain the best performance after about 150 epochs). You can change the batch size in the train_params in .yaml file.

๐Ÿšฉ Please use multiple GPUs to train your own model, if you use only one GPU, you would meet the inplace problem.

Also, you can watch the training loss by running the following command:

tensorboard --logdir log/DaGAN/log

When you kill your process for some reasons in the middle of training, a zombie process may occur, you can kill it using our provided tool:

python kill_port.py PORT

Training on your own dataset

  1. Resize all the videos to the same size e.g 256x256, the videos can be in '.gif', '.mp4' or folder with images. We recommend the later, for each video make a separate folder with all the frames in '.png' format. This format is loss-less, and it has better i/o performance.

  2. Create a folder data/dataset_name with 2 subfolders train and test, put training videos in the train and testing in the test.

  3. Create a config config/dataset_name.yaml, in dataset_params specify the root dir the root_dir: data/dataset_name. Also adjust the number of epoch in train_params.

๐Ÿ“œ Acknowledgement

Our DaGAN implementation is inspired by FOMM. We appreciate the authors of FOMM for making their codes available to public.

๐Ÿ“œ BibTeX

@inproceedings{hong2022depth,
            title={Depth-Aware Generative Adversarial Network for Talking Head Video Generation},
            author={Hong, Fa-Ting and Zhang, Longhao and Shen, Li and Xu, Dan},
            journal={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
            year={2022}
          }

๐Ÿ“ง Contact

If you have any question, please email [email protected].

Issues
  • The generated face remains the same pose

    The generated face remains the same pose

    Thanks for your good work; however when i tried run the demo, the generated video tends to remains the same pose as the source image; while in the paper (Figure 2) the generated results have driving frame's pose(this is also the case for the results from README), so why is this the case?

    https://user-images.githubusercontent.com/29053705/165462856-da97c242-b091-4609-b122-414c4216f492.mp4

    opened by hallwaypzh 3
  • Error while training on VoxCeleb

    Error while training on VoxCeleb

    Hi, I am trying to train DaGAN on VoxCeleb. The following error is occurring.

      File "run.py", line 144, in <module>
        train(config, generator, discriminator, kp_detector, opt.checkpoint, log_dir, dataset, opt.local_rank,device,opt,writer)
      File "/home/madhav3101/gan_codes/CVPR2022-DaGAN/train.py", line 66, in train
        losses_generator, generated = generator_full(x)
      File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
        return forward_call(*input, **kwargs)
      File "/home/madhav3101/gan_codes/CVPR2022-DaGAN/modules/model.py", line 189, in forward
        kp_driving = self.kp_extractor(driving)
      File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
        return forward_call(*input, **kwargs)
      File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 886, in forward
        output = self.module(*inputs[0], **kwargs[0])
      File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
        return forward_call(*input, **kwargs)
      File "/home/madhav3101/gan_codes/CVPR2022-DaGAN/modules/keypoint_detector.py", line 51, in forward
        feature_map = self.predictor(x) #x bz,4,64,64
      File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
        return forward_call(*input, **kwargs)
      File "/home/madhav3101/gan_codes/CVPR2022-DaGAN/modules/util.py", line 252, in forward
        return self.decoder(self.encoder(x))
      File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
        return forward_call(*input, **kwargs)
      File "/home/madhav3101/gan_codes/CVPR2022-DaGAN/modules/util.py", line 178, in forward
        out = up_block(out)
      File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
        return forward_call(*input, **kwargs)
      File "/home/madhav3101/gan_codes/CVPR2022-DaGAN/modules/util.py", line 92, in forward
        out = self.norm(out)
      File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
        return forward_call(*input, **kwargs)
      File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 745, in forward
        self.eps,
      File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/nn/functional.py", line 2283, in batch_norm
        input, weight, bias, running_mean, running_var, training, momentum, eps, torch.backends.cudnn.enabled
     (function _print_stack)
    ^M  0%|          | 0/3965 [00:26<?, ?it/s]
    ^M  0%|          | 0/150 [00:26<?, ?it/s]
    
    Traceback (most recent call last):
      File "run.py", line 144, in <module>
        train(config, generator, discriminator, kp_detector, opt.checkpoint, log_dir, dataset, opt.local_rank,device,opt,writer)
      File "/home/madhav3101/gan_codes/CVPR2022-DaGAN/train.py", line 70, in train
        loss.backward()
      File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/_tensor.py", line 307, in backward
        torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
      File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/autograd/__init__.py", line 156, in backward
        allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
    RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [32]] is at version 4; expected version 3 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
    /home/madhav3101/env_tf/lib/python3.7/site-packages/torch/distributed/launch.py:186: FutureWarning: The module torch.distributed.launch is deprecated
    and will be removed in future. Use torchrun.
    Note that --use_env is set by default in torchrun.
    If your script expects `--local_rank` argument to be set, please
    change it to read from `os.environ['LOCAL_RANK']` instead. See
    https://pytorch.org/docs/stable/distributed.html#launch-utility for
    further instructions
    
      FutureWarning,
    ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 13113) of binary: /home/madhav3101/env_tf/bin/python
    Traceback (most recent call last):
      File "/home/madhav3101/miniconda3/lib/python3.7/runpy.py", line 193, in _run_module_as_main
        "__main__", mod_spec)
      File "/home/madhav3101/miniconda3/lib/python3.7/runpy.py", line 85, in _run_code
        exec(code, run_globals)
      File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/distributed/launch.py", line 193, in <module>
        main()
      File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/distributed/launch.py", line 189, in main
        launch(args)
      File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/distributed/launch.py", line 174, in launch
        run(args)
      File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/distributed/run.py", line 713, in run
        )(*cmd_args)
      File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
        return launch_agent(self._config, self._entrypoint, list(args))
      File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 261, in launch_agent
        failures=result.failures,
    torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
    ============================================================
    run.py FAILED
    ------------------------------------------------------------
    Failures:
      <NO_OTHER_FAILURES>
    ------------------------------------------------------------
    Root Cause (first observed failure):
    [0]:
      time      : 2022-04-25_17:30:13
      host      : gnode90.local
      rank      : 0 (local_rank: 0)
      exitcode  : 1 (pid: 13113)
      error_file: <N/A>
      traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
    ============================================================
    
    
    opened by mdv3101 3
  • Error as training on my own dataset, did anyone have this problem before?

    Error as training on my own dataset, did anyone have this problem before?

    [W python_anomaly_mode.cpp:104] Warning: Error detected in CudnnBatchNormBackward. Traceback of forward call that caused the error: File "run.py", line 144, in train(config, generator, discriminator, kp_detector, opt.checkpoint, log_dir, dataset, opt.local_rank,device,opt,writer) File "/mnt/users/CVPR2022-DaGAN-master/train.py", line 66, in train losses_generator, generated = generator_full(x)

    Meanwhile there's another problem as well: Traceback (most recent call last): File "run.py", line 144, in train(config, generator, discriminator, kp_detector, opt.checkpoint, log_dir, dataset, opt.local_rank,device,opt,writer) File "/mnt/users/CVPR2022-DaGAN-master/train.py", line 74, in train loss.backward() File "/home/anaconda3/envs/DaGAN/lib/python3.7/site-packages/torch/tensor.py", line 221, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [32]] is at version 5; expected version 4 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

    It seems an inplace problem happen, but I couldn't find anywhere with an inplace code.

    opened by twilight0718 2
  • Hello, I was wondering what's wrong with my example, I follow the instruction but get different result

    Hello, I was wondering what's wrong with my example, I follow the instruction but get different result

    I follow the instruction, set my parameter as followed:

    CUDA_VISIBLE_DEVICES=7 python demo.py
    --config config/vox-adv-256.yaml --driving_video source/example.mp4 --source_image source/example.png --checkpoint download/SPADE_DaGAN_vox_adv_256.pth.tar --kp_num 15 --generator SPADEDepthAwareGenerator --result_video results/example_out.mp4 --relative --adapt_scale

    https://user-images.githubusercontent.com/37037808/169028484-142f9de3-acd6-45d6-950d-caacae9f593a.mp4

    Is there something wrong with my parameters.

    opened by twilight0718 2
  • Question regarding output of  DepthAwareAttention

    Question regarding output of DepthAwareAttention

    In the DepthAwareAttention module, the inputs are: depth_image and output feature map generated by occlusion map line 195.

    depth_image is stored in 'source' while output feature is stored in 'feat'.

    There is one variable gamma line 66, which is basically a zero tensor. self.gamma = nn.Parameter(torch.zeros(1))

    After doing all the operations in forward pass, you are getting an output feature map. It is then multiplied with gamma and feat is added line 87. out = self.gamma*out + feat

    That means all the operations performed during the forward pass are multiplied to zero and the original output features were returned. That makes the entire DepthAwareAttention useless, as the attention returned was also never used in the code.

    Can you please clarify on this?

    opened by mdv3101 2
  • Error when load the spade model

    Error when load the spade model

    Nice work! But I have encountered a problem that when I load the SPADE model as I load theDaGAN model, the following problem occurs. Any suggestions?

      File "demo.py", line 191, in <module>
        generator, kp_detector = load_checkpoints(config_path=opt.config, checkpoint_path=opt.checkpoint, cpu=opt.cpu)
      File "demo.py", line 46, in load_checkpoints
        generator.load_state_dict(ckp_generator)
      File "/root/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in load_state_dict
        raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
    RuntimeError: Error(s) in loading state_dict for DepthAwareGenerator:
       Missing key(s) in state_dict: "up_blocks.0.conv.weight", "up_blocks.0.conv.bias",...... "final.bias". 
       Unexpected key(s) in state_dict: "decoder.compress.weight", "decoder.compress.bias", ..."decoder.G_middle_0.norm_0.mlp_beta.weight" ```
    Any suggestions?
    opened by howardgriffin 2
  • Question about the Eqn.(9) and Fig.10

    Question about the Eqn.(9) and Fig.10

    Hi, thanks for sharing the good work. After reading the paper, I have some confusion in understanding the attention process in equation (9).

    1. How to understand the physical meaning of the attenion? The query feature comes from the source depth map, while the key and value features come from the warped source feature; since the depth map has a different pose with the warped feaure, and according to the qkv attention, the re-represented feature should have spatial structure simialr with the query (the depth map here); so how to guarantee the refined feature $F_g$ has the pose of the driving image?
    2. Intuitively, features of different positions may have different relations with features of other postions; in Fig.10, it seems the attentions from different positions are always similar (i.e., both attend the mouth and eyes), how to understand this?
    opened by JialeTao 2
  • Error occurs when I change model to SPADE_DaGAN_vox_adv_256.pth.tar

    Error occurs when I change model to SPADE_DaGAN_vox_adv_256.pth.tar

    As I use DaGAN_vox_adv_256.pth.tar as my pretrained model, the result is not very well. Therefore I want to change model to SPADE_DaGAN_vox_adv_256.pth.tar but error as followed occurs:

    RuntimeError: Error(s) in loading state_dict for DepthAwareGenerator:
            Missing key(s) in state_dict: "up_blocks.0.conv.weight", "up_blocks.0.conv.bias", "up_blocks.0.norm.weight", "up_blocks.0.norm.bias", "up_blocks.0.norm.running_mean", "up_blocks.0.norm.running_var", "up_blocks.1.conv.weight", "up_blocks.1.conv.bias", "up_blocks.1.norm.weight", "up_blocks.1.norm.bias", "up_blocks.1.norm.running_mean", "up_blocks.1.norm.running_var", "bottleneck.r0.conv1.weight", "bottleneck.r0.conv1.bias", "bottleneck.r0.conv2.weight", "bottleneck.r0.conv2.bias", "bottleneck.r0.norm1.weight", "bottleneck.r0.norm1.bias", "bottleneck.r0.norm1.running_mean", "bottleneck.r0.norm1.running_var", "bottleneck.r0.norm2.weight", "bottleneck.r0.norm2.bias", "bottleneck.r0.norm2.running_mean", "bottleneck.r0.norm2.running_var", "bottleneck.r1.conv1.weight", "bottleneck.r1.conv1.bias", "bottleneck.r1.conv2.weight", "bottleneck.r1.conv2.bias", "bottleneck.r1.norm1.weight", "bottleneck.r1.norm1.bias", "bottleneck.r1.norm1.running_mean", "bottleneck.r1.norm1.running_var", "bottleneck.r1.norm2.weight", "bottleneck.r1.norm2.bias", "bottleneck.r1.norm2.running_mean", "bottleneck.r1.norm2.running_var", "bottleneck.r2.conv1.weight", "bottleneck.r2.conv1.bias", "bottleneck.r2.conv2.weight", "bottleneck.r2.conv2.bias", "bottleneck.r2.norm1.weight", "bottleneck.r2.norm1.bias", "bottleneck.r2.norm1.running_mean", "bottleneck.r2.norm1.running_var", "bottleneck.r2.norm2.weight", "bottleneck.r2.norm2.bias", "bottleneck.r2.norm2.running_mean", "bottleneck.r2.norm2.running_var", "bottleneck.r3.conv1.weight", "bottleneck.r3.conv1.bias", "bottleneck.r3.conv2.weight", "bottleneck.r3.conv2.bias", "bottleneck.r3.norm1.weight", "bottleneck.r3.norm1.bias", "bottleneck.r3.norm1.running_mean", "bottleneck.r3.norm1.running_var", "bottleneck.r3.norm2.weight", "bottleneck.r3.norm2.bias", "bottleneck.r3.norm2.running_mean", "bottleneck.r3.norm2.running_var", "bottleneck.r4.conv1.weight", "bottleneck.r4.conv1.bias", "bottleneck.r4.conv2.weight", "bottleneck.r4.conv2.bias", "bottleneck.r4.norm1.weight", "bottleneck.r4.norm1.bias", "bottleneck.r4.norm1.running_mean", "bottleneck.r4.norm1.running_var", "bottleneck.r4.norm2.weight", "bottleneck.r4.norm2.bias", "bottleneck.r4.norm2.running_mean", "bottleneck.r4.norm2.running_var", "bottleneck.r5.conv1.weight", "bottleneck.r5.conv1.bias", "bottleneck.r5.conv2.weight", "bottleneck.r5.conv2.bias", "bottleneck.r5.norm1.weight", "bottleneck.r5.norm1.bias", "bottleneck.r5.norm1.running_mean", "bottleneck.r5.norm1.running_var", "bottleneck.r5.norm2.weight", "bottleneck.r5.norm2.bias", "bottleneck.r5.norm2.running_mean", "bottleneck.r5.norm2.running_var", "final.weight", "final.bias". 
            Unexpected key(s) in state_dict: "decoder.compress.weight", "decoder.compress.bias", "decoder.fc.weight", "decoder.fc.bias", "decoder.G_middle_0.conv_0.bias", "decoder.G_middle_0.conv_0.weight_orig", "decoder.G_middle_0.conv_0.weight_u", "decoder.G_middle_0.conv_0.weight_v", "decoder.G_middle_0.conv_1.bias", "decoder.G_middle_0.conv_1.weight_orig", "decoder.G_middle_0.conv_1.weight_u", "decoder.G_middle_0.conv_1.weight_v", "decoder.G_middle_0.norm_0.mlp_shared.0.weight", "decoder.G_middle_0.norm_0.mlp_shared.0.bias", "decoder.G_middle_0.norm_0.mlp_gamma.weight", "decoder.G_middle_0.norm_0.mlp_gamma.bias", "decoder.G_middle_0.norm_0.mlp_beta.weight", "decoder.G_middle_0.norm_0.mlp_beta.bias", "decoder.G_middle_0.norm_1.mlp_shared.0.weight", "decoder.G_middle_0.norm_1.mlp_shared.0.bias", "decoder.G_middle_0.norm_1.mlp_gamma.weight", "decoder.G_middle_0.norm_1.mlp_gamma.bias", "decoder.G_middle_0.norm_1.mlp_beta.weight", "decoder.G_middle_0.norm_1.mlp_beta.bias", "decoder.G_middle_1.conv_0.bias", "decoder.G_middle_1.conv_0.weight_orig", "decoder.G_middle_1.conv_0.weight_u", "decoder.G_middle_1.conv_0.weight_v", "decoder.G_middle_1.conv_1.bias", "decoder.G_middle_1.conv_1.weight_orig", "decoder.G_middle_1.conv_1.weight_u", "decoder.G_middle_1.conv_1.weight_v", "decoder.G_middle_1.norm_0.mlp_shared.0.weight", "decoder.G_middle_1.norm_0.mlp_shared.0.bias", "decoder.G_middle_1.norm_0.mlp_gamma.weight", "decoder.G_middle_1.norm_0.mlp_gamma.bias", "decoder.G_middle_1.norm_0.mlp_beta.weight", "decoder.G_middle_1.norm_0.mlp_beta.bias", "decoder.G_middle_1.norm_1.mlp_shared.0.weight", "decoder.G_middle_1.norm_1.mlp_shared.0.bias", "decoder.G_middle_1.norm_1.mlp_gamma.weight", "decoder.G_middle_1.norm_1.mlp_gamma.bias", "decoder.G_middle_1.norm_1.mlp_beta.weight", "decoder.G_middle_1.norm_1.mlp_beta.bias", "decoder.G_middle_2.conv_0.bias", "decoder.G_middle_2.conv_0.weight_orig", "decoder.G_middle_2.conv_0.weight_u", "decoder.G_middle_2.conv_0.weight_v", "decoder.G_middle_2.conv_1.bias", "decoder.G_middle_2.conv_1.weight_orig", "decoder.G_middle_2.conv_1.weight_u", "decoder.G_middle_2.conv_1.weight_v", "decoder.G_middle_2.norm_0.mlp_shared.0.weight", "decoder.G_middle_2.norm_0.mlp_shared.0.bias", "decoder.G_middle_2.norm_0.mlp_gamma.weight", "decoder.G_middle_2.norm_0.mlp_gamma.bias", "decoder.G_middle_2.norm_0.mlp_beta.weight", "decoder.G_middle_2.norm_0.mlp_beta.bias", "decoder.G_middle_2.norm_1.mlp_shared.0.weight", "decoder.G_middle_2.norm_1.mlp_shared.0.bias", "decoder.G_middle_2.norm_1.mlp_gamma.weight", "decoder.G_middle_2.norm_1.mlp_gamma.bias", "decoder.G_middle_2.norm_1.mlp_beta.weight", "decoder.G_middle_2.norm_1.mlp_beta.bias", "decoder.up_0.conv_0.bias", "decoder.up_0.conv_0.weight_orig", "decoder.up_0.conv_0.weight_u", "decoder.up_0.conv_0.weight_v", "decoder.up_0.conv_1.bias", "decoder.up_0.conv_1.weight_orig", "decoder.up_0.conv_1.weight_u", "decoder.up_0.conv_1.weight_v", "decoder.up_0.conv_s.weight_orig", "decoder.up_0.conv_s.weight_u", "decoder.up_0.conv_s.weight_v", "decoder.up_0.norm_0.mlp_shared.0.weight", "decoder.up_0.norm_0.mlp_shared.0.bias", "decoder.up_0.norm_0.mlp_gamma.weight", "decoder.up_0.norm_0.mlp_gamma.bias", "decoder.up_0.norm_0.mlp_beta.weight", "decoder.up_0.norm_0.mlp_beta.bias", "decoder.up_0.norm_1.mlp_shared.0.weight", "decoder.up_0.norm_1.mlp_shared.0.bias", "decoder.up_0.norm_1.mlp_gamma.weight", "decoder.up_0.norm_1.mlp_gamma.bias", "decoder.up_0.norm_1.mlp_beta.weight", "decoder.up_0.norm_1.mlp_beta.bias", "decoder.up_0.norm_s.mlp_shared.0.weight", "decoder.up_0.norm_s.mlp_shared.0.bias", "decoder.up_0.norm_s.mlp_gamma.weight", "decoder.up_0.norm_s.mlp_gamma.bias", "decoder.up_0.norm_s.mlp_beta.weight", "decoder.up_0.norm_s.mlp_beta.bias", "decoder.up_1.conv_0.bias", "decoder.up_1.conv_0.weight_orig", "decoder.up_1.conv_0.weight_u", "decoder.up_1.conv_0.weight_v", "decoder.up_1.conv_1.bias", "decoder.up_1.conv_1.weight_orig", "decoder.up_1.conv_1.weight_u", "decoder.up_1.conv_1.weight_v", "decoder.up_1.conv_s.weight_orig", "decoder.up_1.conv_s.weight_u", "decoder.up_1.conv_s.weight_v", "decoder.up_1.norm_0.mlp_shared.0.weight", "decoder.up_1.norm_0.mlp_shared.0.bias", "decoder.up_1.norm_0.mlp_gamma.weight", "decoder.up_1.norm_0.mlp_gamma.bias", "decoder.up_1.norm_0.mlp_beta.weight", "decoder.up_1.norm_0.mlp_beta.bias", "decoder.up_1.norm_1.mlp_shared.0.weight", "decoder.up_1.norm_1.mlp_shared.0.bias", "decoder.up_1.norm_1.mlp_gamma.weight", "decoder.up_1.norm_1.mlp_gamma.bias", "decoder.up_1.norm_1.mlp_beta.weight", "decoder.up_1.norm_1.mlp_beta.bias", "decoder.up_1.norm_s.mlp_shared.0.weight", "decoder.up_1.norm_s.mlp_shared.0.bias", "decoder.up_1.norm_s.mlp_gamma.weight", "decoder.up_1.norm_s.mlp_gamma.bias", "decoder.up_1.norm_s.mlp_beta.weight", "decoder.up_1.norm_s.mlp_beta.bias", "decoder.conv_img.weight", "decoder.conv_img.bias". 
    

    It seems that those two pretrained model have different structures, should I change something in demo.py or vox-adv-256.yaml? Looking forward to your reply, Thx a lot!

    opened by twilight0718 1
  • Question about the background of images

    Question about the background of images

    Thanks for this incredible work! I've looked at the demo gif on the project homepage, I was wondering about why the background is moving with head movement, is there any way to disentangle the foreground and background๏ผŸ

    opened by KangweiiLiu 1
  • my fault

    my fault

    Great work. But i got some erro when trying the SPADE.pth.tar :

    RuntimeError: Error(s) in loading state_dict for DepthAwareGenerator: Missing key(s) in state_dict: "up_blocks.0.conv.weight", "up_blocks.0.conv.bias",.... Unexpected key(s) in state_dict: "decoder.compress.weight", "decoder.compress.bias",...

    opened by kakao014 0
  • Suggestion: Add automatic face cropping in demo.py

    Suggestion: Add automatic face cropping in demo.py

    Output result significally related to input image. There few samples:

    1. Photo as is
    2. Photo with manual crop
    3. Photo converted to video and cropped with crop-video.py

    Please, crop input image inside demo.py automatically

    https://user-images.githubusercontent.com/84853762/169252080-db016d04-2f9c-4bb3-9d84-d5de0450d2a6.mp4

    https://user-images.githubusercontent.com/84853762/169252087-5a8c9a5a-0eeb-436f-874f-0745683e64b3.mp4

    https://user-images.githubusercontent.com/84853762/169252089-a8b37f66-897b-4092-a109-86295fecbf15.mp4

    opened by Vadim2S 1
  • Depth map from paper not reproducable

    Depth map from paper not reproducable

    Hi

    Firstly, thank you for this awesome work. However, I tried to reproduce the depth map from the paper using the "demo.py" script and the result is quite different from the one seen in Fig. 9 of the paper.

    Result from the paper: depthmapDaGAN paper

    Result running the script: depthmapDaGAN_myRun

    Corresponding depth map as pointcloud: depthmapDaGAN_myRunPCD

    The Depth map looks way more smooth and facial details like the nose or mouth are completely lost.

    opened by mrokuss 2
  • Size of input

    Size of input

    Hello Thanks for your great work! I have a question, does your model support input resolution higher, than 256px? 512px for example I see that in code input video and image are resized to 256px, so causes the loss of visual quality Is there a way to use 512x512 img/vid without losing quality?

    opened by NikitaKononov 3
  • Did you train depth estimator yourself?

    Did you train depth estimator yourself?

    In your paper, you mentioned you need to learn a depth estimator first via self-supervision but in the repo, I didn't see training part for this module. Do you have the plan to release the training code for that part in the future?

    opened by FusionLi 1
Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis (CVPR2022)

Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis Multi-View Consistent Generative Adversarial Networks for 3D-aware

Xuanmeng Zhang 56 Jun 21, 2022
Official code release for "Learned Spatial Representations for Few-shot Talking-Head Synthesis" ICCV 2021

Official code release for "Learned Spatial Representations for Few-shot Talking-Head Synthesis" ICCV 2021

Moustafa Meshry 13 Jun 20, 2022
This repository contains a PyTorch implementation of "AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis".

AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis | Project Page | Paper | PyTorch implementation for the paper "AD-NeRF: Audio

null 414 Jun 21, 2022
Unofficial implementation of One-Shot Free-View Neural Talking Head Synthesis

face-vid2vid Usage Dataset Preparation cd datasets wget https://yt-dl.org/downloads/latest/youtube-dl -O youtube-dl chmod a+rx youtube-dl python load_

worstcoder 38 May 25, 2022
Code for the CVPR2022 paper "Frequency-driven Imperceptible Adversarial Attack on Semantic Similarity"

Introduction This is an official release of the paper "Frequency-driven Imperceptible Adversarial Attack on Semantic Similarity" (arxiv link). Abstrac

Leo 13 Jun 16, 2022
Official code for "Towards An End-to-End Framework for Flow-Guided Video Inpainting" (CVPR2022)

E2FGVI (CVPR 2022) English | ็ฎ€ไฝ“ไธญๆ–‡ This repository contains the official implementation of the following paper: Towards An End-to-End Framework for Flo

Media Computing Group @ Nankai University 350 Jun 20, 2022
Code for Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation (CVPR 2021)

Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation (CVPR 2021) Hang Zhou, Yasheng Sun, Wayne Wu, Chen Cha

Hang_Zhou 516 Jun 12, 2022
Code for One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning (AAAI 2022)

One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning (AAAI 2022) Paper | Demo Requirements Python >= 3.6 , Pytorch >

FuxiVirtualHuman 39 Jun 1, 2022
Digan - Official PyTorch implementation of Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks

DIGAN (ICLR 2022) Official PyTorch implementation of "Generating Videos with Dyn

Sihyun Yu 117 Jun 5, 2022
The implemention of Video Depth Estimation by Fusing Flow-to-Depth Proposals

Flow-to-depth (FDNet) video-depth-estimation This is the implementation of paper Video Depth Estimation by Fusing Flow-to-Depth Proposals Jiaxin Xie,

null 32 Jun 14, 2022
Official implementation of the network presented in the paper "M4Depth: A motion-based approach for monocular depth estimation on video sequences"

M4Depth This is the reference TensorFlow implementation for training and testing depth estimation models using the method described in M4Depth: A moti

Michaรซl Fonder 65 May 13, 2022
Light-weight network, depth estimation, knowledge distillation, real-time depth estimation, auxiliary data.

light-weight-depth-estimation Boosting Light-Weight Depth Estimation Via Knowledge Distillation, https://arxiv.org/abs/2105.06143 Junjie Hu, Chenyou F

Junjie Hu 12 May 24, 2022
This is an official implementation of the CVPR2022 paper "Blind2Unblind: Self-Supervised Image Denoising with Visible Blind Spots".

Blind2Unblind: Self-Supervised Image Denoising with Visible Blind Spots Blind2Unblind Citing Blind2Unblind @inproceedings{wang2022blind2unblind, tit

demonsjin 29 Jun 9, 2022
The official codes of our CVPR2022 paper: A Differentiable Two-stage Alignment Scheme for Burst Image Reconstruction with Large Shift

TwoStageAlign The official codes of our CVPR2022 paper: A Differentiable Two-stage Alignment Scheme for Burst Image Reconstruction with Large Shift Pa

Shi Guo 26 Jun 16, 2022
Towards Implicit Text-Guided 3D Shape Generation (CVPR2022)

Towards Implicit Text-Guided 3D Shape Generation Towards Implicit Text-Guided 3D Shape Generation (CVPR2022) Code for the paper [Towards Implicit Text

null 36 Jun 20, 2022
Official code for "Eigenlanes: Data-Driven Lane Descriptors for Structurally Diverse Lanes", CVPR2022

[CVPR 2022] Eigenlanes: Data-Driven Lane Descriptors for Structurally Diverse Lanes Dongkwon Jin, Wonhui Park, Seong-Gyun Jeong, Heeyeon Kwon, and Cha

Dongkwon Jin 62 Jun 13, 2022
ฯ€-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis

ฯ€-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis Project Page | Paper | Data Eric Ryan Chan*, Marco Monteiro*, Pe

null 318 Jun 20, 2022
Video Frame Interpolation with Transformer (CVPR2022)

VFIformer Official PyTorch implementation of our CVPR2022 paper Video Frame Interpolation with Transformer Dependencies python >= 3.8 pytorch >= 1.8.0

DV Lab 34 Jun 7, 2022
Inference code for "StylePeople: A Generative Model of Fullbody Human Avatars" paper. This code is for the part of the paper describing video-based avatars.

NeuralTextures This is repository with inference code for paper "StylePeople: A Generative Model of Fullbody Human Avatars" (CVPR21). This code is for

Visual Understanding Lab @ Samsung AI Center Moscow 17 Jun 9, 2022