Video Frame Interpolation with Transformer (CVPR2022)

Last update: Dec 16, 2022

Related tags

Deep Learning VFIformer

Overview

VFIformer

Official PyTorch implementation of our CVPR2022 paper Video Frame Interpolation with Transformer

Dependencies

python >= 3.8
pytorch >= 1.8.0
torchvision >= 0.9.0

Prepare Dataset

To train on the Vimeo90K, we have to first compute the ground-truth flows between frames using Lite-flownet, you can clone the Lite-flownet repo and put compute_flow_vimeo.py we provide under its main directory and run (remember to change the data path):

python compute_flow_vimeo.py

Get Started

Clone this repo.

git clone https://github.com/Jia-Research-Lab/VFIformer.git
cd VFIformer

Modify the argument --data_root in train.py according to your Vimeo90K path.

Evaluation

Download the pre-trained models and place them into the pretrained_models/ folder.
- Pre-trained models can be downloaded from Google Drive
  - pretrained_VFIformer: the final model in the main paper
  - pretrained_VFIformerSmall: the smaller version of the model mentioned in the supplementary file
Test on the Vimeo90K testing set.

Modify the argument --data_root according to your data path, run:
```
python test.py --data_root [your Vimeo90K path] --testset VimeoDataset --net_name VFIformer --resume ./pretrained_models/pretrained_VFIformer/net_220.pth --save_result
```
If you want to test with the smaller model, please change the --net_name and --resume accordingly:
```
python test.py --data_root [your Vimeo90K path] --testset VimeoDataset --net_name VFIformerSmall --resume ./pretrained_models/pretrained_VFIformerSmall/net_220.pth --save_result
```
The testing results are saved in the test_results/ folder. If you do not want to save the image results, you can remove the --save_result argument in the commands optionally.

Test on the MiddleBury dataset.

Modify the argument --data_root according to your data path, run:

python test.py --data_root [your MiddleBury path] --testset MiddleburyDataset --net_name VFIformer --resume ./pretrained_models/pretrained_VFIformer/net_220.pth --save_result

Test on the UCF101 dataset.

Modify the argument --data_root according to your data path, run:

python test.py --data_root [your UCF101 path] --testset UFC101Dataset --net_name VFIformer --resume ./pretrained_models/pretrained_VFIformer/net_220.pth --save_result

Test on the SNU-FILM dataset.

Modify the argument --data_root according to your data path. Choose the motion level and modify the argument --test_level accordingly, run:

python FILM_test.py --data_root [your SNU-FILM path] --test_level [easy/medium/hard/extreme] --net_name VFIformer --resume ./pretrained_models/pretrained_VFIformer/net_220.pth

Training

First train the flow estimator. (Note that skipping this step will not cause a significant impact on performance. We keep this step here only to be consistent with our paper.)

python -m torch.distributed.launch --nproc_per_node=4 --master_port=4174 train.py --launcher pytorch --gpu_ids 0,1,2,3 \
        --loss_flow --use_tb_logger --batch_size 48 --net_name IFNet --name train_IFNet --max_iter 300 --crop_size 192 --save_epoch_freq 5

Then train the whole framework.

python -m torch.distributed.launch --nproc_per_node=8 --master_port=4175 train.py --launcher pytorch --gpu_ids 0,1,2,3,4,5,6,7 \
        --loss_l1 --loss_ter --loss_flow --use_tb_logger --batch_size 24 --net_name VFIformer --name train_VFIformer --max_iter 300 \
        --crop_size 192 --save_epoch_freq 5 --resume_flownet ./weights/train_IFNet/snapshot/net_final.pth

To train the smaller version, run:

python -m torch.distributed.launch --nproc_per_node=8 --master_port=4175 train.py --launcher pytorch --gpu_ids 0,1,2,3,4,5,6,7 \
        --loss_l1 --loss_ter --loss_flow --use_tb_logger --batch_size 24 --net_name VFIformerSmall --name train_VFIformerSmall --max_iter 300 \
        --crop_size 192 --save_epoch_freq 5 --resume_flownet ./weights/train_IFNet/snapshot/net_final.pth

Test on your own data

Modify the arguments --img0_path and --img1_path according to your data path, run:

python demo.py --img0_path [your img0 path] --img1_path [your img1 path] --save_folder [your save path] --net_name VFIformer --resume ./pretrained_models/pretrained_VFIformer/net_220.pth

Acknowledgement

We borrow some codes from RIFE and SwinIR. We thank the authors for their great work.

Citation

Please consider citing our paper in your publications if it is useful for your research.

@inproceedings{lu2022vfiformer,
    title={Video Frame Interpolation with Transformer},
    author={Liying Lu, Ruizheng Wu, Huaijia Lin, Jiangbo Lu, and Jiaya Jia},
    booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    year={2022},
}

Contact

[email protected]

Video Frame Interpolation with Transformer (CVPR2022)

Related tags

Overview

VFIformer

Dependencies

Prepare Dataset

Get Started

Evaluation

Training

Test on your own data

Acknowledgement

Citation

Contact

Owner

DV Lab

FaceQgen: Semi-Supervised Deep Learning for Face Image Quality Assessment

[ICLR 2021] HW-NAS-Bench: Hardware-Aware Neural Architecture Search Benchmark

AttGAN: Facial Attribute Editing by Only Changing What You Want (IEEE TIP 2019)

TakeInfoatNistforICS - Take Information in NIST NVD for ICS

计算机视觉中用到的注意力模块和其他即插即用模块PyTorch Implementation Collection of Attention Module and Plug&Play Module

ICLR 2021, Fair Mixup: Fairness via Interpolation

ThunderSVM: A Fast SVM Library on GPUs and CPUs

BRNet - code for Automated assessment of BI-RADS categories for ultrasound images using multi-scale neural networks with an order-constrained loss function

[SIGIR22] Official PyTorch implementation for "CORE: Simple and Effective Session-based Recommendation within Consistent Representation Space".

Official repo of the paper "Surface Form Competition: Why the Highest Probability Answer Isn't Always Right"

A no-BS, dead-simple training visualizer for tf-keras

Code for our paper "Multi-scale Guided Attention for Medical Image Segmentation"

This is implementation of AlexNet(2012) with 3D Convolution on TensorFlow (AlexNet 3D).

A production-ready, scalable Indexer for the Jina neural search framework, based on HNSW and PSQL

A set of tests for evaluating large-scale algorithms for Wasserstein-2 transport maps computation.

Official implementation of "DSP: Dual Soft-Paste for Unsupervised Domain Adaptive Semantic Segmentation"

Online-compatible Unsupervised Non-resonant Anomaly Detection Repository

Implementing Graph Convolutional Networks and Information Retrieval Mechanisms using pure Python and NumPy

Fast and exact ILP-based solvers for the Minimum Flow Decomposition (MFD) problem, and variants of it.

ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration