CFNet: Cascade and Fused Cost Volume for Robust Stereo Matching（CVPR2021）

Last update: Dec 28, 2022

Related tags

Overview

CFNet(CVPR 2021)

This is the implementation of the paper CFNet: Cascade and Fused Cost Volume for Robust Stereo Matching, CVPR 2021, Zhelun Shen, Yuchao Dai, Zhibo Rao [Arxiv].

Our method also obtains the 1st place on the stereo task of Robust Vision Challenge 2020

Camera ready version and supplementary Materials can be found in [CVPR official website]

Code has been released.

Abstract

Recently, the ever-increasing capacity of large-scale annotated datasets has led to profound progress in stereo matching. However, most of these successes are limited to a specific dataset and cannot generalize well to other datasets. The main difficulties lie in the large domain differences and unbalanced disparity distribution across a variety of datasets, which greatly limit the real-world applicability of current deep stereo matching models. In this paper, we propose CFNet, a Cascade and Fused cost volume based network to improve the robustness of the stereo matching network. First, we propose a fused cost volume representation to deal with the large domain difference. By fusing multiple low-resolution dense cost volumes to enlarge the receptive field, we can extract robust structural representations for initial disparity estimation. Second, we propose a cascade cost volume representation to alleviate the unbalanced disparity distribution. Specifically, we employ a variance-based uncertainty estimation to adaptively adjust the next stage disparity search space, in this way driving the network progressively prune out the space of unlikely correspondences. By iteratively narrowing down the disparity search space and improving the cost volume resolution, the disparity estimation is gradually refined in a coarse-tofine manner. When trained on the same training images and evaluated on KITTI, ETH3D, and Middlebury datasets with the fixed model parameters and hyperparameters, our proposed method achieves the state-of-the-art overall performance and obtains the 1st place on the stereo task of Robust Vision Challenge 2020.

How to use

Environment

python 3.74
Pytorch == 1.1.0
Numpy == 1.15

Data Preparation

Download Scene Flow Datasets, KITTI 2012, KITTI 2015, ETH3D, Middlebury

KITTI2015/2012 SceneFlow

please place the dataset as described in "./filenames", i.e., "./filenames/sceneflow_train.txt", "./filenames/sceneflow_test.txt", "./filenames/kitticombine.txt"

Middlebury/ETH3D

Our folder structure is as follows:

dataset
├── KITTI2015
├── KITTI2012
├── Middlebury
    │ ├── Adirondack
    │   ├── im0.png
    │   ├── im1.png
    │   └── disp0GT.pfm
├── ETH3D
    │ ├── delivery_area_1l
    │   ├── im0.png
    │   ├── im1.png
    │   └── disp0GT.pfm

Note that we use the full-resolution image of Middlebury for training as the additional training images don't have half-resolution version. We will down-sample the input image to half-resolution in the data argumentation. In contrast, we use the half-resolution image and full-resolution disparity of Middlebury for testing.

Training

Scene Flow Datasets Pretraining

run the script ./scripts/sceneflow.sh to pre-train on Scene Flow datsets. Please update DATAPATH in the bash file as your training data path.

To repeat our pretraining details. You may need to replace the Mish activation function to Relu. Samples is shown in ./models/relu/.

Finetuning

run the script ./scripts/robust.sh to jointly finetune the pre-train model on four datasets, i.e., KITTI 2015, KITTI2012, ETH3D, and Middlebury. Please update DATAPATH and --loadckpt as your training data path and pretrained SceneFlow checkpoint file.

Evaluation

Joint Generalization

run the script ./scripts/eth3d_save.sh", ./scripts/mid_save.sh" and ./scripts/kitti15_save.sh to save png predictions on the test set of the ETH3D, Middlebury, and KITTI2015 datasets. Note that you may need to update the storage path of save_disp.py, i.e., fn = os.path.join("/home3/raozhibo/jack/shenzhelun/cfnet/pre_picture/", fn.split('/')[-2]).

Corss-domain Generalization

run the script ./scripts/robust_test.sh" to test the cross-domain generalizaiton of the model (Table.3 of the main paper). Please update --loadckpt as pretrained SceneFlow checkpoint file.

Pretrained Models

Pretraining Model You can use this checkpoint to reproduce the result we reported in Table.3 of the main paper

Finetuneing Moel You can use this checkpoint to reproduce the result we reported in the stereo task of Robust Vision Challenge 2020

Citation

If you find this code useful in your research, please cite:

@InProceedings{Shen_2021_CVPR,
    author    = {Shen, Zhelun and Dai, Yuchao and Rao, Zhibo},
    title     = {CFNet: Cascade and Fused Cost Volume for Robust Stereo Matching},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2021},
    pages     = {13906-13915}
}

Acknowledgements

Thanks to the excellent work GWCNet, Deeppruner, and HSMNet. Our work is inspired by these work and part of codes are migrated from GWCNet, DeepPruner and HSMNet.

CFNet: Cascade and Fused Cost Volume for Robust Stereo Matching（CVPR2021）

Related tags

Overview

CFNet(CVPR 2021)

Code has been released.

Abstract

How to use

Environment

Data Preparation

Training

Evaluation

Pretrained Models

Citation

Acknowledgements

Owner

PyTorch implementation of ''Background Activation Suppression for Weakly Supervised Object Localization''.

Research code of ICCV 2021 paper "Mesh Graphormer"

ISTR: End-to-End Instance Segmentation with Transformers (https://arxiv.org/abs/2105.00637)

Code for paper "Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs"

Implementation of "JOKR: Joint Keypoint Representation for Unsupervised Cross-Domain Motion Retargeting"

DeepVoxels is an object-specific, persistent 3D feature embedding.

InterfaceGAN++: Exploring the limits of InterfaceGAN

This codebase proposes modular light python and pytorch implementations of several LiDAR Odometry methods

Implementation of Continuous Sparsification, a method for pruning and ticket search in deep networks

Human pose estimation from video plays a critical role in various applications such as quantifying physical exercises, sign language recognition, and full-body gesture control.

hipCaffe: the HIP port of Caffe

Makes patches from huge resolution .svs slide files using openslide

Diffgram - Supervised Learning Data Platform

Some tentative models that incorporate label propagation to graph neural networks for graph representation learning in nodes, links or graphs.

This's an implementation of deepmind Visual Interaction Networks paper using pytorch

Code for the Convolutional Vision Transformer (ConViT)

State of the Art Neural Networks for Generative Deep Learning

Implementation of CVPR 2021 paper "Spatially-invariant Style-codes Controlled Makeup Transfer"

A Parameter-free Deep Embedded Clustering Method for Single-cell RNA-seq Data

SAT Project - The first project I had done at General Assembly, performed EDA, data cleaning and created data visualizations