🎵PolyphonicFormer: Unified Query Learning for Depth-aware Video Panoptic Segmentation

PolyphonicFormer is the winner method of the ICCV-2021 SemKITTI-DVPS Challenge.

PolyphonicFormer is accepted by ECCV '22, Tel Aviv, Israel.

Haobo Yuan*, Xiangtai Li*, Yibo Yang, Guangliang Cheng, Jing Zhang, Yunhai Tong, Lefei Zhang, Dacheng Tao.

Demo

Installation (Optional)

You do not need to install the environment if you have docker in your environment. We already put the pre-built docker image on docker hub. If you want to build the docker image by yourself, please run the following command in scripts/docker_env.

docker build -t polyphonicformer:release . --network=host

Please refer to the dockerfile for environment details if you insist on using conda.

Datasets Preparation

You can download the Cityscapes-DVPS datasets here, and SemKITTI-DVPS datasets here. Suppose your path to datasets is DATALOC, please extract the zip file and make sure the datasets folder looks like this:

DATALOC
|── cityscapes-dvps
│   ├── video_sequence
│   │   ├── train
│   │   │   ├── 000000_000000_munster_000105_000004_leftImg8bit.png
│   │   │   ├── 000000_000000_munster_000105_000004_gtFine_instanceTrainIds.png
│   │   │   ├── 000000_000000_munster_000105_000004_depth.png
│   │   │   ├── ...
│   │   ├── val
│   │   │   ├── ...
|── semkitti-dvps
│   ├── video_sequence
│   │   ├── train
│   │   │   ├── 000000_000000_leftImg8bit.png
│   │   │   ├── 000000_000000_gtFine_class.png
│   │   │   ├── 000000_000000_gtFine_instance.png
│   │   │   ├── 000000_000000_depth_718.8560180664062.png
│   │   │   ├── ...
│   │   ├── val
│   │   │   ├── ...

Please make sure you know that the Cityscapes-DVPS and SemKITTI-DVPS datasets are created by the authors of ViP-Deeplab.

Docker Container

After you prepared the datasets, you can create and enter a docker container:

DATALOC={/path/to/datafolder} LOGLOC={/path/to/logfolder} bash tools/docker.sh

The DATALOC will be linked to data in the project folder, and the LOGLOC will be linked to /opt/logger.

Getting Start

Let's go for 🏃‍♀️running code.

Image training

bash tools/dist_train.sh configs/polyphonic_image/poly_r50_cityscapes_2x.py 8 --seed 0 --work-dir /opt/logger/exp001

Image testing

bash tools/dist_test.sh configs/polyphonic_image/poly_r50_cityscapes_2x.py https://huggingface.co/HarborYuan/PolyphonicFormer/resolve/main/polyphonic_r50_image.pth 8

Video training

bash tools/dist_train.sh configs/polyphonic_video/poly_r50_cityscapes_1x.py 8 --seed 0 --work-dir /opt/logger/vid001 --no-validate

Video testing

PYTHONPATH=. python tools/test_video.py configs/polyphonic_video/poly_r50_cityscapes_1x.py https://huggingface.co/HarborYuan/PolyphonicFormer/resolve/main/polyphonic_r50_video.pth --eval-video DVPQ --video-dir ./tmp

To test your own training results, just replace the online checkpoints to your local checkpoints. For example, you can run as the following for video testing:

PYTHONPATH=. python tools/test_video.py configs/polyphonic_video/poly_r50_cityscapes_1x.py /path/to/checkpoint.pth --eval-video DVPQ --video-dir ./tmp

Acknowledgements

The image segmentation model is based on K-Net. The datasets are extracted from ViP-Deeplab. Please refer them if you think they are useful.

@article{zhang2021k,
  title={K-Net: Towards Unified Image Segmentation},
  author={Zhang, Wenwei and Pang, Jiangmiao and Chen, Kai and Loy, Chen Change},
  journal={NeurIPS},
  year={2021}
}
@inproceedings{qiao2021vip,
  title={ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation},
  author={Qiao, Siyuan and Zhu, Yukun and Adam, Hartwig and Yuille, Alan and Chen, Liang-Chieh},
  booktitle={CVPR},
  year={2021}
}

Citation

If you think the code are useful in your research, please consider to refer PolyphonicFormer:

@inproceedings{yuan2022polyphonicformer,
  title={Polyphonicformer: Unified Query Learning for Depth-aware Video Panoptic Segmentation},
  author={Yuan, Haobo and Li, Xiangtai and Yang, Yibo and Cheng, Guangliang and Zhang, Jing and Tong, Yunhai and Zhang, Lefei and Tao, Dacheng},
  booktitle={ECCV},
  year={2022},
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
configs		configs
datasets		datasets
demo		demo
mmdet		mmdet
mmseg		mmseg
polyphonic		polyphonic
scripts		scripts
tools		tools
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

configs

configs

datasets

datasets

demo

demo

mmdet

mmdet

mmseg

mmseg

polyphonic

polyphonic

scripts

scripts

tools

tools

.gitignore

.gitignore

README.md

README.md

Repository files navigation

🎵PolyphonicFormer: Unified Query Learning for Depth-aware Video Panoptic Segmentation

Demo

Installation (Optional)

Datasets Preparation

Docker Container

Getting Start

Image training

Image testing

Video training

Video testing

Acknowledgements

Citation

About

Contributors 2

Languages

HarborYuan/PolyphonicFormer

Folders and files

Latest commit

History

Repository files navigation

🎵PolyphonicFormer: Unified Query Learning for Depth-aware Video Panoptic Segmentation

Demo

Installation (Optional)

Datasets Preparation

Docker Container

Getting Start

Image training

Image testing

Video training

Video testing

Acknowledgements

Citation

About

Topics

Resources

Stars

Watchers

Forks

Languages