TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-Captured Scenarios

Last update: Dec 22, 2022

Related tags

Overview

TPH-YOLOv5

This repo is the implementation of "TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-Captured Scenarios".
On VisDrone Challenge 2021, TPH-YOLOv5 wins 4th place and achieves well-matched results with 1st place model.
You can get VisDrone-DET2021: The Vision Meets Drone Object Detection Challenge Results for more information.

Install

$ git clone https://github.com/cv516Buaa/tph-yolov5
$ cd tph-yolov5
$ pip install -r requirements.txt

Convert labels

VisDrone2YOLO_lable.py transfer VisDrone annotiations to yolo labels.
You should set the path of VisDrone dataset in VisDrone2YOLO_lable.py first.

$ python VisDrone2YOLO_lable.py

Inference

Datasets : VisDrone
Weights :
- yolov5l-xs-1.pt: | Baidu Drive(pw: vibe). | Google Drive |
- yolov5l-xs-2.pt: | Baidu Drive(pw: vffz). | Google Drive |

val.py runs inference on VisDrone2019-DET-val, using weights trained with TPH-YOLOv5.
(We provide two weights trained by two different models based on YOLOv5l.)

$ python val.py --weights ./weights/yolov5l-xs-1.pt --img 1996 --data ./data/VisDrone.yaml
                                    yolov5l-xs-2.pt
--augment --save-txt  --save-conf --task val --batch-size 8 --verbose --name v5l-xs

Ensemble

If you inference dataset with different models, then you can ensemble the result by weighted boxes fusion using wbf.py.
You should set img path and txt path in wbf.py.

$ python wbf.py

Train

train.py allows you to train new model from strach.

$ python train.py --img 1536 --batch 2 --epochs 80 --data ./data/VisDrone.yaml --weights yolov5l.pt --hy data/hyps/hyp.VisDrone.yaml --cfg models/yolov5l-xs-tr-cbam-spp-bifpn.yaml --name v5l-xs

Description of TPH-yolov5 and citation

If you have any question, please discuss with me by sending email to [email protected]
If you find this code useful please cite:

@inproceedings{zhu2021tph,
  title={TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-captured Scenarios},
  author={Zhu, Xingkui and Lyu, Shuchang and Wang, Xu and Zhao, Qi},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={2778--2788},
  year={2021}
}

References

Thanks to their great works

TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-Captured Scenarios

Related tags

Overview

TPH-YOLOv5

Install

Convert labels

Inference

Ensemble

Train

Description of TPH-yolov5 and citation

References

Owner

cv516Buaa

Code and real data for the paper "Counterfactual Temporal Point Processes", available at arXiv.

Official implementation of "MetaSDF: Meta-learning Signed Distance Functions"

Spectral normalization (SN) is a widely-used technique for improving the stability and sample quality of Generative Adversarial Networks (GANs)

Semantic Edge Detection with Diverse Deep Supervision

TensorFlow GNN is a library to build Graph Neural Networks on the TensorFlow platform.

mlpack: a scalable C++ machine learning library --

This repository contains a pytorch implementation of "HeadNeRF: A Real-time NeRF-based Parametric Head Model (CVPR 2022)".

Guided Internet-delivered Cognitive Behavioral Therapy Adherence Forecasting

PyTorch implementation for our NeurIPS 2021 Spotlight paper "Long Short-Term Transformer for Online Action Detection".

Machine Learning with JAX Tutorials

Semantic segmentation models, datasets and losses implemented in PyTorch.

[PNAS2021] The neural architecture of language: Integrative modeling converges on predictive processing

A pytorch &keras implementation and demo of Fastformer.

Generate fine-tuning samples & Fine-tuning the model & Generate samples by transferring Note On

Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning

Emblaze - Interactive Embedding Comparison

An implementation of Geoffrey Hinton's paper "How to represent part-whole hierarchies in a neural network" in Pytorch.

Dynamic Multi-scale Filters for Semantic Segmentation (DMNet ICCV'2019)

Code for the ICML 2021 paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"

SSL_SLAM2: Lightweight 3-D Localization and Mapping for Solid-State LiDAR (mapping and localization separated) ICRA 2021