A tutorial on training a DarkNet YOLOv4 model for the CrowdHuman dataset

Overview

YOLOv4 CrowdHuman Tutorial

This is a tutorial demonstrating how to train a YOLOv4 people detector using Darknet and the CrowdHuman dataset.

Table of contents

Setup

If you are going to train the model on Google Colab, you could skip this section and jump straight to Training on Google Colab.

Otherwise, to run training locally, you need to have a x86_64 PC with a decent GPU. For example, I mainly test the code in this repository using a desktop PC with:

  • NVIDIA GeForce RTX 2080 Ti
  • Ubuntu 18.04.5 LTS (x86_64)
    • CUDA 10.2
    • cuDNN 8.0.1

In addition, you should have OpenCV (including python3 "cv2" module) installed properly on the local PC since both the data preparation code and "darknet" would require it.

Preparing training data

For training on the local PC, I use a "608x608" yolov4 model as example. Note that I use python3 exclusively in this tutorial (python2 might not work). Follow these steps to prepare the "CrowdHuman" dataset for training the yolov4 model.

  1. Clone this repository.

    $ cd ${HOME}/project
    $ git clone https://github.com/jkjung-avt/yolov4_crowdhuman
  2. Run the "prepare_data.sh" script in the "data/" subdirectory. It would download the "CrowdHuman" dataset, unzip train/val image files, and generate YOLO txt files necessary for the training. You could refer to data/README.md for more information about the dataset. You could further refer to How to train (to detect your custom objects) for an explanation of YOLO txt files.

    $ cd ${HOME}/project/yolov4_crowdhuman/data
    $ ./prepare_data.sh 608x608

    This step could take quite a while, depending on your internet speed. When it is done, all image files and ".txt" files for training would be in the "data/crowdhuman-608x608/" subdirectory. (If interested, you could do python3 verify_txts.py 608x608 to verify the generated txt files.)

    This tutorial is for training the yolov4 model to detect 2 classes of object: "head" (0) and "person" (1), where the "person" class corresponds to "full body" (including occluded body portions) in the original "CrowdHuman" annotations. Take a look at "data/crowdhuman-608x608.data", "data/crowdhuman.names", and "data/crowdhuman-608x608/" to gain a better understanding of the data files that have been generated/prepared for the training.

    A sample jpg from the CrowdHuman dataset

Training on a local PC

Continuing from steps in the previous section, you'd be using the "darknet" framework to train the yolov4 model.

  1. Download and build "darknet" code. (NOTE to myself: Consider making "darknet" as a submodule and automate the build process?)

    $ cd ${HOME}/project/yolov4_crowdhuman
    $ git clone https://github.com/AlexeyAB/darknet.git
    $ cd darknet
    $ vim Makefile  # edit Makefile with your preferred editor (might not be vim)

    Modify the first few lines of the "Makefile" as follows. Please refer to How to compile on Linux (using make) for more information about these settings. Note that, in the example below, CUDA compute "75" is for RTX 2080 Ti and "61" is for GTX 1080. You might need to modify those based on the kind of GPU you are using.

    GPU=1
    CUDNN=1
    CUDNN_HALF=1
    OPENCV=1
    AVX=1
    OPENMP=1
    LIBSO=1
    ZED_CAMERA=0
    ZED_CAMERA_v2_8=0
    
    ......
    
    USE_CPP=0
    DEBUG=0
    
    ARCH= -gencode arch=compute_61,code=[sm_61,compute_61] \
          -gencode arch=compute_75,code=[sm_75,compute_75]
    
    ......
    

    Then do a make to build "darknet".

    $ make

    When it is done, you could (optionally) test the "darknet" executable as follows.

    ### download pre-trained yolov4 coco weights and test with the dog image
    $ wget https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v3_optimal/yolov4.weights \
           -q --show-progress --no-clobber
    $ ./darknet detector test cfg/coco.data cfg/yolov4-416.cfg yolov4.weights \
                              data/dog.jpg
  2. Then copy over all files needed for training and download the pre-trained weights ("yolov4.conv.137").

    $ cd ${HOME}/project/yolov4_crowdhuman
    $ ./prepare_training.sh 608x608
  3. Train the "yolov4-crowdhuman-608x608" model. Please refer to How to train with multi-GPU for how to fine-tune your training process. For example, you could specify -gpus 0,1,2,3 in order to use multiple GPUs to speed up training.

    $ cd ${HOME}/project/yolov4_crowdhuman/darknet
    $ ./darknet detector train data/crowdhuman-608x608.data \
                               cfg/yolov4-crowdhuman-608x608.cfg \
                               yolov4.conv.137 -map -gpus 0

    When the model is being trained, you could monitor its progress on the loss/mAP chart (since the -map option is used). Alternatively, if you are training on a remote PC via ssh, add the -dont_show -mjpeg_port 8090 option so that you could monitor the loss/mAP chart on a web browser (http://{IP address}:8090/).

    As a reference, training this "yolov4-crowdhuman-608x608" model with my RTX 2080 Ti GPU takes 17~18 hours.

    My sample loss/mAP chart of the "yolov4-crowdhuman-608x608" model

    Another example for the training of "yolov4-tiny-crowdhuman-608x608" model on RTX 2080 Ti GPU (< 3 hours).

    My sample loss/mAP chart of the "yolov4-tiny-crowdhuman-608x608" model

    And another one for the training of "yolov4-tiny-3l-crowdhuman-416x416" model on RTX 2080 Ti GPU (< 2 hours).

    My sample loss/mAP chart of the "yolov4-tiny-3l-crowdhuman-416x416" model

Testing the custom-trained yolov4 model

After you have trained the "yolov4-crowdhuman-608x608" model locally, you could test the "best" custom-trained model like this.

$ cd ${HOME}/project/yolov4_crowdhuman/darknet
$ ./darknet detector test data/crowdhuman-608x608.data \
                          cfg/yolov4-crowdhuman-608x608.cfg \
                          backup/yolov4-crowdhuman-608x608_best.weights \
                          data/crowdhuman-608x608/273275,4e9d1000623d182f.jpg \
                          -gpus 0

A sample prediction using the trained "yolov4-crowdhuman-608x608" model

In addition, you could verify mAP of the "best" model like this.

$ ./darknet detector map data/crowdhuman-608x608.data \
                         cfg/yolov4-crowdhuman-608x608.cfg \
                         backup/yolov4-crowdhuman-608x608_best.weights \
                         -gpus 0

For example, I got [email protected] = 0.814523 when I tested my own custom-trained "yolov4-crowdhuman-608x608" model.

 detections_count = 614280, unique_truth_count = 183365
class_id = 0, name = head, ap = 82.60%           (TP = 65119, FP = 14590)
class_id = 1, name = person, ap = 80.30%         (TP = 72055, FP = 11766)

 for conf_thresh = 0.25, precision = 0.84, recall = 0.75, F1-score = 0.79
 for conf_thresh = 0.25, TP = 137174, FP = 26356, FN = 46191, average IoU = 66.92 %

 IoU threshold = 50 %, used Area-Under-Curve for each unique Recall
 mean average precision ([email protected]) = 0.814523, or 81.45 %

Training on Google Colab

For doing training on Google Colab, I use a "416x416" yolov4 model as example. I have put all data processing and training commands into an IPython Notebook. So training the "yolov4-crowdhuman-416x416" model on Google Colab is just as simple as: (1) opening the Notebook on Google Colab, (2) mount your Google Drive, (3) run all cells in the Notebook.

A few words of caution before you begin running the Notebook on Google Colab:

  • Google Colab's GPU runtime is free of charge, but it is not unlimited nor guaranteed. Even though the Google Colab FAQ states that "virtual machines have maximum lifetimes that can be as much as 12 hours", I often saw my Colab GPU sessions getting disconnected after 7~8 hours of non-interactive use.

  • If you connect to GPU instances on Google Colab repeatedly and frequently, you could be temporarily locked out (not able to connect to GPU instances for a couple of days). So I'd suggest you to connect to a GPU runtime sparingly and only when needed, and to manually terminate the GPU sessions as soon as you no longer need them.

  • It is strongly advised that you read and mind Google Colab's Resource Limits.

Due to the 7~8 hour limit of GPU runtime mentioned above, you won't be able to train a large yolov4 model in a single session. That's the reason why I chose "416x416" model for this part of the tutorial. Here are the steps:

  1. Open yolov4_crowdhuman.ipynb. This IPython Notebook is on my personal Google Drive. You could review it, but you could not modify it.

  2. Make a copy of "yolov4_crowdhuman.ipynb" on your own Google Drive, by clicking "Files -> Save a copy in Drive" on the menu. You should use your own saved copy of the Notebook for the rest of the steps.

    Saving a copy of yolov4_crowdhuman.ipynb

  3. Follow the instructions in the Notebook to train the "yolov4-crowdhuman-416x416" model, i.e.

    • make sure the IPython Notebook has successfully connected to a GPU runtime,
    • mount your Google Drive (for saving training log and weights),
    • run all cells ("Runtime -> Run all" or "Runtime -> Restart and run all").

    You should have a good chance of finishing training the "yolov4-crowdhuman-416x416" model before the Colab session gets automatically disconnected (expired).

Instead of opening the Colab Notebook on my Google Drive, you could also go to your own Colab account and use "File -> Upload notebook" to upload yolov4_crowdhuman.ipynb directly.

Refer to my Custom YOLOv4 Model on Google Colab post for additional information about running the IPython Notebook.

Deploying onto Jetson Nano

To deploy the trained "yolov4-crowdhuman-416x416" model onto Jsetson Nano, I'd use my jkjung-avt/tensorrt_demos code to build/deploy it as a TensorRT engine. Here are the detailed steps:

  1. On the Jetson Nano, check out my jkjung-avt/tensorrt_demos code and make sure you are able to run the standard "yolov4-416" TensorRT engine without problem. Please refer to Demo #5: YOLOv4 for details.

    $ cd ${HOME}/project
    $ git clone https://github.com/jkjung-avt/tensorrt_demos.git
    ### Detailed steps omitted: install pycuda, download yolov4-416 model, yolo_to_onnx, onnx_to_tensorrt
    ### ......
    $ cd ${HOME}/project/tensorrt_demos
    $ python3 trt_yolo.py --image ${HOME}/Pictures/dog.jpg -m yolov4-416
  2. Download the "yolov4-crowdhuman-416x416" model. More specifically, get "yolov4-crowdhuman-416x416.cfg" from this repository and download "yolov4-crowdhuman-416x416_best.weights" file from your Google Drive. Rename the .weights file so that it matches the .cfg file.

    $ cd ${HOME}/project/tensorrt_demos/yolo
    $ wget https://raw.githubusercontent.com/jkjung-avt/yolov4_crowdhuman/master/cfg/yolov4-crowdhuman-416x416.cfg
    $ cp ${HOME}/Downloads/yolov4-crowdhuman-416x416_best.weights yolov4-crowdhuman-416x416.weights

    Then build the TensorRT (FP16) engine. Note the "-c 2" in the command-line option is for specifying that the model is for detecting 2 classes of objects.

    $ python3 yolo_to_onnx.py -c 2 -m yolov4-crowdhuman-416x416
    $ python3 onnx_to_tensorrt.py -c 2 -m yolov4-crowdhuman-416x416
  3. Test the TensorRT engine. For example, I tested it with the "Avengers: Infinity War" movie trailer. (You should download and test with your own images or videos.)

    $ cd ${HOME}/project/tensorrt_demos
    $ python3 trt_yolo.py --video ${HOME}/Videos/Infinity_War.mp4 \
                          -c 2 -m yolov4-crowdhuman-416x416

    (Click on the image below to see the whole video clip...)

    Testing with the Avengers: Infinity War trailer

Contributions

@philipp-schmidt: yolov4-tiny models and training charts

Owner
JK Jung
I am one of the NVIDIA Jetson Champions: https://developer.nvidia.com/embedded/community/jetson-champions
JK Jung
An implementation of a sequence to sequence neural network using an encoder-decoder

Keras implementation of a sequence to sequence model for time series prediction using an encoder-decoder architecture. I created this post to share a

Luke Tonin 195 Dec 17, 2022
PyTorch implementation of Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy

Anomaly Transformer in PyTorch This is an implementation of Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy. This pape

spencerbraun 160 Dec 19, 2022
SiT: Self-supervised vIsion Transformer

This repository contains the official PyTorch self-supervised pretraining, finetuning, and evaluation codes for SiT (Self-supervised image Transformer).

Sara Ahmed 275 Dec 28, 2022
MPLP: Metapath-Based Label Propagation for Heterogenous Graphs

MPLP: Metapath-Based Label Propagation for Heterogenous Graphs Results on MAG240M Here, we demonstrate the following performance on the MAG240M datase

Qiuying Peng 10 Jun 28, 2022
The implement of papar "Enhanced Graph Learning for Collaborative Filtering via Mutual Information Maximization"

SIGIR2021-EGLN The implement of paper "Enhanced Graph Learning for Collaborative Filtering via Mutual Information Maximization" Neural graph based Col

15 Dec 27, 2022
PaddleRobotics is an open-source algorithm library for robots based on Paddle, including open-source parts such as human-robot interaction, complex motion control, environment perception, SLAM positioning, and navigation.

简体中文 | English PaddleRobotics paddleRobotics是基于paddle的机器人开源算法库集,包括人机交互、复杂运动控制、环境感知、slam定位导航等开源算法部分。 人机交互 主动多模交互技术TFVT-HRI 主动多模交互技术是通过视觉、语音、触摸传感器等输入机器人

185 Dec 26, 2022
Locally Constrained Self-Attentive Sequential Recommendation

LOCKER This is the pytorch implementation of this paper: Locally Constrained Self-Attentive Sequential Recommendation. Zhankui He, Handong Zhao, Zhe L

Zhankui (Aaron) He 8 Jul 30, 2022
General purpose Slater-Koster tight-binding code for electronic structure calculations

tight-binder Introduction General purpose tight-binding code for electronic structure calculations based on the Slater-Koster approximation. The code

9 Dec 15, 2022
Image Data Augmentation in Keras

Image data augmentation is a technique that can be used to artificially expand the size of a training dataset by creating modified versions of images in the dataset.

Grace Ugochi Nneji 3 Feb 15, 2022
Revisiting Global Statistics Aggregation for Improving Image Restoration

Revisiting Global Statistics Aggregation for Improving Image Restoration Xiaojie Chu, Liangyu Chen, Chengpeng Chen, Xin Lu Paper: https://arxiv.org/pd

MEGVII Research 128 Dec 24, 2022
[ECCV'20] Convolutional Occupancy Networks

Convolutional Occupancy Networks Paper | Supplementary | Video | Teaser Video | Project Page | Blog Post This repository contains the implementation o

622 Dec 30, 2022
[ICLR 2021 Spotlight Oral] "Undistillable: Making A Nasty Teacher That CANNOT teach students", Haoyu Ma, Tianlong Chen, Ting-Kuei Hu, Chenyu You, Xiaohui Xie, Zhangyang Wang

Undistillable: Making A Nasty Teacher That CANNOT teach students "Undistillable: Making A Nasty Teacher That CANNOT teach students" Haoyu Ma, Tianlong

VITA 71 Dec 28, 2022
Text Extraction Formulation + Feedback Loop for state-of-the-art WSD (EMNLP 2021)

ConSeC is a novel approach to Word Sense Disambiguation (WSD), accepted at EMNLP 2021. It frames WSD as a text extraction task and features a feedback loop strategy that allows the disambiguation of

Sapienza NLP group 36 Dec 13, 2022
Official implementation of "Variable-Rate Deep Image Compression through Spatially-Adaptive Feature Transform", ICCV 2021

Variable-Rate Deep Image Compression through Spatially-Adaptive Feature Transform This repository is the implementation of "Variable-Rate Deep Image C

Myungseo Song 47 Dec 13, 2022
基于Paddle框架的arcface复现

arcface-Paddle 基于Paddle框架的arcface复现 ArcFace-Paddle 本项目基于paddlepaddle框架复现ArcFace,并参加百度第三届论文复现赛,将在2021年5月15日比赛完后提供AIStudio链接~敬请期待 参考项目: InsightFace Padd

QuanHao Guo 16 Dec 15, 2022
A 3D Dense mapping backend library of SLAM based on taichi-Lang designed for the aerial swarm.

TaichiSLAM This project is a 3D Dense mapping backend library of SLAM based Taichi-Lang, designed for the aerial swarm. Intro Taichi is an efficient d

XuHao 230 Dec 19, 2022
PyTorch ,ONNX and TensorRT implementation of YOLOv4

PyTorch ,ONNX and TensorRT implementation of YOLOv4

4.2k Jan 01, 2023
Optimizers-visualized - Visualization of different optimizers on local minimas and saddle points.

Optimizers Visualized Visualization of how different optimizers handle mathematical functions for optimization. Contents Installation Usage Functions

Gautam J 1 Jan 01, 2022
Unofficial implementation of Perceiver IO: A General Architecture for Structured Inputs & Outputs

Perceiver IO Unofficial implementation of Perceiver IO: A General Architecture for Structured Inputs & Outputs Usage import torch from src.perceiver.

Timur Ganiev 111 Nov 15, 2022
Pytorch implementation of the paper: "SAPNet: Segmentation-Aware Progressive Network for Perceptual Contrastive Image Deraining"

SAPNet This repository contains the official Pytorch implementation of the paper: "SAPNet: Segmentation-Aware Progressive Network for Perceptual Contr

11 Oct 17, 2022