A state of the art of new lightweight YOLO model implemented by TensorFlow 2.

Last update: Dec 21, 2022

Overview

CSL-YOLO: A New Lightweight Object Detection System for Edge Computing

This project provides a SOTA level lightweight YOLO called "Cross-Stage Lightweight YOLO"(CSL-YOLO),

it is achieving better detection performance with only 43% FLOPs and 52% parameters than Tiny-YOLOv4.

Paper Link: https://arxiv.org/abs/2107.04829

Requirements

How to Get Started?

#Predict
python3 main.py -p cfg/predict_coco.cfg

#Train
python3 main.py -t cfg/train_coco.cfg

#Eval
python3 main.py -ce cfg/eval_coco.cfg

WebCam DEMO(on CPU)

This DEMO runs on a pure CPU environment, the CPU is I7-6600U(2.6Ghz~3.4Ghz), the model scale is 224x224, and the FPS is about 10.

Please execute the following script to get this DEMO, the "camera_idx" in the cfg file represents the camera number you specified.

#Camera DEMO
python3 main.py -d cfg/demo_coco.cfg

More Info

Change Model Scale

The model's default scale is 224x224, if you want to change the scale to 320~512,

please go to cfg/XXXX.cfg and change the following two parts:

# input_shape=[512,512,3]
# out_hw_list=[[64,64],[48,48],[32,32],[24,24],[16,16]]
# input_shape=[416,416,3]
# out_hw_list=[[52,52],[39,39],[26,26],[20,20],[13,13]]
# input_shape=[320,320,3]
# out_hw_list=[[40,40],[30,30],[20,20],[15,15],[10,10]]
input_shape=[224,224,3]
out_hw_list=[[28,28],[21,21],[14,14],[10,10],[7,7]]

weight_path=weights/224_nolog.hdf5

                         |
                         | 224 to 320
                         V
                         
# input_shape=[512,512,3]
# out_hw_list=[[64,64],[48,48],[32,32],[24,24],[16,16]]
# input_shape=[416,416,3]
# out_hw_list=[[52,52],[39,39],[26,26],[20,20],[13,13]]
input_shape=[320,320,3]
out_hw_list=[[40,40],[30,30],[20,20],[15,15],[10,10]]
# input_shape=[224,224,3]
# out_hw_list=[[28,28],[21,21],[14,14],[10,10],[7,7]]

weight_path=weights/320_nolog.hdf5

Fully Dataset

The entire MS-COCO data set is too large, here only a few pictures are stored for DEMO,

if you need complete data, please download on this page.

Our Data Format

We did not use the official format of MS-COCO, we expressed a bounding box as following:

[ left_top_x<float>, left_top_y<float>, w<float>, h<float>, confidence<float>, class<str> ]

The bounding boxes contained in a picture are represented by single json file.

For detailed format, please refer to the json file in "data/coco/train/json".

AP Performance on MS-COCO

For detailed COCO report, please refer to "mscoco_result".

TODOs

Improve the calculator script of FLOPs.
Using Focal Loss will cause overfitting, we need to explore the reasons.

A state of the art of new lightweight YOLO model implemented by TensorFlow 2.

Related tags

Overview

CSL-YOLO: A New Lightweight Object Detection System for Edge Computing

Requirements

How to Get Started?

WebCam DEMO(on CPU)

More Info

Change Model Scale

Fully Dataset

Our Data Format

AP Performance on MS-COCO

TODOs

Owner

Miles Zhang

Pop-Out Motion: 3D-Aware Image Deformation via Learning the Shape Laplacian (CVPR 2022)

Randomizes the warps in a stock pokeemerald repo.

Complex Answer Generation For Conversational Search Systems.

Out-of-Town Recommendation with Travel Intention Modeling (AAAI2021)

Definition of a business problem according to Wilson Lower Bound Score and Time Based Average Rating

Frequency Domain Image Translation: More Photo-realistic, Better Identity-preserving

[NeurIPS 2019] Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss

Shape-Adaptive Selection and Measurement for Oriented Object Detection

Official codes for the paper "Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech"

ICON: Implicit Clothed humans Obtained from Normals

Code for "Long-tailed Distribution Adaptation"

A scikit-learn-compatible module for estimating prediction intervals.

Code for ACL'2021 paper WARP 🌀 Word-level Adversarial ReProgramming

Official repository for: Continuous Control With Ensemble DeepDeterministic Policy Gradients

DecoupledNet is semantic segmentation system which using heterogeneous annotations

Notebooks, slides and dataset of the CorrelAid Machine Learning Winter School

A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning

A Planar RGB-D SLAM which utilizes Manhattan World structure to provide optimal camera pose trajectory while also providing a sparse reconstruction containing points, lines and planes, and a dense surfel-based reconstruction.

Research Artifact of USENIX Security 2022 Paper: Automated Side Channel Analysis of Media Software with Manifold Learning

NLG evaluation via Statistical Measures of Similarity: BaryScore, DepthScore, InfoLM