In this tutorial, you will perform inference across 10 well-known pre-trained object detectors and fine-tune on a custom dataset. Design and train your own object detector.

Overview

Object Detection

Object detection is a computer vision task for locating instances of predefined objects in images or videos.

In this tutorial, you will:

  • Perform inference with 10 well-known pre-trained object detectors Open In Colab
  • Fine tune object detectors on a custom dataset Open In Colab
  • Design and train your own object detector Open In Colab

This work is based on MMDetection: OpenMMLab detection toolbox and benchmark.

10 Object detectors

Detector Paper
YOLOF You Only Look One-level Feature (2021)
YOLOX Exceeding YOLO Series in 2021 (2021)
DETR End-to-End Object Detection with Transformers (2020)
Deformable DETR Deformable Transformers for End-to-End Object Detection (2021)
SparseR-CNN End-to-End Object Detection with Learnable Proposals (2020)
VarifocalNet An IoU-aware Dense Object Detector (2020)
PAA Probabilistic Anchor Assignment with IoU Prediction for Object Detection (2020)
SABL Side-Aware Boundary Localization for More Precise Object Detection (2020)
ATSS Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection (2019)
Double Heads Rethinking Classification and Localization for Object Detection (2019)

Perform inference

Open In Colab

Here is how to load a pretrained model, perfrom inference and vizualize the results.

model = init_detector(config, checkpoint, device='cuda:0')
result = inference_detector(model, img)
show_result_pyplot(model, img, result, title=m_name, score_thr=0.6)

res_yolof

And more examples ...

od_res_collage


Fine tune object detectors on a custome dataset

Open In Colab

We can select a pre-trained model, and fine-tune it on a cutome dataset. kitti_tiny is a tiny version of the well known KITTI dataset is used here as the custom dataset.

kitti_sample

While fine-tuning a pretrained model, we should select a small learning rate, monitor the loss and malke sure it is decreasing, etc. For object detection, the loss is comosed of 2 main components:

  • Classification loss
  • Bounding Box loss
- Epoch [1][10/50]	loss_cls: 0.4056, loss_bbox: 0.3570, loss: 0.7625
- Epoch [1][20/50]	loss_cls: 0.3316, loss_bbox: 0.3062, loss: 0.6378
- Epoch [1][30/50]	loss_cls: 0.5663, loss_bbox: 0.3815, loss: 0.9478
- Epoch [1][40/50]	loss_cls: 0.4196, loss_bbox: 0.3604, loss: 0.7801
- Epoch [1][50/50]	loss_cls: 0.4421, loss_bbox: 0.2568, loss: 0.6989
- Epoch [2][10/50]	loss_cls: 0.2546, loss_bbox: 0.3497, loss: 0.6043
- Epoch [2][20/50]	loss_cls: 0.2809, loss_bbox: 0.4048, loss: 0.6857
- Epoch [2][30/50]	loss_cls: 0.2418, loss_bbox: 0.3069, loss: 0.5487
- Epoch [2][40/50]	loss_cls: 0.2912, loss_bbox: 0.4422, loss: 0.7335
- Epoch [2][50/50]	loss_cls: 0.2818, loss_bbox: 0.2620, loss: 0.5438
- Epoch [3][10/50]	loss_cls: 0.2182, loss_bbox: 0.3185, loss: 0.5367
- Epoch [3][20/50]	loss_cls: 0.2711, loss_bbox: 0.4153, loss: 0.6864
- Epoch [3][30/50]	loss_cls: 0.2528, loss_bbox: 0.2959, loss: 0.5488
- Epoch [3][40/50]	loss_cls: 0.2249, loss_bbox: 0.3188, loss: 0.5437
- Epoch [3][50/50]	loss_cls: 0.2020, loss_bbox: 0.3191, loss: 0.5210
- Epoch [4][10/50]	loss_cls: 0.2135, loss_bbox: 0.3957, loss: 0.6091
- Epoch [4][20/50]	loss_cls: 0.1984, loss_bbox: 0.2902, loss: 0.4886
- Epoch [4][30/50]	loss_cls: 0.1694, loss_bbox: 0.2576, loss: 0.4270
- Epoch [4][40/50]	loss_cls: 0.2733, loss_bbox: 0.3334, loss: 0.6067
- Epoch [4][50/50]	loss_cls: 0.2013, loss_bbox: 0.3498, loss: 0.5512
- Epoch [5][10/50]	loss_cls: 0.2169, loss_bbox: 0.3491, loss: 0.5660
- Epoch [5][20/50]	loss_cls: 0.1879, loss_bbox: 0.2640, loss: 0.4518
- Epoch [5][30/50]	loss_cls: 0.1366, loss_bbox: 0.3346, loss: 0.4712
- Epoch [5][40/50]	loss_cls: 0.1981, loss_bbox: 0.2485, loss: 0.4467
- Epoch [5][50/50]	loss_cls: 0.2038, loss_bbox: 0.2929, loss: 0.4968
- Epoch [6][10/50]	loss_cls: 0.2363, loss_bbox: 0.3392, loss: 0.5755
- Epoch [6][20/50]	loss_cls: 0.1515, loss_bbox: 0.3114, loss: 0.4629
- Epoch [6][30/50]	loss_cls: 0.1903, loss_bbox: 0.3343, loss: 0.5246
- Epoch [6][40/50]	loss_cls: 0.1311, loss_bbox: 0.2742, loss: 0.4053
- Epoch [6][50/50]	loss_cls: 0.1533, loss_bbox: 0.3611, loss: 0.5145
- Epoch [7][10/50]	loss_cls: 0.1615, loss_bbox: 0.2453, loss: 0.4067
- Epoch [7][20/50]	loss_cls: 0.1341, loss_bbox: 0.2708, loss: 0.4049
- Epoch [7][30/50]	loss_cls: 0.1876, loss_bbox: 0.4118, loss: 0.5994
- Epoch [7][40/50]	loss_cls: 0.1071, loss_bbox: 0.1954, loss: 0.3026
- Epoch [7][50/50]	loss_cls: 0.2015, loss_bbox: 0.2969, loss: 0.4985
- Epoch [8][10/50]	loss_cls: 0.1945, loss_bbox: 0.2751, loss: 0.4695
- Epoch [8][20/50]	loss_cls: 0.1245, loss_bbox: 0.1816, loss: 0.3061
- Epoch [8][30/50]	loss_cls: 0.1388, loss_bbox: 0.3588, loss: 0.4976
- Epoch [8][40/50]	loss_cls: 0.1417, loss_bbox: 0.2950, loss: 0.4367
- Epoch [8][50/50]	loss_cls: 0.1369, loss_bbox: 0.2684, loss: 0.4053
- Epoch [9][10/50]	loss_cls: 0.1295, loss_bbox: 0.2353, loss: 0.3648
- Epoch [9][20/50]	loss_cls: 0.1349, loss_bbox: 0.2023, loss: 0.3372

Results of the fine-tuned model od_ft_res02 od_ft_res03 od_ft_res01


Design and train your own object detector

Open In Colab

Here, we desing, inspect and train a custome object detection model. First we start with YOLOF detector and then make the following updates:

  1. Backbone: Replace ResNet50 -> Pyramid Vision Transformer (PVT)
# clear the defualt backbone
cfg.model.backbone.clear() 
# add the new backbone
cfg.model.backbone.type='PyramidVisionTransformer'
cfg.model.backbone.num_layers=[2, 2, 2, 2]
cfg.model.backbone.init_cfg=dict(checkpoint='https://github.com/whai362/PVT/releases/download/v2/pvt_tiny.pth')
cfg.model.backbone.out_indices=(3, )
  1. Neck: Replace DilatedEncoder -> Feature Pyramid Network (FPN)
# clear the defualt neck
cfg.model.neck.clear() 

# add the new neck
cfg.model.neck.type='FPN'
cfg.model.neck.in_channels=[512]
cfg.model.neck.out_channels=128
cfg.model.neck.num_outs=1
  1. Head: Update YOLOF head -> num_classes = 3, and input channels
cfg.model.bbox_head.num_classes=3
cfg.model.bbox_head.in_channels=128

Train on KITTI tiny

- Epoch [1][10/50]loss_cls: 11.7010, loss_bbox: 1.2835, loss: 12.9846
- Epoch [1][20/50]loss_cls: 1.5840, loss_bbox: 1.2059, loss: 2.7900
- Epoch [1][30/50]loss_cls: 1.9305, loss_bbox: 1.1042, loss: 3.0348
- Epoch [1][40/50]loss_cls: 1.9289, loss_bbox: 1.1813, loss: 3.1102
- Epoch [1][50/50]loss_cls: 1.5479, loss_bbox: 0.8640, loss: 2.4120
- Epoch [2][10/50]loss_cls: 1.7256, loss_bbox: 1.1628, loss: 2.8884
- Epoch [2][20/50]loss_cls: 1.4963, loss_bbox: 1.1485, loss: 2.6448
- Epoch [2][30/50]loss_cls: 1.5202, loss_bbox: 0.9300, loss: 2.4502
- Epoch [2][40/50]loss_cls: 1.3903, loss_bbox: 1.2719, loss: 2.6623
- Epoch [2][50/50]loss_cls: 1.4891, loss_bbox: 1.0604, loss: 2.5496
- Epoch [3][10/50]loss_cls: 1.2993, loss_bbox: 1.0748, loss: 2.3741
- Epoch [3][20/50]loss_cls: 1.2108, loss_bbox: 1.0574, loss: 2.2682
- Epoch [3][30/50]loss_cls: 1.1432, loss_bbox: 1.1037, loss: 2.2469
- Epoch [3][40/50]loss_cls: 1.0749, loss_bbox: 0.9834, loss: 2.0583
- Epoch [3][50/50]loss_cls: 1.0111, loss_bbox: 1.0372, loss: 2.0483
- Epoch [4][10/50]loss_cls: 0.8880, loss_bbox: 1.3213, loss: 2.2093
- Epoch [4][20/50]loss_cls: 0.9726, loss_bbox: 1.0892, loss: 2.0618
- Epoch [4][30/50]loss_cls: 0.9695, loss_bbox: 0.7829, loss: 1.7524
- Epoch [4][40/50]loss_cls: 0.8184, loss_bbox: 1.0882, loss: 1.9067
- Epoch [4][50/50]loss_cls: 0.8585, loss_bbox: 1.1519, loss: 2.0104
- Epoch [5][10/50]loss_cls: 0.7565, loss_bbox: 1.0927, loss: 1.8492
- Epoch [5][20/50]loss_cls: 0.8421, loss_bbox: 0.9804, loss: 1.8226
- Epoch [5][30/50]loss_cls: 0.7459, loss_bbox: 0.9499, loss: 1.6958
- Epoch [5][40/50]loss_cls: 0.6630, loss_bbox: 0.9623, loss: 1.6253
- Epoch [5][50/50]loss_cls: 0.8257, loss_bbox: 1.0594, loss: 1.8851
- Epoch [6][10/50]loss_cls: 0.7017, loss_bbox: 1.1119, loss: 1.8136
- Epoch [6][20/50]loss_cls: 0.7116, loss_bbox: 1.0703, loss: 1.7819
- Epoch [6][30/50]loss_cls: 0.7036, loss_bbox: 1.0633, loss: 1.7669
- Epoch [6][40/50]loss_cls: 0.6957, loss_bbox: 1.1699, loss: 1.8656
- Epoch [6][50/50]loss_cls: 0.6945, loss_bbox: 1.0165, loss: 1.7111
- Epoch [7][10/50]loss_cls: 0.6606, loss_bbox: 0.9504, loss: 1.6110
- Epoch [7][20/50]loss_cls: 0.6879, loss_bbox: 1.0412, loss: 1.7292
- Epoch [7][30/50]loss_cls: 0.6921, loss_bbox: 1.2121, loss: 1.9042
- Epoch [7][40/50]loss_cls: 0.6256, loss_bbox: 0.9307, loss: 1.5562
- Epoch [7][50/50]loss_cls: 0.6127, loss_bbox: 1.1764, loss: 1.7891
- Epoch [8][10/50]loss_cls: 0.5272, loss_bbox: 1.1579, loss: 1.6851
- Epoch [8][20/50]loss_cls: 0.6060, loss_bbox: 0.9147, loss: 1.5207
- Epoch [8][30/50]loss_cls: 0.6159, loss_bbox: 1.0704, loss: 1.6863
- Epoch [8][40/50]loss_cls: 0.7068, loss_bbox: 1.0493, loss: 1.7561
- Epoch [8][50/50]loss_cls: 0.6400, loss_bbox: 0.9694, loss: 1.6094
- Epoch [9][10/50]loss_cls: 0.6859, loss_bbox: 1.0887, loss: 1.7745
- Epoch [9][20/50]loss_cls: 0.5328, loss_bbox: 0.7794, loss: 1.3123

See how the loss is decreasing.


Quick review

YOLOF: You Only Look One-level Feature (2021)

YOLOF This paper revisits feature pyramids networks (FPN) for one-stage detectors and points out that the success of FPN is due to its divide-and-conquer solution to the optimization problem in object detection rather than multi-scale feature fusion. From the perspective of optimization, we introduce an alternative way to address the problem instead of adopting the complex feature pyramids - {utilizing only one-level feature for detection}. Based on the simple and efficient solution, we present You Only Look One-level Feature (YOLOF). In our method, two key components, Dilated Encoder and Uniform Matching, are proposed and bring considerable improvements. Extensive experiments on the COCO benchmark prove the effectiveness of the proposed model. Our YOLOF achieves comparable results with its feature pyramids counterpart RetinaNet while being 2.5× faster. Without transformer layers, YOLOF can match the performance of DETR in a single-level feature manner with 7× less training epochs. With an image size of 608×608, YOLOF achieves 44.3 mAP running at 60 fps on 2080Ti, which is 13% faster than YOLOv4.

YOLOX: Exceeding YOLO Series in 2021 (2021)

YOLOX In this report, we present some experienced improvements to YOLO series, forming a new high-performance detector -- YOLOX. We switch the YOLO detector to an anchor-free manner and conduct other advanced detection techniques, i.e., a decoupled head and the leading label assignment strategy SimOTA to achieve state-of-the-art results across a large scale range of models: For YOLO-Nano with only 0.91M parameters and 1.08G FLOPs, we get 25.3% AP on COCO, surpassing NanoDet by 1.8% AP; for YOLOv3, one of the most widely used detectors in industry, we boost it to 47.3% AP on COCO, outperforming the current best practice by 3.0% AP; for YOLOX-L with roughly the same amount of parameters as YOLOv4-CSP, YOLOv5-L, we achieve 50.0% AP on COCO at a speed of 68.9 FPS on Tesla V100, exceeding YOLOv5-L by 1.8% AP. Further, we won the 1st Place on Streaming Perception Challenge (Workshop on Autonomous Driving at CVPR 2021) using a single YOLOX-L model. We hope this report can provide useful experience for developers and researchers in practical scenes, and we also provide deploy versions with ONNX, TensorRT, NCNN, and Openvino supported.

DETR: End-to-End Object Detection with Transformers (2020)

DETR We present a new method that views object detection as a direct set prediction problem. Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components like a non-maximum suppression procedure or anchor generation that explicitly encode our prior knowledge about the task. The main ingredients of the new framework, called DEtection TRansformer or DETR, are a set-based global loss that forces unique predictions via bipartite matching, and a transformer encoder-decoder architecture. Given a fixed small set of learned object queries, DETR reasons about the relations of the objects and the global image context to directly output the final set of predictions in parallel. The new model is conceptually simple and does not require a specialized library, unlike many other modern detectors. DETR demonstrates accuracy and run-time performance on par with the well-established and highly-optimized Faster RCNN baseline on the challenging COCO object detection dataset. Moreover, DETR can be easily generalized to produce panoptic segmentation in a unified manner. We show that it significantly outperforms competitive baselines.

Deformable DETR: Deformable Transformers for End-to-End Object Detection (2021)

DDETR DETR has been recently proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance. However, it suffers from slow convergence and limited feature spatial resolution, due to the limitation of Transformer attention modules in processing image feature maps. To mitigate these issues, we proposed Deformable DETR, whose attention modules only attend to a small set of key sampling points around a reference. Deformable DETR can achieve better performance than DETR (especially on small objects) with 10 times less training epochs. Extensive experiments on the COCO benchmark demonstrate the effectiveness of our approach.

SparseR-CNN: End-to-End Object Detection with Learnable Proposals (2020)

SparseRCNN We present Sparse R-CNN, a purely sparse method for object detection in images. Existing works on object detection heavily rely on dense object candidates, such as k anchor boxes pre-defined on all grids of image feature map of size H×W. In our method, however, a fixed sparse set of learned object proposals, total length of N, are provided to object recognition head to perform classification and location. By eliminating HWk (up to hundreds of thousands) hand-designed object candidates to N (e.g. 100) learnable proposals, Sparse R-CNN completely avoids all efforts related to object candidates design and many-to-one label assignment. More importantly, final predictions are directly output without non-maximum suppression post-procedure. Sparse R-CNN demonstrates accuracy, run-time and training convergence performance on par with the well-established detector baselines on the challenging COCO dataset, e.g., achieving 45.0 AP in standard 3× training schedule and running at 22 fps using ResNet-50 FPN model. We hope our work could inspire re-thinking the convention of dense prior in object detectors.

VarifocalNet: An IoU-aware Dense Object Detector (2020)

VarifocalNet Accurately ranking the vast number of candidate detections is crucial for dense object detectors to achieve high performance. Prior work uses the classification score or a combination of classification and predicted localization scores to rank candidates. However, neither option results in a reliable ranking, thus degrading detection performance. In this paper, we propose to learn an Iou-aware Classification Score (IACS) as a joint representation of object presence confidence and localization accuracy. We show that dense object detectors can achieve a more accurate ranking of candidate detections based on the IACS. We design a new loss function, named Varifocal Loss, to train a dense object detector to predict the IACS, and propose a new star-shaped bounding box feature representation for IACS prediction and bounding box refinement. Combining these two new components and a bounding box refinement branch, we build an IoU-aware dense object detector based on the FCOS+ATSS architecture, that we call VarifocalNet or VFNet for short. Extensive experiments on MS COCO show that our VFNet consistently surpasses the strong baseline by ∼2.0 AP with different backbones. Our best model VFNet-X-1200 with Res2Net-101-DCN achieves a single-model single-scale AP of 55.1 on COCO test-dev, which is state-of-the-art among various object detectors.

PAA: Probabilistic Anchor Assignment with IoU Prediction for Object Detection (2020)

paa In object detection, determining which anchors to assign as positive or negative samples, known as anchor assignment, has been revealed as a core procedure that can significantly affect a model's performance. In this paper we propose a novel anchor assignment strategy that adaptively separates anchors into positive and negative samples for a ground truth bounding box according to the model's learning status such that it is able to reason about the separation in a probabilistic manner. To do so we first calculate the scores of anchors conditioned on the model and fit a probability distribution to these scores. The model is then trained with anchors separated into positive and negative samples according to their probabilities. Moreover, we investigate the gap between the training and testing objectives and propose to predict the Intersection-over-Unions of detected boxes as a measure of localization quality to reduce the discrepancy. The combined score of classification and localization qualities serving as a box selection metric in non-maximum suppression well aligns with the proposed anchor assignment strategy and leads significant performance improvements. The proposed methods only add a single convolutional layer to RetinaNet baseline and does not require multiple anchors per location, so are efficient. Experimental results verify the effectiveness of the proposed methods. Especially, our models set new records for single-stage detectors on MS COCO test-dev dataset with various backbones.

SABL: Side-Aware Boundary Localization for More Precise Object Detection (2020)

SABL Current object detection frameworks mainly rely on bounding box regression to localize objects. Despite the remarkable progress in recent years, the precision of bounding box regression remains unsatisfactory, hence limiting performance in object detection. We observe that precise localization requires careful placement of each side of the bounding box. However, the mainstream approach, which focuses on predicting centers and sizes, is not the most effective way to accomplish this task, especially when there exists displacements with large variance between the anchors and the targets. In this paper, we propose an alternative approach, named as Side-Aware Boundary Localization (SABL), where each side of the bounding box is respectively localized with a dedicated network branch. To tackle the difficulty of precise localization in the presence of displacements with large variance, we further propose a two-step localization scheme, which first predicts a range of movement through bucket prediction and then pinpoints the precise position within the predicted bucket. We test the proposed method on both two-stage and single-stage detection frameworks. Replacing the standard bounding box regression branch with the proposed design leads to significant improvements on Faster R-CNN, RetinaNet, and Cascade R-CNN, by 3.0%, 1.7%, and 0.9%, respectively.

ATSS: Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection (2019)

ATSS Object detection has been dominated by anchor-based detectors for several years. Recently, anchor-free detectors have become popular due to the proposal of FPN and Focal Loss. In this paper, we first point out that the essential difference between anchor-based and anchor-free detection is actually how to define positive and negative training samples, which leads to the performance gap between them. If they adopt the same definition of positive and negative samples during training, there is no obvious difference in the final performance, no matter regressing from a box or a point. This shows that how to select positive and negative training samples is important for current object detectors. Then, we propose an Adaptive Training Sample Selection (ATSS) to automatically select positive and negative samples according to statistical characteristics of object. It significantly improves the performance of anchor-based and anchor-free detectors and bridges the gap between them. Finally, we discuss the necessity of tiling multiple anchors per location on the image to detect objects. Extensive experiments conducted on MS COCO support our aforementioned analysis and conclusions. With the newly introduced ATSS, we improve state-of-the-art detectors by a large margin to 50.7% AP without introducing any overhead.

Double Heads: Rethinking Classification and Localization for Object Detection (2019)

![dhs](images/Double Heads.png) Two head structures (i.e. fully connected head and convolution head) have been widely used in R-CNN based detectors for classification and localization tasks. However, there is a lack of understanding of how does these two head structures work for these two tasks. To address this issue, we perform a thorough analysis and find an interesting fact that the two head structures have opposite preferences towards the two tasks. Specifically, the fully connected head (fc-head) is more suitable for the classification task, while the convolution head (conv-head) is more suitable for the localization task. Furthermore, we examine the output feature maps of both heads and find that fc-head has more spatial sensitivity than conv-head. Thus, fc-head has more capability to distinguish a complete object from part of an object, but is not robust to regress the whole object. Based upon these findings, we propose a Double-Head method, which has a fully connected head focusing on classification and a convolution head for bounding box regression. Without bells and whistles, our method gains +3.5 and +2.8 AP on MS COCO dataset from Feature Pyramid Network (FPN) baselines with ResNet-50 and ResNet-101 backbones, respectively.

Owner
Ibrahim Sobh
Ibrahim Sobh
A library to inspect itermediate layers of PyTorch models.

A library to inspect itermediate layers of PyTorch models. Why? It's often the case that we want to inspect intermediate layers of a model without mod

archinet.ai 380 Dec 28, 2022
An experiment on the performance of homemade Q-learning AIs in Agar.io depending on their state representation and available actions

Agar.io_Q-Learning_AI An experiment on the performance of homemade Q-learning AIs in Agar.io depending on their state representation and available act

1 Jun 09, 2022
Robust fine-tuning of zero-shot models

Robust fine-tuning of zero-shot models This repository contains code for the paper Robust fine-tuning of zero-shot models by Mitchell Wortsman*, Gabri

224 Dec 29, 2022
Deep learning models for classification of 15 common weeds in the southern U.S. cotton production systems.

CottonWeeds Deep learning models for classification of 15 common weeds in the southern U.S. cotton production systems. requirements pytorch torchsumma

Dong Chen 8 Jun 07, 2022
A system used to detect whether a person is wearing a medical mask or not.

Mask_Detection_System A system used to detect whether a person is wearing a medical mask or not. To open the program, please follow these steps: Make

Mohamed Emad 0 Nov 17, 2022
A scikit-learn compatible neural network library that wraps PyTorch

A scikit-learn compatible neural network library that wraps PyTorch. Resources Documentation Source Code Examples To see more elaborate examples, look

4.9k Dec 31, 2022
The code for Bi-Mix: Bidirectional Mixing for Domain Adaptive Nighttime Semantic Segmentation

BiMix The code for Bi-Mix: Bidirectional Mixing for Domain Adaptive Nighttime Semantic Segmentation arxiv Framework: visualization results: Requiremen

stanley 18 Sep 18, 2022
Demonstration of transfer of knowledge and generalization with distillation

Distilling-the-Knowledge-in-a-Neural-Network This is an implementation of a part of the paper "Distilling the Knowledge in a Neural Network" (https://

26 Nov 25, 2022
Fast convergence of detr with spatially modulated co-attention

Fast convergence of detr with spatially modulated co-attention Usage There are no extra compiled components in SMCA DETR and package dependencies are

peng gao 135 Dec 07, 2022
Minimal PyTorch implementation of Generative Latent Optimization from the paper "Optimizing the Latent Space of Generative Networks"

Minimal PyTorch implementation of Generative Latent Optimization This is a reimplementation of the paper Piotr Bojanowski, Armand Joulin, David Lopez-

Thomas Neumann 117 Nov 27, 2022
[ICLR 2022 Oral] F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization

F8Net Fixed-Point 8-bit Only Multiplication for Network Quantization (ICLR 2022 Oral) OpenReview | arXiv | PDF | Model Zoo | BibTex PyTorch implementa

Snap Research 76 Dec 13, 2022
Human pose estimation from video plays a critical role in various applications such as quantifying physical exercises, sign language recognition, and full-body gesture control.

Pose Detection Project Description: Human pose estimation from video plays a critical role in various applications such as quantifying physical exerci

Hassan Shahzad 2 Jan 17, 2022
Using machine learning to predict and analyze high and low reader engagement for New York Times articles posted to Facebook.

How The New York Times can increase Engagement on Facebook Using machine learning to understand characteristics of news content that garners "high" Fa

Jessica Miles 0 Sep 16, 2021
Code for a real-time distributed cooperative slam(RDC-SLAM) system for ROS compatible platforms.

RDC-SLAM This repository contains code for a real-time distributed cooperative slam(RDC-SLAM) system for ROS compatible platforms. The system takes in

40 Nov 19, 2022
A minimalist tool to display a network graph.

A tool to get a minimalist view of any architecture This tool has only be tested with the models included in this repo. Therefore, I can't guarantee t

Thibault Castells 1 Feb 11, 2022
[NeurIPS 2021] Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data

Near-Duplicate Video Retrieval with Deep Metric Learning This repository contains the Tensorflow implementation of the paper Near-Duplicate Video Retr

Liming Jiang 238 Nov 25, 2022
Models Supported: AlbUNet [18, 34, 50, 101, 152] (1D and 2D versions for Single and Multiclass Segmentation, Feature Extraction with supports for Deep Supervision and Guided Attention)

AlbUNet-1D-2D-Tensorflow-Keras This repository contains 1D and 2D Signal Segmentation Model Builder for AlbUNet and several of its variants developed

Sakib Mahmud 1 Nov 15, 2021
Pytorch implementation of Learning Rate Dropout.

Learning-Rate-Dropout Pytorch implementation of Learning Rate Dropout. Paper Link: https://arxiv.org/pdf/1912.00144.pdf Train ResNet-34 for Cifar10: r

42 Nov 25, 2022
The official code for PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization

PRIMER The official code for PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization. PRIMER is a pre-trained model for mu

AI2 111 Dec 18, 2022
A new play-and-plug method of controlling an existing generative model with conditioning attributes and their compositions.

Viz-It Data Visualizer Web-Application If I ask you where most of the data wrangler looses their time ? It is Data Overview and EDA. Presenting "Viz-I

NVIDIA Research Projects 66 Jan 01, 2023