SwinTransformer + OBBDet

The sixth place winning solution (6/220) in the track of Fine-grained Object Recognition in High-Resolution Optical Images, 2021 Gaofen Challenge on Automated High-Resolution Earth Observation Image Interpretation.

Members

Qi Ming, Junjie Song, Yunpeng Dong.

Solution

Off-line date augmentation
We use random combination of affine transformation, flip, scaling, optical distortion for data augmentation.
Multi-scale training and testing
The training images are resized into sizes of 600, 800, and 1024 for training and testing.
Strong backbone
Swin transformer is adopt in ORCNN and RoI Transformer for better performance.
Model ensemble
We have merged the results from RoI Transformer, ORCNN, S2ANet, and ReDet.
Lower confidence
Set the output threshold into 0.005.

Tried but didn't work

Soft-NMS.
Adjust NMS threshold.
Class-agnostic NMS.
Mosaic, and mix up for data augmentation.
Oversample the categories with fewer instances.
Train the detectors for specific classes with low AP.
Multi-scale training and testing on SwinTransformer-based detectors (even dropped by about 1% mAP).

The sixth place winning solution (6/220) in 2021 Gaofen Challenge.

Related tags

Overview

SwinTransformer + OBBDet

Members

Solution

Tried but didn't work

Detections

Owner

ming71

Unadversarial Examples: Designing Objects for Robust Vision

Python Implementation of the CoronaWarnApp (CWA) Event Registration

A Kitti Road Segmentation model implemented in tensorflow.

A machine learning library for spiking neural networks. Supports training with both torch and jax pipelines, and deployment to neuromorphic hardware.

App customer segmentation cohort rfm clustering

Official Code For TDEER: An Efficient Translating Decoding Schema for Joint Extraction of Entities and Relations (EMNLP2021)

Embeddinghub is a database built for machine learning embeddings.

A collection of random and hastily hacked together scripts for investigating EU-DCC

DLL: Direct Lidar Localization

Generate images from texts. In Russian. In PaddlePaddle

Unsupervised Learning of Video Representations using LSTMs

Binary Stochastic Neurons in PyTorch

A robotic arm that mimics hand movement through MediaPipe tracking.

Official code for paper Exemplar Based 3D Portrait Stylization.

Source code for 2021 ICCV paper "In-the-Wild Single Camera 3D Reconstruction Through Moving Water Surfaces"

ICCV2021 Paper: AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection

PyTorch implementation of SmoothGrad: removing noise by adding noise.

Code for our paper A Transformer-Based Feature Segmentation and Region Alignment Method For UAV-View Geo-Localization,

You Only 👀 One Sequence

10th place solution for Google Smartphone Decimeter Challenge at kaggle.