🏆 The 1st Place Submission to AICity Challenge 2021 Natural Language-Based Vehicle Retrieval Track (Alibaba-UTS submission)

Last update: Dec 29, 2022

Related tags

Deep Learning AIC2021-T5-CLV

Overview

AI City 2021: Connecting Language and Vision for Natural Language-Based Vehicle Retrieval

🏆 The 1st Place Submission to AICity Challenge 2021 Natural Language-Based Vehicle Retrieval Track (Alibaba-UTS submission)

We have two codebases. For the final submission, we conduct the feature ensemble, where features are from two codebases.

Part One is at here: https://github.com/ShuaiBai623/AIC2021-T5-CLV

Part Two is at here: https://github.com/layumi/NLP-AICity2021

Prepare

Preprocess the dataset to prepare frames, motion maps, NLP augmentation

scripts/extract_vdo_frms.py is a Python script that is used to extract frames.

scripts/get_motion_maps.py is a Python script that is used to get motion maps.

scripts/deal_nlpaug.py is a Python script that is used for NLP augmentation.

Download the pretrained models of Part One to checkpoints. The checkpoints can be found here. The best score of a single model on TestA is 0.1927 from motion_effb3_NOCLS_nlpaug_320.pth.

The directory structures in data and checkpoints are as follows：

.
├── checkpoints
│   ├── motion_effb2_1CLS_nlpaug_288.pth
│   ├── motion_effb3_NOCLS_nlpaug_320.pth
│   ├── motion_SE_3CLS_nonlpaug_288.pth
│   ├── motion_SE_NOCLS_nlpaug_288.pth
│   └── motion_SE_NOCLS_nonlpaug_288.pth
└── data
    ├── AIC21_Track5_NL_Retrieval
    │   ├── train
    │   └── validation
    ├── motion_map 
    ├── test-queries.json
    ├── test-queries_nlpaug.json    ## NLP augmentation (Refer to scripts/deal_nlpaug.py)
    ├── test-tracks.json
    ├── train.json
    ├── train_nlpaug.json
    ├── train-tracks.json
    ├── train-tracks_nlpaug.json    ## NLP augmentation (Refer to scripts/deal_nlpaug.py)
    ├── val.json
    └── val_nlpaug.json             ## NLP augmentation (Refer to scripts/deal_nlpaug.py)

Part One

Modify the data paths in config.py

Train

The configuration files are in configs.

CUDA_VISIBLE_DEVICES=0,1,2,3 python -u main.py --name your_experiment_name --config your_config_file |tee log

Test

Change the RESTORE_FROM in your configuration file.

python -u test.py --config your_config_file

Extract the visual and text embeddings. The extracted embeddings can be found here.

python -u test.py --config configs/motion_effb2_1CLS_nlpaug_288.yaml
python -u test.py --config configs/motion_SE_NOCLS_nlpaug_288.yaml
python -u test.py --config configs/motion_effb2_1CLS_nlpaug_288.yaml
python -u test.py --config configs/motion_SE_3CLS_nonlpaug_288.yaml
python -u test.py --config configs/motion_SE_NOCLS_nonlpaug_288.yaml

Part Two

Link

Submission

During the inference, we average all the frame features of the target in each track as track features, the embeddings of text descriptions are also averaged as the query features. The cosine distance is used for ranking as the final result.

Reproduce the best submission. ALL extracted embeddings are in the folder output:

python scripts/get_submit.py

🏆 The 1st Place Submission to AICity Challenge 2021 Natural Language-Based Vehicle Retrieval Track (Alibaba-UTS submission)

Related tags

Overview

AI City 2021: Connecting Language and Vision for Natural Language-Based Vehicle Retrieval

Prepare

Part One

Train

Test

Part Two

Submission

Friend Links：

Owner

NeuroGen: activation optimized image synthesis for discovery neuroscience

Official implementation of the article "Unsupervised JPEG Domain Adaptation For Practical Digital Forensics"

[arXiv'22] Panoptic NeRF: 3D-to-2D Label Transfer for Panoptic Urban Scene Segmentation

Use deep learning, genetic programming and other methods to predict stock and market movements

DeLag: Detecting Latency Degradation Patterns in Service-based Systems

Voxel-based Network for Shape Completion by Leveraging Edge Generation (ICCV 2021, oral)

This project uses Template Matching technique for object detecting by detection of template image over base image.

MDMM - Learning multi-domain multi-modality I2I translation

The code uses SegFormer for Semantic Segmentation on Drone Dataset.

Python3 / PyTorch implementation of the following paper: Fine-grained Semantics-aware Representation Enhancement for Self-supervisedMonocular Depth Estimation. ICCV 2021 (oral)

Code for 'Blockwise Sequential Model Learning for Partially Observable Reinforcement Learning' (AAAI 2022)

FairyTailor: Multimodal Generative Framework for Storytelling

Code for a seq2seq architecture with Bahdanau attention designed to map stereotactic EEG data from human brains to spectrograms, using the PyTorch Lightning.

This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"

Dcf-game-infrastructure-public - Contains all the components necessary to run a DC finals (attack-defense CTF) game from OOO

source code for https://arxiv.org/abs/2005.11248 "Accelerating Antimicrobial Discovery with Controllable Deep Generative Models and Molecular Dynamics"

Hierarchical Time Series Forecasting with a familiar API

Pytorch implementation of various High Dynamic Range (HDR) Imaging algorithms

Easy to use Python camera interface for NVIDIA Jetson

Official repo for AutoInt: Automatic Integration for Fast Neural Volume Rendering in CVPR 2021