[CVPR2021] Look before you leap: learning landmark features for one-stage visual grounding.

Overview

LBYL-Net

This repo implements paper Look Before You Leap: Learning Landmark Features For One-Stage Visual Grounding CVPR 2021.


Getting Started

Prerequisites

  • python 3.7
  • pytorch 10.0
  • cuda 10.0
  • gcc 4.92 or above

Installation

  1. Then clone the repo and install dependencies.
    git clone https://github.com/svip-lab/LBYLNet.git
    cd LBYLNet
    pip install requirements.txt 
  2. You also need to install our landmark feature convolution:
    cd ext
    git clone https://github.com/hbb1/landmarkconv.git
    cd landmarkconv/lib/layers
    python setup.py install --user
  3. We follow dataset structure DMS and FAOA. For convience, we have pack them togather, including ReferitGame, RefCOCO, RefCOCO+, RefCOCOg.
    bash data/refer/download_data.sh ./data/refer
  4. download the generated index files and place them in ./data/refer. Available at [Gdrive], [One Drive] .
  5. download the pretained model of YOLOv3.
    wget -P ext https://pjreddie.com/media/files/yolov3.weights

Training and Evaluation

By default, we use 2 gpus and batchsize 64 with DDP (distributed data-parallel). We have provided several configurations and training log for reproducing our results. If you want to use different hyperparameters or models, you may create configs for yourself. Here are examples:

  • For distributed training with gpus :

    CUDA_VISIBLE_DEVICES=0,1 python train.py lbyl_lstm_referit_batch64  --workers 8 --distributed --world_size 1  --dist_url "tcp://127.0.0.1:60006"
  • If you use single gpu or won't use distributed training (make sure to adjust the batchsize in the corresponding config file to match your devices):

    CUDA_VISIBLE_DEVICES=0, python train.py lbyl_lstm_referit_batch64  --workers 8
  • For evaluation:

    CUDA_VISIBLE_DEVICES=0, python evaluate.py lbyl_lstm_referit_batch64 --testiter 100 --split val

Trained Models

We provide the our retrained models with this re-organized codebase and provide their checkpoints and logs for reproducing the results. To use our trained models, download them from the [Gdrive] and save them into directory cache. Then the file path is expected to be <LBYLNet dir>/cache/nnet/<config>/<dataset>/<config>_100.pkl

Notice: The reproduced performances are occassionally higher or lower (within a reasonable range) than the results reported in the paper.

In this repo, we provide the peformance of our LBYL-Nets below. You can also find the details on <LBYLNet dir>/results and <LBYLNet dir>/logs.

  • Performance on ReferitGame ([email protected]).

    Dataset Langauge Split Papar Reproduce
    ReferitGame LSTM test 65.48 65.98
    BERT test 67.47 68.48
  • Performance on RefCOCO ([email protected]).

    Dataset Langauge Split Papar Reproduce
    RefCOCO LSTM
    testA 82.18 82.48
    testB 71.91 71.76
    BERT
    testA 82.91 82.82
    testB 74.15 72.82
  • Performance on RefCOCO+ ([email protected]).

    Dataset Langauge Split Papar Reproduce
    RefCOCO+ LSTM val 66.64 66.71
    testA 73.21 72.63
    testB 56.23 55.88
    BERT val 68.64 68.76
    testA 73.38 73.73
    testB 59.49 59.62
  • Performance on RefCOCOg ([email protected]).

    Dataset Langauge Split Papar Reproduce
    RefCOCOg LSTM val 58.72 60.03
    BERT val 62.70 63.20

Demo

We also provide demo scripts to test if the repo is corretly installed. After installing the repo and download the pretained weights, you should be able to use the LBYL-Net to ground your own images.

python demo.py

you can change the model, image or phrase in the demo.py. You will see the output image in imgs/demo_out.jpg.

#!/usr/bin/env python
import cv2
import torch
from core.test.test import _visualize
from core.groundors import Net 
# pick one model
cfg_file = "lbyl_bert_unc+_batch64"
detector = Net(cfg_file, iter=100)
# inference
image = cv2.imread('imgs/demo.jpeg')
phrase = 'the green gaint'
bbox = detector(image, phrase)
_visualize(image, pred_bbox=bbox, phrase=phrase, save_path='imgs/demo_out.jpg', color=(1, 174, 245), draw_phrase=True)

Input:

Output:


Acknowledgements

This repo is organized as CornerNet-Lite and the code is partially from FAOA (e.g. data preparation) and MAttNet (e.g. LSTM). We thank for their great works.


Citations:

If you use any part of this repo in your research, please cite our paper:

@InProceedings{huang2021look,
      title={Look Before You Leap: Learning Landmark Features for One-Stage Visual Grounding}, 
      author={Huang, Binbin and Lian, Dongze and Luo, Weixin and Gao, Shenghua},
      booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
      month = {June},
      year={2021},
}
Owner
SVIP Lab
ShanghaiTech Vision and Intelligent Perception Lab
SVIP Lab
「PyTorch Implementation of AnimeGANv2」を用いて、生成した顔画像を元の画像に上書きするデモ

AnimeGANv2-Face-Overlay-Demo PyTorch Implementation of AnimeGANv2を用いて、生成した顔画像を元の画像に上書きするデモです。

KazuhitoTakahashi 21 Oct 18, 2022
An official PyTorch implementation of the TKDE paper "Self-Supervised Graph Representation Learning via Topology Transformations".

Self-Supervised Graph Representation Learning via Topology Transformations This repository is the official PyTorch implementation of the following pap

Hsiang Gao 2 Oct 31, 2022
This is a work in progress reimplementation of Instant Neural Graphics Primitives

Neural Hash Encoding This is a work in progress reimplementation of Instant Neural Graphics Primitives Currently this can train an implicit representa

Penn 79 Sep 01, 2022
Implementation for Stankevičiūtė et al. "Conformal time-series forecasting", NeurIPS 2021.

Conformal time-series forecasting Implementation for Stankevičiūtė et al. "Conformal time-series forecasting", NeurIPS 2021. If you use our code in yo

Kamilė Stankevičiūtė 36 Nov 21, 2022
Object detection GUI based on PaddleDetection

PP-Tracking GUI界面测试版 本项目是基于飞桨开源的实时跟踪系统PP-Tracking开发的可视化界面 在PaddlePaddle中加入pyqt进行GUI页面研发,可使得整个训练过程可视化,并通过GUI界面进行调参,模型预测,视频输出等,通过多种类型的识别,简化整体预测流程。 GUI界面

杨毓栋 68 Jan 02, 2023
diablo2 resurrected loot filter

Only For Chinese and Traditional Chinese The filter only for Chinese and Traditional Chinese, i didn't change it for other language.Maybe you could mo

elmagnifico 249 Dec 04, 2022
Code release for NeurIPS 2020 paper "Co-Tuning for Transfer Learning"

CoTuning Official implementation for NeurIPS 2020 paper Co-Tuning for Transfer Learning. [News] 2021/01/13 The COCO 70 dataset used in the paper is av

THUML @ Tsinghua University 35 Sep 23, 2022
Implementation of paper "Self-supervised Learning on Graphs:Deep Insights and New Directions"

SelfTask-GNN A PyTorch implementation of "Self-supervised Learning on Graphs: Deep Insights and New Directions". [paper] In this paper, we first deepe

Wei Jin 85 Oct 13, 2022
Saeed Lotfi 28 Dec 12, 2022
Official codebase used to develop Vision Transformer, MLP-Mixer, LiT and more.

Big Vision This codebase is designed for training large-scale vision models on Cloud TPU VMs. It is based on Jax/Flax libraries, and uses tf.data and

Google Research 701 Jan 03, 2023
UnpNet - Rethinking 3-D LiDAR Point Cloud Segmentation(IEEE TNNLS)

UnpNet Citation Please cite the following paper if you use this repository in your reseach. @article {PMID:34914599, Title = {Rethinking 3-D LiDAR Po

Shijie Li 4 Jul 15, 2022
EGNN - Implementation of E(n)-Equivariant Graph Neural Networks, in Pytorch

EGNN - Pytorch Implementation of E(n)-Equivariant Graph Neural Networks, in Pytorch. May be eventually used for Alphafold2 replication. This

Phil Wang 259 Jan 04, 2023
PyTorch code of "SLAPS: Self-Supervision Improves Structure Learning for Graph Neural Networks"

SLAPS-GNN This repo contains the implementation of the model proposed in SLAPS: Self-Supervision Improves Structure Learning for Graph Neural Networks

60 Dec 22, 2022
KGDet: Keypoint-Guided Fashion Detection (AAAI 2021)

KGDet: Keypoint-Guided Fashion Detection (AAAI 2021) This is an official implementation of the AAAI-2021 paper "KGDet: Keypoint-Guided Fashion Detecti

Qian Shenhan 35 Dec 29, 2022
Official repository for the ISBI 2021 paper Transformer Assisted Convolutional Neural Network for Cell Instance Segmentation

SegPC-2021 This is the official repository for the ISBI 2021 paper Transformer Assisted Convolutional Neural Network for Cell Instance Segmentation by

Datascience IIT-ISM 13 Dec 14, 2022
Learning to Identify Top Elo Ratings with A Dueling Bandits Approach

Learning to Identify Top Elo Ratings We propose two algorithms MaxIn-Elo and MaxIn-mElo to solve the top players identification on the transitive and

2 Jan 14, 2022
This repository contains the accompanying code for Deep Virtual Markers for Articulated 3D Shapes, ICCV'21

Deep Virtual Markers This repository contains the accompanying code for Deep Virtual Markers for Articulated 3D Shapes, ICCV'21 Getting Started Get sa

KimHyomin 45 Oct 07, 2022
The 2nd place solution of 2021 google landmark retrieval on kaggle.

Leaderboard, taxonomy, and curated list of few-shot object detection papers.

229 Dec 13, 2022
5 Jan 05, 2023
retweet 4 satoshi ⚡️

rt4sat retweet 4 satoshi This bot is the codebase for https://twitter.com/rt4sat please feel free to create an issue if you saw any bugs basically thi

6 Sep 30, 2022