A Fast and Accurate One-Stage Approach to Visual Grounding, ICCV 2019 (Oral)

Overview

One-Stage Visual Grounding

***** New: Our recent work on One-stage VG is available at ReSC.*****

A Fast and Accurate One-Stage Approach to Visual Grounding

by Zhengyuan Yang, Boqing Gong, Liwei Wang, Wenbing Huang, Dong Yu, and Jiebo Luo

IEEE International Conference on Computer Vision (ICCV), 2019, Oral

Introduction

We propose a simple, fast, and accurate one-stage approach to visual grounding. For more details, please refer to our paper.

Citation

@inproceedings{yang2019fast,
  title={A Fast and Accurate One-Stage Approach to Visual Grounding},
  author={Yang, Zhengyuan and Gong, Boqing and Wang, Liwei and Huang
    , Wenbing and Yu, Dong and Luo, Jiebo},
  booktitle={ICCV},
  year={2019}
}

Prerequisites

  • Python 3.5 (3.6 tested)
  • Pytorch 0.4.1
  • Others (Pytorch-Bert, OpenCV, Matplotlib, scipy, etc.)

Installation

  1. Clone the repository

    git clone https://github.com/zyang-ur/onestage_grounding.git
    
  2. Prepare the submodules and associated data

  • RefCOCO & ReferItGame Dataset: place the data or the soft link of dataset folder under ./ln_data/. We follow dataset structure DMS. To accomplish this, the download_dataset.sh bash script from DMS can be used.
    bash ln_data/download_data.sh --path ./ln_data
  • Flickr30K Entities Dataset: please download the images for the dataset on the website for the Flickr30K Entities Dataset and the original Flickr30k Dataset. Images should be placed under ./ln_data/Flickr30k/flickr30k_images.

  • Data index: download the generated index files and place them as the ./data folder. Availble at [Gdrive], [One Drive].

    rm -r data
    tar xf data.tar
    
  • Model weights: download the pretrained model of Yolov3 and place the file in ./saved_models.

    sh saved_models/yolov3_weights.sh
    

More pretrained models are availble in the performance table [Gdrive], [One Drive] and should also be placed in ./saved_models.

Training

  1. Train the model, run the code under main folder. Using flag --lstm to access lstm encoder, Bert is used as the default. Using flag --light to access the light model.

    python train_yolo.py --data_root ./ln_data/ --dataset referit \
      --gpu gpu_id --batch_size 32 --resume saved_models/lstm_referit_model.pth.tar \
      --lr 1e-4 --nb_epoch 100 --lstm
    
  2. Evaluate the model, run the code under main folder. Using flag --test to access test mode.

    python train_yolo.py --data_root ./ln_data/ --dataset referit \
      --gpu gpu_id --resume saved_models/lstm_referit_model.pth.tar \
      --lstm --test
    
  3. Visulizations. Flag --save_plot will save visulizations.

Performance and Pre-trained Models

Please check the detailed experiment settings in our paper.

Dataset Ours-LSTM Performance ([email protected]) Ours-Bert Performance ([email protected])
ReferItGame Gdrive 58.76 Gdrive 59.30
Flickr30K Entities One Drive 67.62 One Drive 68.69
RefCOCO val: 73.66 val: 72.05
testA: 75.78 testA: 74.81
testB: 71.32 testB: 67.59

Credits

Part of the code or models are from DMS, MAttNet, Yolov3 and Pytorch-yolov3.

Owner
Zhengyuan Yang
Zhengyuan Yang
Official Implementation of Swapping Autoencoder for Deep Image Manipulation (NeurIPS 2020)

Swapping Autoencoder for Deep Image Manipulation Taesung Park, Jun-Yan Zhu, Oliver Wang, Jingwan Lu, Eli Shechtman, Alexei A. Efros, Richard Zhang UC

449 Dec 27, 2022
BrainGNN - A deep learning model for data-driven discovery of functional connectivity

A deep learning model for data-driven discovery of functional connectivity https://doi.org/10.3390/a14030075 Usman Mahmood, Zengin Fu, Vince D. Calhou

Usman Mahmood 3 Aug 28, 2022
A minimalist implementation of score-based diffusion model

sdeflow-light This is a minimalist codebase for training score-based diffusion models (supporting MNIST and CIFAR-10) used in the following paper "A V

Chin-Wei Huang 89 Dec 20, 2022
Code for paper "Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs"

This is the codebase for the paper: Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs Directory Structur

Peter Hase 19 Aug 21, 2022
Code for the paper "Multi-task problems are not multi-objective"

Multi-Task problems are not multi-objective This is the code for the paper "Multi-Task problems are not multi-objective" in which we show that the com

Michael Ruchte 5 Aug 19, 2022
Lucid Sonic Dreams syncs GAN-generated visuals to music.

Lucid Sonic Dreams Lucid Sonic Dreams syncs GAN-generated visuals to music. By default, it uses NVLabs StyleGAN2, with pre-trained models lifted from

731 Jan 02, 2023
This is the official repository of XVFI (eXtreme Video Frame Interpolation)

XVFI This is the official repository of XVFI (eXtreme Video Frame Interpolation), https://arxiv.org/abs/2103.16206 Last Update: 20210607 We provide th

Jihyong Oh 195 Dec 29, 2022
Source code for CVPR2022 paper "Abandoning the Bayer-Filter to See in the Dark"

Abandoning the Bayer-Filter to See in the Dark (CVPR 2022) Paper: https://arxiv.org/abs/2203.04042 (Arxiv version) This code includes the training and

74 Dec 15, 2022
A Japanese Medical Information Extraction Toolkit

JaMIE: a Japanese Medical Information Extraction toolkit Joint Japanese Medical Problem, Modality and Relation Recognition The Train/Test phrases requ

7 Dec 12, 2022
(CVPR 2021) PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds

PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds by Mutian Xu*, Runyu Ding*, Hengshuang Zhao, and Xiaojuan Qi. Int

CVMI Lab 228 Dec 25, 2022
Customer-Transaction-Analysis - This analysis is based on a synthesised transaction dataset containing 3 months worth of transactions for 100 hypothetical customers.

Customer-Transaction-Analysis - This analysis is based on a synthesised transaction dataset containing 3 months worth of transactions for 100 hypothetical customers. It contains purchases, recurring

Ayodeji Yekeen 1 Jan 01, 2022
RIFE - Real-Time Intermediate Flow Estimation for Video Frame Interpolation

RIFE - Real-Time Intermediate Flow Estimation for Video Frame Interpolation YouTube | BiliBili 16X interpolation results from two input images: Introd

旷视天元 MegEngine 28 Dec 09, 2022
Pytorch implementation for RelTransformer

RelTransformer Our Architecture This is a Pytorch implementation for RelTransformer The implementation for Evaluating on VG200 can be found here Requi

Vision CAIR Research Group, KAUST 21 Nov 22, 2022
Learning to Draw: Emergent Communication through Sketching

Learning to Draw: Emergent Communication through Sketching This is the official code for the paper "Learning to Draw: Emergent Communication through S

19 Jul 22, 2022
An optimization and data collection toolbox for convenient and fast prototyping of computationally expensive models.

An optimization and data collection toolbox for convenient and fast prototyping of computationally expensive models. Hyperactive: is very easy to lear

Simon Blanke 422 Jan 04, 2023
KGDet: Keypoint-Guided Fashion Detection (AAAI 2021)

KGDet: Keypoint-Guided Fashion Detection (AAAI 2021) This is an official implementation of the AAAI-2021 paper "KGDet: Keypoint-Guided Fashion Detecti

Qian Shenhan 35 Dec 29, 2022
This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"

This is an official pytorch implementation of ActionCLIP: A New Paradigm for Video Action Recognition [arXiv] Overview Content Prerequisites Data Prep

268 Jan 09, 2023
A commany has recently introduced a new type of bidding, the average bidding, as an alternative to the bid given to the current maximum bidding

Business Problem A commany has recently introduced a new type of bidding, the average bidding, as an alternative to the bid given to the current maxim

Kübra Bilinmiş 1 Jan 15, 2022
Script that attempts to force M1 macs into RGB mode when used with monitors that are defaulting to YPbPr.

fix_m1_rgb Script that attempts to force M1 macs into RGB mode when used with monitors that are defaulting to YPbPr. No warranty provided for using th

Kevin Gao 116 Jan 01, 2023
CoRe: Contrastive Recurrent State-Space Models

CoRe: Contrastive Recurrent State-Space Models This code implements the CoRe model and reproduces experimental results found in Robust Robotic Control

Apple 21 Aug 11, 2022