A Fast and Accurate One-Stage Approach to Visual Grounding, ICCV 2019 (Oral)

Overview

One-Stage Visual Grounding

***** New: Our recent work on One-stage VG is available at ReSC.*****

A Fast and Accurate One-Stage Approach to Visual Grounding

by Zhengyuan Yang, Boqing Gong, Liwei Wang, Wenbing Huang, Dong Yu, and Jiebo Luo

IEEE International Conference on Computer Vision (ICCV), 2019, Oral

Introduction

We propose a simple, fast, and accurate one-stage approach to visual grounding. For more details, please refer to our paper.

Citation

@inproceedings{yang2019fast,
  title={A Fast and Accurate One-Stage Approach to Visual Grounding},
  author={Yang, Zhengyuan and Gong, Boqing and Wang, Liwei and Huang
    , Wenbing and Yu, Dong and Luo, Jiebo},
  booktitle={ICCV},
  year={2019}
}

Prerequisites

  • Python 3.5 (3.6 tested)
  • Pytorch 0.4.1
  • Others (Pytorch-Bert, OpenCV, Matplotlib, scipy, etc.)

Installation

  1. Clone the repository

    git clone https://github.com/zyang-ur/onestage_grounding.git
    
  2. Prepare the submodules and associated data

  • RefCOCO & ReferItGame Dataset: place the data or the soft link of dataset folder under ./ln_data/. We follow dataset structure DMS. To accomplish this, the download_dataset.sh bash script from DMS can be used.
    bash ln_data/download_data.sh --path ./ln_data
  • Flickr30K Entities Dataset: please download the images for the dataset on the website for the Flickr30K Entities Dataset and the original Flickr30k Dataset. Images should be placed under ./ln_data/Flickr30k/flickr30k_images.

  • Data index: download the generated index files and place them as the ./data folder. Availble at [Gdrive], [One Drive].

    rm -r data
    tar xf data.tar
    
  • Model weights: download the pretrained model of Yolov3 and place the file in ./saved_models.

    sh saved_models/yolov3_weights.sh
    

More pretrained models are availble in the performance table [Gdrive], [One Drive] and should also be placed in ./saved_models.

Training

  1. Train the model, run the code under main folder. Using flag --lstm to access lstm encoder, Bert is used as the default. Using flag --light to access the light model.

    python train_yolo.py --data_root ./ln_data/ --dataset referit \
      --gpu gpu_id --batch_size 32 --resume saved_models/lstm_referit_model.pth.tar \
      --lr 1e-4 --nb_epoch 100 --lstm
    
  2. Evaluate the model, run the code under main folder. Using flag --test to access test mode.

    python train_yolo.py --data_root ./ln_data/ --dataset referit \
      --gpu gpu_id --resume saved_models/lstm_referit_model.pth.tar \
      --lstm --test
    
  3. Visulizations. Flag --save_plot will save visulizations.

Performance and Pre-trained Models

Please check the detailed experiment settings in our paper.

Dataset Ours-LSTM Performance ([email protected]) Ours-Bert Performance ([email protected])
ReferItGame Gdrive 58.76 Gdrive 59.30
Flickr30K Entities One Drive 67.62 One Drive 68.69
RefCOCO val: 73.66 val: 72.05
testA: 75.78 testA: 74.81
testB: 71.32 testB: 67.59

Credits

Part of the code or models are from DMS, MAttNet, Yolov3 and Pytorch-yolov3.

Owner
Zhengyuan Yang
Zhengyuan Yang
[CVPR 2022] Thin-Plate Spline Motion Model for Image Animation.

[CVPR2022] Thin-Plate Spline Motion Model for Image Animation Source code of the CVPR'2022 paper "Thin-Plate Spline Motion Model for Image Animation"

yoyo-nb 1.4k Dec 30, 2022
Face Mask Detector by live camera using tensorflow-keras, openCV and Python

Face Mask Detector 😷 by Live Camera Detecting masked or unmasked faces by live camera with percentange of mask occupation About Project: This an Arti

Karan Shingde 2 Apr 04, 2022
Neighbor2Seq: Deep Learning on Massive Graphs by Transforming Neighbors to Sequences

Neighbor2Seq: Deep Learning on Massive Graphs by Transforming Neighbors to Sequences This repository is an official PyTorch implementation of Neighbor

DIVE Lab, Texas A&M University 8 Jun 12, 2022
Cross-Image Region Mining with Region Prototypical Network for Weakly Supervised Segmentation

Cross-Image Region Mining with Region Prototypical Network for Weakly Supervised Segmentation The code of: Cross-Image Region Mining with Region Proto

LiuWeide 16 Nov 26, 2022
NAACL'2021: Factual Probing Is [MASK]: Learning vs. Learning to Recall

OptiPrompt This is the PyTorch implementation of the paper Factual Probing Is [MASK]: Learning vs. Learning to Recall. We propose OptiPrompt, a simple

Princeton Natural Language Processing 150 Dec 20, 2022
Tensorflow Implementation of ECCV'18 paper: Multimodal Human Motion Synthesis

MT-VAE for Multimodal Human Motion Synthesis This is the code for ECCV 2018 paper MT-VAE: Learning Motion Transformations to Generate Multimodal Human

Xinchen Yan 36 Oct 02, 2022
quantize aware training package for NCNN on pytorch

ncnnqat ncnnqat is a quantize aware training package for NCNN on pytorch. Table of Contents ncnnqat Table of Contents Installation Usage Code Examples

62 Nov 23, 2022
This repository contains demos I made with the Transformers library by HuggingFace.

Transformers-Tutorials Hi there! This repository contains demos I made with the Transformers library by 🤗 HuggingFace. Currently, all of them are imp

3.5k Jan 01, 2023
Kernel Point Convolutions

Created by Hugues THOMAS Introduction Update 27/04/2020: New PyTorch implementation available. With SemanticKitti, and Windows supported. This reposit

Hugues THOMAS 584 Jan 07, 2023
一个免费开源一键搭建的通用验证码识别平台,大部分常见的中英数验证码识别都没啥问题。

captcha_server 一个免费开源一键搭建的通用验证码识别平台,大部分常见的中英数验证码识别都没啥问题。 使用方法 python = 3.8 以上环境 pip install -r requirements.txt -i https://pypi.douban.com/simple gun

Sml2h3 189 Dec 02, 2022
[ACM MM 2019 Oral] Cycle In Cycle Generative Adversarial Networks for Keypoint-Guided Image Generation

Contents Cycle-In-Cycle GANs Installation Dataset Preparation Generating Images Using Pretrained Model Train and Test New Models Acknowledgments Relat

Hao Tang 67 Dec 14, 2022
Code for "Adversarial attack by dropping information." (ICCV 2021)

AdvDrop Code for "AdvDrop: Adversarial Attack to DNNs by Dropping Information(ICCV 2021)." Human can easily recognize visual objects with lost informa

Ranjie Duan 52 Nov 10, 2022
Regularizing Generative Adversarial Networks under Limited Data (CVPR 2021)

Regularizing Generative Adversarial Networks under Limited Data [Project Page][Paper] Implementation for our GAN regularization method. The proposed r

Google 148 Nov 18, 2022
SmartSim Infrastructure Library.

Home Install Documentation Slack Invite Cray Labs SmartSim SmartSim makes it easier to use common Machine Learning (ML) libraries like PyTorch and Ten

Cray Labs 139 Jan 01, 2023
Convolutional neural network web app trained to track our infant’s sleep schedule using our Google Nest camera.

Machine Learning Sleep Schedule Tracker What is it? Convolutional neural network web app trained to track our infant’s sleep schedule using our Google

g-parki 7 Jul 15, 2022
Image Deblurring using Generative Adversarial Networks

DeblurGAN arXiv Paper Version Pytorch implementation of the paper DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks. Our netwo

Orest Kupyn 2.2k Jan 01, 2023
Here I will explain the flow to deploy your custom deep learning models on Ultra96V2.

Xilinx_Vitis_AI This repo will help you to Deploy your Deep Learning Model on Ultra96v2 Board. Prerequisites Vitis Core Development Kit 2019.2 This co

Amin Mamandipoor 1 Feb 08, 2022
XViT - Space-time Mixing Attention for Video Transformer

XViT - Space-time Mixing Attention for Video Transformer This is the official implementation of the XViT paper: @inproceedings{bulat2021space, title

Adrian Bulat 33 Dec 23, 2022
ThunderGBM: Fast GBDTs and Random Forests on GPUs

Documentations | Installation | Parameters | Python (scikit-learn) interface What's new? ThunderGBM won 2019 Best Paper Award from IEEE Transactions o

Xtra Computing Group 647 Jan 04, 2023
Deep metric learning methods implemented in Chainer

Deep Metric Learning Implementation of several methods for deep metric learning in Chainer v4.2.0. Proxy-NCA: No Fuss Distance Metric Learning using P

ronekko 156 Nov 28, 2022