PyTorch code for: Learning to Generate Grounded Visual Captions without Localization Supervision

Last update: Nov 17, 2022

Overview

Learning to Generate Grounded Visual Captions without Localization Supervision

This is the PyTorch implementation of our paper:

Learning to Generate Grounded Visual Captions without Localization Supervision
Chih-Yao Ma, Yannis Kalantidis, Ghassan AlRegib, Peter Vajda, Marcus Rohrbach, Zsolt Kira
European Conference on Computer Vision (ECCV), 2020

[arXiv] [GitHub] [Project]

10-min YouTube Video

How to start

Clone the repo recursively:

git clone --recursive [email protected]:chihyaoma/cyclical-visual-captioning.git

If you didn't clone with the --recursive flag, then you'll need to manually clone the pybind submodule from the top-level directory:

git submodule update --init --recursive

Installation

The proposed cyclical method can be applied directly to image and video captioning tasks.

Currently, installation guide and our code for video captioning on the ActivityNet-Entities dataset are provided in anet-video-captioning.

Acknowledgments

Chih-Yao Ma and Zsolt Kira were partly supported by DARPA’s Lifelong Learning Machines (L2M) program, under Cooperative Agreement HR0011-18-2-0019, as part of their affiliation with Georgia Tech. We thank Chia-Jung Hsu for her valuable and artistic helps on the figures.

Citation

If you find this repository useful, please cite our paper:

@inproceedings{ma2020learning,
    title={Learning to Generate Grounded Image Captions without Localization Supervision},
    author={Ma, Chih-Yao and Kalantidis, Yannis and AlRegib, Ghassan and Vajda, Peter and Rohrbach, Marcus and Kira, Zsolt},
    booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
    year={2020},
    url={https://arxiv.org/abs/1906.00283},
}

PyTorch code for: Learning to Generate Grounded Visual Captions without Localization Supervision

Related tags

Overview

Learning to Generate Grounded Visual Captions without Localization Supervision

10-min YouTube Video

How to start

Installation

Acknowledgments

Citation

Owner

Chih-Yao Ma

Differentiable scientific computing library

Code & Experiments for "LILA: Language-Informed Latent Actions" to be presented at the Conference on Robot Learning (CoRL) 2021.

Learning to Identify Top Elo Ratings with A Dueling Bandits Approach

📚 Papermill is a tool for parameterizing, executing, and analyzing Jupyter Notebooks.

This is a Machine Learning Based Hand Detector Project, It Uses Machine Learning Models and Modules Like Mediapipe, Developed By Google!

Implementation of ETSformer, state of the art time-series Transformer, in Pytorch

“Data Augmentation for Cross-Domain Named Entity Recognition” (EMNLP 2021)

Tracking code for the winner of track 1 in the MMP-Tracking Challenge at ICCV 2021 Workshop.

Learn about Spice.ai with in-depth samples

[ICCV 2021] Self-supervised Monocular Depth Estimation for All Day Images using Domain Separation

Learning Features with Parameter-Free Layers (ICLR 2022)

LiDAR R-CNN: An Efficient and Universal 3D Object Detector

Code for Deep Single-image Portrait Image Relighting

A simple consistency training framework for semi-supervised image semantic segmentation

DI-HPC is an acceleration operator component for general algorithm modules in reinforcement learning algorithms

A Domain-Agnostic Benchmark for Self-Supervised Learning

A Closer Look at Invalid Action Masking in Policy Gradient Algorithms

Adversarial Learning for Semi-supervised Semantic Segmentation, BMVC 2018

Linear image-to-image translation

Source code for the BMVC-2021 paper "SimReg: Regression as a Simple Yet Effective Tool for Self-supervised Knowledge Distillation".