Meshed-Memory Transformer for Image Captioning. CVPR 2020

Overview

M²: Meshed-Memory Transformer

This repository contains the reference code for the paper Meshed-Memory Transformer for Image Captioning (CVPR 2020).

Please cite with the following BibTeX:

@inproceedings{cornia2020m2,
  title={{Meshed-Memory Transformer for Image Captioning}},
  author={Cornia, Marcella and Stefanini, Matteo and Baraldi, Lorenzo and Cucchiara, Rita},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2020}
}

Meshed-Memory Transformer

Environment setup

Clone the repository and create the m2release conda environment using the environment.yml file:

conda env create -f environment.yml
conda activate m2release

Then download spacy data by executing the following command:

python -m spacy download en

Note: Python 3.6 is required to run our code.

Data preparation

To run the code, annotations and detection features for the COCO dataset are needed. Please download the annotations file annotations.zip and extract it.

Detection features are computed with the code provided by [1]. To reproduce our result, please download the COCO features file coco_detections.hdf5 (~53.5 GB), in which detections of each image are stored under the _features key. is the id of each COCO image, without leading zeros (e.g. the for COCO_val2014_000000037209.jpg is 37209), and each value should be a (N, 2048) tensor, where N is the number of detections.

Evaluation

To reproduce the results reported in our paper, download the pretrained model file meshed_memory_transformer.pth and place it in the code folder.

Run python test.py using the following arguments:

Argument Possible values
--batch_size Batch size (default: 10)
--workers Number of workers (default: 0)
--features_path Path to detection features file
--annotation_folder Path to folder with COCO annotations

Expected output

Under output_logs/, you may also find the expected output of the evaluation code.

Training procedure

Run python train.py using the following arguments:

Argument Possible values
--exp_name Experiment name
--batch_size Batch size (default: 10)
--workers Number of workers (default: 0)
--m Number of memory vectors (default: 40)
--head Number of heads (default: 8)
--warmup Warmup value for learning rate scheduling (default: 10000)
--resume_last If used, the training will be resumed from the last checkpoint.
--resume_best If used, the training will be resumed from the best checkpoint.
--features_path Path to detection features file
--annotation_folder Path to folder with COCO annotations
--logs_folder Path folder for tensorboard logs (default: "tensorboard_logs")

For example, to train our model with the parameters used in our experiments, use

python train.py --exp_name m2_transformer --batch_size 50 --m 40 --head 8 --warmup 10000 --features_path /path/to/features --annotation_folder /path/to/annotations

Sample Results

References

[1] P. Anderson, X. He, C. Buehler, D. Teney, M. Johnson, S. Gould, and L. Zhang. Bottom-up and top-down attention for image captioning and visual question answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.

Owner
AImageLab
AImageLab
Processed, version controlled history of Minecraft's generated data and assets

mcmeta Processed, version controlled history of Minecraft's generated data and assets Repository structure Each of the following branches has a commit

Misode 75 Dec 28, 2022
Docker containers of baseline agents for the Crafter environment

Crafter Baselines This repository contains Docker containers for running various baselines on the Crafter environment. Reward Agents DreamerV2 based o

Danijar Hafner 17 Sep 25, 2022
Weakly Supervised 3D Object Detection from Point Cloud with Only Image Level Annotation

SCCKTIM Weakly Supervised 3D Object Detection from Point Cloud with Only Image-Level Annotation Our code will be available soon. The class knowledge t

1 Nov 12, 2021
This porject is intented to build the most accurate model for predicting the porbability of loan default

Estimating-Loan-Default-Probability IBA ML2 Mid-project / Kaggle Competition This porject is intented to build the most accurate model for predicting

Adil Gahramanov 1 Jan 24, 2022
Reducing Information Bottleneck for Weakly Supervised Semantic Segmentation (NeurIPS 2021)

Reducing Information Bottleneck for Weakly Supervised Semantic Segmentation (NeurIPS 2021) The implementation of Reducing Infromation Bottleneck for W

Jungbeom Lee 81 Dec 16, 2022
InsightFace: 2D and 3D Face Analysis Project on MXNet and PyTorch

InsightFace: 2D and 3D Face Analysis Project on MXNet and PyTorch

Deep Insight 13.2k Jan 06, 2023
Code to reproduce the results for Compositional Attention

Compositional-Attention This repository contains the official implementation for the paper Compositional Attention: Disentangling Search and Retrieval

Sarthak Mittal 58 Nov 30, 2022
Implementation of Graph Convolutional Networks in TensorFlow

Graph Convolutional Networks This is a TensorFlow implementation of Graph Convolutional Networks for the task of (semi-supervised) classification of n

Thomas Kipf 6.6k Dec 30, 2022
The code release of paper 'Domain Generalization for Medical Imaging Classification with Linear-Dependency Regularization' NIPS 2020.

Domain Generalization for Medical Imaging Classification with Linear Dependency Regularization The code release of paper 'Domain Generalization for Me

Yufei Wang 56 Dec 28, 2022
Information-Theoretic Multi-Objective Bayesian Optimization with Continuous Approximations

Information-Theoretic Multi-Objective Bayesian Optimization with Continuous Approximations Requirements The code is implemented in Python and requires

1 Nov 03, 2021
The self-supervised goal reaching benchmark introduced in Discovering and Achieving Goals via World Models

Lexa-Benchmark Codebase for the self-supervised goal reaching benchmark introduced in 'Discovering and Achieving Goals via World Models'. Setup Create

1 Oct 14, 2021
Libraries, tools and tasks created and used at DeepMind Robotics.

dm_robotics: Libraries, tools, and tasks created and used for Robotics research at DeepMind. Package overview Package Summary Transformations Rigid bo

DeepMind 273 Jan 06, 2023
Creative Applications of Deep Learning w/ Tensorflow

Creative Applications of Deep Learning w/ Tensorflow This repository contains lecture transcripts and homework assignments as Jupyter Notebooks for th

Parag K Mital 1.5k Dec 30, 2022
Flow is a computational framework for deep RL and control experiments for traffic microsimulation.

Flow Flow is a computational framework for deep RL and control experiments for traffic microsimulation. See our website for more information on the ap

867 Jan 02, 2023
Differentiable scientific computing library

xitorch: differentiable scientific computing library xitorch is a PyTorch-based library of differentiable functions and functionals that can be widely

98 Dec 26, 2022
A machine learning benchmark of in-the-wild distribution shifts, with data loaders, evaluators, and default models.

WILDS is a benchmark of in-the-wild distribution shifts spanning diverse data modalities and applications, from tumor identification to wildlife monitoring to poverty mapping.

P-Lambda 437 Dec 30, 2022
Image classification for projects and researches

This is a tool to help you quickly solve classification problems including: data analysis, training, report results and model explanation.

Nguyễn Trường Lâu 2 Dec 27, 2021
The code repository for "PyCIL: A Python Toolbox for Class-Incremental Learning" in PyTorch.

PyCIL: A Python Toolbox for Class-Incremental Learning Introduction • Methods Reproduced • Reproduced Results • How To Use • License • Acknowledgement

Fu-Yun Wang 258 Dec 31, 2022
POCO: Point Convolution for Surface Reconstruction

POCO: Point Convolution for Surface Reconstruction by: Alexandre Boulch and Renaud Marlet Abstract Implicit neural networks have been successfully use

valeo.ai 93 Dec 29, 2022