TensorFlow Implementation of "Show, Attend and Tell"

Overview

Show, Attend and Tell

Update (December 2, 2016) TensorFlow implementation of Show, Attend and Tell: Neural Image Caption Generation with Visual Attention which introduces an attention based image caption generator. The model changes its attention to the relevant part of the image while it generates each word.


alt text


References

Author's theano code: https://github.com/kelvinxu/arctic-captions

Another tensorflow implementation: https://github.com/jazzsaxmafia/show_attend_and_tell.tensorflow


Getting Started

Prerequisites

First, clone this repo and pycocoevalcap in same directory.

$ git clone https://github.com/yunjey/show-attend-and-tell-tensorflow.git
$ git clone https://github.com/tylin/coco-caption.git

This code is written in Python2.7 and requires TensorFlow 1.2. In addition, you need to install a few more packages to process MSCOCO data set. I have provided a script to download the MSCOCO image dataset and VGGNet19 model. Downloading the data may take several hours depending on the network speed. Run commands below then the images will be downloaded in image/ directory and VGGNet19 model will be downloaded in data/ directory.

$ cd show-attend-and-tell-tensorflow
$ pip install -r requirements.txt
$ chmod +x ./download.sh
$ ./download.sh

For feeding the image to the VGGNet, you should resize the MSCOCO image dataset to the fixed size of 224x224. Run command below then resized images will be stored in image/train2014_resized/ and image/val2014_resized/ directory.

$ python resize.py

Before training the model, you have to preprocess the MSCOCO caption dataset. To generate caption dataset and image feature vectors, run command below.

$ python prepro.py

Train the model

To train the image captioning model, run command below.

$ python train.py

(optional) Tensorboard visualization

I have provided a tensorboard visualization for real-time debugging. Open the new terminal, run command below and open http://localhost:6005/ into your web browser.

$ tensorboard --logdir='./log' --port=6005 

Evaluate the model

To generate captions, visualize attention weights and evaluate the model, please see evaluate_model.ipynb.


Results


Training data

(1) Generated caption: A plane flying in the sky with a landing gear down.

alt text

(2) Generated caption: A giraffe and two zebra standing in the field.

alt text

Validation data

(1) Generated caption: A large elephant standing in a dry grass field.

alt text

(2) Generated caption: A baby elephant standing on top of a dirt field.

alt text

Test data

(1) Generated caption: A plane flying over a body of water.

alt text

(2) Generated caption: A zebra standing in the grass near a tree.

alt text

Owner
Yunjey Choi
Yunjey Choi
GraphLily: A Graph Linear Algebra Overlay on HBM-Equipped FPGAs

GraphLily: A Graph Linear Algebra Overlay on HBM-Equipped FPGAs GraphLily is the first FPGA overlay for graph processing. GraphLily supports a rich se

Cornell Zhang Research Group 39 Dec 13, 2022
Using machine learning to predict undergrad college admissions.

College-Prediction Project- Overview: Many have tried, many have failed. Few trailblazers are ambitious enought to chase acceptance into the top 15 un

John H Klinges 1 Jan 05, 2022
Dyalog-apl-docset - Dyalog APL Dash Docset Generator

Dyalog APL Dash Docset Generator o alasa e kili sona kepeken tenpo lili a A Dash

Maciej Goszczycki 1 Jan 10, 2022
This code provides a PyTorch implementation for OTTER (Optimal Transport distillation for Efficient zero-shot Recognition), as described in the paper.

Data Efficient Language-Supervised Zero-Shot Recognition with Optimal Transport Distillation This repository contains PyTorch evaluation code, trainin

Meta Research 45 Dec 20, 2022
Faster RCNN with PyTorch

Faster RCNN with PyTorch Note: I re-implemented faster rcnn in this project when I started learning PyTorch. Then I use PyTorch in all of my projects.

Long Chen 1.6k Dec 23, 2022
Code for the paper Progressive Pose Attention for Person Image Generation in CVPR19 (Oral).

Pose-Transfer Code for the paper Progressive Pose Attention for Person Image Generation in CVPR19(Oral). The paper is available here. Video generation

Tengteng Huang 679 Jan 04, 2023
Official PyTorch implementation of the paper "Deep Constrained Least Squares for Blind Image Super-Resolution", CVPR 2022.

Deep Constrained Least Squares for Blind Image Super-Resolution [Paper] This is the official implementation of 'Deep Constrained Least Squares for Bli

MEGVII Research 141 Dec 30, 2022
Self-Supervised Monocular DepthEstimation with Internal Feature Fusion(arXiv), BMVC2021

DIFFNet This repo is for Self-Supervised Monocular DepthEstimation with Internal Feature Fusion(arXiv), BMVC2021 A new backbone for self-supervised de

Hang 94 Dec 25, 2022
Using CNN to mimic the driver based on training data from Torcs

Behavioural-Cloning-in-autonomous-driving Using CNN to mimic the driver based on training data from Torcs. Approach First, the data was collected from

Sudharshan 2 Jan 05, 2022
[ WSDM '22 ] On Sampling Collaborative Filtering Datasets

On Sampling Collaborative Filtering Datasets This repository contains the implementation of many popular sampling strategies, along with various expli

Noveen Sachdeva 17 Dec 08, 2022
the code for our CVPR 2021 paper Bilateral Grid Learning for Stereo Matching Network [BGNet]

BGNet This repository contains the code for our CVPR 2021 paper Bilateral Grid Learning for Stereo Matching Network [BGNet] Environment Python 3.6.* C

3DCV developer 87 Nov 29, 2022
Graph Analysis From Scratch

Graph Analysis From Scratch Goal In this notebook we wanted to implement some functionalities to analyze a weighted graph only by using algorithms imp

Arturo Ghinassi 0 Sep 17, 2022
PyExplainer: A Local Rule-Based Model-Agnostic Technique (Explainable AI)

PyExplainer PyExplainer is a local rule-based model-agnostic technique for generating explanations (i.e., why a commit is predicted as defective) of J

AI Wizards for Software Management (AWSM) Research Group 14 Nov 13, 2022
HairCLIP: Design Your Hair by Text and Reference Image

Overview This repository hosts the official PyTorch implementation of the paper: "HairCLIP: Design Your Hair by Text and Reference Image". Our single

322 Jan 06, 2023
PyTorch implementation of Convolutional Neural Fabrics http://arxiv.org/abs/1606.02492

PyTorch implementation of Convolutional Neural Fabrics arxiv:1606.02492 There are some minor differences: The raw image is first convolved, to obtain

Anuvabh Dutt 25 Dec 22, 2021
An Open Source Machine Learning Framework for Everyone

Documentation TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, a

170.1k Jan 04, 2023
TResNet: High Performance GPU-Dedicated Architecture

TResNet: High Performance GPU-Dedicated Architecture paperV2 | pretrained models Official PyTorch Implementation Tal Ridnik, Hussam Lawen, Asaf Noy, I

426 Dec 28, 2022
Robotics with GPU computing

Robotics with GPU computing Cupoch is a library that implements rapid 3D data processing for robotics using CUDA. The goal of this library is to imple

Shirokuma 625 Jan 07, 2023
Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners

Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners This repository is built upon BEiT, thanks very much! Now, we on

Zhiliang Peng 2.3k Jan 04, 2023
The open source code of SA-UNet: Spatial Attention U-Net for Retinal Vessel Segmentation.

SA-UNet: Spatial Attention U-Net for Retinal Vessel Segmentation(ICPR 2020) Overview This code is for the paper: Spatial Attention U-Net for Retinal V

Changlu Guo 151 Dec 28, 2022