Deep-Learning-Image-Captioning - Implementing convolutional and recurrent neural networks in Keras to generate sentence descriptions of images

Last update: Apr 06, 2022

Overview

Deep Learning - Image Captioning with Convolutional and Recurrent Neural Nets

========================================================================

Author: Jonathan Kuo
Python: 3.6.1
TensorFlow: 1.0.1 Keras: 2.0.4

Implementing convolutional and recurrent neural networks in Keras to generate sentence descriptions of images

Introduction

The Keras deep learning architecture of this project was inspired by Deep Visual-Semantic Alignments for Generating Image Descriptions by Andrej Karpathy and Fei-Fei Li.

Given input of a dataset of images and their sentence descriptions, define a Keras (TensorFlow backend) deep learning model that corresponds detected regions on image with description segments. This learning allows the model to output novel descriptions for test images.

Dataset

Microsoft Common Objects in Context (MSCOCO) is an image recognition, segmentation, and captioning dataset. Training data includes 123,000 images and caption pairs. Validation and testing data are both 5,000 images and caption pairs.

Architecture

VGG16 CNN architecture (loaded in Keras) with pre-trained weights on ImageNet are used as the CNN to detect objects in the image. Then, the last dense softmax 200-classification layer was removed in order to pass the 4096-D activations into into the RNN (LSTM). CNN weights are frozen and RNN weights are updated in backpropagation through time (BPTT). The CNN and LSTM is merged before passing into a second LSTM to predict the next word in the sequence. RMSprop is used as the optimizer to combat the vanishing gradient problem.

Demo

View the demo iPython notebook for the model training and prediction on the MSCOCO dataset.

Deep-Learning-Image-Captioning - Implementing convolutional and recurrent neural networks in Keras to generate sentence descriptions of images

Related tags

Overview

Deep Learning - Image Captioning with Convolutional and Recurrent Neural Nets

Introduction

Dataset

Architecture

Demo

Owner

Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with pretrained models.

Stroke-predictions-ml-model - Machine learning model to predict individuals chances of having a stroke

Video-face-extractor - Video face extractor with Python

[NeurIPS'21] "AugMax: Adversarial Composition of Random Augmentations for Robust Training" by Haotao Wang, Chaowei Xiao, Jean Kossaifi, Zhiding Yu, Animashree Anandkumar, and Zhangyang Wang.

Structural Constraints on Information Content in Human Brain States

a short visualisation script for pyvideo data

GNN-based Recommendation Benchmark

A PyTorch implementation of the Transformer model in "Attention is All You Need".

The pure and clear PyTorch Distributed Training Framework.

Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers [CVPR 2021]

The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

PyJokes - Joking around with Python library pyjokes

Model Zoo of BDD100K Dataset

A best practice for tensorflow project template architecture.

[RSS 2021] An End-to-End Differentiable Framework for Contact-Aware Robot Design

HistoKT: Cross Knowledge Transfer in Computational Pathology

Official Implementation of "Transformers Can Do Bayesian Inference"

A trashy useless Latin programming language written in python.

In this project we predict the forest cover type using the cartographic variables in the training/test datasets.

Model Agnostic Interpretability for Multiple Instance Learning