Image Captioning using CNN ,LSTM and Attention

Last update: Dec 16, 2021

Related tags

Deep Learning imagecaptioningproject

Overview

Image Captioning using CNN ,LSTM and Attention

This is a deeplearning model which tries to summarize an image into a text .

Installation

Install this project with pip3. Use python version 3.7

  pip3 install -R requirements.txt
  python3 app.py

these commands are applicable if you want to try the website in localhost.

you can also install docker and build an image from the docker file and run it.

  docker build -f Dockerfile -t imagecaptioning:api .
  docker run -p 8080:8080 -ti imagecaptioning

Deployment

To deploy this project in google cloud app engine . First create an project in app engine. Install google SDK to push ptojects into your local machine then run the following commands.

  gcloud init
  gcloud app deploy

choose the right project and then push the application to the cloud. This is an monolithic application so a single docker image is complied on the app engine.

Demo

link to demo-https://lucky-dahlia-333406.el.r.appspot.com/index

FAQ

why is this project implimented in tensorflow ?

Tensorflow is actively maintained by google and is very convenient to deploy on a server .It automatically switches to gpu while training if it finds one.

what is BELU score ?

BLEU, or the Bilingual Evaluation Understudy, is a score for comparing a candidate translation of text to one or more reference translations.Although developed for translation, it can be used to evaluate text generated for a suite of natural language processing tasks.

In this project, you will discover the BLEU score for evaluating and scoring candidate text using the NLTK library in Python.

Authors

License

MIT

Image Captioning using CNN ,LSTM and Attention

Related tags

Overview

Image Captioning using CNN ,LSTM and Attention

Installation

Deployment

Demo

FAQ

why is this project implimented in tensorflow ?

what is BELU score ?

Authors

License

Owner

ASUTOSH GHANTO

Based on Yolo's low-power, ultra-lightweight universal target detection algorithm, the parameter is only 250k, and the speed of the smart phone mobile terminal can reach ~300fps+

Code for the submitted paper Surrogate-based cross-correlation for particle image velocimetry

CVPR 2021 Official Pytorch Code for UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training

A python/pytorch utility library

ICCV2021, Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

Official Implementation of 'UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers' ICLR 2021(spotlight)

Unadversarial Examples: Designing Objects for Robust Vision

Voice of Pajlada with model and weights.

Reproduces the results of the paper "Finite Basis Physics-Informed Neural Networks (FBPINNs): a scalable domain decomposition approach for solving differential equations".

This is an easy python software which allows to sort images with faces by gender and after by age.

Img-process-manual - Utilize Python Numpy and Matplotlib to realize OpenCV baisc image processing function

Facestar dataset. High quality audio-visual recordings of human conversational speech.

Omnidirectional camera calibration in python

An end-to-end library for editing and rendering motion of 3D characters with deep learning [SIGGRAPH 2020]

Official implementation for the paper "Attentive Prototypes for Source-free Unsupervised Domain Adaptive 3D Object Detection"

통일된 DataScience 폴더 구조 제공 및 가상환경 작업의 부담감 해소

official implemntation for "Contrastive Learning with Stronger Augmentations"

Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes

Code for a seq2seq architecture with Bahdanau attention designed to map stereotactic EEG data from human brains to spectrograms, using the PyTorch Lightning.

ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.