Image captioning

End-to-end image captioning with EfficientNet-b3 + LSTM with Attention

Model is seq2seq model. In the encoder pretrained EfficientNet-b3 model is used to extract the features. Decoder is the LSTM with the Bahdanau Attention.

Dataset

The dataset is available at kaggle and contains 8,000 images that are each paired with five different captions.

Usage

run in terminal: python -m img_caption

Config

The user interface consists of file:

config.yaml - general configuration with data and model parameters

Default config.yaml:

data:
  path_to_data_folder: "data"
  caption_file_name: "captions.txt"
  images_folder_name: "Images"
  output_folder_name: "output"
  logging_file_name: "logging.txt"
  model_file_name: "model.pt"

batch_size: 32
num_worker: 1
gensim_model_name: "glove-wiki-gigaword-200"

model:
  embedding_dimension: 200
  decoder_hidden_dimension: 300
  learning_rate: 0.0001
  momentum: 0.9
  n_epochs: 50
  clip: 5
  fine_tune_encoder: false

Output

After training the model, the pipeline will return the following files:

model.pt - checkpoint with:
- epoch - last epoch
- model_state_dict - model parameters
- optimizer_state_dict - the state of the optimizer
- train_history - training history from a model
- valid_history - validation history from a model
- best_valid_loss - the best validation loss

End-to-end image captioning with EfficientNet-b3 + LSTM with Attention

Related tags

Overview

Image captioning

Dataset

Usage

Config

Output

Owner

Grover is a model for Neural Fake News -- both generation and detectio

Twitter Sentiment Analysis using #tag, words and username

MHtyper is an end-to-end pipeline for recognized the Forensic microhaplotypes in Nanopore sequencing data.

Repositório da disciplina no semestre 2021-2

⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x using fastT5.

Use AutoModelForSeq2SeqLM in Huggingface Transformers to train COMET

Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

Poetry PEP 517 Build Backend & Core Utilities

Data preprocessing rosetta parser for python

SimBERT升级版（SimBERTv2）！

Code and dataset for the EMNLP 2021 Finding paper "Can NLI Models Verify QA Systems’ Predictions?"

Neural-Machine-Translation - Implementation of revolutionary machine translation models

Search with BERT vectors in Solr and Elasticsearch

To create a deep learning model which can explain the content of an image in the form of speech through caption generation with attention mechanism on Flickr8K dataset.

Leon is an open-source personal assistant who can live on your server.

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

Repository for the paper: VoiceMe: Personalized voice generation in TTS

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)