Image captioning

End-to-end image captioning with EfficientNet-b3 + LSTM with Attention

Model is seq2seq model. In the encoder pretrained EfficientNet-b3 model is used to extract the features. Decoder is the LSTM with the Bahdanau Attention.

Dataset

The dataset is available at kaggle and contains 8,000 images that are each paired with five different captions.

Usage

run in terminal: python -m img_caption

Config

The user interface consists of file:

config.yaml - general configuration with data and model parameters

Default config.yaml:

data:
  path_to_data_folder: "data"
  caption_file_name: "captions.txt"
  images_folder_name: "Images"
  output_folder_name: "output"
  logging_file_name: "logging.txt"
  model_file_name: "model.pt"

batch_size: 32
num_worker: 1
gensim_model_name: "glove-wiki-gigaword-200"

model:
  embedding_dimension: 200
  decoder_hidden_dimension: 300
  learning_rate: 0.0001
  momentum: 0.9
  n_epochs: 50
  clip: 5
  fine_tune_encoder: false

Output

After training the model, the pipeline will return the following files:

model.pt - checkpoint with:
- epoch - last epoch
- model_state_dict - model parameters
- optimizer_state_dict - the state of the optimizer
- train_history - training history from a model
- valid_history - validation history from a model
- best_valid_loss - the best validation loss

End-to-end image captioning with EfficientNet-b3 + LSTM with Attention

Related tags

Overview

Image captioning

Dataset

Usage

Config

Output

Owner

Create a semantic search engine with a neural network (i.e. BERT) whose knowledge base can be updated

Bu Chatbot, Konya Bilim Merkezi Yen için tasarlanmış olan bir projedir.

Spacy-ginza-ner-webapi - Named Entity Recognition API with spaCy and GiNZA

Implementation of TTS with combination of Tacotron2 and HiFi-GAN

Semi-automated vocabulary generation from semantic vector models

Baseline code for Korean open domain question answering(ODQA)

Simple Annotated implementation of GPT-NeoX in PyTorch

TEACh is a dataset of human-human interactive dialogues to complete tasks in a simulated household environment.

Anomaly Detection 이상치 탐지 전처리 모듈

ProteinBERT is a universal protein language model pretrained on ~106M proteins from the UniRef90 dataset.

AI Assistant for Building Reliable, High-performing and Fair Multilingual NLP Systems

History Aware Multimodal Transformer for Vision-and-Language Navigation

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

Lattice methods in TensorFlow

SimCSE: Simple Contrastive Learning of Sentence Embeddings

Code for our ACL 2021 (Findings) Paper - Fingerprinting Fine-tuned Language Models in the wild .

:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

Unifying Cross-Lingual Semantic Role Labeling with Heterogeneous Linguistic Resources (NAACL-2021).

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)