A transformer-based method for Healthcare Image Captioning in Vietnamese

Last update: May 05, 2022

Overview

vieCap4H Challenge 2021: A transformer-based method for Healthcare Image Captioning in Vietnamese

This repo GitHub contains our solution for vieCap4H Challenge 2021. In detail, we use grid features as visual presentation and pre-training a BERT-based language model from PhoBERT-based pre-trained model to obtain language presentation. Besides, we indicate a suitable schedule with the self-critical training sequence (SCST) technique to achieve the best results. Through experiments, we achieve an average of BLEU 30.3% on the public-test round and 28.9% on the private-test round, which ranks 3rd and 4th, respectively.

Figure 1. An overview of our solution based on RSTNet

1. Data preparation

The grid features of vieCap4H can be downloaded via links below:

Dataset can be downloaded at https://aihub.vn/competitions/40 Annotations must be converted to COCO format. We have already converted and it is available at:

viecap4h-public-train.json.

2. Training

Pre-training BERT-based model with PhoBERT-based

python train_language.py \
--img_path <images path> \
--features_path <features path> \
--annotation_folder <annotations folder> \
--batch_size 40

Weights of BERT-based model should be appeared in folder saved_language_models

Then, continue to train Transformer model via command below::

python train_transformer.py \
--img_path <images path> \
--features_path <features path> \
--annotation_folder <annotations folder> \
--batch_size 40

Weights of Transformr-based model should be appeared in folder saved_transformer_rstnet_models

Where <images path> is data folder, <features path> is the path of grid features folder, <annotations folder> is the path of folder that contains file viecap4h-public-train.json.

3. Inference

The results can be obtained via command below:

python test_viecap.py

4. Pre-trained model

To implement our results on leaderboard, two pretrained models for BERT-based model and Transformer model can be downloaded via links below:

Updating...

A transformer-based method for Healthcare Image Captioning in Vietnamese

Related tags

Overview

vieCap4H Challenge 2021: A transformer-based method for Healthcare Image Captioning in Vietnamese

1. Data preparation

2. Training

3. Inference

4. Pre-trained model

Owner

Doanh B C

Object Depth via Motion and Detection Dataset

Python implementation of Bayesian optimization over permutation spaces.

KoCLIP: Korean port of OpenAI CLIP, in Flax

Walk with fastai

Caffe-like explicit model constructor. C(onfig)Model

This is the repository for our paper Ditch the Gold Standard: Re-evaluating Conversational Question Answering

BMVC 2021: This is the github repository for "Few Shot Temporal Action Localization using Query Adaptive Transformers" accepted in British Machine Vision Conference (BMVC) 2021, Virtual

Easy to use Audio Tagging in PyTorch

Styleformer - Official Pytorch Implementation

Replication Package for "An Empirical Study of the Effectiveness of an Ensemble of Stand-alone Sentiment Detection Tools for Software Engineering Datasets"

Deep Learning for Computer Vision final project

Fight Recognition from Still Images in the Wild @ WACVW2022, Real-world Surveillance Workshop

Syllabic Quantity Patterns as Rhythmic Features for Latin Authorship Attribution

Single Image Super-Resolution (SISR) with SRResNet, EDSR and SRGAN

A very simple tool for situations where optimization with onnx-simplifier would exceed the Protocol Buffers upper file size limit of 2GB, or simply to separate onnx files to any size you want.

The code of Zero-shot learning for low-light image enhancement based on dual iteration

Video Autoencoder: self-supervised disentanglement of 3D structure and motion

A forwarding MPI implementation that can use any other MPI implementation via an MPI ABI

Tensorflow 2 Object Detection API kurulumu, GPU desteği, custom model hazırlama

In Search of Probeable Generalization Measures