Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps

Here is the code for ssbassline model. We also provide OCR results/features/models. The code is built on top of M4C, where more detailed information can also be found.

Citation

If you use ssbaseline in your work, please cite:

@article{zhu2020simple,
  title={Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps},
  author={Zhu, Qi and Gao, Chenyu and Wang, Peng and Wu, Qi},
  journal={arXiv preprint arXiv:2012.05153},
  year={2020}
}

Installation

First install the repo using

git clone https://github.com/ZephyrZhuQi/ssbaseline.git ~/ssbaseline
cd ~/ssbaseline
python setup.py build develop

Getting Data

We provide SBD-Trans OCR for TextVQA and ST-VQA datasets. The corresponding OCR Faster R-CNN features and Recog-CNN features are also released.

Datasets	ImDBs	Object Faster R-CNN Features	OCR Faster R-CNN Features	OCR Recog-CNN Features
TextVQA	TextVQA ImDB	Open Images	TextVQA SBD-Trans OCRs	TextVQA SBD-Trans OCRs
ST-VQA	ST-VQA ImDB	ST-VQA Objects	ST-VQA SBD-Trans OCRs	ST-VQA SBD-Trans OCRs

Pretrained Models

We release the following pretrained models for ssbaseline on TextVQA.

For the TextVQA dataset, we release: ssbaseline trained with ST-VQA as additional data (our best model) with SBD-Trans.

Datasets	Config Files (under `configs/vqa/`)	Pretrained Models	Metrics	Notes
TextVQA (`m4c_textvqa`)	`m4c_textvqa/m4c_with_stvqa.yml`	`ssbaseline_with_stvqa`	val accuracy - 45.53%; test accuracy - 45.66%	SBD-Trans OCRs; ST-VQA as additional data

Training and Evaluation

Please follow the M4C README for the training and evaluation of the M4C model on each dataset.

Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps[AAAI2021]

Related tags

Overview

Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps

Citation

Installation

Getting Data

Pretrained Models

Training and Evaluation

Owner

ZephyrZhuQi

BARTScore: Evaluating Generated Text as Text Generation

Keras udrl - Keras implementation of Upside Down Reinforcement Learning

Lightwood is Legos for Machine Learning.

Project looking into use of autoencoder for semi-supervised learning and comparing data requirements compared to supervised learning.

Bayesian optimisation library developped by Huawei Noah's Ark Library

VGG16 model-based classification project about brain tumor detection.

Code for Universal Semi-Supervised Semantic Segmentation models paper accepted in ICCV 2019

Discovering Interpretable GAN Controls [NeurIPS 2020]

Cognition-aware Cognate Detection

a short visualisation script for pyvideo data

SeqAttack: a framework for adversarial attacks on token classification models

Geometric Vector Perceptrons --- a rotation-equivariant GNN for learning from biomolecular structure

Chinese license plate recognition

Team Enigma at ArgMining 2021 Shared Task: Leveraging Pretrained Language Models for Key Point Matching

Music Generation using Neural Networks Streamlit App

Code repository for "Free View Synthesis", ECCV 2020.

A PyTorch Lightning Callback for pushing models to the Hugging Face Hub 🤗⚡️

Detecting Potentially Harmful and Protective Suicide-related Content on Twitter

Official implementation of EdiTTS: Score-based Editing for Controllable Text-to-Speech

Pacman-AI - AI project designed by UC Berkeley. Designed reflex and minimax agents for the game Pacman.