Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps

Here is the code for ssbassline model. We also provide OCR results/features/models. The code is built on top of M4C, where more detailed information can also be found.

Citation

If you use ssbaseline in your work, please cite:

@article{zhu2020simple,
  title={Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps},
  author={Zhu, Qi and Gao, Chenyu and Wang, Peng and Wu, Qi},
  journal={arXiv preprint arXiv:2012.05153},
  year={2020}
}

Installation

First install the repo using

git clone https://github.com/ZephyrZhuQi/ssbaseline.git ~/ssbaseline
cd ~/ssbaseline
python setup.py build develop

Getting Data

We provide SBD-Trans OCR for TextVQA and ST-VQA datasets. The corresponding OCR Faster R-CNN features and Recog-CNN features are also released.

Datasets	ImDBs	Object Faster R-CNN Features	OCR Faster R-CNN Features	OCR Recog-CNN Features
TextVQA	TextVQA ImDB	Open Images	TextVQA SBD-Trans OCRs	TextVQA SBD-Trans OCRs
ST-VQA	ST-VQA ImDB	ST-VQA Objects	ST-VQA SBD-Trans OCRs	ST-VQA SBD-Trans OCRs

Pretrained Models

We release the following pretrained models for ssbaseline on TextVQA.

For the TextVQA dataset, we release: ssbaseline trained with ST-VQA as additional data (our best model) with SBD-Trans.

Datasets	Config Files (under `configs/vqa/`)	Pretrained Models	Metrics	Notes
TextVQA (`m4c_textvqa`)	`m4c_textvqa/m4c_with_stvqa.yml`	`ssbaseline_with_stvqa`	val accuracy - 45.53%; test accuracy - 45.66%	SBD-Trans OCRs; ST-VQA as additional data

Training and Evaluation

Please follow the M4C README for the training and evaluation of the M4C model on each dataset.

Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps[AAAI2021]

Related tags

Overview

Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps

Citation

Installation

Getting Data

Pretrained Models

Training and Evaluation

Owner

ZephyrZhuQi

An open software package to develop BCI based brain and cognitive computing technology for recognizing user's intention using deep learning

Python Library for Signal/Image Data Analysis with Transport Methods

PyTorch implementation for COMPLETER: Incomplete Multi-view Clustering via Contrastive Prediction (CVPR 2021)

FairEdit: Preserving Fairness in Graph Neural Networks through Greedy Graph Editing

I explore rock vs. mine prediction using a SONAR dataset

Autoencoder - Reducing the Dimensionality of Data with Neural Network

Repo for the paper Extrapolating from a Single Image to a Thousand Classes using Distillation

nextPARS, a novel Illumina-based implementation of in-vitro parallel probing of RNA structures.

Deep Learning Pipelines for Apache Spark

Vertex AI: Serverless framework for MLOPs (ESP / ENG)

A Python library for adversarial machine learning focusing on benchmarking adversarial robustness.

Scrutinizing XAI with linear ground-truth data

(ICCV 2021) PyTorch implementation of Paper "Progressive Correspondence Pruning by Consensus Learning"

Predict and time series avocado hass

The Simplest DCGAN Implementation

Get 2D point positions (e.g., facial landmarks) projected on 3D mesh

Potato Disease Classification - Training, Rest APIs, and Frontend to test.

Relative Positional Encoding for Transformers with Linear Complexity

Implementation of 'lightweight' GAN, proposed in ICLR 2021, in Pytorch. High resolution image generations that can be trained within a day or two

[3DV 2021] A Dataset-Dispersion Perspective on Reconstruction Versus Recognition in Single-View 3D Reconstruction Networks