Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps[AAAI2021]

Overview

Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps

Here is the code for ssbassline model. We also provide OCR results/features/models. The code is built on top of M4C, where more detailed information can also be found.

Citation

If you use ssbaseline in your work, please cite:

@article{zhu2020simple,
  title={Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps},
  author={Zhu, Qi and Gao, Chenyu and Wang, Peng and Wu, Qi},
  journal={arXiv preprint arXiv:2012.05153},
  year={2020}
}

Installation

First install the repo using

git clone https://github.com/ZephyrZhuQi/ssbaseline.git ~/ssbaseline
cd ~/ssbaseline
python setup.py build develop

Getting Data

We provide SBD-Trans OCR for TextVQA and ST-VQA datasets. The corresponding OCR Faster R-CNN features and Recog-CNN features are also released.

Datasets ImDBs Object Faster R-CNN Features OCR Faster R-CNN Features OCR Recog-CNN Features
TextVQA TextVQA ImDB Open Images TextVQA SBD-Trans OCRs TextVQA SBD-Trans OCRs
ST-VQA ST-VQA ImDB ST-VQA Objects ST-VQA SBD-Trans OCRs ST-VQA SBD-Trans OCRs

Pretrained Models

We release the following pretrained models for ssbaseline on TextVQA.

For the TextVQA dataset, we release: ssbaseline trained with ST-VQA as additional data (our best model) with SBD-Trans.

Datasets Config Files (under configs/vqa/) Pretrained Models Metrics Notes
TextVQA (m4c_textvqa) m4c_textvqa/m4c_with_stvqa.yml ssbaseline_with_stvqa val accuracy - 45.53%; test accuracy - 45.66% SBD-Trans OCRs; ST-VQA as additional data

Training and Evaluation

Please follow the M4C README for the training and evaluation of the M4C model on each dataset.

Owner
ZephyrZhuQi
Visual and linguistic reasoning.
ZephyrZhuQi
An open software package to develop BCI based brain and cognitive computing technology for recognizing user's intention using deep learning

An open software package to develop BCI based brain and cognitive computing technology for recognizing user's intention using deep learning

deepbci 272 Jan 08, 2023
Python Library for Signal/Image Data Analysis with Transport Methods

PyTransKit Python Transport Based Signal Processing Toolkit Website and documentation: https://pytranskit.readthedocs.io/ Installation The library cou

24 Dec 23, 2022
PyTorch implementation for COMPLETER: Incomplete Multi-view Clustering via Contrastive Prediction (CVPR 2021)

Completer: Incomplete Multi-view Clustering via Contrastive Prediction This repo contains the code and data of the following paper accepted by CVPR 20

XLearning Group 72 Dec 07, 2022
FairEdit: Preserving Fairness in Graph Neural Networks through Greedy Graph Editing

FairEdit Relevent Publication FairEdit: Preserving Fairness in Graph Neural Networks through Greedy Graph Editing

5 Feb 04, 2022
I explore rock vs. mine prediction using a SONAR dataset

I explore rock vs. mine prediction using a SONAR dataset. Using a Logistic Regression Model for my prediction algorithm, I intend on predicting what an object is based on supervised learning.

Jeff Shen 1 Jan 11, 2022
Autoencoder - Reducing the Dimensionality of Data with Neural Network

autoencoder Implementation of the Reducing the Dimensionality of Data with Neural Network – G. E. Hinton and R. R. Salakhutdinov paper. Notes Aim to m

Jordan Burgess 13 Nov 17, 2022
Repo for the paper Extrapolating from a Single Image to a Thousand Classes using Distillation

Extrapolating from a Single Image to a Thousand Classes using Distillation by Yuki M. Asano* and Aaqib Saeed* (*Equal Contribution) Extrapolating from

Yuki M. Asano 16 Nov 04, 2022
nextPARS, a novel Illumina-based implementation of in-vitro parallel probing of RNA structures.

nextPARS, a novel Illumina-based implementation of in-vitro parallel probing of RNA structures. Here you will find the scripts necessary to produce th

Jesse Willis 0 Jan 20, 2022
Deep Learning Pipelines for Apache Spark

Deep Learning Pipelines for Apache Spark The repo only contains HorovodRunner code for local CI and API docs. To use HorovodRunner for distributed tra

Databricks 2k Jan 08, 2023
Vertex AI: Serverless framework for MLOPs (ESP / ENG)

Vertex AI: Serverless framework for MLOPs (ESP / ENG) Español Qué es esto? Este repo contiene un pipeline end to end diseñado usando el SDK de Kubeflo

Hernán Escudero 2 Apr 28, 2022
A Python library for adversarial machine learning focusing on benchmarking adversarial robustness.

ARES This repository contains the code for ARES (Adversarial Robustness Evaluation for Safety), a Python library for adversarial machine learning rese

Tsinghua Machine Learning Group 377 Dec 20, 2022
Scrutinizing XAI with linear ground-truth data

This repository contains all the experiments presented in the corresponding paper: "Scrutinizing XAI using linear ground-truth data with suppressor va

braindata lab 2 Oct 04, 2022
(ICCV 2021) PyTorch implementation of Paper "Progressive Correspondence Pruning by Consensus Learning"

CLNet (ICCV 2021) PyTorch implementation of Paper "Progressive Correspondence Pruning by Consensus Learning" [project page] [paper] Citing CLNet If yo

Chen Zhao 22 Aug 26, 2022
Predict and time series avocado hass

RECOMMENDER SYSTEM MARKETING TỔNG QUAN VỀ HỆ THỐNG DỮ LIỆU 1. Giới thiệu - Tiki là một hệ sinh thái thương mại "all in one", trong đó có tiki.vn, là

hieulmsc 3 Jan 10, 2022
The Simplest DCGAN Implementation

DCGAN in TensorLayer This is the TensorLayer implementation of Deep Convolutional Generative Adversarial Networks. Looking for Text to Image Synthesis

TensorLayer Community 310 Dec 13, 2022
Get 2D point positions (e.g., facial landmarks) projected on 3D mesh

points2d_projection_mesh Input 2D points (e.g. facial landmarks) on an image Camera parameters (extrinsic and intrinsic) of the image Aligned 3D mesh

5 Dec 08, 2022
Potato Disease Classification - Training, Rest APIs, and Frontend to test.

Potato Disease Classification Setup for Python: Install Python (Setup instructions) Install Python packages pip3 install -r training/requirements.txt

codebasics 95 Dec 21, 2022
Relative Positional Encoding for Transformers with Linear Complexity

Stochastic Positional Encoding (SPE) This is the source code repository for the ICML 2021 paper Relative Positional Encoding for Transformers with Lin

Antoine Liutkus 48 Nov 16, 2022
Implementation of 'lightweight' GAN, proposed in ICLR 2021, in Pytorch. High resolution image generations that can be trained within a day or two

512x512 flowers after 12 hours of training, 1 gpu 256x256 flowers after 12 hours of training, 1 gpu Pizza 'Lightweight' GAN Implementation of 'lightwe

Phil Wang 1.5k Jan 02, 2023
[3DV 2021] A Dataset-Dispersion Perspective on Reconstruction Versus Recognition in Single-View 3D Reconstruction Networks

dispersion-score Official implementation of 3DV 2021 Paper A Dataset-dispersion Perspective on Reconstruction versus Recognition in Single-view 3D Rec

Yefan 7 May 28, 2022