Pytorch Implementation of Value Retrieval with Arbitrary Queries for Form-like Documents.

Overview

Value Retrieval with Arbitrary Queries for Form-like Documents

Introduction

Pytorch Implementation of Value Retrieval with Arbitrary Queries for Form-like Documents.

Environment

CUDA="11.0"
CUDNN="8"
UBUNTU="18.04"

Install

bash install.sh
git clone https://github.com/NVIDIA/apex && cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
pip install .
# under our project root folder
pip install .

Data Preparation

Our model is pre-trained on IIT-CDIP dataset, fine-tuned on FUNSD train set and evaluated on FUNSD test set and INV-CDIP test set.

  • Download our processed OCR results of IIT-CDIP with hocr_list_addr.txt and put under PRETRAIN_DATA_FOLDER/.

  • Download our processed FUNSD and INV-CDIP datasets and put under DATA_DIR/.

Reproduce Our Results

  • Download our model fine-tuned on FUNSD here.

  • Do inference following

# $MODEL_PATH here is where you save the fine-tuned model.
# DATASET_NAME is FUNSD or INV-CDIP.
bash reproduce_results.sh $MODEL_PATH $DATA_DIR/DATASET_NAME
  • You should get the following results.
Datasets Precision Recall F1
FUNSD 60.4 60.9 60.7
INV-CDIP 50.5 47.6 49.0

Pre-training

  • You can skip the following steps by downloading our pre-trained SimpleDLM model here.

  • Or download layoutlm-base-uncased.

  • Do pre-training following

# $NUM_GPUS is the number of gpus you want to do the pretraining on. To reproduce the paper's results we recommend to use 8 gpus.
# $MODEL_PATH here is where you save the LayoutLM model.
# $PRETRAIN_DATA_FOLDER is the folder of IIT-CDIP hocr files.

python -m torch.distributed.launch --nproc_per_node=$NUM_GPUS pretraining.py \
--model_name_or_path $MODEL_PATH  --data_dir $PRETRAIN_DATA_FOLDER \
--output_dir $OUTPUT_DIR

Fine-tuning

  • Do fine-tuning following
# $MODEL_PATH is where you save the pre-trained simpleDLM model.

CUDA_VISIBLE_DEVICES=0 python run_query_value_retrieval.py --model_type simpledlm --model_name_or_path $MODEL_PATH \
--data_dir $DATA_DIR/FUNSD/ --output_dir $OUTPUT_DIR --do_train --evaluate_during_training

Citation

If you find this codebase useful, please cite our paper:

@article{gao2021value,
  title={Value Retrieval with Arbitrary Queries for Form-like Documents},
  author={Gao, Mingfei and Xue, Le and Ramaiah, Chetan and Xing, Chen and Xu, Ran and Xiong, Caiming},
  journal={arXiv preprint arXiv:2112.07820},
  year={2021}
}

Contact

Please send an email to [email protected] or [email protected] if you have questions.

Owner
Salesforce
A variety of vendor agnostic projects which power Salesforce
Salesforce
Source code release of the paper: Knowledge-Guided Deep Fractal Neural Networks for Human Pose Estimation.

GNet-pose Project Page: http://guanghan.info/projects/guided-fractal/ UPDATE 9/27/2018: Prototxts and model that achieved 93.9Pck on LSP dataset. http

Guanghan Ning 83 Nov 21, 2022
Using image super resolution models with vapoursynth and speeding them up with TensorRT

vs-RealEsrganAnime-tensorrt-docker Using image super resolution models with vapoursynth and speeding them up with TensorRT. Also a docker image since

4 Aug 23, 2022
SOTA easy to use PyTorch-based DL training library

Easily train or fine-tune SOTA computer vision models from one training repository. SuperGradients Introduction Welcome to SuperGradients, a free open

619 Jan 03, 2023
TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks

TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks [Paper] [Project Website] This repository holds the source code, pretra

Humam Alwassel 83 Dec 21, 2022
Demo for Real-time RGBD-based Extended Body Pose Estimation paper

Real-time RGBD-based Extended Body Pose Estimation This repository is a real-time demo for our paper that was published at WACV 2021 conference The ou

Renat Bashirov 118 Dec 26, 2022
This is an official implementation for "DeciWatch: A Simple Baseline for 10x Efficient 2D and 3D Pose Estimation"

DeciWatch: A Simple Baseline for 10× Efficient 2D and 3D Pose Estimation This repo is the official implementation of "DeciWatch: A Simple Baseline for

117 Dec 24, 2022
Code base for the paper "Scalable One-Pass Optimisation of High-Dimensional Weight-Update Hyperparameters by Implicit Differentiation"

This repository contains code for the paper Scalable One-Pass Optimisation of High-Dimensional Weight-Update Hyperparameters by Implicit Differentiati

8 Aug 28, 2022
AutoML library for deep learning

Official Website: autokeras.com AutoKeras: An AutoML system based on Keras. It is developed by DATA Lab at Texas A&M University. The goal of AutoKeras

Keras 8.7k Jan 08, 2023
Graph-Refined Convolutional Network for Multimedia Recommendation with Implicit Feedback

Graph-Refined Convolutional Network for Multimedia Recommendation with Implicit Feedback This is our Pytorch implementation for the paper: Yinwei Wei,

17 Jun 10, 2022
Defending against Model Stealing via Verifying Embedded External Features

Defending against Model Stealing Attacks via Verifying Embedded External Features This is the official implementation of our paper Defending against M

20 Dec 30, 2022
This program uses trial auth token of Azure Cognitive Services to do speech synthesis for you.

🗣️ aspeak A simple text-to-speech client using azure TTS API(trial). 😆 TL;DR: This program uses trial auth token of Azure Cognitive Services to do s

Levi Zim 359 Jan 05, 2023
Human4D Dataset tools for processing and visualization

HUMAN4D: A Human-Centric Multimodal Dataset for Motions & Immersive Media HUMAN4D constitutes a large and multimodal 4D dataset that contains a variet

tofis 15 Nov 09, 2022
Joint Versus Independent Multiview Hashing for Cross-View Retrieval[J] (IEEE TCYB 2021, PyTorch Code)

Thanks to the low storage cost and high query speed, cross-view hashing (CVH) has been successfully used for similarity search in multimedia retrieval. However, most existing CVH methods use all view

4 Nov 19, 2022
PyTorch implementation of "VRT: A Video Restoration Transformer"

VRT: A Video Restoration Transformer Jingyun Liang, Jiezhang Cao, Yuchen Fan, Kai Zhang, Rakesh Ranjan, Yawei Li, Radu Timofte, Luc Van Gool Computer

Jingyun Liang 837 Jan 09, 2023
End-To-End Memory Network using Tensorflow

MemN2N Implementation of End-To-End Memory Networks with sklearn-like interface using Tensorflow. Tasks are from the bAbl dataset. Get Started git clo

Dominique Luna 339 Oct 27, 2022
Concept drift monitoring for HA model servers.

{Fast, Correct, Simple} - pick three Easily compare training and production ML data & model distributions Goals Boxkite is an instrumentation library

98 Dec 15, 2022
PyTorch Implementation of Fully Convolutional Networks. (Training code to reproduce the original result is available.)

pytorch-fcn PyTorch implementation of Fully Convolutional Networks. Requirements pytorch = 0.2.0 torchvision = 0.1.8 fcn = 6.1.5 Pillow scipy tqdm

Kentaro Wada 1.6k Jan 07, 2023
Author's PyTorch implementation of Randomized Ensembled Double Q-Learning (REDQ) algorithm.

REDQ source code Author's PyTorch implementation of Randomized Ensembled Double Q-Learning (REDQ) algorithm. Paper link: https://arxiv.org/abs/2101.05

109 Dec 16, 2022
ICSS - Interactive Continual Semantic Segmentation

Presentation This repository contains the code of our paper: Weakly-supervised c

Alteia 9 Jul 23, 2022
Official repository for "Action-Based Conversations Dataset: A Corpus for Building More In-Depth Task-Oriented Dialogue Systems"

Action-Based Conversations Dataset (ABCD) This respository contains the code and data for ABCD (Chen et al., 2021) Introduction Whereas existing goal-

ASAPP Research 49 Oct 09, 2022