PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks

Overview

PICK-PyTorch

***** Updated on Feb 6th, 2021: Train Ticket dataset is now available for academic research. You can download from Google Drive or OneDrive. It contains 1,530 synthetic images and 320 real images for training, and 80 real images for testing. Please refer to our paper for more details about how to sample training/testing set from EATEN and generate the corresponding annotations.*****

***** Updated on Sep 17th, 2020: A training example on the large-scale document understanding dataset, DocBank, is now available. Please refer to examples/DocBank/README.md for more details. Thanks TengQi Ye for this contribution.*****

PyTorch reimplementation of "PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks" (ICPR 2020). This project is different from our original implementation.

Introduction

PICK is a framework that is effective and robust in handling complex documents layout for Key Information Extraction (KIE) by combining graph learning with graph convolution operation, yielding a richer semantic representation containing the textual and visual features and global layout without ambiguity. Overall architecture shown follows.

Overall

Requirements

  • python = 3.6
  • torchvision = 0.6.1
  • tabulate = 0.8.7
  • overrides = 3.0.0
  • opencv_python = 4.3.0.36
  • numpy = 1.16.4
  • pandas = 1.0.5
  • allennlp = 1.0.0
  • torchtext = 0.6.0
  • tqdm = 4.47.0
  • torch = 1.5.1
pip install -r requirements.txt

Usage

Distributed training with config files

Modify the configurations in config.json and dist_train.sh files, then run:

bash dist_train.sh

The application will be launched via launch.py on a 4 GPU node with one process per GPU (recommend).

This is equivalent to

python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=4 \
--master_addr=127.0.0.1 --master_port=5555 \
train.py -c config.json -d 1,2,3,4 --local_world_size 4

and is equivalent to specify indices of available GPUs by CUDA_VISIBLE_DEVICES instead of -d args

CUDA_VISIBLE_DEVICES=1,2,3,4 python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=4 \
--master_addr=127.0.0.1 --master_port=5555 \
train.py -c config.json --local_world_size 4

Similarly, it can be launched with a single process that spans all 4 GPUs (if node has 4 available GPUs) using (don't recommend):

CUDA_VISIBLE_DEVICES=1,2,3,4 python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=1 \
--master_addr=127.0.0.1 --master_port=5555 \
train.py -c config.json --local_world_size 1

Using Multiple Node

You can enable multi-node multi-GPU training by setting nnodes and node_rank args of the commandline line on every node. e.g., 2 nodes 4 gpus run as follows

Node 1, ip: 192.168.0.10, then run on node 1 as follows

CUDA_VISIBLE_DEVICES=1,2,3,4 python -m torch.distributed.launch --nnodes=2 --node_rank=0 --nproc_per_node=4 \
--master_addr=192.168.0.10 --master_port=5555 \
train.py -c config.json --local_world_size 4  

Node 2, ip: 192.168.0.15, then run on node 2 as follows

CUDA_VISIBLE_DEVICES=2,4,6,7 python -m torch.distributed.launch --nnodes=2 --node_rank=1 --nproc_per_node=4 \
--master_addr=192.168.0.10 --master_port=5555 \
train.py -c config.json --local_world_size 4  

Resuming from checkpoints

You can resume from a previously saved checkpoint by:

python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=4 \
--master_addr=127.0.0.1 --master_port=5555 \
train.py -d 1,2,3,4 --local_world_size 4 --resume path/to/checkpoint

Debug mode on one GPU/CPU training with config files

This option of training mode can debug code without distributed way. -dist must set to false to turn off distributed mode. -d specify which one gpu will be used.

python train.py -c config.json -d 1 -dist false

Testing from checkpoints

You can test from a previously saved checkpoint by:

python test.py --checkpoint path/to/checkpoint --boxes_transcripts path/to/boxes_transcripts \
               --images_path path/to/images_path --output_folder path/to/output_folder \
               --gpu 0 --batch_size 2

Customization

Training custom datasets

You can train your own datasets following the steps outlined below.

  1. Prepare the correct format of files as provided in data folder.
    • Please see data/README.md an instruction how to prepare the data in required format for PICK.
  2. Modify train_dataset and validation_dataset args in config.json file, including files_name, images_folder, boxes_and_transcripts_folder, entities_folder, iob_tagging_type and resized_image_size.
  3. Modify Entities_list in utils/entities_list.py file according to the entity type of your dataset.
  4. Modify keys.txt in utils/keys.txt file if needed according to the vocabulary of your dataset.
  5. Modify MAX_BOXES_NUM and MAX_TRANSCRIPT_LEN in data_tuils/documents.py file if needed.

Note: The self-build datasets our paper used cannot be shared for patient privacy and proprietary issues.

Checkpoints

You can specify the name of the training session in config.json files:

"name": "PICK_Default",
"run_id": "test"

The checkpoints will be saved in save_dir/name/run_id_timestamp/checkpoint_epoch_n, with timestamp in mmdd_HHMMSS format.

A copy of config.json file will be saved in the same folder.

Note: checkpoints contain:

{
  'arch': arch,
  'epoch': epoch,
  'state_dict': self.model.state_dict(),
  'optimizer': self.optimizer.state_dict(),
  'monitor_best': self.monitor_best,
  'config': self.config
}

Tensorboard Visualization

This project supports Tensorboard visualization by using either torch.utils.tensorboard or TensorboardX.

  1. Install

    If you are using pytorch 1.1 or higher, install tensorboard by 'pip install tensorboard>=1.14.0'.

    Otherwise, you should install tensorboardx. Follow installation guide in TensorboardX.

  2. Run training

    Make sure that tensorboard option in the config file is turned on.

     "tensorboard" : true
    
  3. Open Tensorboard server

    Type tensorboard --logdir saved/log/ at the project root, then server will open at http://localhost:6006

By default, values of loss will be logged. If you need more visualizations, use add_scalar('tag', data), add_image('tag', image), etc in the trainer._train_epoch method. add_something() methods in this project are basically wrappers for those of tensorboardX.SummaryWriter and torch.utils.tensorboard.SummaryWriter modules.

Note: You don't have to specify current steps, since WriterTensorboard class defined at logger/visualization.py will track current steps.

Results on Train Ticket

example

TODOs

  • Dataset cache mechanism to speed up training loop
  • Multi-node multi-gpu setup (DistributedDataParallel)

Citations

If you find this code useful please cite our paper:

@inproceedings{Yu2020PICKPK,
  title={{PICK}: Processing Key Information Extraction from Documents using 
  Improved Graph Learning-Convolutional Networks},
  author={Wenwen Yu and Ning Lu and Xianbiao Qi and Ping Gong and Rong Xiao},
  booktitle={2020 25th International Conference on Pattern Recognition (ICPR)},
  year={2020}
}

License

This project is licensed under the MIT License. See LICENSE for more details.

Acknowledgements

This project structure takes example by PyTorch Template Project.

Owner
Wenwen Yu
Ph.D. student at Huazhong University of Science and Technology
Wenwen Yu
MicRank is a Learning to Rank neural channel selection framework where a DNN is trained to rank microphone channels.

MicRank: Learning to Rank Microphones for Distant Speech Recognition Application Scenario Many applications nowadays envision the presence of multiple

Samuele Cornell 20 Nov 10, 2022
hySLAM is a hybrid SLAM/SfM system designed for mapping

HySLAM Overview hySLAM is a hybrid SLAM/SfM system designed for mapping. The system is based on ORB-SLAM2 with some modifications and refactoring. Raú

Brian Hopkinson 15 Oct 10, 2022
This is an official implementation of the High-Resolution Transformer for Dense Prediction.

High-Resolution Transformer for Dense Prediction Introduction This is the official implementation of High-Resolution Transformer (HRT). We present a H

HRNet 403 Dec 13, 2022
💡 Learnergy is a Python library for energy-based machine learning models.

Learnergy: Energy-based Machine Learners Welcome to Learnergy. Did you ever reach a bottleneck in your computational experiments? Are you tired of imp

Gustavo Rosa 57 Nov 17, 2022
Eff video representation - Efficient video representation through neural fields

Neural Residual Flow Fields for Efficient Video Representations 1. Download MPI

41 Jan 06, 2023
This is the official github repository of the Met dataset

The Met dataset This is the official github repository of the Met dataset. The official webpage of the dataset can be found here. What is it? This cod

Nikolaos-Antonios Ypsilantis 35 Dec 17, 2022
Official implementation for paper: A Latent Transformer for Disentangled Face Editing in Images and Videos.

A Latent Transformer for Disentangled Face Editing in Images and Videos Official implementation for paper: A Latent Transformer for Disentangled Face

InterDigital 108 Dec 09, 2022
MatchGAN: A Self-supervised Semi-supervised Conditional Generative Adversarial Network

MatchGAN: A Self-supervised Semi-supervised Conditional Generative Adversarial Network This repository is the official implementation of MatchGAN: A S

Justin Sun 12 Dec 27, 2022
An Industrial Grade Federated Learning Framework

DOC | Quick Start | 中文 FATE (Federated AI Technology Enabler) is an open-source project initiated by Webank's AI Department to provide a secure comput

Federated AI Ecosystem 4.8k Jan 09, 2023
A lane detection integrated Real-time Instance Segmentation based on YOLACT (You Only Look At CoefficienTs)

Real-time Instance Segmentation and Lane Detection This is a lane detection integrated Real-time Instance Segmentation based on YOLACT (You Only Look

Jin 4 Dec 30, 2022
Enabling Lightweight Fine-tuning for Pre-trained Language Model Compression based on Matrix Product Operators

Enabling Lightweight Fine-tuning for Pre-trained Language Model Compression based on Matrix Product Operators This is our Pytorch implementation for t

RUCAIBox 12 Jul 22, 2022
Source code for CAST - Crisis Domain Adaptation Using Sequence-to-sequence Transformers (Accepted to ISCRAM 2021, CorePaper).

Source code for CAST: Crisis Domain Adaptation UsingSequence-to-sequenceTransformers (Paper, BibTeX, Accepted to ISCRAM 2021, CorePaper) Quick start D

Congcong Wang 0 Jul 14, 2021
Simple-Image-Classification - Simple Image Classification Code (PyTorch)

Simple-Image-Classification Simple Image Classification Code (PyTorch) Yechan Kim This repository contains: Python3 / Pytorch code for multi-class ima

Yechan Kim 8 Oct 29, 2022
Revealing and Protecting Labels in Distributed Training

Revealing and Protecting Labels in Distributed Training

Google Interns 0 Nov 09, 2022
DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision

The Official PyTorch Implementation of DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision

Shiyi Lan 3 Oct 15, 2021
Natural Posterior Network: Deep Bayesian Predictive Uncertainty for Exponential Family Distributions

Natural Posterior Network This repository provides the official implementation o

Oliver Borchert 54 Dec 06, 2022
Parallel Latent Tree-Induction for Faster Sequence Encoding

FastTrees This repository contains the experimental code supporting the FastTrees paper by Bill Pung. Software Requirements Python 3.6, NLTK and PyTor

Bill Pung 4 Mar 29, 2022
Official implementation for CVPR 2021 paper: Adaptive Class Suppression Loss for Long-Tail Object Detection

Adaptive Class Suppression Loss for Long-Tail Object Detection This repo is the official implementation for CVPR 2021 paper: Adaptive Class Suppressio

CASIA-IVA-Lab 67 Dec 04, 2022
In-place Parallel Super Scalar Samplesort (IPS⁴o)

In-place Parallel Super Scalar Samplesort (IPS⁴o) This is the implementation of the algorithm IPS⁴o presented in the paper Engineering In-place (Share

82 Dec 22, 2022
Official implementation for "Style Transformer for Image Inversion and Editing" (CVPR 2022)

Style Transformer for Image Inversion and Editing (CVPR2022) https://arxiv.org/abs/2203.07932 Existing GAN inversion methods fail to provide latent co

Xueqi Hu 153 Dec 02, 2022