Replication of Pix2Seq with Pretrained Model

Last update: Nov 22, 2022

Related tags

Overview

Pretrained-Pix2Seq

We provide the pre-trained model of Pix2Seq. This version contains new data augmentation. The model is trained for 300 epochs and can acheive 37 mAP without beam search or neucles search.

Installation

Install PyTorch 1.5+ and torchvision 0.6+ (recommend torch1.8.1 torchvision 0.8.0)

Install pycocotools (for evaluation on COCO):

pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'

That's it, should be good to train and evaluate detection models.

Data preparation

Download and extract COCO 2017 train and val images with annotations from http://cocodataset.org. We expect the directory structure to be the following:

path/to/coco/
  annotations/  # annotation json files
  train2017/    # train images
  val2017/      # val images

Training

First link coco dataset to the project folder

ln -s /path/to/coco ./coco

Training

sh train.sh --model pix2seq --output_dir /path/to/save

Evaluation

sh train.sh --model pix2seq --output_dir /path/to/save --resume /path/to/checkpoints --eval

COCO

Method	backbone	Epoch	Batch Size	AP	AP50	AP75	Weights
Pix2Seq	R50	300	32	37.0	53.4	39.4	weight

Contributor

Qiu Han, Peng Gao, Jingqiu Zhou(Beam Search)

Acknowledegement

Pix2Seq, DETR

Replication of Pix2Seq with Pretrained Model

Related tags

Overview

Pretrained-Pix2Seq

Installation

Data preparation

Training

COCO

Contributor

Acknowledegement

Owner

peng gao

2021-MICCAI-Progressively Normalized Self-Attention Network for Video Polyp Segmentation

Optimized code based on M2 for faster image captioning training

✨✨✨An awesome open source toolbox for stereo matching.

Data and Code for ACL 2021 Paper "Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning"

Official implementation for paper: Feature-Style Encoder for Style-Based GAN Inversion

A Robust Non-IoU Alternative to Non-Maxima Suppression in Object Detection

Open source annotation tool for machine learning practitioners.

Towards End-to-end Video-based Eye Tracking

Implementation of EMNLP 2017 Paper "Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog" using PyTorch and ParlAI

Corgis are the cutest creatures; have 30K of them!

Source code for From Stars to Subgraphs

DiffStride: Learning strides in convolutional neural networks

PyTorch implementation of Decoupling Value and Policy for Generalization in Reinforcement Learning

Temporal-Relational CrossTransformers

1st place solution to the Satellite Image Change Detection Challenge hosted by SenseTime

[CVPR 2021] NormalFusion: Real-Time Acquisition of Surface Normals for High-Resolution RGB-D Scanning

Localizing Visual Sounds the Hard Way

学习 python3 以来写的一些垃圾玩具……

CNN designed for pansharpening

Pytorch implementation for A-NeRF: Articulated Neural Radiance Fields for Learning Human Shape, Appearance, and Pose