Image-generation-baseline - MUGE Text To Image Generation Baseline

Last update: Oct 17, 2022

Related tags

Deep Learning image-generation-baseline

Overview

MUGE Text To Image Generation Baseline

Requirements and Installation

More details see fairseq. Briefly,

python == 3.6.4
pytorch == 1.7.1

Installing fairseq and other requirements

git clone https://github.com/MUGE-2021/image-caption-baseline
cd muge_baseline/
pip install -r requirements.txt
cd fairseq/
pip install --editable .

Downloading data and place to dataset/ directory, file structure is

text2image-baseline
    - dataset
        - ECommerce-T2I
            - T2I_train.img.tsv
            - T2I_train.text.tsv
            - ...

Getting Started

The model is a BART-like model with vqgan as a image tokenizer, please see models/t2i_baseline.py for detailed model structure.

Training

cd run_scripts/; bash train_t2i_vqgan.sh

Model training takes about 5 hours.

Inference

cd run_scripts/; bash generate_t2i_vqgan.sh

See results in results/ directory.

Reference

@inproceedings{M6,
  author    = {Junyang Lin and
               Rui Men and
               An Yang and
               Chang Zhou and
               Ming Ding and
               Yichang Zhang and
               Peng Wang and
               Ang Wang and
               Le Jiang and
               Xianyan Jia and
               Jie Zhang and
               Jianwei Zhang and
               Xu Zou and
               Zhikang Li and
               Xiaodong Deng and
               Jie Liu and
               Jinbao Xue and
               Huiling Zhou and
               Jianxin Ma and
               Jin Yu and
               Yong Li and
               Wei Lin and
               Jingren Zhou and
               Jie Tang and
               Hongxia Yang},
  title     = {{M6:} {A} Chinese Multimodal Pretrainer},
  year      = {2021},
  booktitle = {Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining},
  pages     = {3251–3261},
  numpages  = {11},
  location  = {Virtual Event, Singapore},
}

@article{M6-T,
  author    = {An Yang and
               Junyang Lin and
               Rui Men and
               Chang Zhou and
               Le Jiang and
               Xianyan Jia and
               Ang Wang and
               Jie Zhang and
               Jiamang Wang and
               Yong Li and
               Di Zhang and
               Wei Lin and
               Lin Qu and
               Jingren Zhou and
               Hongxia Yang},
  title     = {{M6-T:} Exploring Sparse Expert Models and Beyond},
  journal   = {CoRR},
  volume    = {abs/2105.15082},
  year      = {2021}
}

Image-generation-baseline - MUGE Text To Image Generation Baseline

Related tags

Overview

MUGE Text To Image Generation Baseline

Requirements and Installation

Getting Started

Training

Inference

Reference

Owner

Le dataset des images du projet d'IA de 2021

Segmentation vgg16 fcn - cityscapes

The personal repository of the work: DanceNet3D: Music Based Dance Generation with Parametric Motion Transformer.

Minimalist Error collection Service compatible with Rollbar clients. Sentry or Rollbar alternative.

Swin-Transformer is basically a hierarchical Transformer whose representation is computed with shifted windows.

Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation

Official PyTorch Implementation of "AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent Forecasting".

The Incredible PyTorch: a curated list of tutorials, papers, projects, communities and more relating to PyTorch.

Source code to accompany Defunctland's video "FASTPASS: A Complicated Legacy"

[ICCV2021] 3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds

Online Pseudo Label Generation by Hierarchical Cluster Dynamics for Adaptive Person Re-identification

BTC-Generator - BTC Generator With Python

Elucidating Robust Learning with Uncertainty-Aware Corruption Pattern Estimation

Multi-Agent Reinforcement Learning (MARL) method to learn scalable control polices for multi-agent target tracking.

Json2Xml tool will help you convert from json COCO format to VOC xml format in Object Detection Problem.

🗺 General purpose U-Network implemented in Keras for image segmentation

Unofficial implementation (replicates paper results!) of MINER: Multiscale Implicit Neural Representations in pytorch-lightning

Official code for the ICCV 2021 paper "DECA: Deep viewpoint-Equivariant human pose estimation using Capsule Autoencoders"

Alias-Free Generative Adversarial Networks (StyleGAN3) Official PyTorch implementation

Official PyTorch implementation of Less is More: Pay Less Attention in Vision Transformers.

Image-generation-baseline - MUGE Text To Image Generation Baseline

Related tags

Overview

MUGE Text To Image Generation Baseline

Requirements and Installation

Getting Started

Training

Inference

Reference

Owner

Le dataset des images du projet d'IA de 2021

Segmentation vgg16 fcn - cityscapes

The personal repository of the work: *DanceNet3D: Music Based Dance Generation with Parametric Motion Transformer*.

Minimalist Error collection Service compatible with Rollbar clients. Sentry or Rollbar alternative.

Swin-Transformer is basically a hierarchical Transformer whose representation is computed with shifted windows.

Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation

Official PyTorch Implementation of "AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent Forecasting".

The Incredible PyTorch: a curated list of tutorials, papers, projects, communities and more relating to PyTorch.

Source code to accompany Defunctland's video "FASTPASS: A Complicated Legacy"

[ICCV2021] 3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds

Online Pseudo Label Generation by Hierarchical Cluster Dynamics for Adaptive Person Re-identification

BTC-Generator - BTC Generator With Python

Elucidating Robust Learning with Uncertainty-Aware Corruption Pattern Estimation

Multi-Agent Reinforcement Learning (MARL) method to learn scalable control polices for multi-agent target tracking.

Json2Xml tool will help you convert from json COCO format to VOC xml format in Object Detection Problem.

🗺 General purpose U-Network implemented in Keras for image segmentation

Unofficial implementation (replicates paper results!) of MINER: Multiscale Implicit Neural Representations in pytorch-lightning

Official code for the ICCV 2021 paper "DECA: Deep viewpoint-Equivariant human pose estimation using Capsule Autoencoders"

Alias-Free Generative Adversarial Networks (StyleGAN3) Official PyTorch implementation

Official PyTorch implementation of Less is More: Pay Less Attention in Vision Transformers.

The personal repository of the work: DanceNet3D: Music Based Dance Generation with Parametric Motion Transformer.