MUGE Text To Image Generation Baseline

Requirements and Installation

More details see fairseq. Briefly,

python == 3.6.4
pytorch == 1.7.1

Installing fairseq and other requirements

git clone https://github.com/MUGE-2021/image-caption-baseline
cd muge_baseline/
pip install -r requirements.txt
cd fairseq/
pip install --editable .

Downloading data and place to dataset/ directory, file structure is

text2image-baseline
    - dataset
        - ECommerce-T2I
            - T2I_train.img.tsv
            - T2I_train.text.tsv
            - ...

Getting Started

The model is a BART-like model with vqgan as a image tokenizer, please see models/t2i_baseline.py for detailed model structure.

Training

cd run_scripts/; bash train_t2i_vqgan.sh

Model training takes about 5 hours.

Inference

cd run_scripts/; bash generate_t2i_vqgan.sh

See results in results/ directory.

Reference

@inproceedings{M6,
  author    = {Junyang Lin and
               Rui Men and
               An Yang and
               Chang Zhou and
               Ming Ding and
               Yichang Zhang and
               Peng Wang and
               Ang Wang and
               Le Jiang and
               Xianyan Jia and
               Jie Zhang and
               Jianwei Zhang and
               Xu Zou and
               Zhikang Li and
               Xiaodong Deng and
               Jie Liu and
               Jinbao Xue and
               Huiling Zhou and
               Jianxin Ma and
               Jin Yu and
               Yong Li and
               Wei Lin and
               Jingren Zhou and
               Jie Tang and
               Hongxia Yang},
  title     = {{M6:} {A} Chinese Multimodal Pretrainer},
  year      = {2021},
  booktitle = {Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining},
  pages     = {3251–3261},
  numpages  = {11},
  location  = {Virtual Event, Singapore},
}

@article{M6-T,
  author    = {An Yang and
               Junyang Lin and
               Rui Men and
               Chang Zhou and
               Le Jiang and
               Xianyan Jia and
               Ang Wang and
               Jie Zhang and
               Jiamang Wang and
               Yong Li and
               Di Zhang and
               Wei Lin and
               Lin Qu and
               Jingren Zhou and
               Hongxia Yang},
  title     = {{M6-T:} Exploring Sparse Expert Models and Beyond},
  journal   = {CoRR},
  volume    = {abs/2105.15082},
  year      = {2021}
}

Image-generation-baseline - MUGE Text To Image Generation Baseline

Related tags

Overview

MUGE Text To Image Generation Baseline

Requirements and Installation

Getting Started

Training

Inference

Reference

Owner

This repository attempts to replicate the SqueezeNet architecture and implement the same on an image classification task.

The implementation of "Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer"

Geometry-Free View Synthesis: Transformers and no 3D Priors

Official Pytorch implementation for "End2End Occluded Face Recognition by Masking Corrupted Features, TPAMI 2021"

Implementation of "Learning to Match Features with Seeded Graph Matching Network" ICCV2021

Code release to accompany paper "Geometry-Aware Gradient Algorithms for Neural Architecture Search."

ACL'2021: LM-BFF: Better Few-shot Fine-tuning of Language Models

Quasi-Dense Similarity Learning for Multiple Object Tracking, CVPR 2021 (Oral)

Predicting path with preference based on user demonstration using Maximum Entropy Deep Inverse Reinforcement Learning in a continuous environment

Simple-System-Convert--C--F - Simple System Convert With Python

Code and data for ACL2021 paper Cross-Lingual Abstractive Summarization with Limited Parallel Resources.

Meta Learning for Semi-Supervised Few-Shot Classification

This repository contains the code and models for the following paper.

Parsing, analyzing, and comparing source code across many languages

Used to record WKU's utility bills on a regular basis.

OMAMO: orthology-based model organism selection

A community run, 5-day PyTorch Deep Learning Bootcamp

Generalized Decision Transformer for Offline Hindsight Information Matching

A library for answering questions using data you cannot see

Library for machine learning stacking generalization.