πŸ‡°πŸ‡· Text to Image in Korean

Overview

KoDALLE

Open In Colab Wandb Log

image-20211227151557604

Utilizing pretrained language model’s token embedding layer and position embedding layer as DALLE’s text encoder.

Background

  • Training DALLE model from scratch demands large size paired dataset of images and captions. For example, OpenAI DALLE is trained with more than 250 million text-image pairs for the training.
  • If the dataset isn’t large enough or is limited to specific domains, number of vocabularies in the trained DALLE model are insufficient. For instance, 1 million text captions of K-Fashion dataset only consists of more or less than 300 tokens.
  • Therefore, inferencing from such DALLE models could be problematic if the given sentence query is unconnected to the originally trained captions’ text dataset.

KoDALLE's Result on Small Size Fashion Dataset

OpenAI’s DALLE KoDALLE of HappyFace
Train Dataset Size 250 Million Pairs 0.8 Million Pairs
#Params 12 Billion 428 Million
#Layers 64 Layers 16 Layers
Computing Resource 1024 x V100 16GB 1 x V100 32GB
Text Encoder 16384 Vocab x 512 Dim BPE 32000 Vocab x 1024 Dim klue/roberta-large
Image Encoder VQVAE VQGAN
Optimizer AdamW AdamW
Learning Rate 4.5e-5 3.0e-5
Weight Decay 4.5e-3 3.0e-3
LR Scheduler ReduceLROnPlateau -

The team constructed Text to Fashion Design DALLE model in Korean language with less than 100k text-image sampled pairs.

Caption ν•˜μ˜μ—μ„œ 색상은 μŠ€μΉ΄μ΄λΈ”λ£¨μ΄λ‹€. μƒμ˜μ—μ„œ κΈ°μž₯은 둱이닀. 색상은 ν™”μ΄νŠΈμ΄λ‹€. μΉ΄ν…Œκ³ λ¦¬λŠ” λΈ”λΌμš°μŠ€μ΄λ‹€. λ””ν…ŒμΌμ—λŠ” 셔링이닀. μ†Œλ§€κΈ°μž₯은 λ°˜νŒ”μ΄λ‹€. μ†Œμž¬μ—λŠ” 싀크이닀. ν”„λ¦°νŠΈμ—λŠ” 무지이닀. λ„₯라인은 브이λ„₯이닀. 핏은 λ…Έλ©€
Generated Image image
Caption μ•„μš°ν„°λŠ” 색상이 μΉ΄ν‚€ μ†Œμž¬κ°€ 우븐 핏이 루즈인 μ½”νŠΈμ΄λ‹€. ν•˜μ˜λŠ” 색상이 넀이비 μ†Œμž¬κ°€ λ°λ‹˜ 핏이 μŠ€ν‚€λ‹ˆμΈ 청바지이닀.
Generated Image image
Caption ν•˜μ˜μ—μ„œ κΈ°μž₯은 발λͺ©μ΄λ‹€. 색상은 블루이닀. μΉ΄ν…Œκ³ λ¦¬λŠ” μŠ€μ»€νŠΈμ΄λ‹€. μ†Œμž¬μ—λŠ” λ°λ‹˜μ΄λ‹€. 핏은 μ™€μ΄λ“œμ΄λ‹€. μƒμ˜μ—μ„œ 색상은 ν™”μ΄νŠΈμ΄λ‹€. μΉ΄ν…Œκ³ λ¦¬λŠ” λΈ”λΌμš°μŠ€μ΄λ‹€. λ””ν…ŒμΌμ—λŠ” 셔링이닀. μ†Œλ§€κΈ°μž₯은 λ°˜νŒ”μ΄λ‹€. μ†Œμž¬μ—λŠ” μš°λΈμ΄λ‹€.
Generated Image image
Caption μƒμ˜μ—μ„œ κΈ°μž₯은 노멀이닀. μƒμ˜μ—μ„œ 색상은 ν™”μ΄νŠΈμ΄λ‹€. μƒμ˜μ—μ„œ μ„œλΈŒμƒ‰μƒμ€ λΈ”λž™μ΄λ‹€. μƒμ˜μ—μ„œ μΉ΄ν…Œκ³ λ¦¬λŠ” 티셔츠이닀. μƒμ˜μ—μ„œ μ†Œλ§€κΈ°μž₯은 λ°˜νŒ”μ΄λ‹€. μƒμ˜μ—μ„œ μ†Œμž¬μ—λŠ” 저지이닀. μƒμ˜μ—μ„œ ν”„λ¦°νŠΈμ—λŠ” λ ˆν„°λ§μ΄λ‹€. μƒμ˜μ—μ„œ λ„₯라인은 λΌμš΄λ“œλ„₯이닀. μƒμ˜μ—μ„œ 핏은 λ£¨μ¦ˆμ΄λ‹€.
Generated Image image

Methodology

Experimentations were conducted with the following Korean Transformers Models’ embedding layers. The team selected klue/roberta-large as baseline in the repository considering the size of the model.

KoDALLE with klue/roberta-large's wpe and wte which is trainable on 16GB GPU Google Colab environment. Hyperparams related to the DALLE's model size are following.

'BATCH_SIZE': 32
'DEPTH': 2
'TEXT_SEQ_LEN': 128
'VOCAB_SIZE': 32000
'MODEL_DIM': 1024
'ATTN_TYPES': 'full'
'DIM_HEAD': 64
'HEADS': 8

Significance

  • Offers promising result for training from scratch on specific domains with small size dataset.
  • Introduces solution for domain specific DALLE & CLIP models to be robust on input sentence.
  • Recommends adequate text-to-image model size for given computation resource.
  • Suggests effortless method of creating DALLE & CLIP model for own languages if pretrained language model is available.

WIP

  • Add image-caption reranker(EfficientNet + Klue/roberta-large)
  • Model trained with 500k text-image pairs.
  • Modulize in python code.
  • Update Inference code.
  • Update FID and IS metrics on test and validation dataset.
You might also like...
[CVPR 2021] Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach
[CVPR 2021] Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach

Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach This is the repo to host the dataset TextSeg and code for TexRNe

BARTScore: Evaluating Generated Text as Text Generation
BARTScore: Evaluating Generated Text as Text Generation

This is the Repo for the paper: BARTScore: Evaluating Generated Text as Text Generation Updates 2021.06.28 Release online evaluation Demo 2021.06.25 R

Code for EMNLP 2021 main conference paper
Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"

Text-AutoAugment (TAA) This repository contains the code for our paper Text AutoAugment: Learning Compositional Augmentation Policy for Text Classific

a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LSTM layers

RNN-Playwrite a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LS

Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts

t5-japanese Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts. The following is a list of models that

Siamese-nn-semantic-text-similarity - A repository containing comprehensive Neural Networks based PyTorch implementations for the semantic text similarity task Automatic number plate recognition using tech:  Yolo, OCR, Scene text detection, scene text recognation, flask, torch
Automatic number plate recognition using tech: Yolo, OCR, Scene text detection, scene text recognation, flask, torch

Automatic Number Plate Recognition Automatic Number Plate Recognition (ANPR) is the process of reading the characters on the plate with various optica

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network)
Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network)

Deep Daze mist over green hills shattered plates on the grass cosmic love and attention a time traveler in the crowd life during the plague meditative

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

DALL-E in Pytorch Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch. It will also contain CLIP for ranking the ge

Comments
  • Koclip apply in KoDALLE

    Koclip apply in KoDALLE

    변경사항

    add) model.py

    ν˜„μˆ˜λ‹˜μ˜ KoCLIP이 DALLE Roberta μ—μ„œ μž‘λ™ν•˜κ²Œλ” μ½”λ“œλ₯Ό μˆ˜μ •ν•œ νŒŒμΌμž…λ‹ˆλ‹€.

    dev branch에 μ‘΄μž¬ν•˜λŠ” model.py λΉ„κ΅ν•˜λ©΄μ„œ μˆ˜μ •μ΄ ν•„μš”ν•©λ‹ˆλ‹€.

    add) generate.ipynb

    KoCLIP이 μž‘λ™ν•˜λŠ”κ²ƒμ„ λ³Ό 수 μžˆλ„λ‘ λ§Œλ“  μ½”λ“œμž…λ‹ˆλ‹€.

    opened by JoonHong-Kim 1
  • add: KoCLIP codes

    add: KoCLIP codes

    변경사항:

    refactor) clipmodel.py

    • CLIPModel μ΅œμ’… λ²„μ „μœΌλ‘œ μˆ˜μ •
    • clip folder둜 이동

    add) clip/train_clip.py

    • CLIP λͺ¨λΈ ν•™μŠ΅μ— μ‚¬μš©ν•œ μ½”λ“œμž…λ‹ˆλ‹€

    add) clip/dataloader.py

    • CLIP λͺ¨λΈ ν•™μŠ΅μ— μ‚¬μš©ν•œ dataloader ν•¨μˆ˜μž…λ‹ˆλ‹€.
    opened by shawnhyeonsoo 0
  • add skip_sample in TextImageDataset

    add skip_sample in TextImageDataset

    변경사항

    modify) loader.py

    • TextImageDatasetμ—μ„œ texts, imageλ₯Ό 뢈러올 λ•Œ, dataκ°€ 없을 경우 λ°œμƒν•˜λŠ” μ—λŸ¬ 처리
    • skip_sample ν•¨μˆ˜λ₯Ό ν™œμš©ν•˜μ—¬ errorκ°€ λ°œμƒν•  경우, random ν˜Ήμ€ λ‹€μŒ index둜 λ³€ν™˜ν•˜μ—¬ skip
    • κΈ°μ‘΄ train_dalle_gpt_roberta.pyλ₯Ό λ°”νƒ•μœΌλ‘œ μˆ˜μ •
    opened by jjonhwa 0
Releases(v0.1.0-beta)
FedScale: Benchmarking Model and System Performance of Federated Learning

FedScale: Benchmarking Model and System Performance of Federated Learning (Paper) This repository contains scripts and instructions of building FedSca

268 Jan 01, 2023
Easy way to add GoogleMaps to Flask applications. maintainer: @getcake

Flask Google Maps Easy to use Google Maps in your Flask application requires Jinja Flask A google api key get here Contribute To contribute with the p

Flask Extensions 611 Dec 05, 2022
Stochastic Tensor Optimization for Robot Motion - A GPU Robot Motion Toolkit

STORM Stochastic Tensor Optimization for Robot Motion - A GPU Robot Motion Toolkit [Install Instructions] [Paper] [Website] This package contains code

NVIDIA Research Projects 101 Dec 12, 2022
Nsdf: A mesh SDF with just some code we can directly paste into our raymarcher

nsdf Representing SDFs of arbitrary meshes has been a bit tricky so far. Express

Jan Ivanecky 5 Feb 18, 2022
Replication Package for "An Empirical Study of the Effectiveness of an Ensemble of Stand-alone Sentiment Detection Tools for Software Engineering Datasets"

Replication Package for "An Empirical Study of the Effectiveness of an Ensemble of Stand-alone Sentiment Detection Tools for Software Engineering Data

2 Oct 06, 2022
Tool which allow you to detect and translate text.

Text detection and recognition This repository contains tool which allow to detect region with text and translate it one by one. Description Two pretr

Damian Panek 176 Nov 28, 2022
A modular active learning framework for Python

Modular Active Learning framework for Python3 Page contents Introduction Active learning from bird's-eye view modAL in action From zero to one in a fe

modAL 1.9k Dec 31, 2022
ManipNet: Neural Manipulation Synthesis with a Hand-Object Spatial Representation - SIGGRAPH 2021

ManipNet: Neural Manipulation Synthesis with a Hand-Object Spatial Representation - SIGGRAPH 2021 Dataset Code Demos Authors: He Zhang, Yuting Ye, Tak

HE ZHANG 194 Dec 06, 2022
πŸ“š A collection of all the Deep Learning Metrics that I came across which are not accuracy/loss.

πŸ“š A collection of all the Deep Learning Metrics that I came across which are not accuracy/loss.

Rahul Vigneswaran 1 Jan 17, 2022
DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models

DSEE Codes for [Preprint] DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models Xuxi Chen, Tianlong Chen, Yu Cheng, Weizhu Ch

VITA 4 Dec 27, 2021
This repository contains code for the paper "Decoupling Representation and Classifier for Long-Tailed Recognition", published at ICLR 2020

Classifier-Balancing This repository contains code for the paper: Decoupling Representation and Classifier for Long-Tailed Recognition Bingyi Kang, Sa

Facebook Research 820 Dec 26, 2022
RealTime Emotion Recognizer for Machine Learning Study Jam's demo

Emotion recognizer Table of contents Clone project Dataset Install dependencies Main program Demo 1. Clone project git clone https://github.com/GDSC20

Google Developer Student Club - UIT 1 Oct 05, 2021
Patch Rotation: A Self-Supervised Auxiliary Task for Robustness and Accuracy of Supervised Models

Patch-Rotation(PatchRot) Patch Rotation: A Self-Supervised Auxiliary Task for Robustness and Accuracy of Supervised Models Submitted to Neurips2021 To

4 Jul 12, 2021
AquaTimer - Programmable Timer for Aquariums based on ATtiny414/814/1614

AquaTimer - Programmable Timer for Aquariums based on ATtiny414/814/1614 AquaTimer is a programmable timer for 12V devices such as lighting, solenoid

Stefan Wagner 4 Jun 13, 2022
Code of PVTv2 is released! PVTv2 largely improves PVTv1 and works better than Swin Transformer with ImageNet-1K pre-training.

Updates (2020/06/21) Code of PVTv2 is released! PVTv2 largely improves PVTv1 and works better than Swin Transformer with ImageNet-1K pre-training. Pyr

1.3k Jan 04, 2023
Official PyTorch Code of GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection (CVPR 2021)

GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Mo

Abhinav Kumar 76 Jan 02, 2023
Code for the ECCV2020 paper "A Differentiable Recurrent Surface for Asynchronous Event-Based Data"

A Differentiable Recurrent Surface for Asynchronous Event-Based Data Code for the ECCV2020 paper "A Differentiable Recurrent Surface for Asynchronous

Marco Cannici 21 Oct 05, 2022
SSL_SLAM2: Lightweight 3-D Localization and Mapping for Solid-State LiDAR (mapping and localization separated) ICRA 2021

SSL_SLAM2 Lightweight 3-D Localization and Mapping for Solid-State LiDAR (Intel Realsense L515 as an example) This repo is an extension work of SSL_SL

Wang Han ηŽ‹ζ™— 1.3k Jan 08, 2023
Another pytorch implementation of FCN (Fully Convolutional Networks)

FCN-pytorch-easiest Trying to be the easiest FCN pytorch implementation and just in a get and use fashion Here I use a handbag semantic segmentation f

Y. Dong 158 Dec 21, 2022
TensorLight - A high-level framework for TensorFlow

TensorLight is a high-level framework for TensorFlow-based machine intelligence applications. It reduces boilerplate code and enables advanced feature

Benjamin Kan 10 Jul 31, 2022