Finetune the base 64 px GLIDE-text2im model from OpenAI on your own image-text dataset

Last update: Oct 13, 2022

Related tags

Overview

glide-finetune

Finetune the base 64 px GLIDE-text2im model from OpenAI on your own image-text dataset.

Installation

git clone https://github.com/afiaka87/glide-finetune.git
cd glide-finetune/
python3 -m venv .venv # create a virtual environment to keep global install clean.
source .venv/bin/activate
(.venv) # optionally install pytorch manually for your own specific env first...
(.venv) python -m pip install -r requirements.txt

Usage

(.venv) python glide-finetune.py 
    --data_dir=./data \
    --batch_size=1 \
    --grad_acc=1 \
    --guidance_scale=4.0 \
    --learning_rate=2e-5 \
    --dropout=0.1 \
    --timestep_respacing=1000 \
    --side_x=64 \
    --side_y=64 \
    --resume_ckpt='' \
    --checkpoints_dir='./glide_checkpoints/' \
    --use_fp16 \
    --device='' \
    --freeze_transformer \
    --freeze_diffusion \
    --weight_decay=0.0 \
    --project_name='glide-finetune'

Known issues:

batching isn't handled in the dataloader
NaN/Inf errors
Resizing doesn't handle non-square aspect ratios properly
some of the code is messy, needs refactoring.

Comments

Fixed a couple of minor issues
Pinned webdataset version to work with python 3.7 which is the version being used in Colab, Kaggle. A new version of this module is releaed few days back which only works with 3.8/9

Fixed an issue with data_dir arg not getting picked up.
opened by vanga 1
Fix NameError when using --data_dir
Hello and thank you for your great work.

Right now using a local data folder with --data_dir results in

Traceback (most recent call last): File "/content/glide-finetune/train_glide.py", line 292, in <module> data_dir=data_dir, NameError: name 'data_dir' is not defined

This PR fixes that.
opened by tillfalko 0

mention mpi4py dependency

mpi4py installation will fail unless the user has this package installed. Since MPI is not a ubiquitous dependency it should probably be mentioned. Edit: Since torch==1.10.1 is a requirement, and torch versions come with their own cuda versions (torch 1.10.1 uses cuda 10.2), I don't see a reason not to just include bitsandbytes-cuda102 in requirements.txt.

$ py -m venv .venv
$ source .venv/bin/activate
$ pip install torch==1.10.1
Collecting torch==1.10.1
  Downloading torch-1.10.1-cp39-cp39-manylinux1_x86_64.whl (881.9 MB)
     |████████████████████████████████| 881.9 MB 15 kB/s
Collecting typing-extensions
  Downloading typing_extensions-4.0.1-py3-none-any.whl (22 kB)
Installing collected packages: typing-extensions, torch
Successfully installed torch-1.10.1 typing-extensions-4.0.1
$ py -c "import torch; print(torch.__version__)"
1.10.1+cu102

opened by tillfalko 0

Fixed half precision optimizer bug

Problem

In half precision, after the first iteration nan values start appearing regardless of input data or gradients since the adam optimizer breaks in float16. The discussion for that can be viewed here.

Solution

This can be fixed by setting the eps variable to 1e-4 instead of the default 1e-8. This is the only thing this pr does

opened by isamu-isozaki 0
Training on half precision leads to nan values

I was training my model and I noticed that after just the first iteration I was running into nan values. As it turns out my gradients and input values/images were all normal but the adam optimizer by pytorch does has some weird behavior on float16 precision where it produces nans probably because of a divide by 0 error. A discussion can be found below

https://discuss.pytorch.org/t/adam-half-precision-nans/1765/4

I hear changing the epison parameter for the adam weights parameter when on half precisions works but I haven't tested it yet. Will make one once I tested.

And also let me say thanks for this repo. I wanted to fine tune the glide model and this made it so much easier.

opened by isamu-isozaki 1
Where is the resume_ckpt

Hi, thanks for your job.

I noticed to finetune the glide, we should have a base_model, namely "resume_ckpt". --resume_ckpt 'ckpt_to_resume_from.pt'
Where can we get this model? Because I find Glide also didn't provide any checkpoint. Thanks for your help.

opened by zhaobingbingbing 0

Releases(v0.0.1)

v0.0.1(Feb 20, 2022)
Having some experience with finetuning GLIDE on laion/alamy, etc. I think this code works great now and hope as many people can use it as possible. Please file bugs - I know there may be a few.

New additions:

dataloader for LAION400M

dataloader for alamy

train the upsample model instead of just the base model

(early) code for training the released noisy CLIP. still a WIP.

Source code(tar.gz)
Source code(zip)

Owner

Clay Mullis

Software engineer working with multi-modal deep learning.

GitHub Repository

Scripts and misc. stuff related to the PortSwigger Web Academy

PortSwigger Web Academy Notes Mostly scripts to automate the exploits. Going in the order of the recomended learning path - starting with SQLi. Commun

17 Dec 30, 2022

A toolkit for document-level event extraction, containing some SOTA model implementations

❤️ A Toolkit for Document-level Event Extraction with & without Triggers Hi, there 👋 . Thanks for your stay in this repo. This project aims at buildi

159 Dec 22, 2022

I3-master-layout - Simple master and stack layout script

Simple master and stack layout script | ------ | ----- | | | | | Ma

18 Dec 05, 2022

Hierarchical Memory Matching Network for Video Object Segmentation (ICCV 2021)

Hierarchical Memory Matching Network for Video Object Segmentation Hongje Seong, Seoung Wug Oh, Joon-Young Lee, Seongwon Lee, Suhyeon Lee, Euntai Kim

72 Dec 14, 2022

Code for "SRHEN: Stepwise-Refining Homography Estimation Network via Parsing Geometric Correspondences in Deep Latent Space"

SRHEN This is a better and simpler implementation for "SRHEN: Stepwise-Refining Homography Estimation Network via Parsing Geometric Correspondences in

1 Oct 28, 2022

Perform Linear Classification with Multi-way Data

MultiwayClassification This is an R package to perform linear classification for data with multi-way structure. The distance-weighted discrimination (

2 Dec 15, 2020

Physics-Aware Training (PAT) is a method to train real physical systems with backpropagation.

Physics-Aware Training (PAT) is a method to train real physical systems with backpropagation. It was introduced in Wright, Logan G. & Onodera, Tatsuhiro et al. (2021)1 to train Physical Neural Networ

230 Jan 05, 2023

classify fashion-mnist dataset with pytorch

Fashion-Mnist Classifier with PyTorch Inference 1- clone this repository: git clone https://github.com/Jhamed7/Fashion-Mnist-Classifier.git 2- Instal

1 Jan 14, 2022

The repository is for safe reinforcement learning baselines.

Safe-Reinforcement-Learning-Baseline The repository is for Safe Reinforcement Learning (RL) research, in which we investigate various safe RL baseline

172 Dec 19, 2022

Spatial color quantization in Rust

rscolorq Rust port of Derrick Coetzee's scolorq, based on the 1998 paper "On spatial quantization of color images" by Jan Puzicha, Markus Held, Jens K

37 Dec 22, 2022

VisionKG: Vision Knowledge Graph

VisionKG: Vision Knowledge Graph Official Repository of VisionKG by Anh Le-Tuan, Trung-Kien Tran, Manh Nguyen-Duc, Jicheng Yuan, Manfred Hauswirth and

9 Jun 23, 2022

A Fast Knowledge Distillation Framework for Visual Recognition

FKD: A Fast Knowledge Distillation Framework for Visual Recognition Official PyTorch implementation of paper A Fast Knowledge Distillation Framework f

129 Dec 24, 2022

Face Transformer for Recognition

Face-Transformer This is the code of Face Transformer for Recognition (https://arxiv.org/abs/2103.14803v2). Recently there has been great interests of

153 Nov 30, 2022

CoRe: Contrastive Recurrent State-Space Models

CoRe: Contrastive Recurrent State-Space Models This code implements the CoRe model and reproduces experimental results found in Robust Robotic Control

21 Aug 11, 2022

CVPR2021: Temporal Context Aggregation Network for Temporal Action Proposal Refinement

Temporal Context Aggregation Network - Pytorch This repo holds the pytorch-version codes of paper: "Temporal Context Aggregation Network for Temporal

63 Sep 27, 2022

Using knowledge-informed machine learning on the PRONOSTIA (FEMTO) and IMS bearing data sets. Predict remaining-useful-life (RUL).

Knowledge Informed Machine Learning using a Weibull-based Loss Function Exploring the concept of knowledge-informed machine learning with the use of a

43 Dec 14, 2022

[CVPR2021] Domain Consensus Clustering for Universal Domain Adaptation

[CVPR2021] Domain Consensus Clustering for Universal Domain Adaptation [Paper] Prerequisites To install requirements: pip install -r requirements.txt

84 Dec 26, 2022

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Transformer in Transformer Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image c

272 Dec 23, 2022

Prososdy Morph: A python library for manipulating pitch and duration in an algorithmic way, for resynthesizing speech.

ProMo (Prosody Morph) Questions? Comments? Feedback? Chat with us on gitter! A library for manipulating pitch and duration in an algorithmic way, for

71 Jan 02, 2023

Implementation of CVPR 2021 paper "Spatially-invariant Style-codes Controlled Makeup Transfer"

SCGAN Implementation of CVPR 2021 paper "Spatially-invariant Style-codes Controlled Makeup Transfer" Prepare The pre-trained model is avaiable at http

118 Dec 12, 2022