Finetune the base 64 px GLIDE-text2im model from OpenAI on your own image-text dataset

Overview

glide-finetune

Finetune the base 64 px GLIDE-text2im model from OpenAI on your own image-text dataset.

Installation

git clone https://github.com/afiaka87/glide-finetune.git
cd glide-finetune/
python3 -m venv .venv # create a virtual environment to keep global install clean.
source .venv/bin/activate
(.venv) # optionally install pytorch manually for your own specific env first...
(.venv) python -m pip install -r requirements.txt

Usage

(.venv) python glide-finetune.py 
    --data_dir=./data \
    --batch_size=1 \
    --grad_acc=1 \
    --guidance_scale=4.0 \
    --learning_rate=2e-5 \
    --dropout=0.1 \
    --timestep_respacing=1000 \
    --side_x=64 \
    --side_y=64 \
    --resume_ckpt='' \
    --checkpoints_dir='./glide_checkpoints/' \
    --use_fp16 \
    --device='' \
    --freeze_transformer \
    --freeze_diffusion \
    --weight_decay=0.0 \
    --project_name='glide-finetune'

Known issues:

  • batching isn't handled in the dataloader
  • NaN/Inf errors
  • Resizing doesn't handle non-square aspect ratios properly
  • some of the code is messy, needs refactoring.
Comments
  • Fixed a couple of minor issues

    Fixed a couple of minor issues

    • Pinned webdataset version to work with python 3.7 which is the version being used in Colab, Kaggle. A new version of this module is releaed few days back which only works with 3.8/9
    • Fixed an issue with data_dir arg not getting picked up.
    opened by vanga 1
  • Fix NameError when using --data_dir

    Fix NameError when using --data_dir

    Hello and thank you for your great work.

    Right now using a local data folder with --data_dir results in

    Traceback (most recent call last):
      File "/content/glide-finetune/train_glide.py", line 292, in <module>
        data_dir=data_dir,
    NameError: name 'data_dir' is not defined
    

    This PR fixes that.

    opened by tillfalko 0
  • mention mpi4py dependency

    mention mpi4py dependency

    mpi4py installation will fail unless the user has this package installed. Since MPI is not a ubiquitous dependency it should probably be mentioned. Edit: Since torch==1.10.1 is a requirement, and torch versions come with their own cuda versions (torch 1.10.1 uses cuda 10.2), I don't see a reason not to just include bitsandbytes-cuda102 in requirements.txt.

    $ py -m venv .venv
    $ source .venv/bin/activate
    $ pip install torch==1.10.1
    Collecting torch==1.10.1
      Downloading torch-1.10.1-cp39-cp39-manylinux1_x86_64.whl (881.9 MB)
         |████████████████████████████████| 881.9 MB 15 kB/s
    Collecting typing-extensions
      Downloading typing_extensions-4.0.1-py3-none-any.whl (22 kB)
    Installing collected packages: typing-extensions, torch
    Successfully installed torch-1.10.1 typing-extensions-4.0.1
    $ py -c "import torch; print(torch.__version__)"
    1.10.1+cu102
    
    opened by tillfalko 0
  • Fixed half precision optimizer bug

    Fixed half precision optimizer bug

    Problem

    In half precision, after the first iteration nan values start appearing regardless of input data or gradients since the adam optimizer breaks in float16. The discussion for that can be viewed here.

    Solution

    This can be fixed by setting the eps variable to 1e-4 instead of the default 1e-8. This is the only thing this pr does

    opened by isamu-isozaki 0
  • Training on half precision leads to nan values

    Training on half precision leads to nan values

    I was training my model and I noticed that after just the first iteration I was running into nan values. As it turns out my gradients and input values/images were all normal but the adam optimizer by pytorch does has some weird behavior on float16 precision where it produces nans probably because of a divide by 0 error. A discussion can be found below

    https://discuss.pytorch.org/t/adam-half-precision-nans/1765/4

    I hear changing the epison parameter for the adam weights parameter when on half precisions works but I haven't tested it yet. Will make one once I tested.

    And also let me say thanks for this repo. I wanted to fine tune the glide model and this made it so much easier.

    opened by isamu-isozaki 1
  • Where is the resume_ckpt

    Where is the resume_ckpt

    Hi, thanks for your job.

    I noticed to finetune the glide, we should have a base_model, namely "resume_ckpt". --resume_ckpt 'ckpt_to_resume_from.pt'
    Where can we get this model? Because I find Glide also didn't provide any checkpoint. Thanks for your help.

    opened by zhaobingbingbing 0
Releases(v0.0.1)
  • v0.0.1(Feb 20, 2022)

    Having some experience with finetuning GLIDE on laion/alamy, etc. I think this code works great now and hope as many people can use it as possible. Please file bugs - I know there may be a few.

    New additions:

    • dataloader for LAION400M
    • dataloader for alamy
    • train the upsample model instead of just the base model
    • (early) code for training the released noisy CLIP. still a WIP.
    Source code(tar.gz)
    Source code(zip)
Owner
Clay Mullis
Software engineer working with multi-modal deep learning.
Clay Mullis
PyTorch implementation of CloudWalk's recent work DenseBody

densebody_pytorch PyTorch implementation of CloudWalk's recent paper DenseBody. Note: For most recent updates, please check out the dev branch. Update

Lingbo Yang 401 Nov 19, 2022
VID-Fusion: Robust Visual-Inertial-Dynamics Odometry for Accurate External Force Estimation

VID-Fusion VID-Fusion: Robust Visual-Inertial-Dynamics Odometry for Accurate External Force Estimation Authors: Ziming Ding , Tiankai Yang, Kunyi Zhan

ZJU FAST Lab 86 Nov 18, 2022
[ACMMM 2021, Oral] Code release for "Elastic Tactile Simulation Towards Tactile-Visual Perception"

EIP: Elastic Interaction of Particles Code release for "Elastic Tactile Simulation Towards Tactile-Visual Perception", in ACMMM (Oral) 2021. By Yikai

Yikai Wang 37 Dec 20, 2022
HiPAL: A Deep Framework for Physician Burnout Prediction Using Activity Logs in Electronic Health Records

HiPAL Code for KDD'22 Applied Data Science Track submission -- HiPAL: A Deep Framework for Physician Burnout Prediction Using Activity Logs in Electro

Hanyang Liu 4 Aug 08, 2022
Implementation for Learning to Track with Object Permanence

Learning to Track with Object Permanence A video-based MOT approach capable of tracking through full occlusions: Learning to Track with Object Permane

Toyota Research Institute - Machine Learning 91 Jan 03, 2023
Visualize Camera's Pose Using Extrinsic Parameter by Plotting Pyramid Model on 3D Space

extrinsic2pyramid Visualize Camera's Pose Using Extrinsic Parameter by Plotting Pyramid Model on 3D Space Intro A very simple and straightforward modu

JEONG HYEONJIN 106 Dec 28, 2022
A Simple Example for Imitation Learning with Dataset Aggregation (DAGGER) on Torcs Env

Imitation Learning with Dataset Aggregation (DAGGER) on Torcs Env This repository implements a simple algorithm for imitation learning: DAGGER. In thi

Hao 66 Nov 23, 2022
A machine learning library for spiking neural networks. Supports training with both torch and jax pipelines, and deployment to neuromorphic hardware.

Rockpool Rockpool is a Python package for developing signal processing applications with spiking neural networks. Rockpool allows you to build network

SynSense 21 Dec 14, 2022
Lightweight Face Image Quality Assessment

LightQNet This is a demo code of training and testing [LightQNet] using Tensorflow. Uncertainty Losses: IDQ loss PCNet loss Uncertainty Networks: Mobi

Kaen 5 Nov 18, 2022
D²Conv3D: Dynamic Dilated Convolutions for Object Segmentation in Videos

D²Conv3D: Dynamic Dilated Convolutions for Object Segmentation in Videos This repository contains the implementation for "D²Conv3D: Dynamic Dilated Co

17 Oct 20, 2022
Computational inteligence project on faces in the wild dataset

Table of Contents The general idea How these scripts work? Loading data Needed modules and global variables Parsing the arrays in dataset Extracting a

tooraj taraz 4 Oct 21, 2022
buildseg is a building extraction plugin of QGIS based on PaddlePaddle.

buildseg buildseg is a building extraction plugin of QGIS based on PaddlePaddle. TODO Extract building on 512x512 remote sensing images. Extract build

Yizhou Chen 11 Sep 26, 2022
Official implementation for "Symbolic Learning to Optimize: Towards Interpretability and Scalability"

Symbolic Learning to Optimize This is the official implementation for ICLR-2022 paper "Symbolic Learning to Optimize: Towards Interpretability and Sca

VITA 8 Dec 19, 2022
Official code of the paper "ReDet: A Rotation-equivariant Detector for Aerial Object Detection" (CVPR 2021)

ReDet: A Rotation-equivariant Detector for Aerial Object Detection ReDet: A Rotation-equivariant Detector for Aerial Object Detection (CVPR2021), Jiam

csuhan 334 Dec 23, 2022
Pytorch implementation of Implicit Behavior Cloning.

Implicit Behavior Cloning - PyTorch (wip) Pytorch implementation of Implicit Behavior Cloning. Install conda create -n ibc python=3.8 pip install -r r

Kevin Zakka 49 Dec 25, 2022
PyTorch Implementation for Deep Metric Learning Pipelines

Easily Extendable Basic Deep Metric Learning Pipeline Karsten Roth ([email 

Karsten Roth 543 Jan 04, 2023
Search Youtube Video and Get Video info

PyYouTube Get Video Data from YouTube link Installation pip install PyYouTube How to use it ? Get Videos Data from pyyoutube import Data yt = Data("ht

lokaman chendekar 35 Nov 25, 2022
PyTorch version repo for CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

Study-CSRNet-pytorch This is the PyTorch version repo for CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

0 Mar 01, 2022
[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Fudan Zhang Vision Group 897 Jan 05, 2023
Benchmark library for high-dimensional HPO of black-box models based on Weighted Lasso regression

LassoBench LassoBench is a library for high-dimensional hyperparameter optimization benchmarks based on Weighted Lasso regression. Note: LassoBench is

Kenan Šehić 5 Mar 15, 2022