RuDOLPH: One Hyper-Modal Transformer can be creative as DALL-E and smart as CLIP

Related tags

Deep Learningru-dolph
Overview

[Paper] [Хабр] [Model Card] [Colab] [Kaggle]

RuDOLPH 🦌 🎄 ☃️

One Hyper-Modal Transformer can be creative as DALL-E and smart as CLIP


Russian Diffusion On Language Picture Hyper-modality (RuDOLPH) is a fast and light text-image-text transformer (350M GPT-3) designed for a quick and easy fine-tuning setup for the solution of various tasks: from generating images by text description and image classification to visual question answering and more. This model demonstrates the power of Hyper-modality Transformers.

(!!!) Hyper-modality means generalized multi-modal, e.g., model that consists of two multi-modal parts: text-2-image and image-2-text becomes text and image hyper-modality model

Sparse Attention Mask

row - col - row - [last] conv

Models

Installing

pip install rudolph==0.0.1rc1

Usage

Init models

from rudalle import get_tokenizer, get_vae
from rudalle.utils import seed_everything
from rudalle.image_prompts import ImagePrompts

from rudolph.model import get_rudolph_model
from rudolph.pipelines import zs_clf, generate_codebooks, self_reranking_by_image, self_reranking_by_text, show, generate_captions, generate_texts
from rudolph import utils

device = 'cuda'
model = get_rudolph_model('350M', fp16=True, device=device)
model.to(device);
tokenizer = get_tokenizer()
vae = get_vae(dwt=False).to(device)

Text Generation

generate_texts(
    tokenizer,
    model,
    template='красивый пейзаж ',
    top_k=32, top_p=0.6, texts_num=32, bs=32, seed=42
)[:8]

[{'text': 'красивый пейзаж с лесом и рекой. вид с воздуха на сельскую местность. пейзаж с лесом и рекой. вид на горы с беспилотника', 'ppl': 82.94},
 {'text': 'красивый пейзаж в стиле реализм, автор которой сергей владимирович дорофеев', 'ppl': 112.73},
 {'text': 'красивый пейзаж с рекой и озером - обои для рабочего стола, картинки, фото', 'ppl': 125.55},
 {'text': 'красивый пейзаж с рекой и мостом через реку в сумерках', 'ppl': 170.83},
 {'text': 'красивый пейзаж с горами в тумане - горы в тумане', 'ppl': 180.72},
 {'text': 'красивый пейзаж с лесом и лугом в сумерках', 'ppl': 185.84},
 {'text': 'красивый пейзаж с озером и лесом на заднем плане', 'ppl': 199.84},
 {'text': 'красивый пейзаж с видом на горы в таиланде', 'ppl': 219.86}]

Setup for Fast Image Generation

text = 'рисунок кота'
bs, images_num = 48, 48
top_k, top_p = 512, 0.9
with torch.no_grad():
    codebooks = generate_codebooks(text, tokenizer, model, top_k=top_k, images_num=images_num, top_p=top_p, bs=bs)
    ppl_text, ppl_image = self_reranking_by_text(text, codebooks, tokenizer, model, bs=bs)
    images = vae.decode(codebooks[ppl_text.argsort()[:4]])
images = torchvision.utils.make_grid(images, nrow=2)
img = torchvision.transforms.functional.to_pil_image(images)
img

Image Generation + Self Reranking

text = 'красивый пейзаж с озером и лесом на заднем плане'
images_num = 256
seed_everything(42)
codebooks = []
for top_k, top_p, images_num in [
    (2048, 0.99, images_num),
    (1024, 0.99, images_num),
    (1024, 0.98, images_num),
]:
    codebooks.append(generate_codebooks(text, tokenizer, model, top_k=top_k, images_num=images_num, top_p=top_p, bs=32))

codebooks = torch.cat(codebooks)

ppl_text, ppl_image = self_reranking_by_text(text, codebooks, tokenizer, model, bs=32)
with torch.no_grad():
    images = vae.decode(codebooks[ppl_text.argsort()[:16]])

pil_images = utils.torch_tensors_to_pil_list(images)
show(pil_images, 8)

text = 'зимнее время года'

ppl_text, ppl_image = self_reranking_by_text(text, codebooks, tokenizer, model, bs=32)
with torch.no_grad():
    images = vae.decode(codebooks[ppl_text.argsort()[:16]])

pil_images = utils.torch_tensors_to_pil_list(images)
show(pil_images, 8)

text = 'ночное время суток'

ppl_text, ppl_image = self_reranking_by_text(text, codebooks, tokenizer, model, bs=32)
with torch.no_grad():
    images = vae.decode(codebooks[ppl_text.argsort()[:16]])

pil_images = utils.torch_tensors_to_pil_list(images)
show(pil_images, 8)

Image Prompt (like Inpainting)

text = 'лодка с алыми парусами'

images_num = 1024
bs = 32

borders = {'up': 6, 'left': 4, 'right': 6, 'down': 2}
image_prompts = ImagePrompts(pil_img, borders, vae, device, crop_first=True)

seed_everything(42)
codebooks = []
for top_k, top_p, images_num in [
    (1024, 0.99, images_num),
]:
    codebooks.append(
        generate_codebooks(text, tokenizer, model, top_k=top_k, images_num=images_num, top_p=top_p, bs=bs, image_prompts=image_prompts)
    )

codebooks = torch.cat(codebooks)

ppl_text, ppl_image = self_reranking_by_text(
    text,
    codebooks,
    tokenizer,
    model,
    bs=bs,
)
with torch.no_grad():
    images = vae.decode(codebooks[ppl_text.argsort()[:16]])

pil_images = utils.torch_tensors_to_pil_list(images)
show(pil_images, 8)

Diffusion (TODO, see Colab)

Image Captioning + Self Reranking

texts = generate_captions(pil_img, tokenizer, model, vae, template='на картинке ', top_k=8, captions_num=128, bs=32, top_p=0.6, seed=42)
ppl_text, ppl_image = self_reranking_by_image(texts, pil_img, tokenizer, model, vae, bs=32, seed=42)
for idx in ppl_image.argsort()[:8]:
    print(f'-{texts[idx]}')

-на картинке я хочу увидеть как выглядит дом в горах
-на картинке нарисована лодка с каяком и лесом
-на картинке нарисован дом с бассейном
-на картинке – пейзаж – горы – одна из самых красивых мест на планете
-на картинке: в норвегии
-на картинке в горах
-на картинке я хочу нарисовать дом
-на картинке изображен домик на горе

-на картинке изображен рыжий пес. на фото изображен рыжий пес
-на картинке собака с длинным носом и длинным носом и короткой шерстью
-на картинке собака с длинными ушами и короткой шерстью
-на картинке изображена собака с большими глазами и длинным носом
-на картинке изображен белый медведь
-на картинке собака похожа на стаффорда и бультерьера. фото, на котором
-на картинке собака похожа на бигля и на собаку
-на картинке собака с длинными ушами и длинными ушами и

-на картинке изображена улица с светофором
-на картинке изображен дом на участке ижс
-на картинке изображена дорога с двумя автомобилями
-на картинке изображен вид с воздуха на жилой район, который находится на улице и в районе жилого комплекса
-на картинке изображен вид на здание с окнами и окнами
-на картинке изображена дорога с светофором
-на картинке изображен дом напротив станции
-на картинке изображен жилой дом

-на картинке изображен мотоцикл иж юпитер
-на картинке изображена молодая женщина с каре на фоне деревянного дома
-на картинке изображён мотоцикл
-на картинке изображен велогонщик
-на картинке изображена мотокультиватор
-на картинке изображено здание
-на картинке изображена девушка с велосипедом
-на картинке изображен мопед

Zero-Shot Image Classification using PPL

import base64
import requests
from PIL import Image
from io import BytesIO

bs4_urls = requests.get('https://raw.githubusercontent.com/sberbank-ai/ru-dolph/master/pics/pipelines/cats_vs_dogs_bs4.json').json()

f, ax = plt.subplots(2,4, figsize=(12,6))

for i, bs4_url in enumerate(bs4_urls):
    pil_img = Image.open(BytesIO(base64.b64decode(bs4_url)))
    
    classes = ['кошка', 'собака']
    preds = zs_clf(
        pil_img, 
        classes,
        model, 
        tokenizer,
        vae,
        template = 'на фото изображена', 
    )
    ax[i//4, i%4].imshow(pil_img)
    ax[i//4, i%4].set_title(preds['class'])

Linear Probe (TODO, see Colab)

Authors:

Drawing Drawing

Citation

@article{shonenkov2022ruDolph,
  title         = {RuDOLPH: One Hyper-Modal Transformer can be creative as DALL-E and smart as CLIP},
  author        = {Alex Shonenkov and Michael Konstantinov},
  year          = {2022},
  eprint        = {...},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CL}
}
@misc{github2022ruDolph,
  title         = {RuDOLPH: One Hyper-Modal Transformer can be creative as DALL-E and smart as CLIP},
  author        = {Alex Shonenkov and Michael Konstantinov},
  year          = {2022},
  howpublished  = {\url{https://github.com/sberbank-ai/ru-dolph}},
}

Supported by



Owner
Sber AI
Sber AI
Constraint-based geometry sketcher for blender

Constraint-based sketcher addon for Blender that allows to create precise 2d shapes by defining a set of geometric constraints like tangent, distance,

1.7k Dec 31, 2022
A Pytorch implementation of SMU: SMOOTH ACTIVATION FUNCTION FOR DEEP NETWORKS USING SMOOTHING MAXIMUM TECHNIQUE

SMU_pytorch A Pytorch Implementation of SMU: SMOOTH ACTIVATION FUNCTION FOR DEEP NETWORKS USING SMOOTHING MAXIMUM TECHNIQUE arXiv https://arxiv.org/ab

Fuhang 36 Dec 24, 2022
LBK 20 Dec 02, 2022
Repositório criado para abrigar os notebooks com a listas de exercícios propostos pelo professor Gustavo Guanabara do canal Curso em Vídeo do YouTube durante o Curso de Python 3

Curso em Vídeo - Exercícios de Python 3 Sobre o repositório Este repositório contém os notebooks com a listas de exercícios propostos pelo professor G

João Pedro Pereira 9 Oct 15, 2022
Deep Markov Factor Analysis (NeurIPS2021)

Deep Markov Factor Analysis (DMFA) Codes and experiments for deep Markov factor analysis (DMFA) model accepted for publication at NeurIPS2021: A. Farn

Sarah Ostadabbas 2 Dec 16, 2022
Algorithmic trading using machine learning.

Algorithmic Trading This machine learning algorithm was built using Python 3 and scikit-learn with a Decision Tree Classifier. The program gathers sto

Sourav Biswas 101 Nov 10, 2022
Tutorial page of the Climate Hack, the greatest hackathon ever

Tutorial page of the Climate Hack, the greatest hackathon ever

UCL Artificial Intelligence Society 12 Jul 02, 2022
PyTorch implementation of Barlow Twins.

Barlow Twins: Self-Supervised Learning via Redundancy Reduction PyTorch implementation of Barlow Twins. @article{zbontar2021barlow, title={Barlow Tw

Facebook Research 839 Dec 29, 2022
CSE-519---Project - Job Title Analysis (Project for CSE 519 - Data Science Fundamentals)

A Multifaceted Approach to Job Title Analysis CSE 519 - Data Science Fundamentals Project Description Project consists of three parts: Salary Predicti

Jimit Dholakia 1 Jan 04, 2022
A MNIST-like fashion product database. Benchmark

Fashion-MNIST Table of Contents Why we made Fashion-MNIST Get the Data Usage Benchmark Visualization Contributing Contact Citing Fashion-MNIST License

Zalando Research 10.5k Jan 08, 2023
pytorch implementation of openpose including Hand and Body Pose Estimation.

pytorch-openpose pytorch implementation of openpose including Body and Hand Pose Estimation, and the pytorch model is directly converted from openpose

Hzzone 1.4k Jan 07, 2023
Automatically align face images 🙃→🙂. Can also do windowing and warping.

Automatic Face Alignment (AFA) Carl M. Gaspar & Oliver G.B. Garrod You have lots of photos of faces like this: But you want to line up all of the face

Carl Michael Gaspar 15 Dec 12, 2022
AutoML library for deep learning

Official Website: autokeras.com AutoKeras: An AutoML system based on Keras. It is developed by DATA Lab at Texas A&M University. The goal of AutoKeras

Keras 8.7k Jan 08, 2023
The 2nd Version Of Slothybot

SlothyBot Go to this website: "https://bitly.com/SlothyBot" The 2nd Version Of Slothybot. The Bot Has Many Features, Such As: Moderation Commands; Kic

Slothy 0 Jun 01, 2022
Code for the paper: Audio-Visual Scene Analysis with Self-Supervised Multisensory Features

[Paper] [Project page] This repository contains code for the paper: Andrew Owens, Alexei A. Efros. Audio-Visual Scene Analysis with Self-Supervised Mu

Andrew Owens 202 Dec 13, 2022
an implementation of softmax splatting for differentiable forward warping using PyTorch

softmax-splatting This is a reference implementation of the softmax splatting operator, which has been proposed in Softmax Splatting for Video Frame I

Simon Niklaus 338 Dec 28, 2022
TagLab: an image segmentation tool oriented to marine data analysis

TagLab: an image segmentation tool oriented to marine data analysis TagLab was created to support the activity of annotation and extraction of statist

Visual Computing Lab - ISTI - CNR 49 Dec 29, 2022
Official PyTorch implemention of our paper "Learning to Rectify for Robust Learning with Noisy Labels".

WarPI The official PyTorch implemention of our paper "Learning to Rectify for Robust Learning with Noisy Labels". Run python main.py --corruption_type

Haoliang Sun 3 Sep 03, 2022
Official PyTorch implementation of the paper "Recycling Discriminator: Towards Opinion-Unaware Image Quality Assessment Using Wasserstein GAN", accepted to ACM MM 2021 BNI Track.

RecycleD Official PyTorch implementation of the paper "Recycling Discriminator: Towards Opinion-Unaware Image Quality Assessment Using Wasserstein GAN

Yunan Zhu 23 Nov 05, 2022
Convert Table data to approximate values with GUI

Table_Editor Convert Table data to approximate values with GUIs... usage - Import methods for extension Tables. Imported method supposed to have only

CLJ 1 Jan 10, 2022