MAGMA - a GPT-style multimodal model that can understand any combination of images and language

Related tags

Deep Learningmagma
Overview

MAGMA -- Multimodal Augmentation of Generative Models through Adapter-based Finetuning

Authors

repo (alphabetical)

Constantin (CoEich), Mayukh (Mayukhdeb), Sid (sdtblck)

paper

Constantin Eichenberg, Sidney Black, Samuel Weinbach, Aleph Alpha

Letitia Parcalabescu, Anette Frank, Heidelberg University

Abstract

Large-scale pretraining is fast becoming the norm in Vision-Language (VL) modeling. However, prevailing VL approaches are limited by the requirement for labeled data and the use of complex multi-step pretraining objectives. We present MAGMA - a simple method for augmenting generative language models with additional modalities using adapter-based finetuning. Building on Frozen, we train a series of VL models that autoregressively generate text from arbitrary combinations of visual and textual input. The pretraining is entirely end-to-end using a single language modeling objective, simplifying optimization compared to previous approaches. Importantly, the language model weights remain unchanged during training, allowing for transfer of encyclopedic knowledge and in-context learning abilities from language pretraining. MAGMA outperforms Frozen on open-ended generative tasks, achieving state of the art results on the OKVQA benchmark and competitive results on a range of other popular VL benchmarks, while pretraining on 0.2% of the number of samples used to train SimVLM.

Paper on arXiv: https://arxiv.org/abs/2112.05253

Examples (via Aleph Alpha playground)

Photos Text & Technical
A man covering a woman's eyes to hide a present A hand drawn treasure map
A fallen tree is blocking a road A software architecture

Model design

MAGMA model design

About the repository

In this repository we share the main parts of the codebase for training and inference of our MAGMA VL model. The main use of the repo is for downloading our pretrained weights and interacting with the model. We include a script for data parallel training with Deepspeed for finetuning our models or training a MAGMA model from scratch.

Installation

Make sure PyTorch (Ver >= 1.9.0) and Torchvision are installed. See https://pytorch.org/get-started/locally/.

You can pip install from the git repository with:

pip install git+https://github.com/Aleph-Alpha/magma.git

Make sure that you also download the config:

mkdir configs; wget -O configs/MAGMA_v1.yml https://raw.githubusercontent.com/Aleph-Alpha/magma/add-setup/configs/MAGMA_v1.yml

Or if you've cloned the repo, you can install all further requirements by:

pip install -r requirements.txt

Checkpoint

We also publish the model checkpoint that has been used for the publication. It is hosted on our infrastructure and downloads automatically. It can be downloaded manually here: https://bit.ly/aleph_alpha_magma_download

This checkpoint can also be played around with on a space managed by Heath Mitchell, AK, and Stella Biderman. (This is a 3rd party space, not managed by Aleph Alpha.)

Loading a model for inference

Downloads the checkpoint file into checkpoint_path if it's not already present.

from magma import Magma
from magma.image_input import ImageInput

model = Magma.from_checkpoint(
    config_path = "configs/MAGMA_v1.yml",
    checkpoint_path = "./mp_rank_00_model_states.pt",
    device = 'cuda:0'
)

inputs =[
    ## supports urls and path/to/image
    ImageInput('https://www.art-prints-on-demand.com/kunst/thomas_cole/woods_hi.jpg'),
    'Describe the painting:'
]

## returns a tensor of shape: (1, 149, 4096)
embeddings = model.preprocess_inputs(inputs)  

## returns a list of length embeddings.shape[0] (batch size)
output = model.generate(
    embeddings = embeddings,
    max_steps = 6,
    temperature = 0.7,
    top_k = 0,
)  

print(output[0]) ##  A cabin on a lake

Converting datasets to our format

To convert an image-caption dataset to our dataset class magma.datasets.ImgCptDataset, we suggest:

from magma.datasets.convert_datasets import convert_dataset

def my_dataset_iterator():
    """
    Implement an iterator for your dataset that for every datapoint yields a tuple
    image_path, {"captions": [...], "metadata": {...}, }, where image_path is the path to the image as a Path object, captions is a list of caption strings and metadata is an optional field.
    """

if __name__ == "__main__":
    convert_dataset(data_dir="/target/directory", ds_iterator=my_dataset_iterator())

How to train MAGMA

Run the training with:

deepspeed train.py --config path_to_my_config

To continue training from a deepspeed checkpoint, provide the checkpoint directory in the "load" config parameter.

WARNING: By default, instantiating magma via the init method instead of from_checkpoint loads the pretrained CLIP weights but not the pretrained gpt-j weights. For training MAGMA from scratch, download the gpt-j weights from this repo: https://github.com/finetuneanon/transformers and include them in the state dict after initializing the MAGMA model.

Owner
Aleph Alpha GmbH
Aleph Alpha GmbH
Creating Artificial Life with Reinforcement Learning

Although Evolutionary Algorithms have shown to result in interesting behavior, they focus on learning across generations whereas behavior could also be learned during ones lifetime.

Maarten Grootendorst 49 Dec 21, 2022
PyTorch implementation of SmoothGrad: removing noise by adding noise.

SmoothGrad implementation in PyTorch PyTorch implementation of SmoothGrad: removing noise by adding noise. Vanilla Gradients SmoothGrad Guided backpro

SSKH 143 Jan 05, 2023
PyTorch implementation for the visual prior component (i.e. perception module) of the Visually Grounded Physics Learner [Li et al., 2020].

VGPL-Visual-Prior PyTorch implementation for the visual prior component (i.e. perception module) of the Visually Grounded Physics Learner (VGPL). Give

Toru 8 Dec 29, 2022
Knowledgeable Prompt-tuning: Incorporating Knowledge into Prompt Verbalizer for Text Classification

Knowledgeable Prompt-tuning: Incorporating Knowledge into Prompt Verbalizer for Text Classification

DingDing 143 Jan 01, 2023
A python package for generating, analyzing and visualizing building shadows

pybdshadow Introduction pybdshadow is a python package for generating, analyzing and visualizing building shadows from large scale building geographic

Qing Yu 13 Nov 30, 2022
DeepStruc is a Conditional Variational Autoencoder which can predict the mono-metallic nanoparticle from a Pair Distribution Function.

ChemRxiv | [Paper] XXX DeepStruc Welcome to DeepStruc, a Deep Generative Model (DGM) that learns the relation between PDF and atomic structure and the

Emil Thyge Skaaning Kjær 13 Aug 01, 2022
Code for our ALiBi method for transformer language models.

Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation This repository contains the code and models for our paper Tra

Ofir Press 211 Dec 31, 2022
SCU OlympicsRunning Baseline

Competition 1v1 running Environment check details in Jidi Competition RLChina2021智能体竞赛 做出的修改: 奖励重塑:修改了环境,重新设置了奖励的分配,使得奖励组成不只有零和博弈,还有探索环境的奖励。 算法微调:修改了官

ZiSeoi Wong 2 Nov 23, 2021
Efficient Sparse Attacks on Videos using Reinforcement Learning

EARL This repository provides a simple implementation of the work "Efficient Sparse Attacks on Videos using Reinforcement Learning" Example: Demo: Her

12 Dec 05, 2021
Image-to-image translation with conditional adversarial nets

pix2pix Project | Arxiv | PyTorch Torch implementation for learning a mapping from input images to output images, for example: Image-to-Image Translat

Phillip Isola 9.3k Jan 08, 2023
A medical imaging framework for Pytorch

Welcome to MedicalTorch MedicalTorch is an open-source framework for PyTorch, implementing an extensive set of loaders, pre-processors and datasets fo

Christian S. Perone 799 Jan 03, 2023
CaFM-pytorch ICCV ACCEPT Introduction of dataset VSD4K

CaFM-pytorch ICCV ACCEPT Introduction of dataset VSD4K Our dataset VSD4K includes 6 popular categories: game, sport, dance, vlog, interview and city.

96 Jul 05, 2022
Official repo of the paper "Surface Form Competition: Why the Highest Probability Answer Isn't Always Right"

Surface Form Competition This is the official repo of the paper "Surface Form Competition: Why the Highest Probability Answer Isn't Always Right" We p

Peter West 46 Dec 23, 2022
RetinaFace: Deep Face Detection Library in TensorFlow for Python

RetinaFace is a deep learning based cutting-edge facial detector for Python coming with facial landmarks.

Sefik Ilkin Serengil 512 Dec 29, 2022
Region-aware Contrastive Learning for Semantic Segmentation, ICCV 2021

Region-aware Contrastive Learning for Semantic Segmentation, ICCV 2021 Abstract Recent works have made great success in semantic segmentation by explo

Hanzhe Hu 30 Dec 29, 2022
OstrichRL: A Musculoskeletal Ostrich Simulation to Study Bio-mechanical Locomotion.

OstrichRL This is the repository accompanying the paper OstrichRL: A Musculoskeletal Ostrich Simulation to Study Bio-mechanical Locomotion. It contain

Vittorio La Barbera 51 Nov 17, 2022
Rethinking Transformer-based Set Prediction for Object Detection

Rethinking Transformer-based Set Prediction for Object Detection Here are the code for the ICCV paper. The code is adapted from Detectron2 and AdelaiD

Zhiqing Sun 62 Dec 03, 2022
Real-time object detection on Android using the YOLO network with TensorFlow

TensorFlow YOLO object detection on Android Source project android-yolo is the first implementation of YOLO for TensorFlow on an Android device. It is

Nataniel Ruiz 624 Jan 03, 2023
An auto discord account and token generator. Automatically verifies the phone number. Works without proxy. Bypasses captcha.

JOIN DISCORD SERVER https://discord.gg/uAc3agBY FREE HCAPTCHA SOLVING API Discord-Token-Gen An auto discord token generator. Auto verifies phone numbe

3kp 271 Jan 01, 2023
A transformer which can randomly augment VOC format dataset (both image and bbox) online.

VocAug It is difficult to find a script which can augment VOC-format dataset, especially the bbox. Or find a script needs complex requirements so it i

Coder.AN 1 Mar 05, 2022