Code for paper Multitask-Finetuning of Zero-shot Vision-Language Models

Last update: Jul 15, 2022

Overview

Downloading our datasets

https://drive.google.com/file/d/1CfomsX6qmdCLfFutptqrQnp1RlaJEpXh/view?usp=sharing
extract and put the /data folder under the same root as /src

Dataset structure

Each dataset may have several subdatasets (most of them only have one)

|
   
   
    
    
    |dataset/
        -|
    
    
     
     
            -|
     
     
      
      
            -|
      
      
       
       
        -|
       
       
         ... |pickled/ -|tensor_dict.pt

The pickle file tensor_dict.pt has the following format:

{
    'subdataset_1':{
        'label_1':{
            'image_tensors':np.array((N,3,224,224)), # N: image number
            'input_ids':np.array(S), # S: token length of the filled template text
            'attention_masks':np.array(S),
            'template_input_ids':np.array(S_), # S_: token length of the un-filled template text
            'template_attention_masks':np.array(S_),
        },
        'label_2':{
            ...
        }
    },
    ...
}

ABO dataset contains an additional label_to_text.json file, which provides text template for each subdataset and label.

A list of available datasets and subdatasets

Dataset	dataset name (-i)	subdataset name (-d)
Clevr Counting	`ClevrCounting`	`counting`
Amazon Berkeley Objects (ABO)	`ABO`	`material`,`color`
Caltech-UCSD Birds 200 (CUB)	`CUB`	`classification`
Fungi	`Fungi`	`classification`
Mini-imagenet	`mini`	`classification`

Training with provided datasets

run.sh provided example code for performing training and meta-testing on our datasets.

Output format

Each model checkpoint dir contains two files:

step1.ckpt: model checkpoint after training phase
dev_test_results.json: scores on each task configuration on dev and test set during meta-testing

Loading checkpoint

Here is an example snippet for loading step1.ckpt from multitask-finetuning/classical-finetuning/zeroshot models:

/step1.ckpt")">

    model = MultitaskFinetuneCLIP()
    model = model.load_from_checkpoint(checkpoint_path="
    
    
     
     /step1.ckpt")

Here is an example snippet for loading step1.ckpt from fomaml models:

/step1.ckpt"))">

    model = LightningCLIP()
    model = l2l.algorithms.MAML(model, lr=1e-5 first_order=True)
    model.load_state_dict(torch.load("
    
    
     
     /step1.ckpt"))

Training with custom datasets

preprocess dataset

put your new dataset in the same format as provided dataset into data/
Specify template_function or the path to label_to_text json file (an example file can be found in /data/ABO/label_to_text.json) at line 350 and 355 in data.py
preprocess.sh provides an example of running data.py to create pickle file for your new dataset
add your dataset into construct_dataset(): line 77 in train.py and line 80 in train_MAML.py

train

modify run.sh to train and meta-test on your own dataset
refer to train.py and train_MAML.py for default and tuning hyperparameters for each algorithm

Code for paper Multitask-Finetuning of Zero-shot Vision-Language Models

Related tags

Overview

Downloading our datasets

Dataset structure

A list of available datasets and subdatasets

Training with provided datasets

Output format

Loading checkpoint

Training with custom datasets

preprocess dataset

train

Citation

Owner

Zhenhailong Wang

Neural-Machine-Translation - Implementation of revolutionary machine translation models

Data loaders and abstractions for text and NLP

Japanese NLP Library

Mednlp - Medical natural language parsing and utility library

nlabel is a library for generating, storing and retrieving tagging information and embedding vectors from various nlp libraries through a unified interface.

Blender addon - Scrub timeline from viewport with a shortcut

An evaluation toolkit for voice conversion models.

中文空间语义理解评测

CodeBERT: A Pre-Trained Model for Programming and Natural Languages.

Simple, Fast, Powerful and Easily extensible python package for extracting patterns from text, with over than 60 predefined Regular Expressions.

Model for recasing and repunctuating ASR transcripts

Simple, hackable offline speech to text - using the VOSK-API.

Python utility library for compositing PDF documents with reportlab.

precise iris segmentation

Transformers implementation for Fall 2021 Clinic

To create a deep learning model which can explain the content of an image in the form of speech through caption generation with attention mechanism on Flickr8K dataset.

Almost State-of-the-art Text Generation library

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project!

PortaSpeech - PyTorch Implementation

Script and models for clustering LAION-400m CLIP embeddings.