Code for paper Multitask-Finetuning of Zero-shot Vision-Language Models

Overview

Downloading our datasets

Dataset structure

  • Each dataset may have several subdatasets (most of them only have one)
|
   
   
    
    
    |dataset/
        -|
    
    
     
     
            -|
     
     
      
      
            -|
      
      
       
       
        -|
       
       
         ... |pickled/ -|tensor_dict.pt 
       
      
      
     
     
    
    
   
   
  • The pickle file tensor_dict.pt has the following format:
{
    'subdataset_1':{
        'label_1':{
            'image_tensors':np.array((N,3,224,224)), # N: image number
            'input_ids':np.array(S), # S: token length of the filled template text
            'attention_masks':np.array(S),
            'template_input_ids':np.array(S_), # S_: token length of the un-filled template text
            'template_attention_masks':np.array(S_),
        },
        'label_2':{
            ...
        }
    },
    ...
}
  • ABO dataset contains an additional label_to_text.json file, which provides text template for each subdataset and label.

A list of available datasets and subdatasets

Dataset dataset name (-i) subdataset name (-d)
Clevr Counting ClevrCounting counting
Amazon Berkeley Objects (ABO) ABO material,color
Caltech-UCSD Birds 200 (CUB) CUB classification
Fungi Fungi classification
Mini-imagenet mini classification

Training with provided datasets

run.sh provided example code for performing training and meta-testing on our datasets.

Output format

Each model checkpoint dir contains two files:

  • step1.ckpt: model checkpoint after training phase
  • dev_test_results.json: scores on each task configuration on dev and test set during meta-testing

Loading checkpoint

  • Here is an example snippet for loading step1.ckpt from multitask-finetuning/classical-finetuning/zeroshot models:
/step1.ckpt")">
    model = MultitaskFinetuneCLIP()
    model = model.load_from_checkpoint(checkpoint_path="
    
    
     
     /step1.ckpt")

    
    
  • Here is an example snippet for loading step1.ckpt from fomaml models:
/step1.ckpt"))">
    model = LightningCLIP()
    model = l2l.algorithms.MAML(model, lr=1e-5 first_order=True)
    model.load_state_dict(torch.load("
    
    
     
     /step1.ckpt"))

    
    

Training with custom datasets

preprocess dataset

  • put your new dataset in the same format as provided dataset into data/
  • Specify template_function or the path to label_to_text json file (an example file can be found in /data/ABO/label_to_text.json) at line 350 and 355 in data.py
  • preprocess.sh provides an example of running data.py to create pickle file for your new dataset
  • add your dataset into construct_dataset(): line 77 in train.py and line 80 in train_MAML.py

train

  • modify run.sh to train and meta-test on your own dataset
  • refer to train.py and train_MAML.py for default and tuning hyperparameters for each algorithm

Citation

Owner
Zhenhailong Wang
MSCS at UIUC, Research Assistant at BLENDER lab advised by Prof. Heng Ji
Zhenhailong Wang
A sentence aligner for comparable corpora

About Yalign is a tool for extracting parallel sentences from comparable corpora. Statistical Machine Translation relies on parallel corpora (eg.. eur

Machinalis 128 Aug 24, 2022
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis Jungil Kong, Jaehyeon Kim, Jaekyoung Bae In our paper, we p

Jungil Kong 1.1k Jan 02, 2023
NLP tool to extract emotional phrase from tweets 🤩

Emotional phrase extractor Extract phrase in the given text that is used to express the sentiment. Capturing sentiment in language is important in the

Shahul ES 38 Oct 17, 2022
✔👉A Centralized WebApp to Ensure Road Safety by checking on with the activities of the driver and activating label generator using NLP.

AI-For-Road-Safety Challenge hosted by Omdena Hyderabad Chapter Original Repo Link : https://github.com/OmdenaAI/omdena-india-roadsafety Final Present

Prathima Kadari 7 Nov 29, 2022
Data and code to support "Applied Natural Language Processing" (INFO 256, Fall 2021, UC Berkeley)

anlp21 Course materials for "Applied Natural Language Processing" (INFO 256, Fall 2021, UC Berkeley) Syllabus: http://people.ischool.berkeley.edu/~dba

David Bamman 48 Dec 06, 2022
spaCy plugin for Transformers , Udify, ELmo, etc.

Camphr - spaCy plugin for Transformers, Udify, Elmo, etc. Camphr is a Natural Language Processing library that helps in seamless integration for a wid

342 Nov 21, 2022
A framework for evaluating Knowledge Graph Embedding Models in a fine-grained manner.

A framework for evaluating Knowledge Graph Embedding Models in a fine-grained manner.

NEC Laboratories Europe 13 Sep 08, 2022
Tool to check whether a GCP bucket is public or not.

Tool to check publicly accessible GCP bucket. Blog https://justm0rph3u5.medium.com/gcp-inspector-auditing-publicly-exposed-gcp-bucket-ac6cad55618c Wha

DIVYANSHU SHUKLA 7 Nov 24, 2022
BERN2: an advanced neural biomedical namedentity recognition and normalization tool

BERN2 We present BERN2 (Advanced Biomedical Entity Recognition and Normalization), a tool that improves the previous neural network-based NER tool by

DMIS Laboratory - Korea University 99 Jan 06, 2023
ADCS cert template modification and ACL enumeration

Purpose This tool is designed to aid an operator in modifying ADCS certificate templates so that a created vulnerable state can be leveraged for privi

Fortalice Solutions, LLC 78 Dec 12, 2022
Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks

Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks, which modifies the input text with a textual template and directly uses PLMs to conduct pre

THUNLP 2.3k Jan 08, 2023
Fake Shakespearean Text Generator

Fake Shakespearean Text Generator This project contains an impelementation of stateful Char-RNN model to generate fake shakespearean texts. Files and

Recep YILDIRIM 1 Feb 15, 2022
A minimal Conformer ASR implementation adapted from ESPnet.

Conformer ASR A minimal Conformer ASR implementation adapted from ESPnet. Introduction I want to use the pre-trained English ASR model provided by ESP

Niu Zhe 3 Jan 24, 2022
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

Texar is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar provides

ASYML 2.3k Jan 07, 2023
Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms

FNet: Mixing Tokens with Fourier Transforms Pytorch implementation of Fnet : Mixing Tokens with Fourier Transforms. Citation: @misc{leethorp2021fnet,

Rishikesh (ऋषिकेश) 217 Dec 05, 2022
Basic Utilities for PyTorch Natural Language Processing (NLP)

Basic Utilities for PyTorch Natural Language Processing (NLP) PyTorch-NLP, or torchnlp for short, is a library of basic utilities for PyTorch NLP. tor

Michael Petrochuk 2.1k Jan 01, 2023
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language mod

20.5k Jan 08, 2023
Download videos from YouTube/Twitch/Twitter right in the Windows Explorer, without installing any shady shareware apps

youtube-dl and ffmpeg Windows Explorer Integration Download videos from YouTube/Twitch/Twitter and more (any platform that is supported by youtube-dl)

Wolfgang 226 Dec 30, 2022
(ACL 2022) The source code for the paper "Towards Abstractive Grounded Summarization of Podcast Transcripts"

Towards Abstractive Grounded Summarization of Podcast Transcripts We provide the source code for the paper "Towards Abstractive Grounded Summarization

10 Jul 01, 2022
A design of MIDI language for music generation task, specifically for Natural Language Processing (NLP) models.

MIDI Language Introduction Reference Paper: Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions: code This

Robert Bogan Kang 3 May 25, 2022