Exploiting a Zoo of Checkpoints for Unseen Tasks

Last update: Sep 06, 2022

Related tags

Overview

Exploiting a Zoo of Checkpoints for Unseen Tasks

This repo includes code to reproduce all results in the above Neurips paper, authored by Jiaji Huang, Qiang Qiu and Kenneth Church.

Dependencies

We used python 3.8.5, but other versions close to that should also work. Install all required packages by

pip install --upgrade pip
pip install -r requirements.txt

We used cuda 10.2.89, but any version that meets pytorch's requirement should also work.

Highlight of Results

We highlight some major results, so that readers do not have to read the paper to grasp the main ideas. Concisely, the paper tries to answer the question:

"Can we use a checkpoint zoo to build something that better adapts to unseen tasks?"

To answer the question, first we need to understand the geometry of a space of tasks.

Characterize the Task Space

In the paper, we model the tasks as following a Gaussian process. Its covariance is computed by applying kernel alignment to extracted features. The features are obtained by inputting probe data into checkpoints, each trained for a task. For example, using 34 checkpoints from Huggingface models, we can estimate the 34x34 covariance (of their corresponding tasks).

To reproduce the above figure, refer to LMs/README.md.

Exploit the Task Space

We hypothesize that representative tasks are more generalizable to new tasks. This, of course, needs a rigorious mathematical proof. But empirically we find it is true, as indicated by the experiments on NLP and vision tasks.

So, how to identify reprentative tasks? They are supposed to convey the most information about the rest of the task space. We formulate the problem into a Max-Mutual-Information (MMI) objective. The solver takes the covariance as input, and greedily picks reprentative tasks.

Linguistic Tasks

Using the 34x34 covariance matrix, we can identify that the 5 most representative tasks are those corresponding to roberta-base, distilbert-base-uncased, t5-base, bert-base-cased and bart-large. Combining these checkpoints yields superior results on 8 new linguistic tasks, e.g., below is an example of chunking task.

To reproduce full results, check LMs/README.md for details.

Computer Vision Tasks

The observation holds for vision tasks too. Below is an experiment set up on cifar100. MMI shows steady gain over random selection, and outperforms another baseline.

To reproduce all results, check vision/README.md for details.

Additional Comments

Note: This project requires running many small jobs. So it will be very useful if you have a cluster powered by slurm, which can launch jobs in parallel. In the job-launching scripts, you can see multiple commands like

sbatch -p $partition --gres=gpu:1 --wrap "python run.py" -o $job_log_path

If you do not have such a cluster, just use

python run.py > $job_log_path

instead.

Exploiting a Zoo of Checkpoints for Unseen Tasks

Related tags

Overview

Exploiting a Zoo of Checkpoints for Unseen Tasks

Dependencies

Highlight of Results

Characterize the Task Space

Exploit the Task Space

Linguistic Tasks

Computer Vision Tasks

Additional Comments

Owner

Baidu Research

PGPortfolio: Policy Gradient Portfolio, the source code of "A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem"(https://arxiv.org/pdf/1706.10059.pdf).

EMNLP'2021: SimCSE: Simple Contrastive Learning of Sentence Embeddings

Deepface is a lightweight face recognition and facial attribute analysis (age, gender, emotion and race) framework for python

A Re-implementation of the paper "A Deep Learning Framework for Character Motion Synthesis and Editing"

An NVDA add-on to split screen reader and audio from other programs to different sound channels

1st place solution in CCF BDCI 2021 ULSEG challenge

TensorFlow implementation of the paper "Hierarchical Attention Networks for Document Classification"

Official implementation of Monocular Quasi-Dense 3D Object Tracking

PyTorch implementation for OCT-GAN Neural ODE-based Conditional Tabular GANs (WWW 2021)

Element selection for functional materials discovery by integrated machine learning of atomic contributions to properties

RMNA: A Neighbor Aggregation-Based Knowledge Graph Representation Learning Model Using Rule Mining

Differentiable simulation for system identification and visuomotor control

Implementation of the 😇 Attention layer from the paper, Scaling Local Self-Attention For Parameter Efficient Visual Backbones

Tensorflow implementation of MIRNet for Low-light image enhancement

Code for our paper Aspect Sentiment Quad Prediction as Paraphrase Generation in EMNLP 2021.

Simple Baselines for Human Pose Estimation and Tracking

Space robot - (Course Project) Using the space robot to capture the target satellite that is disabled and spinning, then stabilize and fix it up

Sequence modeling benchmarks and temporal convolutional networks

Unofficial PyTorch Implementation of AHDRNet (CVPR 2019)

ncnn is a high-performance neural network inference framework optimized for the mobile platform