Code for "Training Neural Networks with Fixed Sparse Masks" (NeurIPS 2021).

Last update: Dec 30, 2022

Related tags

Overview

Fisher Induced Sparse uncHanging (FISH) Mask

This repo contains the code for Fisher Induced Sparse uncHanging (FISH) Mask training, from "Training Neural Networks with Fixed Sparse Masks" by Yi-Lin Sung, Varun Nair, and Colin Raffel. To appear in Neural Information Processing Systems (NeurIPS) 2021.

Abstract: During typical gradient-based training of deep neural networks, all of the model's parameters are updated at each iteration. Recent work has shown that it is possible to update only a small subset of the model's parameters during training, which can alleviate storage and communication requirements. In this paper, we show that it is possible to induce a fixed sparse mask on the model’s parameters that selects a subset to update over many iterations. Our method constructs the mask out of the parameters with the largest Fisher information as a simple approximation as to which parameters are most important for the task at hand. In experiments on parameter-efficient transfer learning and distributed training, we show that our approach matches or exceeds the performance of other methods for training with sparse updates while being more efficient in terms of memory usage and communication costs.

Setup

pip install transformers/.
pip install datasets torch==1.8.0 tqdm torchvision==0.9.0

FISH Mask: GLUE Experiments

Parameter-Efficient Transfer Learning

To run the FISH Mask on a GLUE dataset, code can be run with the following format:

$ bash transformers/examples/text-classification/scripts/run_sparse_updates.sh <dataset-name> <seed> <top_k_percentage> <num_samples_for_fisher>

An example command used to generate Table 1 in the paper is as follows, where all GLUE tasks are provided at a seed of 0 and a FISH mask sparsity of 0.5%.

$ bash transformers/examples/text-classification/scripts/run_sparse_updates.sh "qqp mnli rte cola stsb sst2 mrpc qnli" 0 0.005 1024

Distributed Training

To use the FISH mask on the GLUE tasks in a distributed setting, one can use the following command.

$ bash transformers/examples/text-classification/scripts/distributed_training.sh <dataset-name> <seed> <num_workers> <training_epochs> <gpu_id>

Note the <dataset-name> here can only contain one task, so an example command could be

$ bash transformers/examples/text-classification/scripts/distributed_training.sh "mnli" 0 2 3.5 0

FISH Mask: CIFAR10 Experiments

To run the FISH mask on CIFAR10, code can be run with the following format:

Distributed Training

$ bash cifar10-fast/scripts/distributed_training_fish.sh <num_samples_for_fisher> <top_k_percentage> <training_epochs> <worker_updates> <learning_rate> <num_workers>

For example, in the paper, we compute the FISH mask of the 0.5% sparsity level by 256 samples and distribute the job to 2 workers for a total of 50 epochs training. Then the command would be

$ bash cifar10-fast/scripts/distributed_training_fish.sh 256 0.005 50 2 0.4 2

Efficient Checkpointing

$ bash cifar10-fast/scripts/small_checkpoints_fish.sh <num_samples_for_fisher> <top_k_percentage> <training_epochs> <learning_rate> <fix_mask>

The hyperparameters are almost the same as distributed training. However, the <fix_mask> is to indicate to fix the mask or not, and a valid input is either 0 or 1 (1 means to fix the mask).

Replicating Results

Replicating each of the tables and figures present in the original paper can be done by running the following:

# Table 1 - Parameter Efficient Fine-Tuning on GLUE

$ bash transformers/examples/text-classification/scripts/run_table_1.sh

# Figure 2 - Mask Sparsity Ablation and Sample Ablation

$ bash transformers/examples/text-classification/scripts/run_figure_2.sh

# Table 2 - Distributed Training on GLUE

$ bash transformers/examples/text-classification/scripts/run_table_2.sh

# Table 3 - Distributed Training on CIFAR10

$ bash cifar10-fast/scripts/distributed_training.sh

# Table 4 - Efficient Checkpointing

$ bash cifar10-fast/scripts/small_checkpoints.sh

Notes

For reproduction of Diff Pruning results from Table 1, see code here.

Acknowledgements

We thank Yoon Kim, Michael Matena, and Demi Guo for helpful discussions.

Code for "Training Neural Networks with Fixed Sparse Masks" (NeurIPS 2021).

Related tags

Overview

Fisher Induced Sparse uncHanging (FISH) Mask

Setup

FISH Mask: GLUE Experiments

Parameter-Efficient Transfer Learning

Distributed Training

FISH Mask: CIFAR10 Experiments

Distributed Training

Efficient Checkpointing

Replicating Results

Notes

Acknowledgements

Owner

Varun Nair

THIS IS THE OLD PYMC PROJECT. PLEASE USE PYMC3 INSTEAD:

This repository contains the source code of Auto-Lambda and baselines from the paper, Auto-Lambda: Disentangling Dynamic Task Relationships.

Geometric Vector Perceptrons --- a rotation-equivariant GNN for learning from biomolecular structure

A minimalist tool to display a network graph.

Code for paper "Context-self contrastive pretraining for crop type semantic segmentation"

Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks (MAPDN)

InDuDoNet+: A Model-Driven Interpretable Dual Domain Network for Metal Artifact Reduction in CT Images

Cascaded Pyramid Network (CPN) based on Keras (Tensorflow backend)

Oscar and VinVL

Official Implementation of "Learning Disentangled Behavior Embeddings"

[CVPR 2022] "The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy" by Tianlong Chen, Zhenyu Zhang, Yu Cheng, Ahmed Awadallah, Zhangyang Wang

Python project to take sound as input and output as RGB + Brightness values suitable for DMX

Analyzes your GitHub Profile and presents you with a report on how likely you are to become the next MLH Fellow!

A unet implementation for Image semantic segmentation

Supplementary code for the experiments described in the 2021 ISMIR submission: Leveraging Hierarchical Structures for Few Shot Musical Instrument Recognition.

The story of Chicken for Club Bing

Official code base for the poster "On the use of Cortical Magnification and Saccades as Biological Proxies for Data Augmentation" published in NeurIPS 2021 Workshop (SVRHM)

High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

Plug and play transformer you can find network structure and official complete code by clicking List

PyTorch implementation of Towards Accurate Alignment in Real-time 3D Hand-Mesh Reconstruction (ICCV 2021).

Code for "Training Neural Networks with Fixed Sparse Masks" (NeurIPS 2021).

Related tags

Overview

Fisher Induced Sparse uncHanging (FISH) Mask

Setup

FISH Mask: GLUE Experiments

Parameter-Efficient Transfer Learning

Distributed Training

FISH Mask: CIFAR10 Experiments

Distributed Training

Efficient Checkpointing

Replicating Results

Notes

Acknowledgements

Owner

Varun Nair

THIS IS THE **OLD** PYMC PROJECT. PLEASE USE PYMC3 INSTEAD:

This repository contains the source code of Auto-Lambda and baselines from the paper, Auto-Lambda: Disentangling Dynamic Task Relationships.

Geometric Vector Perceptrons --- a rotation-equivariant GNN for learning from biomolecular structure

A minimalist tool to display a network graph.

Code for paper "Context-self contrastive pretraining for crop type semantic segmentation"

Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks (MAPDN)

InDuDoNet+: A Model-Driven Interpretable Dual Domain Network for Metal Artifact Reduction in CT Images

Cascaded Pyramid Network (CPN) based on Keras (Tensorflow backend)

Oscar and VinVL

Official Implementation of "Learning Disentangled Behavior Embeddings"

[CVPR 2022] "The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy" by Tianlong Chen, Zhenyu Zhang, Yu Cheng, Ahmed Awadallah, Zhangyang Wang

Python project to take sound as input and output as RGB + Brightness values suitable for DMX

Analyzes your GitHub Profile and presents you with a report on how likely you are to become the next MLH Fellow!

A unet implementation for Image semantic segmentation

Supplementary code for the experiments described in the 2021 ISMIR submission: Leveraging Hierarchical Structures for Few Shot Musical Instrument Recognition.

The story of Chicken for Club Bing

Official code base for the poster "On the use of Cortical Magnification and Saccades as Biological Proxies for Data Augmentation" published in NeurIPS 2021 Workshop (SVRHM)

High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

Plug and play transformer you can find network structure and official complete code by clicking List

PyTorch implementation of Towards Accurate Alignment in Real-time 3D Hand-Mesh Reconstruction (ICCV 2021).

THIS IS THE OLD PYMC PROJECT. PLEASE USE PYMC3 INSTEAD: