An experimental technique for efficiently exploring neural architectures.

Last update: Aug 04, 2022

Related tags

Overview

SMASH: One-Shot Model Architecture Search through HyperNetworks

An experimental technique for efficiently exploring neural architectures.

This repository contains code for the SMASH paper and video.

SMASH bypasses the need for fully training candidate models by learning an auxiliary HyperNet to approximate model weights, allowing for rapid comparison of a wide range of network architectures at the cost of a single training run.

Installation

To run this script, you will need PyTorch and a CUDA-capable GPU. If you wish to run it on CPU, just remove all the .cuda() calls.

Note that this code was written in PyTorch 0.12, and is not guaranteed to work on 0.2 until next week when I get a chance to update my own version. Please also be aware that, while thoroughly commented, this is research code for a heckishly complex project. I'll be doing more cleanup work to improve legibility soon.

Running

To run with default parameters, simply call

python train.py

This will by default train a SMASH net with nominally the same parametric budget as a WRN-40-4. Note that validation scores during training are calculated using a random architecture for each batch, and are therefore sort of an "average" measure.

After training, to sample and evaluate SMASH scores, call

python eval.py --SMASH=YOUR_MODEL_NAME_HERE_.pth

This will by default sample 500 random architectures, then perturb the best-found architecture 100 times, then employ a sort of Markov Chain to further perturb the best found architecture.

To select the best architecture and train a resulting net, then call

python train.py --SMASH=YOUR_MODEL_NAME_HERE_archs.npz

This will by default take the best architectuure There are lots of different options, including a number of experimental settings such as architectural gradient descent by proxy, in-op multiplicative gating, variable nonlinearities, setting specific op configuration types. Take a look at the train_parser in utils.py for details, though note that some of these weirder ones may be deprecated.

This code has boilerplate for loading Imagenet32x32 and ModelNet, but doesn't download or preprocess them on its own. It supports model parallelism on a single node, and half-precision training, though simple weightnorm is unstable in FP16 so you probably can't train a SMASH network with it.

Notes

This README doc is in very early stages, and will be updated soon.

Acknowledgments

Training and Progress code acquired in a drunken game of SpearPong with Jan Schlüter: https://github.com/Lasagne/Recipes/tree/master/papers/densenet
Metrics Logging code extracted from ancient diary of Daniel Maturana: https://github.com/dimatura/voxnet

An experimental technique for efficiently exploring neural architectures.

Related tags

Overview

SMASH: One-Shot Model Architecture Search through HyperNetworks

Installation

Running

Notes

Acknowledgments

Owner

Andy Brock

OptaPlanner wrappers for Python. Currently significantly slower than OptaPlanner in Java or Kotlin.

We evaluate our method on different datasets (including ShapeNet, CUB-200-2011, and Pascal3D+) and achieve state-of-the-art results, outperforming all the other supervised and unsupervised methods and 3D representations, all in terms of performance, accuracy, and training time.

Deep deconfounded recommender (Deep-Deconf) for paper "Deep causal reasoning for recommendations"

DeepI2I: Enabling Deep Hierarchical Image-to-Image Translation by Transferring from GANs

Official PyTorch implementation of Segmenter: Transformer for Semantic Segmentation

《Improving Unsupervised Image Clustering With Robust Learning》(2020)

Sign Language Transformers (CVPR'20)

Data, notebooks, and articles associated with the RSNA AI Deep Learning Lab at RSNA 2021

Efficient Multi Collection Style Transfer Using GAN

HandFoldingNet ✌️ : A 3D Hand Pose Estimation Network Using Multiscale-Feature Guided Folding of a 2D Hand Skeleton

Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.

Graph Analysis From Scratch

Task Transformer Network for Joint MRI Reconstruction and Super-Resolution (MICCAI 2021)

PyTorch Implementation of NCSOFT's FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis

This repository contains the implementation of Deep Detail Enhancment for Any Garment proposed in Eurographics 2021

[ICCV2021] 3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds

Does Oversizing Improve Prosumer Profitability in a Flexibility Market? - A Sensitivity Analysis using PV-battery System

Title: Graduate-Admissions-Predictor

The 2nd place solution of 2021 google landmark retrieval on kaggle.

Human head pose estimation using Keras over TensorFlow.