Keras + Hyperopt: A very simple wrapper for convenient hyperparameter optimization

Overview

This project is now archived. It's been fun working on it, but it's time for me to move on. Thank you for all the support and feedback over the last couple of years. If someone is interested in taking ownership, let's discuss. ✌️

Hyperas Build Status PyPI version

A very simple convenience wrapper around hyperopt for fast prototyping with keras models. Hyperas lets you use the power of hyperopt without having to learn the syntax of it. Instead, just define your keras model as you are used to, but use a simple template notation to define hyper-parameter ranges to tune.

Installation

pip install hyperas

Quick start

Assume you have data generated as such

def data():
    x_train = np.zeros(100)
    x_test = np.zeros(100)
    y_train = np.zeros(100)
    y_test = np.zeros(100)
    return x_train, y_train, x_test, y_test

and an existing keras model like the following

def create_model(x_train, y_train, x_test, y_test):
    model = Sequential()
    model.add(Dense(512, input_shape=(784,)))
    model.add(Activation('relu'))
    model.add(Dropout(0.2))
    model.add(Dense(512))
    model.add(Activation('relu'))
    model.add(Dropout(0.2))
    model.add(Dense(10))
    model.add(Activation('softmax'))

    # ... model fitting

    return model

To do hyper-parameter optimization on this model, just wrap the parameters you want to optimize into double curly brackets and choose a distribution over which to run the algorithm.

In the above example, let's say we want to optimize for the best dropout probability in both dropout layers. Choosing a uniform distribution over the interval [0,1], this translates into the following definition. Note that before returning the model, to optimize, we also have to define which evaluation metric of the model is important to us. For example, in the following, we optimize for accuracy.

Note: In the following code we use 'loss': -accuracy, i.e. the negative of accuracy. That's because under the hood hyperopt will always minimize whatever metric you provide. If instead you want to actually want to minimize a metric, say MSE or another loss function, you keep a positive sign (e.g. 'loss': mse).

from hyperas.distributions import uniform

def create_model(x_train, y_train, x_test, y_test):
    model = Sequential()
    model.add(Dense(512, input_shape=(784,)))
    model.add(Activation('relu'))
    model.add(Dropout({{uniform(0, 1)}}))
    model.add(Dense(512))
    model.add(Activation('relu'))
    model.add(Dropout({{uniform(0, 1)}}))
    model.add(Dense(10))
    model.add(Activation('softmax'))

    # ... model fitting

    score = model.evaluate(x_test, y_test, verbose=0)
    accuracy = score[1]
    return {'loss': -accuracy, 'status': STATUS_OK, 'model': model}

The last step is to actually run the optimization, which is done as follows:

best_run = optim.minimize(model=create_model,
                          data=data,
                          algo=tpe.suggest,
                          max_evals=10,
                          trials=Trials())

In this example we use at most 10 evaluation runs and the TPE algorithm from hyperopt for optimization.

Check the "complete example" below for more details.

Complete example

Note: It is important to wrap your data and model into functions as shown below, and then pass them as parameters to the minimizer. data() returns the data the create_model() needs. An extended version of the above example in one script reads as follows. This example shows many potential use cases of hyperas, including:

  • Varying dropout probabilities, sampling from a uniform distribution
  • Different layer output sizes
  • Different optimization algorithms to use
  • Varying choices of activation functions
  • Conditionally adding layers depending on a choice
  • Swapping whole sets of layers
from __future__ import print_function
import numpy as np

from hyperopt import Trials, STATUS_OK, tpe
from keras.datasets import mnist
from keras.layers.core import Dense, Dropout, Activation
from keras.models import Sequential
from keras.utils import np_utils

from hyperas import optim
from hyperas.distributions import choice, uniform


def data():
    """
    Data providing function:

    This function is separated from create_model() so that hyperopt
    won't reload data for each evaluation run.
    """
    (x_train, y_train), (x_test, y_test) = mnist.load_data()
    x_train = x_train.reshape(60000, 784)
    x_test = x_test.reshape(10000, 784)
    x_train = x_train.astype('float32')
    x_test = x_test.astype('float32')
    x_train /= 255
    x_test /= 255
    nb_classes = 10
    y_train = np_utils.to_categorical(y_train, nb_classes)
    y_test = np_utils.to_categorical(y_test, nb_classes)
    return x_train, y_train, x_test, y_test


def create_model(x_train, y_train, x_test, y_test):
    """
    Model providing function:

    Create Keras model with double curly brackets dropped-in as needed.
    Return value has to be a valid python dictionary with two customary keys:
        - loss: Specify a numeric evaluation metric to be minimized
        - status: Just use STATUS_OK and see hyperopt documentation if not feasible
    The last one is optional, though recommended, namely:
        - model: specify the model just created so that we can later use it again.
    """
    model = Sequential()
    model.add(Dense(512, input_shape=(784,)))
    model.add(Activation('relu'))
    model.add(Dropout({{uniform(0, 1)}}))
    model.add(Dense({{choice([256, 512, 1024])}}))
    model.add(Activation({{choice(['relu', 'sigmoid'])}}))
    model.add(Dropout({{uniform(0, 1)}}))

    # If we choose 'four', add an additional fourth layer
    if {{choice(['three', 'four'])}} == 'four':
        model.add(Dense(100))

        # We can also choose between complete sets of layers

        model.add({{choice([Dropout(0.5), Activation('linear')])}})
        model.add(Activation('relu'))

    model.add(Dense(10))
    model.add(Activation('softmax'))

    model.compile(loss='categorical_crossentropy', metrics=['accuracy'],
                  optimizer={{choice(['rmsprop', 'adam', 'sgd'])}})

    result = model.fit(x_train, y_train,
              batch_size={{choice([64, 128])}},
              epochs=2,
              verbose=2,
              validation_split=0.1)
    #get the highest validation accuracy of the training epochs
    validation_acc = np.amax(result.history['val_acc']) 
    print('Best validation acc of epoch:', validation_acc)
    return {'loss': -validation_acc, 'status': STATUS_OK, 'model': model}


if __name__ == '__main__':
    best_run, best_model = optim.minimize(model=create_model,
                                          data=data,
                                          algo=tpe.suggest,
                                          max_evals=5,
                                          trials=Trials())
    X_train, Y_train, X_test, Y_test = data()
    print("Evalutation of best performing model:")
    print(best_model.evaluate(X_test, Y_test))
    print("Best performing model chosen hyper-parameters:")
    print(best_run)

FAQ

Here is a list of a few popular errors

TypeError: require string label

You're probably trying to execute the model creation code, with the templates, directly in python. That fails simply because python cannot run the templating in the braces, e.g. {{uniform..}}. The def create_model(...) function is in fact not a valid python function anymore.

You need to wrap your code in a def create_model(...): ... function, and then call it from optim.minimize(model=create_model,... like in the example.

The reason for this is that hyperas works by doing template replacement of everything in the {{...}} into a separate temporary file, and then running the model with the replaced braces (think jinja templating).

This is the basis of how hyperas simplifies usage of hyperopt by being a "very simple wrapper".

TypeError: 'generator' object is not subscriptable

This is currently a known issue.

Just pip install networkx==1.11

NameError: global name 'X_train' is not defined

Maybe you forgot to return the x_train argument in the def create_model(x_train...) call from the def data(): ... function.

You are not restricted to the same list of arguments as in the example. Any arguments you return from data() will be passed to create_model()

notebook adjustment

If you find error like "No such file or directory" or OSError, Err22, you may need add notebook_name='simple_notebook'(assume your current notebook name is simple_notebook) in optim.minimize function like this:

best_run, best_model = optim.minimize(model=model,
                                      data=data,
                                      algo=tpe.suggest,
                                      max_evals=5,
                                      trials=Trials(),
                                      notebook_name='simple_notebook')

How does hyperas work?

All we do is parse the data and model templates and translate them into proper hyperopt by reconstructing the space object that's then passed to fmin. Most of the relevant code is found in optim.py and utils.py.

How to read the output of a hyperas model?

Hyperas translates your script into hyperopt compliant code, see here for some guidance on how to interpret the result.

How to pass arguments to data?

Suppose you want your data function take an argument, specify it like this using positional arguments only (not keyword arguments):

import pickle
def data(fname):
    with open(fname,'rb') as fh:
        return pickle.load(fh)

Note that your arguments must be implemented such that repr can show them in their entirety (such as strings and numbers). If you want more complex objects, use the passed arguments to build them inside the data function.

And when you run your trials, pass a tuple of arguments to be substituted in as data_args:

best_run, best_model = optim.minimize(
    model=model,
    data=data,
    algo=tpe.suggest,
    max_evals=64,
    trials=Trials(),
    data_args=('my_file.pkl',)
)

What if I need more flexibility loading data and adapting my model?

Hyperas is a convenience wrapper around Hyperopt that has some limitations. If it's not convenient to use in your situation, simply don't use it -- and choose Hyperopt instead. All you can do with Hyperas you can also do with Hyperopt, it's just a different way of defining your model. If you want to squeeze some flexibility out of Hyperas anyway, take a look here.

Running hyperas in parallel?

You can use hyperas to run multiple models in parallel with the use of mongodb (which you'll need to install and setup users for). Here's a short example using MNIST:

  1. Copy and modify examples/mnist_distributed.py (bump up max_evals if you like):

  2. Run python mnist_distributed.py. It will create a temp_model.py file. Copy this file to any machines that will be evaluating models. It will then begin waiting for evaluation results

  3. On your other machines (make sure they have a python installed with all your dependencies, ideally with the same versions) run:

    export PYTHONPATH=/path/to/temp_model.py
    hyperopt-mongo-worker --exp-key='mnist_test' --mongo='mongo://username:[email protected]:27017/jobs'
  4. Once max_evals have been completed, you should get an output with your best model. You can also look through your mongodb and examine the results, to get the best model out and run it, do:

    from pymongo import MongoClient
    from keras.models import load_model
    import tempfile
    c = MongoClient('mongodb://username:[email protected]:27017/jobs')
    best_model = c['jobs']['jobs'].find_one({'exp_key': 'mnist_test'}, sort=[('result.loss', -1)])
    temp_name = tempfile.gettempdir()+'/'+next(tempfile._get_candidate_names()) + '.h5'
    with open(temp_name, 'wb') as outfile:
        outfile.write(best_model['result']['model_serial'])
    model = load_model(temp_name)
Owner
Max Pumperla
Data Science Professor, Data Scientist & Engineer. DL4J core developer, Hyperopt maintainer, Keras contributor. Author of "Deep Learning and the Game of Go"
Max Pumperla
GestureSSD CBAM - A gesture recognition web system based on SSD and CBAM, using pytorch, flask and node.js

GestureSSD_CBAM A gesture recognition web system based on SSD and CBAM, using pytorch, flask and node.js SSD implementation is based on https://github

xue_senhua1999 2 Jan 06, 2022
这是一个unet-pytorch的源码,可以训练自己的模型

Unet:U-Net: Convolutional Networks for Biomedical Image Segmentation目标检测模型在Pytorch当中的实现 目录 性能情况 Performance 所需环境 Environment 注意事项 Attention 文件下载 Downl

Bubbliiiing 567 Jan 05, 2023
Computational Pathology Toolbox developed by TIA Centre, University of Warwick.

TIA Toolbox Computational Pathology Toolbox developed at the TIA Centre Getting Started All Users This package is for those interested in digital path

Tissue Image Analytics (TIA) Centre 156 Jan 08, 2023
Code of our paper "Contrastive Object-level Pre-training with Spatial Noise Curriculum Learning"

CCOP Code of our paper Contrastive Object-level Pre-training with Spatial Noise Curriculum Learning Requirement Install OpenSelfSup Install Detectron2

Chenhongyi Yang 21 Dec 13, 2022
Pointer networks Tensorflow2

Pointer networks Tensorflow2 原文:https://arxiv.org/abs/1506.03134 仅供参考与学习,内含代码备注 环境 tensorflow==2.6.0 tqdm matplotlib numpy 《pointer networks》阅读笔记 应用场景

HUANG HAO 7 Oct 27, 2022
Towards Part-Based Understanding of RGB-D Scans

Towards Part-Based Understanding of RGB-D Scans (CVPR 2021) We propose the task of part-based scene understanding of real-world 3D environments: from

26 Nov 23, 2022
In this tutorial, you will perform inference across 10 well-known pre-trained object detectors and fine-tune on a custom dataset. Design and train your own object detector.

Object Detection Object detection is a computer vision task for locating instances of predefined objects in images or videos. In this tutorial, you wi

Ibrahim Sobh 62 Dec 25, 2022
ISTR: End-to-End Instance Segmentation with Transformers (https://arxiv.org/abs/2105.00637)

This is the project page for the paper: ISTR: End-to-End Instance Segmentation via Transformers, Jie Hu, Liujuan Cao, Yao Lu, ShengChuan Zhang, Yan Wa

Jie Hu 182 Dec 19, 2022
Official implementation of Protected Attribute Suppression System, ICCV 2021

Official implementation of Protected Attribute Suppression System, ICCV 2021

Prithviraj Dhar 6 Jan 01, 2023
Efficient Training of Audio Transformers with Patchout

PaSST: Efficient Training of Audio Transformers with Patchout This is the implementation for Efficient Training of Audio Transformers with Patchout Pa

165 Dec 26, 2022
Gems & Holiday Package Prediction

Predictive_Modelling Gems & Holiday Package Prediction This project is based on 2 cases studies : Gems Price Prediction and Holiday Package prediction

Avnika Mehta 1 Jan 27, 2022
Machine learning algorithms for many-body quantum systems

NetKet NetKet is an open-source project delivering cutting-edge methods for the study of many-body quantum systems with artificial neural networks and

NetKet 413 Dec 31, 2022
A flexible and extensible framework for gait recognition.

A flexible and extensible framework for gait recognition. You can focus on designing your own models and comparing with state-of-the-arts easily with the help of OpenGait.

Shiqi Yu 335 Dec 22, 2022
Improved Fitness Optimization Landscapes for Sequence Design

ReLSO Improved Fitness Optimization Landscapes for Sequence Design Description Citation How to run Training models Original data source Description In

Krishnaswamy Lab 44 Dec 20, 2022
catch-22: CAnonical Time-series CHaracteristics

catch22 - CAnonical Time-series CHaracteristics About catch22 is a collection of 22 time-series features coded in C that can be run from Python, R, Ma

Carl H Lubba 229 Oct 21, 2022
Tandem Mass Spectrum Prediction with Graph Transformers

MassFormer This is the original implementation of MassFormer, a graph transformer for small molecule MS/MS prediction. Check out the preprint on arxiv

Röst Lab 13 Oct 27, 2022
Graph Attention Networks

GAT Graph Attention Networks (Veličković et al., ICLR 2018): https://arxiv.org/abs/1710.10903 GAT layer t-SNE + Attention coefficients on Cora Overvie

Petar Veličković 2.6k Jan 05, 2023
Real-time Object Detection for Streaming Perception, CVPR 2022

StreamYOLO Real-time Object Detection for Streaming Perception Jinrong Yang, Songtao Liu, Zeming Li, Xiaoping Li, Sun Jian Real-time Object Detection

Jinrong Yang 237 Dec 27, 2022
Code for the paper "Controllable Video Captioning with an Exemplar Sentence"

SMCG Code for the paper "Controllable Video Captioning with an Exemplar Sentence" Introduction We investigate a novel and challenging task, namely con

10 Dec 04, 2022
[ICCV 2021] Code release for "Sub-bit Neural Networks: Learning to Compress and Accelerate Binary Neural Networks"

Sub-bit Neural Networks: Learning to Compress and Accelerate Binary Neural Networks By Yikai Wang, Yi Yang, Fuchun Sun, Anbang Yao. This is the pytorc

Yikai Wang 26 Nov 20, 2022