Use evolutionary algorithms instead of gridsearch in scikit-learn

Overview

sklearn-deap

Use evolutionary algorithms instead of gridsearch in scikit-learn. This allows you to reduce the time required to find the best parameters for your estimator. Instead of trying out every possible combination of parameters, evolve only the combinations that give the best results.

Here is an ipython notebook comparing EvolutionaryAlgorithmSearchCV against GridSearchCV and RandomizedSearchCV.

It's implemented using deap library: https://github.com/deap/deap

Install

To install the library use pip:

pip install sklearn-deap

or clone the repo and just type the following on your shell:

python setup.py install

Usage examples

Example of usage:

import sklearn.datasets
import numpy as np
import random

data = sklearn.datasets.load_digits()
X = data["data"]
y = data["target"]

from sklearn.svm import SVC
from sklearn.model_selection import StratifiedKFold

paramgrid = {"kernel": ["rbf"],
             "C"     : np.logspace(-9, 9, num=25, base=10),
             "gamma" : np.logspace(-9, 9, num=25, base=10)}

random.seed(1)

from evolutionary_search import EvolutionaryAlgorithmSearchCV
cv = EvolutionaryAlgorithmSearchCV(estimator=SVC(),
                                   params=paramgrid,
                                   scoring="accuracy",
                                   cv=StratifiedKFold(n_splits=4),
                                   verbose=1,
                                   population_size=50,
                                   gene_mutation_prob=0.10,
                                   gene_crossover_prob=0.5,
                                   tournament_size=3,
                                   generations_number=5,
                                   n_jobs=4)
cv.fit(X, y)

Output:

    Types [1, 2, 2] and maxint [0, 24, 24] detected
    --- Evolve in 625 possible combinations ---
    gen	nevals	avg     	min    	max
    0  	50    	0.202404	0.10128	0.962716
    1  	26    	0.383083	0.10128	0.962716
    2  	31    	0.575214	0.155259	0.962716
    3  	29    	0.758308	0.105732	0.976071
    4  	22    	0.938086	0.158041	0.976071
    5  	26    	0.934201	0.155259	0.976071
    Best individual is: {'kernel': 'rbf', 'C': 31622.776601683792, 'gamma': 0.001}
    with fitness: 0.976071229827

Example for maximizing just some function:

from evolutionary_search import maximize

def func(x, y, m=1., z=False):
    return m * (np.exp(-(x**2 + y**2)) + float(z))

param_grid = {'x': [-1., 0., 1.], 'y': [-1., 0., 1.], 'z': [True, False]}
args = {'m': 1.}
best_params, best_score, score_results, _, _ = maximize(func, param_grid, args, verbose=False)

Output:

best_params = {'x': 0.0, 'y': 0.0, 'z': True}
best_score  = 2.0
score_results = (({'x': 1.0, 'y': -1.0, 'z': True}, 1.1353352832366128),
 ({'x': -1.0, 'y': 1.0, 'z': True}, 1.3678794411714423),
 ({'x': 0.0, 'y': 1.0, 'z': True}, 1.3678794411714423),
 ({'x': -1.0, 'y': 0.0, 'z': True}, 1.3678794411714423),
 ({'x': 1.0, 'y': 1.0, 'z': True}, 1.1353352832366128),
 ({'x': 0.0, 'y': 0.0, 'z': False}, 2.0),
 ({'x': -1.0, 'y': -1.0, 'z': False}, 0.36787944117144233),
 ({'x': 1.0, 'y': 0.0, 'z': True}, 1.3678794411714423),
 ({'x': -1.0, 'y': -1.0, 'z': True}, 1.3678794411714423),
 ({'x': 0.0, 'y': -1.0, 'z': False}, 1.3678794411714423),
 ({'x': 1.0, 'y': -1.0, 'z': False}, 1.1353352832366128),
 ({'x': 0.0, 'y': 0.0, 'z': True}, 2.0),
 ({'x': 0.0, 'y': -1.0, 'z': True}, 2.0))
Comments
  • Added cv_results. Fixed some documentation.

    Added cv_results. Fixed some documentation.

    In init.py I added cv_results_ based on the logbook generated in _fit. This is a compatability feature with sklearn GridSearch and the like in interest of consistency.

    Other than that, I added a test file I used outside of ipython notebook which could eventually use the true python test library, and fixed some errors in the notebook which look like simple version errors.

    opened by ryanpeach 16
  • `.cv_results_` does not include info from first generation

    `.cv_results_` does not include info from first generation

    I think there's a fenceposting/off-by-one error somewhere.

    When I pass in generations_number = 1, it's actually 0-indexed, and gives me 2 generations. Similarly, if I pass in 2 generations, I actually get 3.

    Then, when I examine the cv_results_ property, I noticed that I only get the results from all generations after the first generation (the 0-indexed generation).

    This is most apparently if you set generations_number = 1.

    I looked through the code quickly, but didn't see any obvious source of it. Hopefully someone who knows the library can find it more easily!

    opened by ClimbsRocks 12
  • Better Parallelism

    Better Parallelism

    I wrote this because parallelism wasn't working on my Windows laptop. So I did some reading and found out, at least on windows, you need to declare your Pool from within a if __name__=="__main__" structure in order to prevent recurrent execution. Deap also identifies other kinds of multiprocessing maps you may want to pass to it, so now the user has every option to implement parallelism however they want by passing their "map" function to pmap.

    Yes, it's divergent from sklearn, but sklearn has a fully implemented special parallelism library for their n_jobs parameters that would be both a challenge and potentially incompatible with deap, so what I have implemented is deap's way of doing things.

    opened by ryanpeach 8
  • Error Message While Calling fit() Method

    Error Message While Calling fit() Method

    AttributeError: can't set attribute

    It pointed out the error come from fit( ) method as

    def fit(self, X, y=None): self.best_estimator_ = None --> self.best_score_ = -1 self.best_params_ = None for possible_params in self.possible_params: self.fit(X, y, possible_params) if self.refit: self.best_estimator = clone(self.estimator) self.best_estimator_.set_params(**self.best_params_) self.best_estimator_.fit(X, y)

    opened by tasyacute 8
  • Can't get attribute 'Individual'

    Can't get attribute 'Individual'

    Trying to test example code on Indian Pima Diabetes dataset in Jupyter notebook 5.0.0, Python 3.6, I'm getting an error. Kernel is busy but no processes are running. Turning on debag mode shows: ... File "c:\users\szymon\anaconda3\envs\tensorflow\lib\multiprocessing\queues.py", line 345, in get return ForkingPickler.loads(res) AttributeError: Can't get attribute 'Individual' on <module 'deap.creator' from 'c:\\users\\szymon\\anaconda3\\envs\\tensorflow\\lib\\site-packages\\deap\\creator.py'> File "c:\users\szymon\anaconda3\envs\tensorflow\lib\multiprocessing\pool.py", line 108, in worker task = get()

    opened by szymonk92 7
  • Doubts about encoding correctness

    Doubts about encoding correctness

    I have some doubts about current parameter encoding (to chromosome) correctness.

    Let's assume that we have 2 categorical parameters f1 and f2:

    Enc f1  f2
    0000 a 1
    0001 a 2
    0010 a 3
    0011 a 4
    0100 a 5
    0101 b 1
    0110 b 2
    0111 b 3
    1000 b 4
    1001 b 5
    1010 c 1
    1011 c 2
    1100 c 3
    1101 c 4
    1110 c 5
    

    If we use any crossover operator, for example let's do 2 points crossover between some points:

    (a, 4) 0011    0111 (b, 3)
                x
    (c, 3) 1100    1000 (b, 4)
    

    After crossover we've got b, but both parents don't have b as first parameter.

    opened by olologin 7
  • Can't instantiate abstract class EvolutionaryAlgorithmSearchCV with abstract methods _run_search

    Can't instantiate abstract class EvolutionaryAlgorithmSearchCV with abstract methods _run_search

    I use Python 2.7.15 to run test.py and I found an error TypeError: Can't instantiate abstract class EvolutionaryAlgorithmSearchCV with abstract methods _run_search

    Would you please help correct anything I missed, bellow is all packages I installed

    Package                            Version
    ---------------------------------- -----------
    appdirs                            1.4.3
    appnope                            0.1.0
    asn1crypto                         0.24.0
    attrs                              18.2.0
    Automat                            0.7.0
    backports-abc                      0.5
    backports.shutil-get-terminal-size 1.0.0
    bleach                             2.1.4
    certifi                            2018.8.24
    cffi                               1.11.5
    configparser                       3.5.0
    constantly                         15.1.0
    cryptography                       2.3.1
    Cython                             0.28.5
    deap                               1.2.2
    decorator                          4.3.0
    entrypoints                        0.2.3
    enum34                             1.1.6
    functools32                        3.2.3.post2
    futures                            3.2.0
    html5lib                           1.0.1
    hyperlink                          18.0.0
    idna                               2.7
    incremental                        17.5.0
    ipaddress                          1.0.22
    ipykernel                          4.10.0
    ipython                            5.8.0
    ipython-genutils                   0.2.0
    ipywidgets                         7.4.2
    Jinja2                             2.10
    jsonschema                         2.6.0
    jupyter                            1.0.0
    jupyter-client                     5.2.3
    jupyter-console                    5.2.0
    jupyter-core                       4.4.0
    MarkupSafe                         1.0
    mistune                            0.8.3
    mkl-fft                            1.0.6
    mkl-random                         1.0.1
    nbconvert                          5.3.1
    nbformat                           4.4.0
    notebook                           5.6.0
    numpy                              1.15.2
    pandas                             0.23.4
    pandocfilters                      1.4.2
    pathlib2                           2.3.2
    pexpect                            4.6.0
    pickleshare                        0.7.4
    pip                                10.0.1
    prometheus-client                  0.3.1
    prompt-toolkit                     1.0.15
    ptyprocess                         0.6.0
    pyasn1                             0.4.4
    pyasn1-modules                     0.2.2
    pycparser                          2.19
    Pygments                           2.2.0
    pyOpenSSL                          18.0.0
    python-dateutil                    2.7.3
    pytz                               2018.5
    pyzmq                              17.1.2
    qtconsole                          4.4.1
    scandir                            1.9.0
    scikit-learn                       0.20.0
    scipy                              1.1.0
    Send2Trash                         1.5.0
    service-identity                   17.0.0
    setuptools                         40.2.0
    simplegeneric                      0.8.1
    singledispatch                     3.4.0.3
    six                                1.11.0
    sklearn-deap                       0.2.2
    terminado                          0.8.1
    testpath                           0.3.1
    tornado                            5.1.1
    traitlets                          4.3.2
    Twisted                            17.5.0
    wcwidth                            0.1.7
    webencodings                       0.5.1
    wheel                              0.31.1
    widgetsnbextension                 3.4.2
    zope.interface                     4.5.0
    
    opened by dongchirua 6
  • Sklearn Depreciation

    Sklearn Depreciation

    cross_validation has been replaced with model_selection and will soon be depreciated. Already getting a warning. Tried to simply change this but they have moved a few other things around and also changed how some functions seem to fundamentally work.

    opened by ryanpeach 6
  • What does it take to parallelize the search?

    What does it take to parallelize the search?

    Great tool! Allows me to drastically expand the search space over using GridSearchCV. Really promising for deep learning, as well as standard scikit-learn interfaced ML models.

    Because I'm searching over a large space, this obviously involves training a bunch of models, and doing a lot of computations. scikit-learn's model training parallelizes this to ease the pain somewhat.

    I tried using the toolbox.register('map', pool.map) approach as described out by deap, but didn't see any parallelization.

    Is there a different approach I should take instead? Or is that a feature that hasn't been built yet? If so, what are the steps needed to get parallelization working?

    opened by ClimbsRocks 5
  • What's wrong with my datas ?

    What's wrong with my datas ?

    With the following code 👍 paramgrid = {"n_jobs": -1, "max_features":['auto','log2'], "n_estimators":[10,100,500,1000], "min_samples_split" : [2,5,10], "max_leaf_nodes" : [1,5,10,20,50] }

    #min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None

    cv = EvolutionaryAlgorithmSearchCV(estimator=RandomForestClassifier(), params=paramgrid, scoring="accuracy", cv=StratifiedKFold(y, n_folds=10), verbose=True, population_size=50, gene_mutation_prob=0.10, tournament_size=3, generations_number=10 )

    cv.fit(X, y)

    and having the followning error :

    TypeErrorTraceback (most recent call last) in () 20 ) 21 ---> 22 cv.fit(X,y)

    /root/anaconda2/lib/python2.7/site-packages/evolutionary_search/init.pyc in fit(self, X, y) 276 self.best_params_ = None 277 for possible_params in self.possible_params: --> 278 self.fit(X, y, possible_params) 279 if self.refit: 280 self.best_estimator = clone(self.estimator)

    /root/anaconda2/lib/python2.7/site-packages/evolutionary_search/init.pyc in _fit(self, X, y, parameter_dict) 301 toolbox = base.Toolbox() 302 --> 303 name_values, gene_type, maxints = _get_param_types_maxint(parameter_dict) 304 if self.gene_type is None: 305 self.gene_type = gene_type

    /root/anaconda2/lib/python2.7/site-packages/evolutionary_search/init.pyc in _get_param_types_maxint(params) 33 types = [] 34 for _, possible_values in name_values: ---> 35 if isinstance(possible_values[0], float): 36 types.append(param_types.Numerical) 37 else:

    TypeError: 'int' object has no attribute 'getitem'

    opened by M4k34B3tt3rW0r1D 4
  • Python3 compatibility is broken

    Python3 compatibility is broken

    There are two old-style print statements in __init__.py that break compatibility with Python 3.

    I added brackets to turn them into function calls and that seemed to fix it, but I have not done extensive testing to see if there are any other compatibility issues.

    opened by davekirby 4
  • ValueError when calling cv.fit() for optimising a neural network

    ValueError when calling cv.fit() for optimising a neural network

    Hi,

    I am trying to optimise a neural network (Keras, TensorFlow), but I'm getting an error: ValueError: Input contains NaN, infinity or a value too large for dtype('float32').

    I have checked my input data for NaNs, infities and large or small values. There aren't any. I have forced the input data to be np.float32 before passing it to .fit().

    I've used this algorithm before without any problems or special data prep, so I'm not sure where there error is creeping in.

    the relavent bit of the code is: codetxt.txt

    I should also say that when I manually try to just .fit() to my model, it works fine. The issue is something to do with how the cross valdation is working.

    The full traceback is:

    Traceback (most recent call last): File "/home/users/hf832176/.conda/envs/tb_env6/lib/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, **kwds)) File "/home/users/hf832176/.conda/envs/tb_env6/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar return list(map(*args)) File "/home/users/hf832176/.conda/envs/tb_env6/lib/python3.6/site-packages/evolutionary_search/cv.py", line 104, in _evalFunction error_score=error_score)[0] File "/home/users/hf832176/.conda/envs/tb_env6/lib/python3.6/site-packages/sklearn/model_selection/_validation.py", line 568, in _fit_and_score test_scores = _score(estimator, X_test, y_test, scorer, is_multimetric) File "/home/users/hf832176/.conda/envs/tb_env6/lib/python3.6/site-packages/sklearn/model_selection/_validation.py", line 610, in _score score = scorer(estimator, X_test, y_test) File "/home/users/hf832176/.conda/envs/tb_env6/lib/python3.6/site-packages/sklearn/metrics/scorer.py", line 98, in call **self._kwargs) File "/home/users/hf832176/.conda/envs/tb_env6/lib/python3.6/site-packages/sklearn/metrics/regression.py", line 239, in mean_squared_error y_true, y_pred, multioutput) File "/home/users/hf832176/.conda/envs/tb_env6/lib/python3.6/site-packages/sklearn/metrics/regression.py", line 77, in _check_reg_targets y_pred = check_array(y_pred, ensure_2d=False) File "/home/users/hf832176/.conda/envs/tb_env6/lib/python3.6/site-packages/sklearn/utils/validation.py", line 573, in check_array allow_nan=force_all_finite == 'allow-nan') File "/home/users/hf832176/.conda/envs/tb_env6/lib/python3.6/site-packages/sklearn/utils/validation.py", line 56, in _assert_all_finite raise ValueError(msg_err.format(type_err, X.dtype)) ValueError: Input contains NaN, infinity or a value too large for dtype('float32'). """ The above exception was the direct cause of the following exception: Traceback (most recent call last): File "NN_GSCV-DL2.py", line 308, in grid_result = cv.fit(X_train, y_train) File "/home/users/hf832176/.conda/envs/tb_env6/lib/python3.6/site-packages/evolutionary_search/cv.py", line 363, in fit self._fit(X, y, possible_params) File "/home/users/hf832176/.conda/envs/tb_env6/lib/python3.6/site-packages/evolutionary_search/cv.py", line 453, in _fit halloffame=hof, verbose=self.verbose) File "/home/users/hf832176/.conda/envs/tb_env6/lib/python3.6/site-packages/deap/algorithms.py", line 150, in eaSimple fitnesses = toolbox.map(toolbox.evaluate, invalid_ind) File "/home/users/hf832176/.conda/envs/tb_env6/lib/python3.6/multiprocessing/pool.py", line 266, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/home/users/hf832176/.conda/envs/tb_env6/lib/python3.6/multiprocessing/pool.py", line 644, in get raise self._value ValueError: Input contains NaN, infinity or a value too large for dtype('float32').

    opened by tbloch1 0
  • Does not work with pipelines

    Does not work with pipelines

    For tuning a single estimator this tool is awesome. But the standard gridsearch can actually accept a pipeline as an estimator, which allows you to evaluate different classifiers as parameters.

    For some reason, this breaks with EvolutionaryAlgorithmSearchCV.

    For example, set a pipeline like this: pipe = Pipeline([ ('imputer', SimpleImputer(strategy='median')), ('scaler' , StandardScaler()), ('classify', LogisticRegression()) ])

    Then define a parameter grid to include different classifiers: param_grid_rf_big = [ {'classify': [RandomForestClassifier(),ExtraTreesClassifier()], 'classify__n_estimators': [500], 'classify__max_features': ['log2', 'sqrt', None], 'classify__min_samples_split': [2,3], 'classify__min_samples_leaf': [1,2,3], 'classify__criterion': ['gini',] } ]

    When you pass this to EvolutionaryAlgorithmSearchCV you should be able to set the estimator to 'pipe' and and the params to 'param_grid_rf_big' and let it evaluate. This works with gridsearchcv, but not with EvolutionaryAlgorithmSearchCV.

    opened by dth5 4
  • stuck after gen 1...

    stuck after gen 1...

    image

    I have some datasets where the search get stuck for ever on gen 1 for instance.. does it happen to you too? how can I figure out what is the problem? python is still running and using a lot of CPU... but after hours nothing happens. any idea what could be the issue?

    opened by fcoppey 1
Releases(0.3.0)
Owner
rsteca
rsteca
SoGCN: Second-Order Graph Convolutional Networks

SoGCN: Second-Order Graph Convolutional Networks This is the authors' implementation of paper "SoGCN: Second-Order Graph Convolutional Networks" in Py

Yuehao 7 Aug 16, 2022
A lightweight face-recognition toolbox and pipeline based on tensorflow-lite

FaceIDLight 📘 Description A lightweight face-recognition toolbox and pipeline based on tensorflow-lite with MTCNN-Face-Detection and ArcFace-Face-Rec

Martin Knoche 16 Dec 07, 2022
Automatic self-diagnosis program (python required)Automatic self-diagnosis program (python required)

auto-self-checker 자동으로 자가진단 해주는 프로그램(python 필요) 중요 이 프로그램이 실행될때에는 절대로 마우스포인터를 움직이거나 키보드를 건드리면 안된다(화면인식, 마우스포인터로 직접 클릭) 사용법 프로그램을 구동할 폴더 내의 cmd창에서 pip

1 Dec 30, 2021
MutualGuide is a compact object detector specially designed for embedded devices

Introduction MutualGuide is a compact object detector specially designed for embedded devices. Comparing to existing detectors, this repo contains two

ZHANG Heng 103 Dec 13, 2022
A project to make Amazon Echo respond to sign language using your webcam

Making Alexa respond to Sign Language using Tensorflow.js Try the live demo Read the Blog Post on Tensorflow's Blog Coming Soon Watch the video This p

Abhishek Singh 444 Jan 03, 2023
Defending against Model Stealing via Verifying Embedded External Features

Defending against Model Stealing Attacks via Verifying Embedded External Features This is the official implementation of our paper Defending against M

20 Dec 30, 2022
CT Based COVID 19 Diagnose by Image Processing and Deep Learning

This project proposed the deep learning and image processing method to undertake the diagnosis on 2D CT image and 3D CT volume.

1 Feb 08, 2022
Repositorio oficial del curso IIC2233 Programación Avanzada 🚀✨

IIC2233 - Programación Avanzada Evaluación Las evaluaciones serán efectuadas por medio de actividades prácticas en clases y tareas. Se calculará la no

IIC2233 @ UC 47 Sep 06, 2022
Python implementation of O-OFDMNet, a deep learning-based optical OFDM system,

O-OFDMNet This includes Python implementation of O-OFDMNet, a deep learning-based optical OFDM system, which uses neural networks for signal processin

Thien Luong 4 Sep 09, 2022
torchsummaryDynamic: support real FLOPs calculation of dynamic network or user-custom PyTorch ops

torchsummaryDynamic Improved tool of torchsummaryX. torchsummaryDynamic support real FLOPs calculation of dynamic network or user-custom PyTorch ops.

Bohong Chen 1 Jan 07, 2022
MPI Interest Group on Algorithms on 1st semester 2021

MPI Algorithms Interest Group Introduction Lecturer: Steve Yan Location: TBA Time Schedule: TBA Semester: 1 Useful URLs Typora: https://typora.io Goog

Ex10si0n 13 Sep 08, 2022
Random Forests for Regression with Missing Entries

Random Forests for Regression with Missing Entries These are specific codes used in the article: On the Consistency of a Random Forest Algorithm in th

Irving Gómez-Méndez 1 Nov 15, 2021
Open source person re-identification library in python

Open-ReID Open-ReID is a lightweight library of person re-identification for research purpose. It aims to provide a uniform interface for different da

Tong Xiao 1.3k Jan 01, 2023
PolyGlot, a fuzzing framework for language processors

PolyGlot, a fuzzing framework for language processors Build We tested PolyGlot on Ubuntu 18.04. Get the source code: git clone https://github.com/s3te

Software Systems Security Team at Penn State University 79 Dec 27, 2022
PyTorch implementation of deep GRAph Contrastive rEpresentation learning (GRACE).

GRACE The official PyTorch implementation of deep GRAph Contrastive rEpresentation learning (GRACE). For a thorough resource collection of self-superv

Big Data and Multi-modal Computing Group, CRIPAC 186 Dec 27, 2022
diablo2 resurrected loot filter

Only For Chinese and Traditional Chinese The filter only for Chinese and Traditional Chinese, i didn't change it for other language.Maybe you could mo

elmagnifico 249 Dec 04, 2022
[CVPR'22] COAP: Learning Compositional Occupancy of People

COAP: Compositional Articulated Occupancy of People Paper | Video | Project Page This is the official implementation of the CVPR 2022 paper COAP: Lear

Marko Mihajlovic 111 Dec 11, 2022
In generative deep geometry learning, we often get many obj files remain to be rendered

a python prompt cli script for blender batch render In deep generative geometry learning, we always get many .obj files to be rendered. Our rendered i

Tian-yi Liang 1 Mar 20, 2022
This is the source code for generating the ASL-Skeleton3D and ASL-Phono datasets. Check out the README.md for more details.

ASL-Skeleton3D and ASL-Phono Datasets Generator The ASL-Skeleton3D contains a representation based on mapping into the three-dimensional space the coo

Cleison Amorim 5 Nov 20, 2022
Barlow Twins and HSIC

Barlow Twins and HSIC Unofficial Pytorch implementation for Barlow Twins and HSIC_SSL on small datasets (CIFAR10, STL10, and Tiny ImageNet). Correspon

Yao-Hung Hubert Tsai 49 Nov 24, 2022