MLBox is a powerful Automated Machine Learning python library.

Overview

docs/logos/logo.png

Documentation Status PyPI version Build Status GitHub Issues codecov License Downloads Python Versions


MLBox is a powerful Automated Machine Learning python library. It provides the following features:

  • Fast reading and distributed data preprocessing/cleaning/formatting
  • Highly robust feature selection and leak detection
  • Accurate hyper-parameter optimization in high-dimensional space
  • State-of-the art predictive models for classification and regression (Deep Learning, Stacking, LightGBM,...)
  • Prediction with models interpretation

For more details, please refer to the official documentation


How to Contribute

MLBox has been developed and used by many active community members. Your help is very valuable to make it better for everyone.

  • Check out call for contributions to see what can be improved, or open an issue if you want something.
  • Contribute to the tests to make it more reliable.
  • Contribute to the documents to make it clearer for everyone.
  • Contribute to the examples to share your experience with other users.
  • Open issue if you met problems during development.

For more details, please refer to CONTRIBUTING.

Comments
  • Trying to install and getting xgboost errors

    Trying to install and getting xgboost errors

    Systems is Kaggle kernel which is Ubuntu based which seems to be the desired environment

    I rung this:

    !apt-get install build-essential
    !pip install cmake
    !pip install xgboost>=0.6a2
    !pip install lightgbm>=2.0.2
    !pip install mlbox
    

    Resulting in this:

    Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-xib6_1h7/xgboost/

    Can you please help me out? I see your examples are also Kaggle based, but they don't have the install steps. Do you somehow install packages from setup within the kernel???

    opened by TimusLetap 19
  • Setup script exited with usage: setup.py [global_opts] error: no commands supplied

    Setup script exited with usage: setup.py [global_opts] error: no commands supplied

    Hi AxeldeRomblay,

    System Information Ubuntu 16.04

    I was actually trying to install your MLBox to give it a try. The steps I took were - 1- clone the repository. 2- run setup.py file.

    The building part finishes till 100% but then throws a couple of errors. I am posting the actual error below.

    [100%] Linking CXX shared library /tmp/easy_install-1gz4zwpu/lightgbm-2.0.2/lightgbm/lib_lightgbm.so [100%] Built target _lightgbm Install lib_lightgbm from: ['lightgbm/lib_lightgbm.so'] error: Setup script exited with usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...] or: setup.py --help [cmd1 cmd2 ...] or: setup.py --help-commands or: setup.py cmd --help

    error: no commands supplied.

    opened by NeerajSarwan 15
  • Install fails. Can't tell if it is MLbox or XGboost that doesnt work

    Install fails. Can't tell if it is MLbox or XGboost that doesnt work

    Hi Axel,

    We're trying to install MLbox and get the following error :

    Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-ob6fq6jv/xgboost/
    Error installing the following packages:
    ['xgboost==0.6a2']
    Please install them manually
    

    Now, given that xgboost is installed and works, I suppose two issues can cause the error :

    1. The version of xgboost is too old. We're running 0.6.
    2. MLbox can't find the available xgboost version

    Indeed, it MLbox seems to try and install xgboost 0.6a2 even though a version is already installed, which is surprising.

    Maybe it is me. Thank you for your help.

    opened by brcacrm 13
  • Hub connection request timed out

    Hub connection request timed out

    I tried to run the code in the picture below, but I got the error saying TimeoutError: Hub connection request timed out. I'm using Python2.7 under Ubuntu 16.04

    hub connection time out

    Thanks for your help

    opened by ilyes495 10
  • Cleaning takes too long time on multi-cores cpu

    Cleaning takes too long time on multi-cores cpu

    Cleaning takes 276s for house price dataset on intel E5-2683v3 As E5-2683 has more 14cores and 28threads. I guess the problem may cause by n-job=-1 in here. ` if (self.verbose): print("cleaning data ...")

        df = pd.concat(Parallel(n_jobs=-1)(delayed(convert_list)(df[col]) for col in df.columns),
                       axis=1)
    
        df = pd.concat(Parallel(n_jobs=-1)(delayed(convert_float_and_dates)(df[col]) for col in df.columns), axis=1) `       
    

    I don't know how to fix it, may be add a n_jobs arguments for class Reader? Looking for you response. Thank you.

    opened by a1a2y3 8
  • FYI: ColumnTransformer

    FYI: ColumnTransformer

    We'll have a ColumnTransformer in sklearn pretty soon that will make it easier to treat different columns differently. That should make is much simpler to have different pipelines for categorical and continuous data, which seems one of the big issues MLBox addresses.

    opened by amueller 8
  • Code implementation frozen

    Code implementation frozen

    Hello,

    I tried implementing the code in https://www.analyticsvidhya.com/blog/2017/07/mlbox-library-automated-machine-learning/. The engines start but the code implementation is frozen (still running but no task is done). I get the following message on my screen:

    screen

    I tried to put time.sleep but it doesn't change. I'm on Windows 10 Pro, Python 3.5 with Anaconda Do you have any idea why?

    opened by yousseferahim 7
  • Testing with Predicting Blood Donation challenge

    Testing with Predicting Blood Donation challenge

    Hi, Doing some tests with this challenge https://www.drivendata.org/competitions/2/warm-up-predict-blood-donations/

    With minimal understanding I rank around 700 on 2400 ! I must document some questions on how to get features importance how to set up stacking

    Rgds Bruno Seznec

    opened by brunosez 7
  • TypeError: 'generator' object is not subscriptable

    TypeError: 'generator' object is not subscriptable

    When running on a python 3.6 environment in a jupyter notebook, ubuntu 14.04 I get the following:

    ' from mlbox.preprocessing import * from mlbox.optimisation import * from mlbox.prediction import *

    paths = ["train.csv", "test.csv"] target_name = "target"

    data = Reader(sep=",").train_test_split(paths, target_name) #reading

    space = {

        'ne__numerical_strategy' : {"space" : [0, 'mean']},
    
        'ce__strategy' : {"space" : ["label_encoding", "random_projection", "entity_embedding"]},
    
        'fs__strategy' : {"space" : ["variance", "rf_feature_importance"]},
        'fs__threshold': {"search" : "choice", "space" : [0.1, 0.2, 0.3]},
    
        'est__strategy' : {"space" : ["XGBoost"]},
        'est__max_depth' : {"search" : "choice", "space" : [5,6]},
        'est__subsample' : {"search" : "uniform", "space" : [0.6,0.9]}
    
        }
    

    opt = Optimiser(scoring = 'roc_auc', n_folds = 4)

    best = opt.optimise(space, data, max_evals = 5)

    `

    `TypeError Traceback (most recent call last) in () 16 opt = Optimiser(scoring = 'roc_auc', n_folds = 4) 17 ---> 18 best = opt.optimise(space, data, max_evals = 5) 19

    ~/anaconda2/envs/insurance_v2/lib/python3.6/site-packages/mlbox/optimisation/optimiser.py in optimise(self, space, df, max_evals) 565 space=hyper_space, 566 algo=tpe.suggest, --> 567 max_evals=max_evals) 568 569 # Displaying best_params

    ~/anaconda2/envs/insurance_v2/lib/python3.6/site-packages/hyperopt/fmin.py in fmin(fn, space, algo, max_evals, trials, rstate, allow_trials_fmin, pass_expr_memo_ctrl, catch_eval_exceptions, verbose, return_argmin) 312 313 domain = base.Domain(fn, space, --> 314 pass_expr_memo_ctrl=pass_expr_memo_ctrl) 315 316 rval = FMinIter(algo, domain, trials, max_evals=max_evals,

    ~/anaconda2/envs/insurance_v2/lib/python3.6/site-packages/hyperopt/base.py in init(self, fn, expr, workdir, pass_expr_memo_ctrl, name, loss_target) 784 before = pyll.dfs(self.expr) 785 # -- raises exception if expr contains cycles --> 786 pyll.toposort(self.expr) 787 vh = self.vh = VectorizeHelper(self.expr, self.s_new_ids) 788 # -- raises exception if v_expr contains cycles

    ~/anaconda2/envs/insurance_v2/lib/python3.6/site-packages/hyperopt/pyll/base.py in toposort(expr) 713 G.add_edges_from([(n_in, node) for n_in in node.inputs()]) 714 order = nx.topological_sort(G) --> 715 assert order[-1] == expr 716 return order 717 `

    opened by NickBuchny 6
  • Redundant results and model overfitting

    Redundant results and model overfitting

    1. We are getting same results irrespective of number of max_evals and seed change.
    2. We have increased n_fold and also reduced max_evals to see if we get different results. For any combination of parameter settings we are getting same results. I think the model is over-fitting during training. Is there any other way where we can check and stop this to get better results?
    3. In our use-case we have not used ne, ce, and fs params in 'space' settings. Is there a way to use stacking regression without these? We are not able to resolve errors while using stacking with only params related to algorithm selection in regression strategy.

    I will be grateful if you can help me resolve the above issues. Thanks.

    opened by mahatibharadwaj 5
  • Error while computing the cross validation mean score.

    Error while computing the cross validation mean score.

    Hi,

    I am interested in MLBox and tried for a Kaggle classification project. When processing to the step of optimizing the best hyperparameters, an error message showed as 'An error occurred while computing the cross validation mean score. Check the parameter values and your scoring function.'

    Here's the code I used:

    ` Path = ['train_path', 'test_path'] target = 'target_name'

    rd = Reader(sep = ",") df = rd.train_test_split(paths, target_name)

    dft = Drift_thresholder() df = dft.fit_transform(df)

    space = {'ne__numerical_strategy':{"search":"choice", "space":['mean','median']},

         'ne__categorical_strategy':{"search":"choice",
                                     "space":[np.NaN]},
         
         'ce__strategy':{"search":"choice",
                         "space":['label_encoding','entity_embedding','random_projection']},
         
        'est__strategy':{"search":"choice",
                                  "space":["LightGBM"]},    
        'est__n_estimators':{"search":"choice",
                                  "space":[150]},    
        'est__colsample_bytree':{"search":"uniform",
                                  "space":[0.8,0.95]},
        'est__subsample':{"search":"uniform",
                                  "space":[0.8,0.95]},
        'est__max_depth':{"search":"choice",
                                  "space":[5,6,7,8,9]},
        'est__learning_rate':{"search":"choice",
                                  "space":[0.07]} 
    
        }
    

    opt = Optimiser(scoring = "roc_auc", n_folds = 5) best_params = opt.optimise(space, df, 15)

    ` Can you help me with fixing it? Thanks for that!

    opened by YAOLI0407 5
  • Import error while using MLBox inside google collab

    Import error while using MLBox inside google collab

    I'm facing an abrupt import error, not able to figure out why it is occurring.

    Here is the problem

    • google collab discards the latest versions of MLBox due to dependency failure
    • automatically downgrades the dependencies
    • can't fulfill all the import * actions
    • throws type error when manually installing the MLBox new version

    Screenshot of the issue : Screenshot 2022-12-11 122655

    Sklearn version: 1.0.2 MLBox version: 0.5.1

    Pls look into it and help me resolve this issue

    opened by prathikshetty2002 0
  • Bump tensorflow from 2.0.0 to 2.9.3

    Bump tensorflow from 2.0.0 to 2.9.3

    Bumps tensorflow from 2.0.0 to 2.9.3.

    Release notes

    Sourced from tensorflow's releases.

    TensorFlow 2.9.3

    Release 2.9.3

    This release introduces several vulnerability fixes:

    TensorFlow 2.9.2

    Release 2.9.2

    This releases introduces several vulnerability fixes:

    ... (truncated)

    Changelog

    Sourced from tensorflow's changelog.

    Release 2.9.3

    This release introduces several vulnerability fixes:

    Release 2.8.4

    This release introduces several vulnerability fixes:

    ... (truncated)

    Commits
    • a5ed5f3 Merge pull request #58584 from tensorflow/vinila21-patch-2
    • 258f9a1 Update py_func.cc
    • cd27cfb Merge pull request #58580 from tensorflow-jenkins/version-numbers-2.9.3-24474
    • 3e75385 Update version numbers to 2.9.3
    • bc72c39 Merge pull request #58482 from tensorflow-jenkins/relnotes-2.9.3-25695
    • 3506c90 Update RELEASE.md
    • 8dcb48e Update RELEASE.md
    • 4f34ec8 Merge pull request #58576 from pak-laura/c2.99f03a9d3bafe902c1e6beb105b2f2417...
    • 6fc67e4 Replace CHECK with returning an InternalError on failing to create python tuple
    • 5dbe90a Merge pull request #58570 from tensorflow/r2.9-7b174a0f2e4
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Project dependencies may have API risk issues

    Project dependencies may have API risk issues

    Hi, In MLBox, inappropriate dependency versioning constraints can cause risks.

    Below are the dependencies and version constraints that the project is using

    numpy==1.18.2
    scipy==1.4.1
    matplotlib==3.0.3
    hyperopt==0.2.3
    pandas==0.25.3
    joblib==0.14.1
    scikit-learn==0.22.1
    tensorflow==2.0.0
    lightgbm==2.3.1
    tables==3.5.2
    xlrd==1.2.0
    

    The version constraint == will introduce the risk of dependency conflicts because the scope of dependencies is too strict. The version constraint No Upper Bound and * will introduce the risk of the missing API Error because the latest version of the dependencies may remove some APIs.

    After further analysis, in this project, The version constraint of dependency matplotlib can be changed to >=1.3.0,<=3.0.3. The version constraint of dependency joblib can be changed to ==0.7.0d. The version constraint of dependency joblib can be changed to >=0.3.6.dev,<=1.1.0. The version constraint of dependency scikit-learn can be changed to >=0.20rc1,<=0.20.4.

    The above modification suggestions can reduce the dependency conflicts as much as possible, and introduce the latest version as much as possible without calling Error in the projects.

    The invocation of the current project includes all the following methods.

    The calling methods from the matplotlib
    matplotlib.use
    
    The calling methods from the joblib
    joblib.delayed
    joblib.Parallel
    
    The calling methods from the scikit-learn
    sklearn.tree.DecisionTreeRegressor
    sklearn.ensemble.RandomForestRegressor
    sklearn.linear_model.LinearRegression
    sklearn.linear_model.Ridge
    sklearn.ensemble.ExtraTreesRegressor
    sklearn.ensemble.AdaBoostClassifier
    sklearn.preprocessing.LabelEncoder
    joblib.delayed
    sklearn.ensemble.RandomForestClassifier
    sklearn.impute.SimpleImputer
    sklearn.preprocessing.LabelEncoder.fit_transform
    sklearn.tree.DecisionTreeClassifier
    sklearn.ensemble.BaggingClassifier
    sklearn.ensemble.AdaBoostRegressor
    sklearn.linear_model.LogisticRegression
    joblib.Parallel
    sklearn.ensemble.ExtraTreesClassifier
    sklearn.ensemble.BaggingRegressor
    sklearn.linear_model.Lasso
    sklearn.metrics.roc_auc_score
    sklearn.metrics.make_scorer
    
    The calling methods from the all methods
    self.__Lnum.df.fillna
    self.fit_transform
    x.col.self.__Enc.col.df.apply.tolist
    self.set_params
    readme_file.read
    col.df_train.apply
    setattr
    drift.DriftThreshold.get_support
    estimator.fit
    encoding.categorical_encoder.Categorical_encoder
    pandas.datetime
    i.col.x.get_embeddings.col.df.apply.tolist
    clf.predict_proba
    numpy.arange
    print
    y_train.drop.drop
    est.get_params.items
    tensorflow.keras.layers.Dense
    mlbox.preprocessing.Reader.train_test_split
    serie_to_df.hour.astype
    self.transform
    y.value_counts
    warnings.warn
    pandas.datetime.serie.pandas.DatetimeIndex.total_seconds
    pandas.Series.describe
    mlbox.optimisation.make_scorer
    self.get_estimator
    convert_list
    sklearn.ensemble.RandomForestRegressor.fit
    numpy.shape
    self.__cv.split
    classifier.Classifier
    col.df.apply
    self.clean
    model.regression.feature_selector.Reg_feature_selector
    self.__classifier.score
    serie.pandas.DatetimeIndex.dayofweek.astype
    tensorflow.keras.layers.concatenate
    pandas.read_csv
    pipe.append
    self.__regressor.predict
    len
    self.fit
    mlbox.preprocessing.Drift_thresholder
    tensorflow.keras.models.Model.get_weights
    self.__classifier.get_params.keys
    hyperopt.hp.choice
    self.__set_regressor
    pp.set_params.predict
    y_train.drop.apply
    matplotlib.pyplot.savefig
    pandas.read_json
    model.get_estimator.get_params.items
    selected_col.append
    lightgbm.LGBMRegressor
    tensorflow.keras.layers.Reshape
    df_train.drop_duplicates.keys
    path.split
    drift_estimator.DriftEstimator
    pandas.DataFrame.head
    sorted.remove
    col.df_train.dropna.unique
    dropout1.Dropout
    keepList.append
    hyperopt.hp.uniform
    y_train.pd.get_dummies.astype
    pandas.Series.value_counts
    time.time
    sklearn.metrics.roc_auc_score
    open.close
    zip
    d.copy
    sklearn.linear_model.LinearRegression
    sklearn.ensemble.ExtraTreesRegressor
    pandas.DatetimeIndex
    regressor.Regressor
    pandas.Series.nunique
    convert_float_and_dates.delayed
    tuples.dict.items
    col.df_train.dropna
    pandas.concat.to_hdf
    y.apply
    str
    ValueError
    version_file.read
    self.__K.values
    serie_to_df.dayofweek.astype
    self.__plot_feature_importances
    space.keys
    self.__classifier.get_params
    df_train.drop_duplicates.drop_duplicates
    numpy.exp
    p.startswith
    numpy.intersect1d
    range
    mock.Mock
    numpy.random.seed
    self.level_estimator.predict
    self.__regress_params.items
    params.keys
    numpy.abs
    sklearn.pipeline.Pipeline
    serie_to_df.second.astype
    self.__classif_params.items
    df.value_counts
    pandas.DataFrame
    sklearn.model_selection.cross_val_score
    serie_to_df.minute.astype
    sklearn.pipeline.Pipeline.fit
    self.level_estimator.predict_proba
    self.__regressor.fit
    reg.fit
    serie.pandas.DatetimeIndex.minute.astype
    filter
    y_train.drop.value_counts
    lightgbm.LGBMClassifier
    self.__regressor.transform
    mlbox.prediction.Predictor
    self.get_params
    tensorflow.keras.layers.Embedding
    est.get_estimator.get_params
    col.self.__K.Reshape
    os.mkdir
    drift.DriftThreshold.fit
    sklearn.model_selection.StratifiedKFold.split
    model.regression.regressor.Regressor.get_params
    pickle.load
    tensorflow.keras.layers.Dropout
    numpy.int
    sum
    model.regression.stacking_regressor.StackingRegressor
    reg.get_params
    pp.set_params.set_params
    numpy.sort
    sklearn.model_selection.cross_val_predict
    matplotlib.pyplot.yticks
    serie.apply.tolist
    pandas.concat.keys
    self.__classifier.predict
    fh.read.splitlines
    params.items
    reg.predict
    matplotlib.pyplot.barh
    params.update
    est.feature_importances.values
    pandas.DataFrame.idxmax
    encoding.na_encoder.NA_encoder.get_params
    list.x.type.serie.apply.sum
    pandas.SparseDataFrame
    model.classification.feature_selector.Clf_feature_selector
    convert_list.delayed
    self.__imp.transform
    self.__set_classifier
    self.__classifier.fit
    sklearn.linear_model.Ridge
    self.n_jobs.Parallel
    open.write
    operator.itemgetter
    ds.drifts.items
    dropList.append
    numpy.sum
    sorted
    sklearn.ensemble.ExtraTreesClassifier
    df_train.shape.df_train.isnull.sum.sort_values.max
    model.get_params.items
    serie.pandas.DatetimeIndex.second.astype
    drift_estimator.DriftEstimator.score
    self.get_estimator.estimator_weights_.sum
    mlbox.prediction.Predictor.fit_predict
    model.classification.stacking_classifier.StackingClassifier
    self.__Lcat.df.fillna
    col.df_train.nunique
    df_train.sample
    self.__regressor.score
    model.regression.feature_selector.Reg_feature_selector.get_params
    pandas.concat
    pandas.concat.values
    sklearn.metrics.SCORERS.keys
    matplotlib.pyplot.show
    sklearn.ensemble.BaggingClassifier
    model.get_estimator.get_params
    model.classification.classifier.Classifier
    S.append
    pp.set_params.fit
    stck.STCK.get_params.copy
    open
    est.get_estimator.get_params.items
    fh.read
    tensorflow.keras.models.Model.compile
    clf.fit
    max
    numpy.log
    sklearn.ensemble.AdaBoostClassifier
    sklearn.preprocessing.LabelEncoder
    importance_bag.append
    serie_to_df.month.astype
    int
    enumerate
    self.__cross_val_predict_proba
    get_embeddings
    self.__imp.fit
    df_train.shape.df_train.isnull.sum.sort_values
    sync_fit
    y_train.nunique.Dense
    sklearn.linear_model.LogisticRegression
    serie_to_df.day.astype
    sklearn.linear_model.Lasso
    min
    set
    df_test.sample
    df_train.drop_duplicates.isnull
    df_train.std
    numpy.random.shuffle
    hyperopt.fmin.items
    tensorflow.keras.layers.Input
    serie.pandas.DatetimeIndex.month.astype
    pandas.get_dummies
    pandas.to_datetime
    pandas.Series
    mlbox.optimisation.Optimiser.optimise
    self.__classifier.predict_proba
    sklearn.ensemble.RandomForestClassifier.fit
    self.level_estimator.fit
    sys.path.insert
    self.__regressor.get_params
    model.regression.regressor.Regressor.get_estimator
    serie.pandas.DatetimeIndex.hour.astype
    setuptools.setup
    df_test.index.nunique
    self.__Lcat.df_train.isnull
    sklearn.impute.SimpleImputer
    sklearn.preprocessing.LabelEncoder.fit_transform
    pandas.DataFrame.to_csv
    pandas.datetime.serie_to_df.total_seconds
    embeddings.append
    list
    col.df_train.unique
    self.__regressor.get_params.keys
    stck.STCK.get_params
    mlbox.optimisation.Optimiser
    encoding.na_encoder.NA_encoder
    pickle.dump
    Mock
    sklearn.ensemble.RandomForestRegressor
    col.self.__K.col.self.__Enc.len.Embedding
    self.get_params.keys
    sklearn.ensemble.RandomForestClassifier
    joblib.delayed
    tensorflow.keras.models.Model.fit
    df.drop
    df_train.isnull.sum
    numpy.zeros
    self.__Lcat.df_train.isnull.sum
    sys.modules.update
    serie.apply.apply
    copy.copy
    mlbox.preprocessing.Drift_thresholder.fit_transform
    df_train.drop_duplicates.to_hdf
    joblib.Parallel
    numpy.round
    tensorflow.keras.models.Model
    serie.pandas.DatetimeIndex.year.astype
    col.self.__Enc.keys
    sklearn.pipeline.Pipeline.transform
    df_train.drop_duplicates.values
    sklearn.model_selection.StratifiedKFold
    df_train.index.nunique
    sync_fit.delayed
    sklearn.tree.DecisionTreeRegressor
    pandas.read_hdf
    os.path.dirname
    self.clean.drop_duplicates
    drift.DriftThreshold.drifts
    df.name.pred.apply
    pandas.Series.values
    matplotlib.use
    serie.pandas.DatetimeIndex.day.astype
    pandas.read_excel
    sklearn.tree.DecisionTreeClassifier
    type
    dict
    drift_estimator.DriftEstimator.fit
    numpy.std
    numpy.mean
    sklearn.ensemble.BaggingRegressor
    clf.get_params
    os.getcwd
    estimator.predict_proba
    serie_to_df.year.astype
    self.level_estimator.get_params
    model.get_params
    matplotlib.pyplot.grid
    callable
    self.__save_feature_importances
    min.Dense
    mlbox.optimisation.Optimiser.evaluate
    model.regression.regressor.Regressor.feature_importances
    hyperopt.fmin
    drift.DriftThreshold
    sklearn.ensemble.AdaBoostRegressor
    matplotlib.pyplot.text
    y_train.nunique
    y_train.index.nunique
    matplotlib.pyplot.title
    stck.STCK.get_params.copy.keys
    pp.set_params.predict_proba
    sklearn.metrics.make_scorer
    matplotlib.pyplot.close
    model.regression.regressor.Regressor
    pickle.load.inverse_transform
    dropout2.Dropout
    drift.DriftThreshold.transform
    target_name.df.isnull
    var.df_train.nunique
    self.__classifier.predict_log_proba
    numpy.percentile
    sklearn.model_selection.KFold
    model.get_estimator
    int.Dense
    self.fit_transform.drop
    matplotlib.pyplot.figure
    df.apply
    self.evaluate
    is_null.df.drop
    p.split
    convert_float_and_dates
    mlbox.preprocessing.Reader
    col.df_train.mode
    getattr
    df_test.df_train.pd.concat.drop
    inputs.append
    

    @developer Could please help me check this issue? May I pull a request to fix it? Thank you very much.

    opened by PyDeps 0
  • Bump joblib from 0.14.1 to 1.2.0

    Bump joblib from 0.14.1 to 1.2.0

    Bumps joblib from 0.14.1 to 1.2.0.

    Changelog

    Sourced from joblib's changelog.

    Release 1.2.0

    • Fix a security issue where eval(pre_dispatch) could potentially run arbitrary code. Now only basic numerics are supported. joblib/joblib#1327

    • Make sure that joblib works even when multiprocessing is not available, for instance with Pyodide joblib/joblib#1256

    • Avoid unnecessary warnings when workers and main process delete the temporary memmap folder contents concurrently. joblib/joblib#1263

    • Fix memory alignment bug for pickles containing numpy arrays. This is especially important when loading the pickle with mmap_mode != None as the resulting numpy.memmap object would not be able to correct the misalignment without performing a memory copy. This bug would cause invalid computation and segmentation faults with native code that would directly access the underlying data buffer of a numpy array, for instance C/C++/Cython code compiled with older GCC versions or some old OpenBLAS written in platform specific assembly. joblib/joblib#1254

    • Vendor cloudpickle 2.2.0 which adds support for PyPy 3.8+.

    • Vendor loky 3.3.0 which fixes several bugs including:

      • robustly forcibly terminating worker processes in case of a crash (joblib/joblib#1269);

      • avoiding leaking worker processes in case of nested loky parallel calls;

      • reliability spawn the correct number of reusable workers.

    Release 1.1.0

    • Fix byte order inconsistency issue during deserialization using joblib.load in cross-endian environment: the numpy arrays are now always loaded to use the system byte order, independently of the byte order of the system that serialized the pickle. joblib/joblib#1181

    • Fix joblib.Memory bug with the ignore parameter when the cached function is a decorated function.

    ... (truncated)

    Commits
    • 5991350 Release 1.2.0
    • 3fa2188 MAINT cleanup numpy warnings related to np.matrix in tests (#1340)
    • cea26ff CI test the future loky-3.3.0 branch (#1338)
    • 8aca6f4 MAINT: remove pytest.warns(None) warnings in pytest 7 (#1264)
    • 067ed4f XFAIL test_child_raises_parent_exits_cleanly with multiprocessing (#1339)
    • ac4ebd5 MAINT add back pytest warnings plugin (#1337)
    • a23427d Test child raises parent exits cleanly more reliable on macos (#1335)
    • ac09691 [MAINT] various test updates (#1334)
    • 4a314b1 Vendor loky 3.2.0 (#1333)
    • bdf47e9 Make test_parallel_with_interactively_defined_functions_default_backend timeo...
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Bump numpy from 1.18.2 to 1.22.0

    Bump numpy from 1.18.2 to 1.22.0

    Bumps numpy from 1.18.2 to 1.22.0.

    Release notes

    Sourced from numpy's releases.

    v1.22.0

    NumPy 1.22.0 Release Notes

    NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

    • Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.
    • A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.
    • NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.
    • New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.
    • A new configurable allocator for use by downstream projects.

    These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

    The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

    Expired deprecations

    Deprecated numeric style dtype strings have been removed

    Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

    (gh-19539)

    Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

    numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

    (gh-19615)

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • ModuleNotFoundError: No module named 'mlbox.preprocessing'

    ModuleNotFoundError: No module named 'mlbox.preprocessing'

    Hi,

    Even after installed !pip install mlbox, getting error message while using mlbox.

    1. Able to find in installation using !pip list command

    mlbox 0.8.5

    1. Error Message

    ModuleNotFoundError Traceback (most recent call last) in 1 #https://mlbox.readthedocs.io/en/latest/index.html 2 # importing the required libraries ----> 3 from mlbox.preprocessing import * 4 from mlbox.optimisation import * 5 from mlbox.prediction import *

    ModuleNotFoundError: No module named 'mlbox.preprocessing'

    opened by mrajkumar18 0
Releases(v0.8.1)
Visualize classified time series data with interactive Sankey plots in Google Earth Engine

sankee Visualize changes in classified time series data with interactive Sankey plots in Google Earth Engine Contents Description Installation Using P

Aaron Zuspan 76 Dec 15, 2022
Implemented four supervised learning Machine Learning algorithms

Implemented four supervised learning Machine Learning algorithms from an algorithmic family called Classification and Regression Trees (CARTs), details see README_Report.

Teng (Elijah) Xue 0 Jan 31, 2022
Cryptocurrency price prediction and exceptions in python

Cryptocurrency price prediction and exceptions in python This is a coursework on foundations of computing module Through this coursework i worked on m

Panagiotis Sotirellos 1 Nov 07, 2021
A Multipurpose Library for Synthetic Time Series Generation in Python

TimeSynth Multipurpose Library for Synthetic Time Series Please cite as: J. R. Maat, A. Malali, and P. Protopapas, “TimeSynth: A Multipurpose Library

278 Dec 26, 2022
Projeto: Machine Learning: Linguagens de Programacao 2004-2001

Projeto: Machine Learning: Linguagens de Programacao 2004-2001 Projeto de Data Science e Machine Learning de análise de linguagens de programação de 2

Victor Hugo Negrisoli 0 Jun 29, 2021
Using Logistic Regression and classifiers of the dataset to produce an accurate recall, f-1 and precision score

Using Logistic Regression and classifiers of the dataset to produce an accurate recall, f-1 and precision score

Thines Kumar 1 Jan 31, 2022
Fundamentals of Machine Learning

Fundamentals-of-Machine-Learning This repository introduces the basics of machine learning algorithms for preprocessing, regression and classification

Happy N. Monday 3 Feb 15, 2022
Kaggle Tweet Sentiment Extraction Competition: 1st place solution (Dark of the Moon team)

Kaggle Tweet Sentiment Extraction Competition: 1st place solution (Dark of the Moon team)

Artsem Zhyvalkouski 64 Nov 30, 2022
Responsible Machine Learning with Python

Examples of techniques for training interpretable ML models, explaining ML models, and debugging ML models for accuracy, discrimination, and security.

ph_ 624 Jan 06, 2023
Napari sklearn decomposition

napari-sklearn-decomposition A simple plugin to use with napari This napari plug

1 Sep 01, 2022
Machine-learning-dell - Repositório com as atividades desenvolvidas no curso de Machine Learning

📚 Descrição Neste curso da Dell aprofundamos nossos conhecimentos em Machine Learning. 🖥️ Aulas (Em curso) 1.1 - Python aplicado a Data Science 1.2

Claudia dos Anjos 1 Jan 05, 2022
Deploy AutoML as a service using Flask

AutoML Service Deploy automated machine learning (AutoML) as a service using Flask, for both pipeline training and pipeline serving. The framework imp

Chris Rawles 221 Nov 04, 2022
ArviZ is a Python package for exploratory analysis of Bayesian models

ArviZ (pronounced "AR-vees") is a Python package for exploratory analysis of Bayesian models. Includes functions for posterior analysis, data storage, model checking, comparison and diagnostics

ArviZ 1.3k Jan 05, 2023
This machine learning model was developed for House Prices

This machine learning model was developed for House Prices - Advanced Regression Techniques competition in Kaggle by using several machine learning models such as Random Forest, XGBoost and LightGBM.

serhat_derya 1 Mar 02, 2022
Reggy - Regressions with arbitrarily complex regularization terms

reggy Regressions with arbitrarily complex regularization terms. Currently suppo

Kim 1 Jan 20, 2022
机器学习检测webshell

ai-webshell-detect 机器学习检测webshell,利用textcnn+简单二分类网络,基于keras,花了七天 检测原理: 从文件熵 文件长度 文件语句提取出特征,然后文件熵与长度送入二分类网络,文件语句送入textcnn 项目原理,介绍,怎么做出来的

Huoji's 56 Dec 14, 2022
Scikit learn library models to account for data and concept drift.

liquid_scikit_learn Scikit learn library models to account for data and concept drift. This python library focuses on solving data drift and concept d

7 Nov 18, 2021
ML Kaggle Titanic Problem using LogisticRegrission

-ML-Kaggle-Titanic-Problem-using-LogisticRegrission here you will find the solution for the titanic problem on kaggle with comments and step by step c

Mahmoud Nasser Abdulhamed 3 Oct 23, 2022
Pytools is an open source library containing general machine learning and visualisation utilities for reuse

pytools is an open source library containing general machine learning and visualisation utilities for reuse, including: Basic tools for API developmen

BCG Gamma 26 Nov 06, 2022
LILLIE: Information Extraction and Database Integration Using Linguistics and Learning-Based Algorithms

LILLIE: Information Extraction and Database Integration Using Linguistics and Learning-Based Algorithms Based on the work by Smith et al. (2021) Query

5 Aug 06, 2022