MLBox is a powerful Automated Machine Learning python library.

Last update: Jan 06, 2023

Overview

MLBox is a powerful Automated Machine Learning python library. It provides the following features:

Fast reading and distributed data preprocessing/cleaning/formatting
Highly robust feature selection and leak detection
Accurate hyper-parameter optimization in high-dimensional space
State-of-the art predictive models for classification and regression (Deep Learning, Stacking, LightGBM,...)
Prediction with models interpretation

For more details, please refer to the official documentation

How to Contribute

MLBox has been developed and used by many active community members. Your help is very valuable to make it better for everyone.

Check out call for contributions to see what can be improved, or open an issue if you want something.
Contribute to the tests to make it more reliable.
Contribute to the documents to make it clearer for everyone.
Contribute to the examples to share your experience with other users.
Open issue if you met problems during development.

For more details, please refer to CONTRIBUTING.

Comments

Trying to install and getting xgboost errors
Systems is Kaggle kernel which is Ubuntu based which seems to be the desired environment

I rung this:

!apt-get install build-essential !pip install cmake !pip install xgboost>=0.6a2 !pip install lightgbm>=2.0.2 !pip install mlbox

Resulting in this:

Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-xib6_1h7/xgboost/

Can you please help me out? I see your examples are also Kaggle based, but they don't have the install steps. Do you somehow install packages from setup within the kernel???
opened by TimusLetap 19
Setup script exited with usage: setup.py [global_opts] error: no commands supplied

Hi AxeldeRomblay,

System Information Ubuntu 16.04

I was actually trying to install your MLBox to give it a try. The steps I took were - 1- clone the repository. 2- run setup.py file.

The building part finishes till 100% but then throws a couple of errors. I am posting the actual error below.

[100%] Linking CXX shared library /tmp/easy_install-1gz4zwpu/lightgbm-2.0.2/lightgbm/lib_lightgbm.so [100%] Built target _lightgbm Install lib_lightgbm from: ['lightgbm/lib_lightgbm.so'] error: Setup script exited with usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...] or: setup.py --help [cmd1 cmd2 ...] or: setup.py --help-commands or: setup.py cmd --help

error: no commands supplied.

opened by NeerajSarwan 15
Install fails. Can't tell if it is MLbox or XGboost that doesnt work
Hi Axel,

We're trying to install MLbox and get the following error :

Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-ob6fq6jv/xgboost/ Error installing the following packages: ['xgboost==0.6a2'] Please install them manually

Now, given that xgboost is installed and works, I suppose two issues can cause the error :

The version of xgboost is too old. We're running 0.6.

MLbox can't find the available xgboost version

Indeed, it MLbox seems to try and install xgboost 0.6a2 even though a version is already installed, which is surprising.

Maybe it is me. Thank you for your help.
opened by brcacrm 13
Hub connection request timed out

I tried to run the code in the picture below, but I got the error saying TimeoutError: Hub connection request timed out. I'm using Python2.7 under Ubuntu 16.04

Thanks for your help

opened by ilyes495 10
Cleaning takes too long time on multi-cores cpu
Cleaning takes 276s for house price dataset on intel E5-2683v3 As E5-2683 has more 14cores and 28threads. I guess the problem may cause by n-job=-1 in here. ` if (self.verbose): print("cleaning data ...")

df = pd.concat(Parallel(n_jobs=-1)(delayed(convert_list)(df[col]) for col in df.columns), axis=1) df = pd.concat(Parallel(n_jobs=-1)(delayed(convert_float_and_dates)(df[col]) for col in df.columns), axis=1) `

I don't know how to fix it, may be add a n_jobs arguments for class Reader? Looking for you response. Thank you.
opened by a1a2y3 8
FYI: ColumnTransformer

We'll have a ColumnTransformer in sklearn pretty soon that will make it easier to treat different columns differently. That should make is much simpler to have different pipelines for categorical and continuous data, which seems one of the big issues MLBox addresses.

opened by amueller 8
Code implementation frozen

Hello,

I tried implementing the code in https://www.analyticsvidhya.com/blog/2017/07/mlbox-library-automated-machine-learning/. The engines start but the code implementation is frozen (still running but no task is done). I get the following message on my screen:

I tried to put time.sleep but it doesn't change. I'm on Windows 10 Pro, Python 3.5 with Anaconda Do you have any idea why?

opened by yousseferahim 7
Testing with Predicting Blood Donation challenge

Hi, Doing some tests with this challenge https://www.drivendata.org/competitions/2/warm-up-predict-blood-donations/

With minimal understanding I rank around 700 on 2400 ! I must document some questions on how to get features importance how to set up stacking

Rgds Bruno Seznec

opened by brunosez 7
TypeError: 'generator' object is not subscriptable
When running on a python 3.6 environment in a jupyter notebook, ubuntu 14.04 I get the following:

' from mlbox.preprocessing import * from mlbox.optimisation import * from mlbox.prediction import *

paths = ["train.csv", "test.csv"] target_name = "target"

data = Reader(sep=",").train_test_split(paths, target_name) #reading

space = {

'ne__numerical_strategy' : {"space" : [0, 'mean']}, 'ce__strategy' : {"space" : ["label_encoding", "random_projection", "entity_embedding"]}, 'fs__strategy' : {"space" : ["variance", "rf_feature_importance"]}, 'fs__threshold': {"search" : "choice", "space" : [0.1, 0.2, 0.3]}, 'est__strategy' : {"space" : ["XGBoost"]}, 'est__max_depth' : {"search" : "choice", "space" : [5,6]}, 'est__subsample' : {"search" : "uniform", "space" : [0.6,0.9]} }

opt = Optimiser(scoring = 'roc_auc', n_folds = 4)

best = opt.optimise(space, data, max_evals = 5)

`

`TypeError Traceback (most recent call last) in () 16 opt = Optimiser(scoring = 'roc_auc', n_folds = 4) 17 ---> 18 best = opt.optimise(space, data, max_evals = 5) 19

~/anaconda2/envs/insurance_v2/lib/python3.6/site-packages/mlbox/optimisation/optimiser.py in optimise(self, space, df, max_evals) 565 space=hyper_space, 566 algo=tpe.suggest, --> 567 max_evals=max_evals) 568 569 # Displaying best_params

~/anaconda2/envs/insurance_v2/lib/python3.6/site-packages/hyperopt/fmin.py in fmin(fn, space, algo, max_evals, trials, rstate, allow_trials_fmin, pass_expr_memo_ctrl, catch_eval_exceptions, verbose, return_argmin) 312 313 domain = base.Domain(fn, space, --> 314 pass_expr_memo_ctrl=pass_expr_memo_ctrl) 315 316 rval = FMinIter(algo, domain, trials, max_evals=max_evals,

~/anaconda2/envs/insurance_v2/lib/python3.6/site-packages/hyperopt/base.py in init(self, fn, expr, workdir, pass_expr_memo_ctrl, name, loss_target) 784 before = pyll.dfs(self.expr) 785 # -- raises exception if expr contains cycles --> 786 pyll.toposort(self.expr) 787 vh = self.vh = VectorizeHelper(self.expr, self.s_new_ids) 788 # -- raises exception if v_expr contains cycles

~/anaconda2/envs/insurance_v2/lib/python3.6/site-packages/hyperopt/pyll/base.py in toposort(expr) 713 G.add_edges_from([(n_in, node) for n_in in node.inputs()]) 714 order = nx.topological_sort(G) --> 715 assert order[-1] == expr 716 return order 717 `
opened by NickBuchny 6
Redundant results and model overfitting
We are getting same results irrespective of number of max_evals and seed change.

We have increased n_fold and also reduced max_evals to see if we get different results. For any combination of parameter settings we are getting same results. I think the model is over-fitting during training. Is there any other way where we can check and stop this to get better results?

In our use-case we have not used ne, ce, and fs params in 'space' settings. Is there a way to use stacking regression without these? We are not able to resolve errors while using stacking with only params related to algorithm selection in regression strategy.

I will be grateful if you can help me resolve the above issues. Thanks.
opened by mahatibharadwaj 5

Error while computing the cross validation mean score.

Hi,

I am interested in MLBox and tried for a Kaggle classification project. When processing to the step of optimizing the best hyperparameters, an error message showed as 'An error occurred while computing the cross validation mean score. Check the parameter values and your scoring function.'

Here's the code I used:

` Path = ['train_path', 'test_path'] target = 'target_name'

rd = Reader(sep = ",") df = rd.train_test_split(paths, target_name)

dft = Drift_thresholder() df = dft.fit_transform(df)

space = {'ne__numerical_strategy':{"search":"choice", "space":['mean','median']},

     'ne__categorical_strategy':{"search":"choice",
                                 "space":[np.NaN]},
     
     'ce__strategy':{"search":"choice",
                     "space":['label_encoding','entity_embedding','random_projection']},
     
    'est__strategy':{"search":"choice",
                              "space":["LightGBM"]},    
    'est__n_estimators':{"search":"choice",
                              "space":[150]},    
    'est__colsample_bytree':{"search":"uniform",
                              "space":[0.8,0.95]},
    'est__subsample':{"search":"uniform",
                              "space":[0.8,0.95]},
    'est__max_depth':{"search":"choice",
                              "space":[5,6,7,8,9]},
    'est__learning_rate':{"search":"choice",
                              "space":[0.07]} 

    }

opt = Optimiser(scoring = "roc_auc", n_folds = 5) best_params = opt.optimise(space, df, 15)

` Can you help me with fixing it? Thanks for that!

opened by YAOLI0407 5

Import error while using MLBox inside google collab
I'm facing an abrupt import error, not able to figure out why it is occurring.

Here is the problem

google collab discards the latest versions of MLBox due to dependency failure

automatically downgrades the dependencies

can't fulfill all the import * actions

throws type error when manually installing the MLBox new version

Screenshot of the issue :

Sklearn version: 1.0.2 MLBox version: 0.5.1

Pls look into it and help me resolve this issue
opened by prathikshetty2002 0
Bump tensorflow from 2.0.0 to 2.9.3
Bumps tensorflow from 2.0.0 to 2.9.3.

Release notes

Sourced from tensorflow's releases.

TensorFlow 2.9.3

Release 2.9.3

This release introduces several vulnerability fixes:

Fixes an overflow in tf.keras.losses.poisson (CVE-2022-41887)

Fixes a heap OOB failure in ThreadUnsafeUnigramCandidateSampler caused by missing validation (CVE-2022-41880)

Fixes a segfault in ndarray_tensor_bridge (CVE-2022-41884)

Fixes an overflow in FusedResizeAndPadConv2D (CVE-2022-41885)

Fixes a overflow in ImageProjectiveTransformV2 (CVE-2022-41886)

Fixes an FPE in tf.image.generate_bounding_box_proposals on GPU (CVE-2022-41888)

Fixes a segfault in pywrap_tfe_src caused by invalid attributes (CVE-2022-41889)

Fixes a CHECK fail in BCast (CVE-2022-41890)

Fixes a segfault in TensorListConcat (CVE-2022-41891)

Fixes a CHECK_EQ fail in TensorListResize (CVE-2022-41893)

Fixes an overflow in CONV_3D_TRANSPOSE on TFLite (CVE-2022-41894)

Fixes a heap OOB in MirrorPadGrad (CVE-2022-41895)

Fixes a crash in Mfcc (CVE-2022-41896)

Fixes a heap OOB in FractionalMaxPoolGrad (CVE-2022-41897)

Fixes a CHECK fail in SparseFillEmptyRowsGrad (CVE-2022-41898)

Fixes a CHECK fail in SdcaOptimizer (CVE-2022-41899)

Fixes a heap OOB in FractionalAvgPool and FractionalMaxPool(CVE-2022-41900)

Fixes a CHECK_EQ in SparseMatrixNNZ (CVE-2022-41901)

Fixes an OOB write in grappler (CVE-2022-41902)

Fixes a overflow in ResizeNearestNeighborGrad (CVE-2022-41907)

Fixes a CHECK fail in PyFunc (CVE-2022-41908)

Fixes a segfault in CompositeTensorVariantToComponents (CVE-2022-41909)

Fixes a invalid char to bool conversion in printing a tensor (CVE-2022-41911)

Fixes a heap overflow in QuantizeAndDequantizeV2 (CVE-2022-41910)

Fixes a CHECK failure in SobolSample via missing validation (CVE-2022-35935)

Fixes a CHECK fail in TensorListScatter and TensorListScatterV2 in eager mode (CVE-2022-35935)

TensorFlow 2.9.2

Release 2.9.2

This releases introduces several vulnerability fixes:

Fixes a CHECK failure in tf.reshape caused by overflows (CVE-2022-35934)

Fixes a CHECK failure in SobolSample caused by missing validation (CVE-2022-35935)

Fixes an OOB read in Gather_nd op in TF Lite (CVE-2022-35937)

Fixes a CHECK failure in TensorListReserve caused by missing validation (CVE-2022-35960)

Fixes an OOB write in Scatter_nd op in TF Lite (CVE-2022-35939)

Fixes an integer overflow in RaggedRangeOp (CVE-2022-35940)

Fixes a CHECK failure in AvgPoolOp (CVE-2022-35941)

Fixes a CHECK failures in UnbatchGradOp (CVE-2022-35952)

Fixes a segfault TFLite converter on per-channel quantized transposed convolutions (CVE-2022-36027)

Fixes a CHECK failures in AvgPool3DGrad (CVE-2022-35959)

Fixes a CHECK failures in FractionalAvgPoolGrad (CVE-2022-35963)

Fixes a segfault in BlockLSTMGradV2 (CVE-2022-35964)

Fixes a segfault in LowerBound and UpperBound (CVE-2022-35965)

... (truncated)

Changelog

Sourced from tensorflow's changelog.

Release 2.9.3

This release introduces several vulnerability fixes:

Fixes an overflow in tf.keras.losses.poisson (CVE-2022-41887)

Fixes a heap OOB failure in ThreadUnsafeUnigramCandidateSampler caused by missing validation (CVE-2022-41880)

Fixes a segfault in ndarray_tensor_bridge (CVE-2022-41884)

Fixes an overflow in FusedResizeAndPadConv2D (CVE-2022-41885)

Fixes a overflow in ImageProjectiveTransformV2 (CVE-2022-41886)

Fixes an FPE in tf.image.generate_bounding_box_proposals on GPU (CVE-2022-41888)

Fixes a segfault in pywrap_tfe_src caused by invalid attributes (CVE-2022-41889)

Fixes a CHECK fail in BCast (CVE-2022-41890)

Fixes a segfault in TensorListConcat (CVE-2022-41891)

Fixes a CHECK_EQ fail in TensorListResize (CVE-2022-41893)

Fixes an overflow in CONV_3D_TRANSPOSE on TFLite (CVE-2022-41894)

Fixes a heap OOB in MirrorPadGrad (CVE-2022-41895)

Fixes a crash in Mfcc (CVE-2022-41896)

Fixes a heap OOB in FractionalMaxPoolGrad (CVE-2022-41897)

Fixes a CHECK fail in SparseFillEmptyRowsGrad (CVE-2022-41898)

Fixes a CHECK fail in SdcaOptimizer (CVE-2022-41899)

Fixes a heap OOB in FractionalAvgPool and FractionalMaxPool(CVE-2022-41900)

Fixes a CHECK_EQ in SparseMatrixNNZ (CVE-2022-41901)

Fixes an OOB write in grappler (CVE-2022-41902)

Fixes a overflow in ResizeNearestNeighborGrad (CVE-2022-41907)

Fixes a CHECK fail in PyFunc (CVE-2022-41908)

Fixes a segfault in CompositeTensorVariantToComponents (CVE-2022-41909)

Fixes a invalid char to bool conversion in printing a tensor (CVE-2022-41911)

Fixes a heap overflow in QuantizeAndDequantizeV2 (CVE-2022-41910)

Fixes a CHECK failure in SobolSample via missing validation (CVE-2022-35935)

Fixes a CHECK fail in TensorListScatter and TensorListScatterV2 in eager mode (CVE-2022-35935)

Release 2.8.4

This release introduces several vulnerability fixes:

Fixes a heap OOB failure in ThreadUnsafeUnigramCandidateSampler caused by missing validation (CVE-2022-41880)

Fixes a segfault in ndarray_tensor_bridge (CVE-2022-41884)

Fixes an overflow in FusedResizeAndPadConv2D (CVE-2022-41885)

Fixes a overflow in ImageProjectiveTransformV2 (CVE-2022-41886)

Fixes an FPE in tf.image.generate_bounding_box_proposals on GPU (CVE-2022-41888)

Fixes a segfault in pywrap_tfe_src caused by invalid attributes (CVE-2022-41889)

Fixes a CHECK fail in BCast (CVE-2022-41890)

Fixes a segfault in TensorListConcat (CVE-2022-41891)

Fixes a CHECK_EQ fail in TensorListResize (CVE-2022-41893)

Fixes an overflow in CONV_3D_TRANSPOSE on TFLite (CVE-2022-41894)

Fixes a heap OOB in MirrorPadGrad (CVE-2022-41895)

Fixes a crash in Mfcc (CVE-2022-41896)

Fixes a heap OOB in FractionalMaxPoolGrad (CVE-2022-41897)

Fixes a CHECK fail in SparseFillEmptyRowsGrad (CVE-2022-41898)

Fixes a CHECK fail in SdcaOptimizer (CVE-2022-41899)

... (truncated)

Commits

a5ed5f3 Merge pull request #58584 from tensorflow/vinila21-patch-2

258f9a1 Update py_func.cc

cd27cfb Merge pull request #58580 from tensorflow-jenkins/version-numbers-2.9.3-24474

3e75385 Update version numbers to 2.9.3

bc72c39 Merge pull request #58482 from tensorflow-jenkins/relnotes-2.9.3-25695

3506c90 Update RELEASE.md

8dcb48e Update RELEASE.md

4f34ec8 Merge pull request #58576 from pak-laura/c2.99f03a9d3bafe902c1e6beb105b2f2417...

6fc67e4 Replace CHECK with returning an InternalError on failing to create python tuple

5dbe90a Merge pull request #58570 from tensorflow/r2.9-7b174a0f2e4

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0

Project dependencies may have API risk issues

Hi, In MLBox, inappropriate dependency versioning constraints can cause risks.

Below are the dependencies and version constraints that the project is using

numpy==1.18.2
scipy==1.4.1
matplotlib==3.0.3
hyperopt==0.2.3
pandas==0.25.3
joblib==0.14.1
scikit-learn==0.22.1
tensorflow==2.0.0
lightgbm==2.3.1
tables==3.5.2
xlrd==1.2.0

The version constraint == will introduce the risk of dependency conflicts because the scope of dependencies is too strict. The version constraint No Upper Bound and * will introduce the risk of the missing API Error because the latest version of the dependencies may remove some APIs.

After further analysis, in this project, The version constraint of dependency matplotlib can be changed to >=1.3.0,<=3.0.3. The version constraint of dependency joblib can be changed to ==0.7.0d. The version constraint of dependency joblib can be changed to >=0.3.6.dev,<=1.1.0. The version constraint of dependency scikit-learn can be changed to >=0.20rc1,<=0.20.4.

The above modification suggestions can reduce the dependency conflicts as much as possible, and introduce the latest version as much as possible without calling Error in the projects.

The invocation of the current project includes all the following methods.

The calling methods from the matplotlib

matplotlib.use

The calling methods from the joblib

joblib.delayed
joblib.Parallel

The calling methods from the scikit-learn

sklearn.tree.DecisionTreeRegressor
sklearn.ensemble.RandomForestRegressor
sklearn.linear_model.LinearRegression
sklearn.linear_model.Ridge
sklearn.ensemble.ExtraTreesRegressor
sklearn.ensemble.AdaBoostClassifier
sklearn.preprocessing.LabelEncoder
joblib.delayed
sklearn.ensemble.RandomForestClassifier
sklearn.impute.SimpleImputer
sklearn.preprocessing.LabelEncoder.fit_transform
sklearn.tree.DecisionTreeClassifier
sklearn.ensemble.BaggingClassifier
sklearn.ensemble.AdaBoostRegressor
sklearn.linear_model.LogisticRegression
joblib.Parallel
sklearn.ensemble.ExtraTreesClassifier
sklearn.ensemble.BaggingRegressor
sklearn.linear_model.Lasso
sklearn.metrics.roc_auc_score
sklearn.metrics.make_scorer

The calling methods from the all methods

self.__Lnum.df.fillna
self.fit_transform
x.col.self.__Enc.col.df.apply.tolist
self.set_params
readme_file.read
col.df_train.apply
setattr
drift.DriftThreshold.get_support
estimator.fit
encoding.categorical_encoder.Categorical_encoder
pandas.datetime
i.col.x.get_embeddings.col.df.apply.tolist
clf.predict_proba
numpy.arange
print
y_train.drop.drop
est.get_params.items
tensorflow.keras.layers.Dense
mlbox.preprocessing.Reader.train_test_split
serie_to_df.hour.astype
self.transform
y.value_counts
warnings.warn
pandas.datetime.serie.pandas.DatetimeIndex.total_seconds
pandas.Series.describe
mlbox.optimisation.make_scorer
self.get_estimator
convert_list
sklearn.ensemble.RandomForestRegressor.fit
numpy.shape
self.__cv.split
classifier.Classifier
col.df.apply
self.clean
model.regression.feature_selector.Reg_feature_selector
self.__classifier.score
serie.pandas.DatetimeIndex.dayofweek.astype
tensorflow.keras.layers.concatenate
pandas.read_csv
pipe.append
self.__regressor.predict
len
self.fit
mlbox.preprocessing.Drift_thresholder
tensorflow.keras.models.Model.get_weights
self.__classifier.get_params.keys
hyperopt.hp.choice
self.__set_regressor
pp.set_params.predict
y_train.drop.apply
matplotlib.pyplot.savefig
pandas.read_json
model.get_estimator.get_params.items
selected_col.append
lightgbm.LGBMRegressor
tensorflow.keras.layers.Reshape
df_train.drop_duplicates.keys
path.split
drift_estimator.DriftEstimator
pandas.DataFrame.head
sorted.remove
col.df_train.dropna.unique
dropout1.Dropout
keepList.append
hyperopt.hp.uniform
y_train.pd.get_dummies.astype
pandas.Series.value_counts
time.time
sklearn.metrics.roc_auc_score
open.close
zip
d.copy
sklearn.linear_model.LinearRegression
sklearn.ensemble.ExtraTreesRegressor
pandas.DatetimeIndex
regressor.Regressor
pandas.Series.nunique
convert_float_and_dates.delayed
tuples.dict.items
col.df_train.dropna
pandas.concat.to_hdf
y.apply
str
ValueError
version_file.read
self.__K.values
serie_to_df.dayofweek.astype
self.__plot_feature_importances
space.keys
self.__classifier.get_params
df_train.drop_duplicates.drop_duplicates
numpy.exp
p.startswith
numpy.intersect1d
range
mock.Mock
numpy.random.seed
self.level_estimator.predict
self.__regress_params.items
params.keys
numpy.abs
sklearn.pipeline.Pipeline
serie_to_df.second.astype
self.__classif_params.items
df.value_counts
pandas.DataFrame
sklearn.model_selection.cross_val_score
serie_to_df.minute.astype
sklearn.pipeline.Pipeline.fit
self.level_estimator.predict_proba
self.__regressor.fit
reg.fit
serie.pandas.DatetimeIndex.minute.astype
filter
y_train.drop.value_counts
lightgbm.LGBMClassifier
self.__regressor.transform
mlbox.prediction.Predictor
self.get_params
tensorflow.keras.layers.Embedding
est.get_estimator.get_params
col.self.__K.Reshape
os.mkdir
drift.DriftThreshold.fit
sklearn.model_selection.StratifiedKFold.split
model.regression.regressor.Regressor.get_params
pickle.load
tensorflow.keras.layers.Dropout
numpy.int
sum
model.regression.stacking_regressor.StackingRegressor
reg.get_params
pp.set_params.set_params
numpy.sort
sklearn.model_selection.cross_val_predict
matplotlib.pyplot.yticks
serie.apply.tolist
pandas.concat.keys
self.__classifier.predict
fh.read.splitlines
params.items
reg.predict
matplotlib.pyplot.barh
params.update
est.feature_importances.values
pandas.DataFrame.idxmax
encoding.na_encoder.NA_encoder.get_params
list.x.type.serie.apply.sum
pandas.SparseDataFrame
model.classification.feature_selector.Clf_feature_selector
convert_list.delayed
self.__imp.transform
self.__set_classifier
self.__classifier.fit
sklearn.linear_model.Ridge
self.n_jobs.Parallel
open.write
operator.itemgetter
ds.drifts.items
dropList.append
numpy.sum
sorted
sklearn.ensemble.ExtraTreesClassifier
df_train.shape.df_train.isnull.sum.sort_values.max
model.get_params.items
serie.pandas.DatetimeIndex.second.astype
drift_estimator.DriftEstimator.score
self.get_estimator.estimator_weights_.sum
mlbox.prediction.Predictor.fit_predict
model.classification.stacking_classifier.StackingClassifier
self.__Lcat.df.fillna
col.df_train.nunique
df_train.sample
self.__regressor.score
model.regression.feature_selector.Reg_feature_selector.get_params
pandas.concat
pandas.concat.values
sklearn.metrics.SCORERS.keys
matplotlib.pyplot.show
sklearn.ensemble.BaggingClassifier
model.get_estimator.get_params
model.classification.classifier.Classifier
S.append
pp.set_params.fit
stck.STCK.get_params.copy
open
est.get_estimator.get_params.items
fh.read
tensorflow.keras.models.Model.compile
clf.fit
max
numpy.log
sklearn.ensemble.AdaBoostClassifier
sklearn.preprocessing.LabelEncoder
importance_bag.append
serie_to_df.month.astype
int
enumerate
self.__cross_val_predict_proba
get_embeddings
self.__imp.fit
df_train.shape.df_train.isnull.sum.sort_values
sync_fit
y_train.nunique.Dense
sklearn.linear_model.LogisticRegression
serie_to_df.day.astype
sklearn.linear_model.Lasso
min
set
df_test.sample
df_train.drop_duplicates.isnull
df_train.std
numpy.random.shuffle
hyperopt.fmin.items
tensorflow.keras.layers.Input
serie.pandas.DatetimeIndex.month.astype
pandas.get_dummies
pandas.to_datetime
pandas.Series
mlbox.optimisation.Optimiser.optimise
self.__classifier.predict_proba
sklearn.ensemble.RandomForestClassifier.fit
self.level_estimator.fit
sys.path.insert
self.__regressor.get_params
model.regression.regressor.Regressor.get_estimator
serie.pandas.DatetimeIndex.hour.astype
setuptools.setup
df_test.index.nunique
self.__Lcat.df_train.isnull
sklearn.impute.SimpleImputer
sklearn.preprocessing.LabelEncoder.fit_transform
pandas.DataFrame.to_csv
pandas.datetime.serie_to_df.total_seconds
embeddings.append
list
col.df_train.unique
self.__regressor.get_params.keys
stck.STCK.get_params
mlbox.optimisation.Optimiser
encoding.na_encoder.NA_encoder
pickle.dump
Mock
sklearn.ensemble.RandomForestRegressor
col.self.__K.col.self.__Enc.len.Embedding
self.get_params.keys
sklearn.ensemble.RandomForestClassifier
joblib.delayed
tensorflow.keras.models.Model.fit
df.drop
df_train.isnull.sum
numpy.zeros
self.__Lcat.df_train.isnull.sum
sys.modules.update
serie.apply.apply
copy.copy
mlbox.preprocessing.Drift_thresholder.fit_transform
df_train.drop_duplicates.to_hdf
joblib.Parallel
numpy.round
tensorflow.keras.models.Model
serie.pandas.DatetimeIndex.year.astype
col.self.__Enc.keys
sklearn.pipeline.Pipeline.transform
df_train.drop_duplicates.values
sklearn.model_selection.StratifiedKFold
df_train.index.nunique
sync_fit.delayed
sklearn.tree.DecisionTreeRegressor
pandas.read_hdf
os.path.dirname
self.clean.drop_duplicates
drift.DriftThreshold.drifts
df.name.pred.apply
pandas.Series.values
matplotlib.use
serie.pandas.DatetimeIndex.day.astype
pandas.read_excel
sklearn.tree.DecisionTreeClassifier
type
dict
drift_estimator.DriftEstimator.fit
numpy.std
numpy.mean
sklearn.ensemble.BaggingRegressor
clf.get_params
os.getcwd
estimator.predict_proba
serie_to_df.year.astype
self.level_estimator.get_params
model.get_params
matplotlib.pyplot.grid
callable
self.__save_feature_importances
min.Dense
mlbox.optimisation.Optimiser.evaluate
model.regression.regressor.Regressor.feature_importances
hyperopt.fmin
drift.DriftThreshold
sklearn.ensemble.AdaBoostRegressor
matplotlib.pyplot.text
y_train.nunique
y_train.index.nunique
matplotlib.pyplot.title
stck.STCK.get_params.copy.keys
pp.set_params.predict_proba
sklearn.metrics.make_scorer
matplotlib.pyplot.close
model.regression.regressor.Regressor
pickle.load.inverse_transform
dropout2.Dropout
drift.DriftThreshold.transform
target_name.df.isnull
var.df_train.nunique
self.__classifier.predict_log_proba
numpy.percentile
sklearn.model_selection.KFold
model.get_estimator
int.Dense
self.fit_transform.drop
matplotlib.pyplot.figure
df.apply
self.evaluate
is_null.df.drop
p.split
convert_float_and_dates
mlbox.preprocessing.Reader
col.df_train.mode
getattr
df_test.df_train.pd.concat.drop
inputs.append

@developer Could please help me check this issue? May I pull a request to fix it? Thank you very much.

opened by PyDeps 0

Bump joblib from 0.14.1 to 1.2.0
Bumps joblib from 0.14.1 to 1.2.0.

Changelog

Sourced from joblib's changelog.

Release 1.2.0

Fix a security issue where eval(pre_dispatch) could potentially run arbitrary code. Now only basic numerics are supported. joblib/joblib#1327

Make sure that joblib works even when multiprocessing is not available, for instance with Pyodide joblib/joblib#1256

Avoid unnecessary warnings when workers and main process delete the temporary memmap folder contents concurrently. joblib/joblib#1263

Fix memory alignment bug for pickles containing numpy arrays. This is especially important when loading the pickle with mmap_mode != None as the resulting numpy.memmap object would not be able to correct the misalignment without performing a memory copy. This bug would cause invalid computation and segmentation faults with native code that would directly access the underlying data buffer of a numpy array, for instance C/C++/Cython code compiled with older GCC versions or some old OpenBLAS written in platform specific assembly. joblib/joblib#1254

Vendor cloudpickle 2.2.0 which adds support for PyPy 3.8+.

Vendor loky 3.3.0 which fixes several bugs including:

robustly forcibly terminating worker processes in case of a crash (joblib/joblib#1269);

avoiding leaking worker processes in case of nested loky parallel calls;

reliability spawn the correct number of reusable workers.

Release 1.1.0

Fix byte order inconsistency issue during deserialization using joblib.load in cross-endian environment: the numpy arrays are now always loaded to use the system byte order, independently of the byte order of the system that serialized the pickle. joblib/joblib#1181

Fix joblib.Memory bug with the ignore parameter when the cached function is a decorated function.

... (truncated)

Commits

5991350 Release 1.2.0

3fa2188 MAINT cleanup numpy warnings related to np.matrix in tests (#1340)

cea26ff CI test the future loky-3.3.0 branch (#1338)

8aca6f4 MAINT: remove pytest.warns(None) warnings in pytest 7 (#1264)

067ed4f XFAIL test_child_raises_parent_exits_cleanly with multiprocessing (#1339)

ac4ebd5 MAINT add back pytest warnings plugin (#1337)

a23427d Test child raises parent exits cleanly more reliable on macos (#1335)

ac09691 [MAINT] various test updates (#1334)

4a314b1 Vendor loky 3.2.0 (#1333)

bdf47e9 Make test_parallel_with_interactively_defined_functions_default_backend timeo...

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
Bump numpy from 1.18.2 to 1.22.0
Bumps numpy from 1.18.2 to 1.22.0.

Release notes

Sourced from numpy's releases.

v1.22.0

NumPy 1.22.0 Release Notes

NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.

A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.

NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.

New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.

A new configurable allocator for use by downstream projects.

These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

Expired deprecations

Deprecated numeric style dtype strings have been removed

Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

(gh-19539)

Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

(gh-19615)

... (truncated)

Commits

4adc87d Merge pull request #20685 from charris/prepare-for-1.22.0-release

fd66547 REL: Prepare for the NumPy 1.22.0 release.

125304b wip

c283859 Merge pull request #20682 from charris/backport-20416

5399c03 Merge pull request #20681 from charris/backport-20954

f9c45f8 Merge pull request #20680 from charris/backport-20663

794b36f Update armccompiler.py

d93b14e Update test_public_api.py

7662c07 Update init.py

311ab52 Update armccompiler.py

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
ModuleNotFoundError: No module named 'mlbox.preprocessing'
Hi,

Even after installed !pip install mlbox, getting error message while using mlbox.

Able to find in installation using !pip list command

mlbox 0.8.5

Error Message

ModuleNotFoundError Traceback (most recent call last) in 1 #https://mlbox.readthedocs.io/en/latest/index.html 2 # importing the required libraries ----> 3 from mlbox.preprocessing import * 4 from mlbox.optimisation import * 5 from mlbox.prediction import *

ModuleNotFoundError: No module named 'mlbox.preprocessing'
opened by mrajkumar18 0

Releases(v0.8.1)

v0.8.1(Aug 25, 2019)
support for python 3.5, 3.6 & 3.7

update package dependencies

Source code(tar.gz)
Source code(zip)
v0.7.0(Jun 27, 2019)
add support for Mac OS & Windows

update support for python versions

improve setup

add tests

improve documentation & examples

minor changes in the package architecture

Source code(tar.gz)
Source code(zip)
v0.5.0(Aug 25, 2017)

Parallelisation issues are now fixed ! Importing MLBox is now immediate...
Source code(tar.gz)
Source code(zip)
v0.4.0(Jul 18, 2017)

MLBox will automatically save and reload fitted models while fitting a pipeline configuration
Source code(tar.gz)
Source code(zip)
v0.3.1(Jul 12, 2017)

MLBox is now out on PyPI !
Source code(tar.gz)
Source code(zip)
v0.3.0(Jul 11, 2017)

MLBox is now compatible with : Python 2.7, 3.4, 3.5 and 3.6
Source code(tar.gz)
Source code(zip)
v0.2.2(Jul 10, 2017)

Compatible with Python 2.7
Source code(tar.gz)
Source code(zip)

MLBox is a powerful Automated Machine Learning python library.

Related tags

Overview

How to Contribute

Comments

TensorFlow 2.9.3

Release 2.9.3

TensorFlow 2.9.2

Release 2.9.2

Release 2.9.3

Release 2.8.4

Release 1.2.0

Release 1.1.0

v1.22.0

NumPy 1.22.0 Release Notes

Expired deprecations

Deprecated numeric style dtype strings have been removed

Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

Releases(v0.8.1)

v0.8.1(Aug 25, 2019)

v0.7.0(Jun 27, 2019)

v0.5.0(Aug 25, 2017)

v0.4.0(Jul 18, 2017)

v0.3.1(Jul 12, 2017)

v0.3.0(Jul 11, 2017)

v0.2.2(Jul 10, 2017)

Owner

Axel

Factorization machines in python

Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.

machine learning model deployment project of Iris classification model in a minimal UI using flask web framework and deployed it in Azure cloud using Azure app service

Pandas-method-chaining is a plugin for flake8 that provides method chaining linting for pandas code

Exemplary lightweight and ready-to-deploy machine learning project

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

Convoys is a simple library that fits a few statistical model useful for modeling time-lagged conversions.

scikit-learn: machine learning in Python

Python module for performing linear regression for data with measurement errors and intrinsic scatter

This is the code repository for Interpretable Machine Learning with Python, published by Packt.

Combines Bayesian analyses from many datasets.

pandas, scikit-learn, xgboost and seaborn integration

ZenML 🙏: MLOps framework to create reproducible ML pipelines for production machine learning.

Predicting Keystrokes using an Audio Side-Channel Attack and Machine Learning

Built various Machine Learning algorithms (Logistic Regression, Random Forest, KNN, Gradient Boosting and XGBoost. etc)

Stats, linear algebra and einops for xarray

Scikit-Learn useful pre-defined Pipelines Hub

A collection of neat and practical data science and machine learning projects

A repository for collating all the resources such as articles, blogs, papers, and books related to Bayesian Statistics.

This is the material used in my free Persian course: Machine Learning with Python

Expired deprecations for `loads`, `ndfromtxt`, and `mafromtxt` in npyio