abess: Fast Best-Subset Selection in Python and R

Last update: Dec 21, 2022

Overview

abess: Fast Best-Subset Selection in Python and R

Overview

abess (Adaptive BEst Subset Selection) library aims to solve general best subset selection, i.e., find a small subset of predictors such that the resulting model is expected to have the highest accuracy. The selection for best subset shows great value in scientific researches and practical applications. For example, clinicians want to know whether a patient is healthy or not based on the expression levels of a few of important genes.

This library implements a generic algorithm framework to find the optimal solution in an extremely fast way. This framework now supports the detection of best subset under: linear regression, classification (binary or multi-class), counting-response modeling, censored-response modeling, multi-response modeling (multi-tasks learning), etc. It also supports the variants of best subset selection like group best subset selection, nuisance penalized regression, Especially, the time complexity of (group) best subset selection for linear regression is certifiably polynomial.

Quick start

The abess software has both Python and R's interfaces. Here a quick start will be given and for more details, please view: Installation.

Python package

Install the stable version of Python-package from Pypi with:

$ pip install abess

Best subset selection for linear regression on a simulated dataset in Python:

from abess.linear import abessLm
from abess.datasets import make_glm_data
sim_dat = make_glm_data(n = 300, p = 1000, k = 10, family = "gaussian")
model = abessLm()
model.fit(sim_dat.x, sim_dat.y)

See more examples analyzed with Python in the Python tutorials.

R package

Install the stable version of R-package from CRAN with:

install.packages("abess")

Best subset selection for linear regression on a simulated dataset in R:

library(abess)
sim_dat <- generate.data(n = 300, p = 1000)
abess(x = sim_dat[["x"]], y = sim_dat[["y"]])

See more examples analyzed with R in the R tutorials.

Runtime Performance

To show the power of abess in computation, we assess its timings of the CPU execution (seconds) on synthetic datasets, and compare to state-of-the-art variable selection methods. The variable selection and estimation results are deferred to Python performance and R performance. All computations are conducted on a Ubuntu platform with Intel(R) Core(TM) i9-9940X CPU @ 3.30GHz and 48 RAM.

Python package

We compare abess Python package with scikit-learn on linear regression and logistic regression. Results are presented in the below figure:

It can be see that abess uses the least runtime to find the solution. This results can be reproduced by running the command in shell:

$ python abess/docs/simulation/Python/timings.py

R package

We compare abess R package with three widely used R packages: glmnet, ncvreg, and L0Learn. We get the runtime comparison results:

Compared with other packages, abess shows competitive computational efficiency, and achieves the best computational power when variables have a large correlation.

Conducting the following command in shell can reproduce the above results in R:

$ Rscript abess/docs/simulation/R/timings.R

Open source software

abess is a free software and its source code is publicly available on Github. The core framework is programmed in C++, and user-friendly R and Python interfaces are offered. You can redistribute it and/or modify it under the terms of the GPL-v3 License. We welcome contributions for abess, especially stretching abess to the other best subset selection problems.

Citation

If you use abess or reference our tutorials in a presentation or publication, we would appreciate citations of our library.

Jin Zhu, Liyuan Hu, Junhao Huang, Kangkang Jiang, Yanhang Zhang, Shiyun Lin, Junxian Zhu, Xueqin Wang (2021). “abess: A Fast Best Subset Selection Library in Python and R.” arXiv:2110.09697.

The corresponding BibteX entry:

@article{zhu-abess-arxiv,
  author    = {Jin Zhu and Liyuan Hu and Junhao Huang and Kangkang Jiang and Yanhang Zhang and Shiyun Lin and Junxian Zhu and Xueqin Wang},
  title     = {abess: A Fast Best Subset Selection Library in Python and R},
  journal   = {arXiv:2110.09697},
  year      = {2021},
}

References

Junxian Zhu, Canhong Wen, Jin Zhu, Heping Zhang, and Xueqin Wang (2020). A polynomial algorithm for best-subset selection problem. Proceedings of the National Academy of Sciences, 117(52):33117-33123.
Pölsterl, S (2020). scikit-survival: A Library for Time-to-Event Analysis Built on Top of scikit-learn. J. Mach. Learn. Res., 21(212), 1-6.
Yanhang Zhang, Junxian Zhu, Jin Zhu, and Xueqin Wang (2021). Certifiably Polynomial Algorithm for Best Group Subset Selection. arXiv preprint arXiv:2104.12576.
Qiang Sun and Heping Zhang (2020). Targeted Inference Involving High-Dimensional Data Using Nuisance Penalized Regression, Journal of the American Statistical Association, DOI: 10.1080/01621459.2020.1737079.

Comments

Import error

When I import abess in each python version I try,such as ( 3.5 3.6 3.7 ) all report the error as follow'

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\ProgramData\Anaconda3\envs\cp3710\lib\site-packages\abess\__init__.py", line 9, in <module>
    from abess.linear import abessLogistic, abessLm, abessCox, abessPoisson, abessMultigaussian, abessMultinomial
  File "C:\ProgramData\Anaconda3\envs\cp3710\lib\site-packages\abess\linear.py", line 3, in <module>
    from .bess_base import bess_base
  File "C:\ProgramData\Anaconda3\envs\cp3710\lib\site-packages\abess\bess_base.py", line 7, in <module>
    from abess.cabess import pywrap_abess
  File "C:\ProgramData\Anaconda3\envs\cp3710\lib\site-packages\abess\cabess.py", line 13, in <module>
    from . import _cabess
ImportError: DLL load failed: 找不到指定的模块。

I try all I can find in the web,but not works, cloud anyone can help me .

opened by bored2020 12

Problem of computing GIC

I want to compute GIC to select the true model. But I gain different results from the abess packages and manual calculation.

   set.seed(2)
    p = 250
    N = 2500
    X = matrix(rnorm(N * p), ncol = p)
    A = sort(sample(p, 10))
    beta = rep(0, p)
    beta = replace(beta, A, rnorm(10, mean = 6))
    xbeta <- X %*% beta
    Y <- xbeta + rnorm(N)

Compute the estimator by abess packages.


    C = abess(X, Y, family = "gaussian", tune.path="sequence",tune.type = "gic")
    k = C$best.size
    mid=coef(abess(X, Y, family = "gaussian",support.size =k))
    Central =mid[2:(p+1)]
    intercept=mid[1]
    #compute GIC[10]=131.3686
    GIC= N*log(1/(2*N)*t(Y-X%*%Central-intercept)%*%(Y-X%*%Central-intercept))+k*log(p)*(log(log(N)))
    #GIC=-1601.499

good first issue invalid

opened by yannstory 9

zsh: illegal hardware instruction python (mac m1 silicon)

I run the following code in the terminal, I get the error "zsh: illegal hardware instruction python"

Python 3.9.7 (default, Sep 16 2021, 08:50:36) 
[Clang 10.0.0 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from abess.linear import abessLogistic
>>> from sklearn.datasets import load_breast_cancer
>>> from sklearn.pipeline import Pipeline
>>> from sklearn.metrics import make_scorer, roc_auc_score
>>> from sklearn.preprocessing import PolynomialFeatures
>>> from sklearn.model_selection import GridSearchCV
>>> pipe = Pipeline([('poly', PolynomialFeatures(include_bias=False)), ('alogistic', abessLogistic())])
>>> param_grid = {'poly__interaction_only': [True, False],'poly__degree': [1, 2, 3]}
>>> scorer = make_scorer(roc_auc_score, greater_is_better=True)
>>> grid_search = GridSearchCV(pipe, param_grid, scoring=scorer, cv=5)
>>> X, y = load_breast_cancer(return_X_y=True)
>>> grid_search.fit(X, y)

documentation

opened by JiaqiHu2021 6

cross validation in R package

Describe the bug The cross validation result is not the same as the result written in R.

The code to reproduce

library(abess)
n <- 100
p <- 200
support.size <- 3
dataset <- generate.data(n, p, support.size, seed = 1)
ss <- 0:10

nfolds <- 5
foldid <- rep(1:nfolds, ceiling(n / nfolds))[1:n]
abess_fit <- abess(dataset[["x"]], dataset[["y"]], 
                   tune.type = "cv", nfolds = nfolds, 
                   foldid = foldid, support.size = ss, num.threads = 1)

cv <- rep(0, length(ss))
for (k in 1:nfolds) {
  abess_fit_k <- abess(dataset[["x"]][foldid != k, ], 
                       dataset[["y"]][foldid != k], support.size = ss)
  y_hat_k <- predict(abess_fit_k, dataset[["x"]][foldid == k, ], 
                     support.size = ss)
  fold_cv <- apply(y_hat_k, 2, function(yh) {
    mean((dataset[["y"]][foldid == k] - yh)^2)
  })
  fold_cv <- round(fold_cv, digits = 2)
  print(fold_cv)
  cv <- cv + fold_cv
}
cv <- cv / nfolds
names(cv) <- NULL
all.equal(cv, abess_fit$tune.value, digits = 2)

Expected behavior The output of all.equal(cv, abess_fit$tune.value, digits = 2) is TRUE. However, the output is "Mean relative difference: 0.0008444762".

System info

platform       x86_64-apple-darwin17.0     
arch           x86_64                      
os             darwin17.0                  
system         x86_64, darwin17.0          
status                                     
major          4                           
minor          1.0                         
year           2021                        
month          05                          
day            18                          
svn rev        80317                       
language       R                           
version.string R version 4.1.0 (2021-05-18)
nickname       Camp Pontanezen

wontfix

opened by Mamba413 5

Could not install python package on m1 pro silicon

I run pip install abess in the terminal

Last login: Tue Jan 18 20:32:17 on ttys000 (base) [email protected] ~ % pip install abess

Collecting abess
  Using cached abess-0.3.6.tar.gz (1.5 MB)
Requirement already satisfied: numpy in ./opt/anaconda3/lib/python3.9/site-packages (from abess) (1.20.3)
Requirement already satisfied: scipy in ./opt/anaconda3/lib/python3.9/site-packages (from abess) (1.7.1)
Requirement already satisfied: scikit-learn>=0.24 in ./opt/anaconda3/lib/python3.9/site-packages (from abess) (0.24.2)
Requirement already satisfied: joblib>=0.11 in ./opt/anaconda3/lib/python3.9/site-packages (from scikit-learn>=0.24->abess) (1.1.0)
Requirement already satisfied: threadpoolctl>=2.0.0 in ./opt/anaconda3/lib/python3.9/site-packages (from scikit-learn>=0.24->abess) (2.2.0)
Building wheels for collected packages: abess
  Building wheel for abess (setup.py) ... error
  ERROR: Command errored out with exit status 1:
   command: /Users/jiaqihu/opt/anaconda3/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/setup.py'"'"'; __file__='"'"'/private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-wheel-j5ai4o5i
       cwd: /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/
  Complete output (19 lines):
  bash: /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/copy_src.sh: No such file or directory
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.macosx-10.9-x86_64-3.9
  creating build/lib.macosx-10.9-x86_64-3.9/abess
  copying /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/abess/metrics.py -> build/lib.macosx-10.9-x86_64-3.9/abess
  copying /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/abess/linear.py -> build/lib.macosx-10.9-x86_64-3.9/abess
  copying /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/abess/cabess.py -> build/lib.macosx-10.9-x86_64-3.9/abess
  copying /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/abess/datasets.py -> build/lib.macosx-10.9-x86_64-3.9/abess
  copying /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/abess/__init__.py -> build/lib.macosx-10.9-x86_64-3.9/abess
  copying /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/abess/bess_base.py -> build/lib.macosx-10.9-x86_64-3.9/abess
  copying /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/abess/pca.py -> build/lib.macosx-10.9-x86_64-3.9/abess
  running build_ext
  building 'abess._cabess' extension
  swigging /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/src/pywrap.i to /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/src/pywrap_wrap.cpp
  swig -python -c++ -o /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/src/pywrap_wrap.cpp /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/src/pywrap.i
  error: command 'swig' failed: No such file or directory
  ----------------------------------------
  ERROR: Failed building wheel for abess
  Running setup.py clean for abess
Failed to build abess
Installing collected packages: abess
    Running setup.py install for abess ... error
    ERROR: Command errored out with exit status 1:
     command: /Users/jiaqihu/opt/anaconda3/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/setup.py'"'"'; __file__='"'"'/private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-record-6eee5eyp/install-record.txt --single-version-externally-managed --compile --install-headers /Users/jiaqihu/opt/anaconda3/include/python3.9/abess
         cwd: /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/
    Complete output (19 lines):
    bash: /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/copy_src.sh: No such file or directory
    running install
    running build
    running build_py
    creating build
    creating build/lib.macosx-10.9-x86_64-3.9
    creating build/lib.macosx-10.9-x86_64-3.9/abess
    copying /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/abess/metrics.py -> build/lib.macosx-10.9-x86_64-3.9/abess
    copying /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/abess/linear.py -> build/lib.macosx-10.9-x86_64-3.9/abess
    copying /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/abess/cabess.py -> build/lib.macosx-10.9-x86_64-3.9/abess
    copying /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/abess/datasets.py -> build/lib.macosx-10.9-x86_64-3.9/abess
    copying /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/abess/__init__.py -> build/lib.macosx-10.9-x86_64-3.9/abess
    copying /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/abess/bess_base.py -> build/lib.macosx-10.9-x86_64-3.9/abess
    copying /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/abess/pca.py -> build/lib.macosx-10.9-x86_64-3.9/abess
    running build_ext
    building 'abess._cabess' extension
    swigging /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/src/pywrap.i to /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/src/pywrap_wrap.cpp
    swig -python -c++ -o /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/src/pywrap_wrap.cpp /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/src/pywrap.i
    error: command 'swig' failed: No such file or directory
    ----------------------------------------
ERROR: Command errored out with exit status 1: /Users/jiaqihu/opt/anaconda3/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/setup.py'"'"'; __file__='"'"'/private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-record-6eee5eyp/install-record.txt --single-version-externally-managed --compile --install-headers /Users/jiaqihu/opt/anaconda3/include/python3.9/abess Check the logs for full command output.

documentation

opened by JiaqiHu2021 4

Can't find cpp function
In my fork repo, the developing part in "bbayukari/abess/develop-ordinal" worked well before, but the new version which I want to PR to upstream in "bbayukari/abess/ordinal" didn't work(not only my part didn't work, but all models can't find the api). In the new version, I locate the cpp code at abess/src where isn't in R-package, that's the only difference between the two versions.

After clone the code in "bbayukari/abess/ordinal", then "install and restart", and test simply like:

dataset <- generate.data(150,100,3) abess(dataset[["x"]],dataset[["y"]])

then,

Error in abessGLM_API(x = x, y = y, n = nobs, p = nvars, normalize_type = normalize, : could not find function "abessGLM_API" Called from: abess.default(dataset[["x"]], dataset[["y"]])
bug
opened by bbayukari 4

Have problem in abessMultinomial

When I want to use abess in multi classfication,I use the abessMultinomial and run the example code

from abess.linear import abessMultinomial
from abess.datasets import make_multivariate_glm_data
import numpy as np
np.random.seed(12345)
data = make_multivariate_glm_data(n = 100, p = 50, k = 10, M = 3, family = 'multinomial')
model = abessMultinomial(support_size = [10])
model.fit(data.x, data.y)
model.predict(data.x)

And I get an int 47,but I think it should get a array of the class such as .

array([[1., 0., 0.],
       [0., 0., 1.],
       [1., 0., 0.],
       [1., 0., 0.],
       [0., 0., 1.],
       [0., 1., 0.],
       [0., 1., 0.],
       [1., 0., 0.],
       [0., 1., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.],
       [0., 1., 0.],

I'm so sorry to bother you again.And thanks a lot for yours help.

bug

opened by bored2020 4

Update something
delete unused parameters

pack specific parameters (add function: initial_setting & clear_setting)

pack restore_for_normal

restructure gs_path & gs support CV results output

rewrite screening (now use algorithmXXX.fit() and unlink with model_fit.h)

API update

simplify algorithm pointer (use algorithm_list_xxx uniformly)

move beta_size to Algorithm class
opened by oooo26 4
Unified GLM algorithm
Add _abessGLM in AlgorithmGLM.h as the base GLM class

Rewrite Logistic/Poisson/Gamma Regression on the basis of _abessGLM, which would be much simpler

Fix zero-weight sample bug #458

enhancement
opened by oooo26 3
Use same keyword arguments for sample weight as sklearn

First of all: Thank you for the great package, it has been very helpful. Now to my suggestion:

Abess uses weight https://abess.readthedocs.io/en/latest/Python-package/linear/Logistic.html?highlight=score#abess.linear.LogisticRegression.fit Sklearn uses sample_weight https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression.fit

I'm using both in a project and it would be helpful if abess followed the sklearn convention.
enhancement

opened by lindnemi 3
update R package Matrix API 'as()'

change methods::as(sparse.loading, "dgCMatrix") into as(as(as(sparse.loading, "dMatrix"), "generalMatrix"), "CsparseMatrix") in R-package/R/generate.spc.matrix.R

opened by bbayukari 3
Cross Validation Principle of Cox model

Hello, I 'm using the package to calculate some real survival data, I find the cross-validation can only choose deviance on the test cohort to determine the support size. could you guys add the c_index principle for cox model's cross validation?

opened by EQUIWDH 0
Some problems about algorithm for cox model when the dimension is ultra-high

Hello, I am doing some real data analysis about high-dimensional cox model. My real dataset's shape is like 240*7000, however, I try to use the abess.CoxPHSurvivalAnalysis() with cv and it can not choose any feature out. So, I must use screening before abess for Cox model. I also did simulation test for only screening method in abess package and found that the screening method can not contain all the real features spawn by make_glm_data. So, I doubt the algorithm of screening in this package, I hope you guys may adapt it, thank u!!!
help wanted

opened by EQUIWDH 2
Illegal instruction with conda-forge package
In the HPC environment (Intel, linux) even very simple script give Illegal Instruction errors. This suffices to generate the error

model = abess.LinearRegression() model.fit(np.random.rand(2,2), np.random.rand(2))

However, this occurs only with the latest abess binary from conda-forge. If I use pip to install the package, everything runs smoothly.
bug
opened by lindnemi 0

memory out when combining abess with auto-sklearn for classification.

Describe the bug I'm doing some experiments about combining abess with auto-sklearn, when using MultinomialRegression for classification, the memory tends to increase very quickly and so much that it cannot be displayed on a web page, but for LinearRegression, there is no similar out-of-memory problem.

Code for Reproduction

My code is given as follows:

from pprint import pprint

from ConfigSpace.configuration_space import ConfigurationSpace
from ConfigSpace.hyperparameters import CategoricalHyperparameter, \
    UniformIntegerHyperparameter, UniformFloatHyperparameter

import sklearn.metrics
import autosklearn.classification
import autosklearn.pipeline.components.classification
from autosklearn.pipeline.components.base \
    import AutoSklearnClassificationAlgorithm
from autosklearn.pipeline.constants import DENSE, SIGNED_DATA, UNSIGNED_DATA, \
    PREDICTIONS

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
import openml
from abess import MultinomialRegression
from sklearn.ensemble import RandomForestClassifier
import numpy as np
import pandas as pd
from sklearn.datasets import fetch_openml
from sklearn import preprocessing
from sklearn.tree import DecisionTreeClassifier

import time
import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings("ignore")

class AbessClassifier(AutoSklearnClassificationAlgorithm):

    def __init__(self, exchange_num, random_state=None):
        self.exchange_num = exchange_num
        self.random_state = random_state
        self.estimator = None

    def fit(self, X, y):
        from abess import MultinomialRegression
        self.estimator = MultinomialRegression()
        self.estimator.fit(X, y)
        return self

    def predict(self, X):
        if self.estimator is None:
            raise NotImplementedError
        return self.estimator.predict(X)
    
    def predict_proba(self, X):
        if self.estimator is None:
            raise NotImplementedError()
        return self.estimator.predict_proba(X)

    @staticmethod
    def get_properties(dataset_properties=None):
        return {
            'shortname': 'abess Classifier',
            'name': 'abess logistic Classifier',
            'handles_regression': False,
            'handles_classification': True,
            'handles_multiclass': True,
            'handles_multilabel': False,
            'handles_multioutput': False,
            'is_deterministic': False,
            # Both input and output must be tuple(iterable)
            'input': [DENSE, SIGNED_DATA, UNSIGNED_DATA],
            'output': [PREDICTIONS]
        }
    
    @staticmethod
    def get_hyperparameter_search_space(dataset_properties=None):
        cs = ConfigurationSpace() 
        exchange_num=UniformIntegerHyperparameter(
            name='exchange_num', lower=4, upper=6, default_value=5
        )
        cs.add_hyperparameters([exchange_num])
        return cs
    
# Add abess logistic classifier component to auto-sklearn.
autosklearn.pipeline.components.classification.add_classifier(AbessClassifier)
cs = AbessClassifier.get_hyperparameter_search_space()
print(cs)

dataset = fetch_openml(data_id = int(29),as_frame=True)#507,183,44136
X=dataset.data
y=dataset.target
X.replace([np.inf,-np.inf],np.NaN,inplace=True)
## Remove rows with NaN or Inf values
inx=X[X.isna().values==True].index.unique()
X.drop(inx,inplace=True)
y.drop(inx,inplace=True)
##use dummy variables to replace classification variables:
X = pd.get_dummies(X)
## Keep only numeric columns
X = X.select_dtypes(np.number)
## Remove columns with NaN or Inf values
nan = np.isnan(X).any()[np.isnan(X).any() == True]
inf = np.isinf(X).any()[np.isinf(X).any() == True]
X = X.drop(columns = list(nan.index))
X = X.drop(columns = list(inf.index))
##Encode target labels with value between 0 and 1
le = preprocessing.LabelEncoder()
y = le.fit_transform(y)
X_train, X_test, y_train, y_test = train_test_split(X, y)
print(X_train.shape) #number of initial features
print(X_test.shape) #number of initial features

cls = autosklearn.classification.AutoSklearnClassifier(
    time_left_for_this_task=60,
    per_run_time_limit=10,
    include={
            'classifier': ['AbessClassifier'],
            'feature_preprocessor': ['polynomial']
        },
    memory_limit=6144,
    ensemble_size=1,
)
cls.fit(X_train, y_train, X_test, y_test)
predictions = cls.predict(X_test)
print("Accuracy score", sklearn.metrics.accuracy_score(y_test, predictions))

After running this code , the memory gets to about 159MB, which is not friendly for users to open an .ipynb. Again, regression does not encounter the memory-out problem.

bug

opened by belzheng 0

Cross validation is slower in version 0.4.5 than in 0.4.0

Describe the bug

In my experiments, after updating abess from 0.4.0 to 0.4.5, I found the cv procedure get slower in some cases. The following code provides an example.

Code for Reproduction

library(microbenchmark)
library(abess)
n <- 3000
p <- 500
support.size <- 10

sim_once <- function(seed) {
  dataset <- generate.data(n, p, support.size, family = "binomial", seed = seed)

  time_cv <- microbenchmark(
    abess_fit <- abess(dataset[["x"]], dataset[["y"]], family = "binomial", tune.type = "cv", nfolds = 10),
    times = 1
  ) [["time"]] / 10^9

  time_cv
}

# average time
time <- sapply(1:5, sim_once)
mean(time)

invalid

opened by brtang63 1

Releases(0.4.5)

0.4.5(Mar 23, 2022)

Source code(tar.gz)
Source code(zip)
0.4.4(Mar 12, 2022)

Source code(tar.gz)
Source code(zip)
0.4.3(Mar 11, 2022)

Source code(tar.gz)
Source code(zip)
0.4.2(Mar 10, 2022)

Source code(tar.gz)
Source code(zip)
0.4.1(Mar 9, 2022)

Source code(tar.gz)
Source code(zip)
v0.4.0(Jan 24, 2022)

Source code: abess-0.4.0.tar.gz Wheel packages: abess-0.4.0-win_amd64_whl.zip
Source code(tar.gz)
Source code(zip)
0.3.6(Dec 22, 2021)

Source code(tar.gz)
Source code(zip)
0.3.5(Dec 22, 2021)

Source code(tar.gz)
Source code(zip)
0.3.4(Dec 22, 2021)

Source code(tar.gz)
Source code(zip)
0.3.0(Sep 5, 2021)
Add new features to speed up computation

Significantly improve project management, including documentation, and code coverage, etc

Source code(tar.gz)
Source code(zip)
abess-0.3.0-cp35-cp35m-win_amd64.whl(416.73 KB)
abess-0.3.0-cp36-cp36m-win_amd64.whl(416.79 KB)
abess-0.3.0-cp37-cp37m-win_amd64.whl(416.83 KB)
abess-0.3.0-cp38-cp38-win_amd64.whl(416.87 KB)
abess-0.3.0-cp39-cp39-win_amd64.whl(417.00 KB)
abess-0.3.0.tar.gz(1.39 MB)
abess_0.3.0.tar.gz(752.41 KB)
abess_0.3.0.tgz(11.20 MB)
abess_0.3.0.zip(1.13 MB)
0.2.2(Aug 7, 2021)

Python package 0.0.3 and R package 0.2.0
Source code(tar.gz)
Source code(zip)
0.1.0(Apr 22, 2021)

See ChangeLog in website
Source code(tar.gz)
Source code(zip)
abess_0.1.0.tar.gz(264.14 KB)

Owner

Team developing polynomial algorithms for best subset selection

GitHub Repository https://abess.readthedocs.io/

abess: Fast Best-Subset Selection in Python and R

Related tags

Overview

abess: Fast Best-Subset Selection in Python and R

Overview

Quick start

Python package

R package

Runtime Performance

Python package

R package

Open source software

Citation

References

Comments

Releases(0.4.5)

0.4.5(Mar 23, 2022)

0.4.4(Mar 12, 2022)

0.4.3(Mar 11, 2022)

0.4.2(Mar 10, 2022)

0.4.1(Mar 9, 2022)

v0.4.0(Jan 24, 2022)

0.3.6(Dec 22, 2021)

0.3.5(Dec 22, 2021)

0.3.4(Dec 22, 2021)

0.3.0(Sep 5, 2021)

0.2.2(Aug 7, 2021)

0.1.0(Apr 22, 2021)

Owner

Code for testing various M1 Chip benchmarks with TensorFlow.

A simple pytorch pipeline for semantic segmentation.

A Survey on Deep Learning Technique for Video Segmentation

Towards Interpretable Deep Metric Learning with Structural Matching

CAPRI: Context-Aware Interpretable Point-of-Interest Recommendation Framework

This is the repository for the paper "Have I done enough planning or should I plan more?"

object recognition with machine learning on Respberry pi

A short code in python, Enchpyter, is able to encrypt and decrypt words as you determine, of course

Pytorch codes for Feature Transfer Learning for Face Recognition with Under-Represented Data

[ACM MM 2021] Multiview Detection with Shadow Transformer (and View-Coherent Data Augmentation)

Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

This is a yolo3 implemented via tensorflow 2.7

Predicting a person's gender based on their weight and height

CONetV2: Efficient Auto-Channel Size Optimization for CNNs

A Deep Learning Framework for Neural Derivative Hedging

JittorVis - Visual understanding of deep learning models

Machine learning evaluation metrics, implemented in Python, R, Haskell, and MATLAB / Octave

Exact Pareto Optimal solutions for preference based Multi-Objective Optimization

Pytorch implementation of the paper DocEnTr: An End-to-End Document Image Enhancement Transformer.

The 3rd place solution for competition