abess: Fast Best-Subset Selection in Python and R

Overview

abess: Fast Best-Subset Selection in Python and R

Python Build R Build codecov docs cran pypi pyversions License Codacy Badge

Overview

abess (Adaptive BEst Subset Selection) library aims to solve general best subset selection, i.e., find a small subset of predictors such that the resulting model is expected to have the highest accuracy. The selection for best subset shows great value in scientific researches and practical applications. For example, clinicians want to know whether a patient is healthy or not based on the expression levels of a few of important genes.

This library implements a generic algorithm framework to find the optimal solution in an extremely fast way. This framework now supports the detection of best subset under: linear regression, classification (binary or multi-class), counting-response modeling, censored-response modeling, multi-response modeling (multi-tasks learning), etc. It also supports the variants of best subset selection like group best subset selection, nuisance penalized regression, Especially, the time complexity of (group) best subset selection for linear regression is certifiably polynomial.

Quick start

The abess software has both Python and R's interfaces. Here a quick start will be given and for more details, please view: Installation.

Python package

Install the stable version of Python-package from Pypi with:

$ pip install abess

Best subset selection for linear regression on a simulated dataset in Python:

from abess.linear import abessLm
from abess.datasets import make_glm_data
sim_dat = make_glm_data(n = 300, p = 1000, k = 10, family = "gaussian")
model = abessLm()
model.fit(sim_dat.x, sim_dat.y)

See more examples analyzed with Python in the Python tutorials.

R package

Install the stable version of R-package from CRAN with:

install.packages("abess")

Best subset selection for linear regression on a simulated dataset in R:

library(abess)
sim_dat <- generate.data(n = 300, p = 1000)
abess(x = sim_dat[["x"]], y = sim_dat[["y"]])

See more examples analyzed with R in the R tutorials.

Runtime Performance

To show the power of abess in computation, we assess its timings of the CPU execution (seconds) on synthetic datasets, and compare to state-of-the-art variable selection methods. The variable selection and estimation results are deferred to Python performance and R performance. All computations are conducted on a Ubuntu platform with Intel(R) Core(TM) i9-9940X CPU @ 3.30GHz and 48 RAM.

Python package

We compare abess Python package with scikit-learn on linear regression and logistic regression. Results are presented in the below figure:

It can be see that abess uses the least runtime to find the solution. This results can be reproduced by running the command in shell:

$ python abess/docs/simulation/Python/timings.py

R package

We compare abess R package with three widely used R packages: glmnet, ncvreg, and L0Learn. We get the runtime comparison results:

Compared with other packages, abess shows competitive computational efficiency, and achieves the best computational power when variables have a large correlation.

Conducting the following command in shell can reproduce the above results in R:

$ Rscript abess/docs/simulation/R/timings.R

Open source software

abess is a free software and its source code is publicly available on Github. The core framework is programmed in C++, and user-friendly R and Python interfaces are offered. You can redistribute it and/or modify it under the terms of the GPL-v3 License. We welcome contributions for abess, especially stretching abess to the other best subset selection problems.

Citation

If you use abess or reference our tutorials in a presentation or publication, we would appreciate citations of our library.

Jin Zhu, Liyuan Hu, Junhao Huang, Kangkang Jiang, Yanhang Zhang, Shiyun Lin, Junxian Zhu, Xueqin Wang (2021). “abess: A Fast Best Subset Selection Library in Python and R.” arXiv:2110.09697.

The corresponding BibteX entry:

@article{zhu-abess-arxiv,
  author    = {Jin Zhu and Liyuan Hu and Junhao Huang and Kangkang Jiang and Yanhang Zhang and Shiyun Lin and Junxian Zhu and Xueqin Wang},
  title     = {abess: A Fast Best Subset Selection Library in Python and R},
  journal   = {arXiv:2110.09697},
  year      = {2021},
}

References

  • Junxian Zhu, Canhong Wen, Jin Zhu, Heping Zhang, and Xueqin Wang (2020). A polynomial algorithm for best-subset selection problem. Proceedings of the National Academy of Sciences, 117(52):33117-33123.

  • Pölsterl, S (2020). scikit-survival: A Library for Time-to-Event Analysis Built on Top of scikit-learn. J. Mach. Learn. Res., 21(212), 1-6.

  • Yanhang Zhang, Junxian Zhu, Jin Zhu, and Xueqin Wang (2021). Certifiably Polynomial Algorithm for Best Group Subset Selection. arXiv preprint arXiv:2104.12576.

  • Qiang Sun and Heping Zhang (2020). Targeted Inference Involving High-Dimensional Data Using Nuisance Penalized Regression, Journal of the American Statistical Association, DOI: 10.1080/01621459.2020.1737079.

Comments
  • Import error

    Import error

    When I import abess in each python version I try,such as ( 3.5 3.6 3.7 ) all report the error as follow'

    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "C:\ProgramData\Anaconda3\envs\cp3710\lib\site-packages\abess\__init__.py", line 9, in <module>
        from abess.linear import abessLogistic, abessLm, abessCox, abessPoisson, abessMultigaussian, abessMultinomial
      File "C:\ProgramData\Anaconda3\envs\cp3710\lib\site-packages\abess\linear.py", line 3, in <module>
        from .bess_base import bess_base
      File "C:\ProgramData\Anaconda3\envs\cp3710\lib\site-packages\abess\bess_base.py", line 7, in <module>
        from abess.cabess import pywrap_abess
      File "C:\ProgramData\Anaconda3\envs\cp3710\lib\site-packages\abess\cabess.py", line 13, in <module>
        from . import _cabess
    ImportError: DLL load failed: 找不到指定的模块。
    

    I try all I can find in the web,but not works, cloud anyone can help me .

    opened by bored2020 12
  • Problem of computing GIC

    Problem of computing GIC

    I want to compute GIC to select the true model. But I gain different results from the abess packages and manual calculation.

       set.seed(2)
        p = 250
        N = 2500
        X = matrix(rnorm(N * p), ncol = p)
        A = sort(sample(p, 10))
        beta = rep(0, p)
        beta = replace(beta, A, rnorm(10, mean = 6))
        xbeta <- X %*% beta
        Y <- xbeta + rnorm(N)
    
    

    Compute the estimator by abess packages.

    
        C = abess(X, Y, family = "gaussian", tune.path="sequence",tune.type = "gic")
        k = C$best.size
        mid=coef(abess(X, Y, family = "gaussian",support.size =k))
        Central =mid[2:(p+1)]
        intercept=mid[1]
        #compute GIC[10]=131.3686
        GIC= N*log(1/(2*N)*t(Y-X%*%Central-intercept)%*%(Y-X%*%Central-intercept))+k*log(p)*(log(log(N)))
        #GIC=-1601.499
         
    
    good first issue invalid 
    opened by yannstory 9
  • zsh: illegal hardware instruction  python (mac m1 silicon)

    zsh: illegal hardware instruction python (mac m1 silicon)

    I run the following code in the terminal, I get the error "zsh: illegal hardware instruction python"

    Python 3.9.7 (default, Sep 16 2021, 08:50:36) 
    [Clang 10.0.0 ] :: Anaconda, Inc. on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from abess.linear import abessLogistic
    >>> from sklearn.datasets import load_breast_cancer
    >>> from sklearn.pipeline import Pipeline
    >>> from sklearn.metrics import make_scorer, roc_auc_score
    >>> from sklearn.preprocessing import PolynomialFeatures
    >>> from sklearn.model_selection import GridSearchCV
    >>> pipe = Pipeline([('poly', PolynomialFeatures(include_bias=False)), ('alogistic', abessLogistic())])
    >>> param_grid = {'poly__interaction_only': [True, False],'poly__degree': [1, 2, 3]}
    >>> scorer = make_scorer(roc_auc_score, greater_is_better=True)
    >>> grid_search = GridSearchCV(pipe, param_grid, scoring=scorer, cv=5)
    >>> X, y = load_breast_cancer(return_X_y=True)
    >>> grid_search.fit(X, y)
    
    documentation 
    opened by JiaqiHu2021 6
  • cross validation in R package

    cross validation in R package

    Describe the bug The cross validation result is not the same as the result written in R.

    The code to reproduce

    library(abess)
    n <- 100
    p <- 200
    support.size <- 3
    dataset <- generate.data(n, p, support.size, seed = 1)
    ss <- 0:10
    
    nfolds <- 5
    foldid <- rep(1:nfolds, ceiling(n / nfolds))[1:n]
    abess_fit <- abess(dataset[["x"]], dataset[["y"]], 
                       tune.type = "cv", nfolds = nfolds, 
                       foldid = foldid, support.size = ss, num.threads = 1)
    
    cv <- rep(0, length(ss))
    for (k in 1:nfolds) {
      abess_fit_k <- abess(dataset[["x"]][foldid != k, ], 
                           dataset[["y"]][foldid != k], support.size = ss)
      y_hat_k <- predict(abess_fit_k, dataset[["x"]][foldid == k, ], 
                         support.size = ss)
      fold_cv <- apply(y_hat_k, 2, function(yh) {
        mean((dataset[["y"]][foldid == k] - yh)^2)
      })
      fold_cv <- round(fold_cv, digits = 2)
      print(fold_cv)
      cv <- cv + fold_cv
    }
    cv <- cv / nfolds
    names(cv) <- NULL
    all.equal(cv, abess_fit$tune.value, digits = 2)
    

    Expected behavior The output of all.equal(cv, abess_fit$tune.value, digits = 2) is TRUE. However, the output is "Mean relative difference: 0.0008444762".

    System info

    platform       x86_64-apple-darwin17.0     
    arch           x86_64                      
    os             darwin17.0                  
    system         x86_64, darwin17.0          
    status                                     
    major          4                           
    minor          1.0                         
    year           2021                        
    month          05                          
    day            18                          
    svn rev        80317                       
    language       R                           
    version.string R version 4.1.0 (2021-05-18)
    nickname       Camp Pontanezen             
    
    wontfix 
    opened by Mamba413 5
  • Could not install python package on m1 pro silicon

    Could not install python package on m1 pro silicon

    I run pip install abess in the terminal

    Last login: Tue Jan 18 20:32:17 on ttys000 (base) [email protected] ~ % pip install abess

    Collecting abess
      Using cached abess-0.3.6.tar.gz (1.5 MB)
    Requirement already satisfied: numpy in ./opt/anaconda3/lib/python3.9/site-packages (from abess) (1.20.3)
    Requirement already satisfied: scipy in ./opt/anaconda3/lib/python3.9/site-packages (from abess) (1.7.1)
    Requirement already satisfied: scikit-learn>=0.24 in ./opt/anaconda3/lib/python3.9/site-packages (from abess) (0.24.2)
    Requirement already satisfied: joblib>=0.11 in ./opt/anaconda3/lib/python3.9/site-packages (from scikit-learn>=0.24->abess) (1.1.0)
    Requirement already satisfied: threadpoolctl>=2.0.0 in ./opt/anaconda3/lib/python3.9/site-packages (from scikit-learn>=0.24->abess) (2.2.0)
    Building wheels for collected packages: abess
      Building wheel for abess (setup.py) ... error
      ERROR: Command errored out with exit status 1:
       command: /Users/jiaqihu/opt/anaconda3/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/setup.py'"'"'; __file__='"'"'/private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-wheel-j5ai4o5i
           cwd: /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/
      Complete output (19 lines):
      bash: /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/copy_src.sh: No such file or directory
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.macosx-10.9-x86_64-3.9
      creating build/lib.macosx-10.9-x86_64-3.9/abess
      copying /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/abess/metrics.py -> build/lib.macosx-10.9-x86_64-3.9/abess
      copying /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/abess/linear.py -> build/lib.macosx-10.9-x86_64-3.9/abess
      copying /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/abess/cabess.py -> build/lib.macosx-10.9-x86_64-3.9/abess
      copying /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/abess/datasets.py -> build/lib.macosx-10.9-x86_64-3.9/abess
      copying /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/abess/__init__.py -> build/lib.macosx-10.9-x86_64-3.9/abess
      copying /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/abess/bess_base.py -> build/lib.macosx-10.9-x86_64-3.9/abess
      copying /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/abess/pca.py -> build/lib.macosx-10.9-x86_64-3.9/abess
      running build_ext
      building 'abess._cabess' extension
      swigging /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/src/pywrap.i to /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/src/pywrap_wrap.cpp
      swig -python -c++ -o /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/src/pywrap_wrap.cpp /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/src/pywrap.i
      error: command 'swig' failed: No such file or directory
      ----------------------------------------
      ERROR: Failed building wheel for abess
      Running setup.py clean for abess
    Failed to build abess
    Installing collected packages: abess
        Running setup.py install for abess ... error
        ERROR: Command errored out with exit status 1:
         command: /Users/jiaqihu/opt/anaconda3/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/setup.py'"'"'; __file__='"'"'/private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-record-6eee5eyp/install-record.txt --single-version-externally-managed --compile --install-headers /Users/jiaqihu/opt/anaconda3/include/python3.9/abess
             cwd: /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/
        Complete output (19 lines):
        bash: /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/copy_src.sh: No such file or directory
        running install
        running build
        running build_py
        creating build
        creating build/lib.macosx-10.9-x86_64-3.9
        creating build/lib.macosx-10.9-x86_64-3.9/abess
        copying /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/abess/metrics.py -> build/lib.macosx-10.9-x86_64-3.9/abess
        copying /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/abess/linear.py -> build/lib.macosx-10.9-x86_64-3.9/abess
        copying /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/abess/cabess.py -> build/lib.macosx-10.9-x86_64-3.9/abess
        copying /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/abess/datasets.py -> build/lib.macosx-10.9-x86_64-3.9/abess
        copying /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/abess/__init__.py -> build/lib.macosx-10.9-x86_64-3.9/abess
        copying /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/abess/bess_base.py -> build/lib.macosx-10.9-x86_64-3.9/abess
        copying /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/abess/pca.py -> build/lib.macosx-10.9-x86_64-3.9/abess
        running build_ext
        building 'abess._cabess' extension
        swigging /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/src/pywrap.i to /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/src/pywrap_wrap.cpp
        swig -python -c++ -o /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/src/pywrap_wrap.cpp /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/src/pywrap.i
        error: command 'swig' failed: No such file or directory
        ----------------------------------------
    ERROR: Command errored out with exit status 1: /Users/jiaqihu/opt/anaconda3/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/setup.py'"'"'; __file__='"'"'/private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-record-6eee5eyp/install-record.txt --single-version-externally-managed --compile --install-headers /Users/jiaqihu/opt/anaconda3/include/python3.9/abess Check the logs for full command output.
    
    documentation 
    opened by JiaqiHu2021 4
  • Can't find cpp function

    Can't find cpp function

    In my fork repo, the developing part in "bbayukari/abess/develop-ordinal" worked well before, but the new version which I want to PR to upstream in "bbayukari/abess/ordinal" didn't work(not only my part didn't work, but all models can't find the api). In the new version, I locate the cpp code at abess/src where isn't in R-package, that's the only difference between the two versions.

    After clone the code in "bbayukari/abess/ordinal", then "install and restart", and test simply like:

    dataset <- generate.data(150,100,3)
    abess(dataset[["x"]],dataset[["y"]])
    

    then,

    Error in abessGLM_API(x = x, y = y, n = nobs, p = nvars, normalize_type = normalize,  : 
      could not find function "abessGLM_API"
    Called from: abess.default(dataset[["x"]], dataset[["y"]])
    
    bug 
    opened by bbayukari 4
  • Have problem in abessMultinomial

    Have problem in abessMultinomial

    When I want to use abess in multi classfication,I use the abessMultinomial and run the example code

    from abess.linear import abessMultinomial
    from abess.datasets import make_multivariate_glm_data
    import numpy as np
    np.random.seed(12345)
    data = make_multivariate_glm_data(n = 100, p = 50, k = 10, M = 3, family = 'multinomial')
    model = abessMultinomial(support_size = [10])
    model.fit(data.x, data.y)
    model.predict(data.x)
    
    

    And I get an int 47,but I think it should get a array of the class such as .

    array([[1., 0., 0.],
           [0., 0., 1.],
           [1., 0., 0.],
           [1., 0., 0.],
           [0., 0., 1.],
           [0., 1., 0.],
           [0., 1., 0.],
           [1., 0., 0.],
           [0., 1., 0.],
           [1., 0., 0.],
           [1., 0., 0.],
           [0., 1., 0.],
           [0., 0., 1.],
           [0., 1., 0.],
    

    I'm so sorry to bother you again.And thanks a lot for yours help.

    bug 
    opened by bored2020 4
  • Update something

    Update something

    • delete unused parameters
    • pack specific parameters (add function: initial_setting & clear_setting)
    • pack restore_for_normal
    • restructure gs_path & gs support CV results output
    • rewrite screening (now use algorithmXXX.fit() and unlink with model_fit.h)
    • API update
    • simplify algorithm pointer (use algorithm_list_xxx uniformly)
    • move beta_size to Algorithm class
    opened by oooo26 4
  • Unified GLM algorithm

    Unified GLM algorithm

    • Add _abessGLM in AlgorithmGLM.h as the base GLM class
    • Rewrite Logistic/Poisson/Gamma Regression on the basis of _abessGLM, which would be much simpler
    • Fix zero-weight sample bug #458
    enhancement 
    opened by oooo26 3
  • Use same keyword arguments for sample weight as sklearn

    Use same keyword arguments for sample weight as sklearn

    First of all: Thank you for the great package, it has been very helpful. Now to my suggestion:

    Abess uses weight https://abess.readthedocs.io/en/latest/Python-package/linear/Logistic.html?highlight=score#abess.linear.LogisticRegression.fit Sklearn uses sample_weight https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression.fit

    I'm using both in a project and it would be helpful if abess followed the sklearn convention.

    enhancement 
    opened by lindnemi 3
  • update R package Matrix API 'as()'

    update R package Matrix API 'as()'

    change methods::as(sparse.loading, "dgCMatrix") into as(as(as(sparse.loading, "dMatrix"), "generalMatrix"), "CsparseMatrix") in R-package/R/generate.spc.matrix.R

    opened by bbayukari 3
  • Cross Validation Principle of Cox model

    Cross Validation Principle of Cox model

    Hello, I 'm using the package to calculate some real survival data, I find the cross-validation can only choose deviance on the test cohort to determine the support size. could you guys add the c_index principle for cox model's cross validation?

    opened by EQUIWDH 0
  • Some problems about algorithm for cox model when the dimension is ultra-high

    Some problems about algorithm for cox model when the dimension is ultra-high

    Hello, I am doing some real data analysis about high-dimensional cox model. My real dataset's shape is like 240*7000, however, I try to use the abess.CoxPHSurvivalAnalysis() with cv and it can not choose any feature out. So, I must use screening before abess for Cox model. I also did simulation test for only screening method in abess package and found that the screening method can not contain all the real features spawn by make_glm_data. So, I doubt the algorithm of screening in this package, I hope you guys may adapt it, thank u!!!

    help wanted 
    opened by EQUIWDH 2
  • Illegal instruction with conda-forge package

    Illegal instruction with conda-forge package

    In the HPC environment (Intel, linux) even very simple script give Illegal Instruction errors. This suffices to generate the error

    model = abess.LinearRegression()
    model.fit(np.random.rand(2,2), np.random.rand(2))
    

    However, this occurs only with the latest abess binary from conda-forge. If I use pip to install the package, everything runs smoothly.

    bug 
    opened by lindnemi 0
  • memory out when combining abess with auto-sklearn for classification.

    memory out when combining abess with auto-sklearn for classification.

    Describe the bug I'm doing some experiments about combining abess with auto-sklearn, when using MultinomialRegression for classification, the memory tends to increase very quickly and so much that it cannot be displayed on a web page, but for LinearRegression, there is no similar out-of-memory problem.

    Code for Reproduction

    My code is given as follows:

    from pprint import pprint
    
    from ConfigSpace.configuration_space import ConfigurationSpace
    from ConfigSpace.hyperparameters import CategoricalHyperparameter, \
        UniformIntegerHyperparameter, UniformFloatHyperparameter
    
    import sklearn.metrics
    import autosklearn.classification
    import autosklearn.pipeline.components.classification
    from autosklearn.pipeline.components.base \
        import AutoSklearnClassificationAlgorithm
    from autosklearn.pipeline.constants import DENSE, SIGNED_DATA, UNSIGNED_DATA, \
        PREDICTIONS
    
    from sklearn.datasets import load_breast_cancer
    from sklearn.model_selection import train_test_split
    import openml
    from abess import MultinomialRegression
    from sklearn.ensemble import RandomForestClassifier
    import numpy as np
    import pandas as pd
    from sklearn.datasets import fetch_openml
    from sklearn import preprocessing
    from sklearn.tree import DecisionTreeClassifier
    
    import time
    import matplotlib.pyplot as plt
    
    import warnings
    warnings.filterwarnings("ignore")
    
    class AbessClassifier(AutoSklearnClassificationAlgorithm):
    
        def __init__(self, exchange_num, random_state=None):
            self.exchange_num = exchange_num
            self.random_state = random_state
            self.estimator = None
    
        def fit(self, X, y):
            from abess import MultinomialRegression
            self.estimator = MultinomialRegression()
            self.estimator.fit(X, y)
            return self
    
        def predict(self, X):
            if self.estimator is None:
                raise NotImplementedError
            return self.estimator.predict(X)
        
        def predict_proba(self, X):
            if self.estimator is None:
                raise NotImplementedError()
            return self.estimator.predict_proba(X)
    
        @staticmethod
        def get_properties(dataset_properties=None):
            return {
                'shortname': 'abess Classifier',
                'name': 'abess logistic Classifier',
                'handles_regression': False,
                'handles_classification': True,
                'handles_multiclass': True,
                'handles_multilabel': False,
                'handles_multioutput': False,
                'is_deterministic': False,
                # Both input and output must be tuple(iterable)
                'input': [DENSE, SIGNED_DATA, UNSIGNED_DATA],
                'output': [PREDICTIONS]
            }
        
        @staticmethod
        def get_hyperparameter_search_space(dataset_properties=None):
            cs = ConfigurationSpace() 
            exchange_num=UniformIntegerHyperparameter(
                name='exchange_num', lower=4, upper=6, default_value=5
            )
            cs.add_hyperparameters([exchange_num])
            return cs
        
    # Add abess logistic classifier component to auto-sklearn.
    autosklearn.pipeline.components.classification.add_classifier(AbessClassifier)
    cs = AbessClassifier.get_hyperparameter_search_space()
    print(cs)
    
    dataset = fetch_openml(data_id = int(29),as_frame=True)#507,183,44136
    X=dataset.data
    y=dataset.target
    X.replace([np.inf,-np.inf],np.NaN,inplace=True)
    ## Remove rows with NaN or Inf values
    inx=X[X.isna().values==True].index.unique()
    X.drop(inx,inplace=True)
    y.drop(inx,inplace=True)
    ##use dummy variables to replace classification variables:
    X = pd.get_dummies(X)
    ## Keep only numeric columns
    X = X.select_dtypes(np.number)
    ## Remove columns with NaN or Inf values
    nan = np.isnan(X).any()[np.isnan(X).any() == True]
    inf = np.isinf(X).any()[np.isinf(X).any() == True]
    X = X.drop(columns = list(nan.index))
    X = X.drop(columns = list(inf.index))
    ##Encode target labels with value between 0 and 1
    le = preprocessing.LabelEncoder()
    y = le.fit_transform(y)
    X_train, X_test, y_train, y_test = train_test_split(X, y)
    print(X_train.shape) #number of initial features
    print(X_test.shape) #number of initial features
    
    cls = autosklearn.classification.AutoSklearnClassifier(
        time_left_for_this_task=60,
        per_run_time_limit=10,
        include={
                'classifier': ['AbessClassifier'],
                'feature_preprocessor': ['polynomial']
            },
        memory_limit=6144,
        ensemble_size=1,
    )
    cls.fit(X_train, y_train, X_test, y_test)
    predictions = cls.predict(X_test)
    print("Accuracy score", sklearn.metrics.accuracy_score(y_test, predictions))
    

    After running this code , the memory gets to about 159MB, which is not friendly for users to open an .ipynb. Again, regression does not encounter the memory-out problem.

    bug 
    opened by belzheng 0
  • Cross validation is slower in version 0.4.5 than in 0.4.0

    Cross validation is slower in version 0.4.5 than in 0.4.0

    Describe the bug

    In my experiments, after updating abess from 0.4.0 to 0.4.5, I found the cv procedure get slower in some cases. The following code provides an example.

    Code for Reproduction

    library(microbenchmark)
    library(abess)
    n <- 3000
    p <- 500
    support.size <- 10
    
    sim_once <- function(seed) {
      dataset <- generate.data(n, p, support.size, family = "binomial", seed = seed)
    
      time_cv <- microbenchmark(
        abess_fit <- abess(dataset[["x"]], dataset[["y"]], family = "binomial", tune.type = "cv", nfolds = 10),
        times = 1
      ) [["time"]] / 10^9
    
      time_cv
    }
    
    # average time
    time <- sapply(1:5, sim_once)
    mean(time)
    

    image

    invalid 
    opened by brtang63 1
Owner
Team developing polynomial algorithms for best subset selection
“英特尔创新大师杯”深度学习挑战赛 赛道3:CCKS2021中文NLP地址相关性任务

ccks2021-track3 CCKS2021中文NLP地址相关性任务-赛道三-冠军方案 团队:我的加菲鱼- wodejiafeiyu 初赛第二/复赛第一/决赛第一 前言 19年开始,陆陆续续参加了一些比赛,拿到过一些top,比较懒一直都没分享过,这次比较幸运又拿了top1,打算分享下 分类的任务

shaochenjie 131 Dec 31, 2022
Repository for code and dataset for our EMNLP 2021 paper - “So You Think You’re Funny?”: Rating the Humour Quotient in Standup Comedy.

AI-OpenMic Dataset The dataset is available for download via the follwing link. Repository for code and dataset for our EMNLP 2021 paper - “So You Thi

6 Oct 26, 2022
In-place Parallel Super Scalar Samplesort (IPS⁴o)

In-place Parallel Super Scalar Samplesort (IPS⁴o) This is the implementation of the algorithm IPS⁴o presented in the paper Engineering In-place (Share

82 Dec 22, 2022
An official source code for "Augmentation-Free Self-Supervised Learning on Graphs"

Augmentation-Free Self-Supervised Learning on Graphs An official source code for Augmentation-Free Self-Supervised Learning on Graphs paper, accepted

Namkyeong Lee 59 Dec 01, 2022
Scripts and outputs related to the paper Prediction of Adverse Biological Effects of Chemicals Using Knowledge Graph Embeddings.

Knowledge Graph Embeddings and Chemical Effect Prediction, 2020. Scripts and outputs related to the paper Prediction of Adverse Biological Effects of

Knowledge Graphs at the Norwegian Institute for Water Research 1 Nov 01, 2021
DrWhy is the collection of tools for eXplainable AI (XAI). It's based on shared principles and simple grammar for exploration, explanation and visualisation of predictive models.

Responsible Machine Learning With Great Power Comes Great Responsibility. Voltaire (well, maybe) How to develop machine learning models in a responsib

Model Oriented 590 Dec 26, 2022
Azua - build AI algorithms to aid efficient decision-making with minimum data requirements.

Project Azua 0. Overview Many modern AI algorithms are known to be data-hungry, whereas human decision-making is much more efficient. The human can re

Microsoft 197 Jan 06, 2023
Unofficial pytorch-lightning implement of Mip-NeRF

mipnerf_pl Unofficial pytorch-lightning implement of Mip-NeRF, Here are some results generated by this repository (pre-trained models are provided bel

Jianxin Huang 159 Dec 23, 2022
PyTorch implementation for 3D human pose estimation

Towards 3D Human Pose Estimation in the Wild: a Weakly-supervised Approach This repository is the PyTorch implementation for the network presented in:

Xingyi Zhou 579 Dec 22, 2022
Learning Facial Representations from the Cycle-consistency of Face (ICCV 2021)

Learning Facial Representations from the Cycle-consistency of Face (ICCV 2021) This repository contains the code for our ICCV2021 paper by Jia-Ren Cha

Jia-Ren Chang 40 Dec 27, 2022
Self-Adaptable Point Processes with Nonparametric Time Decays

NPPDecay This is our implementation for the paper Self-Adaptable Point Processes with Nonparametric Time Decays, by Zhimeng Pan, Zheng Wang, Jeff M. P

zpan 2 Sep 24, 2022
GEP (GDB Enhanced Prompt) - a GDB plug-in for GDB command prompt with fzf history search, fish-like autosuggestions, auto-completion with floating window, partial string matching in history, and more!

GEP (GDB Enhanced Prompt) GEP (GDB Enhanced Prompt) is a GDB plug-in which make your GDB command prompt more convenient and flexibility. Why I need th

Alan Li 23 Dec 21, 2022
Scripts and misc. stuff related to the PortSwigger Web Academy

PortSwigger Web Academy Notes Mostly scripts to automate the exploits. Going in the order of the recomended learning path - starting with SQLi. Commun

pageinsec 17 Dec 30, 2022
Code for "Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo"

Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo This repository includes the source code for our CVPR 2021 paper on multi-view mult

Jiahao Lin 66 Jan 04, 2023
Keyhole Imaging: Non-Line-of-Sight Imaging and Tracking of Moving Objects Along a Single Optical Path

Keyhole Imaging Code & Dataset Code associated with the paper "Keyhole Imaging: Non-Line-of-Sight Imaging and Tracking of Moving Objects Along a Singl

Stanford Computational Imaging Lab 20 Feb 03, 2022
PyTorch implementation of ICLR 2022 paper PiCO: Contrastive Label Disambiguation for Partial Label Learning

PiCO: Contrastive Label Disambiguation for Partial Label Learning This is a PyTorch implementation of ICLR 2022 paper PiCO: Contrastive Label Disambig

王皓波 147 Jan 07, 2023
A high-performance distributed deep learning system targeting large-scale and automated distributed training.

HETU Documentation | Examples Hetu is a high-performance distributed deep learning system targeting trillions of parameters DL model training, develop

DAIR Lab 150 Dec 21, 2022
The Environment I built to study Reinforcement Learning + Pokemon Showdown

pokemon-showdown-rl-environment The Environment I built to study Reinforcement Learning + Pokemon Showdown Been a while since I ran this. Think it is

3 Jan 16, 2022
Semantic Segmentation Suite in TensorFlow

Semantic Segmentation Suite in TensorFlow. Implement, train, and test new Semantic Segmentation models easily!

George Seif 2.5k Jan 06, 2023
General Virtual Sketching Framework for Vector Line Art (SIGGRAPH 2021)

General Virtual Sketching Framework for Vector Line Art - SIGGRAPH 2021 Paper | Project Page Outline Dependencies Testing with Trained Weights Trainin

Haoran MO 118 Dec 27, 2022