scikit-survival is a Python module for survival analysis built on top of scikit-learn.

Overview

License readthedocs.org Digital Object Identifier (DOI)

Linux Build Status macOS Build Status Windows Build Status on AppVeyor codecov Codacy Badge

scikit-survival

scikit-survival is a Python module for survival analysis built on top of scikit-learn. It allows doing survival analysis while utilizing the power of scikit-learn, e.g., for pre-processing or doing cross-validation.

About Survival Analysis

The objective in survival analysis (also referred to as time-to-event or reliability analysis) is to establish a connection between covariates and the time of an event. What makes survival analysis differ from traditional machine learning is the fact that parts of the training data can only be partially observed – they are censored.

For instance, in a clinical study, patients are often monitored for a particular time period, and events occurring in this particular period are recorded. If a patient experiences an event, the exact time of the event can be recorded – the patient’s record is uncensored. In contrast, right censored records refer to patients that remained event-free during the study period and it is unknown whether an event has or has not occurred after the study ended. Consequently, survival analysis demands for models that take this unique characteristic of such a dataset into account.

Requirements

  • Python 3.7 or later
  • ecos
  • joblib
  • numexpr
  • numpy 1.16 or later
  • osqp
  • pandas 0.25 or later
  • scikit-learn 0.24
  • scipy 1.0 or later
  • C/C++ compiler

Installation

The easiest way to install scikit-survival is to use Anaconda by running:

conda install -c sebp scikit-survival

Alternatively, you can install scikit-survival from source following this guide.

Examples

The user guide provides in-depth information on the key concepts of scikit-survival, an overview of available survival models, and hands-on examples in the form of Jupyter notebooks.

Help and Support

Documentation

Bug reports

  • If you encountered a problem, please submit a bug report.

Questions

  • If you have a question on how to use scikit-survival, please use GitHub Discussions.
  • For general theoretical or methodological questions on survival analysis, please use Cross Validated.

Contributing

New contributors are always welcome. Please have a look at the contributing guidelines on how to get started and to make sure your code complies with our guidelines.

References

Please cite the following paper if you are using scikit-survival.

S. Pölsterl, "scikit-survival: A Library for Time-to-Event Analysis Built on Top of scikit-learn," Journal of Machine Learning Research, vol. 21, no. 212, pp. 1–6, 2020.
@article{sksurv,
  author  = {Sebastian P{\"o}lsterl},
  title   = {scikit-survival: A Library for Time-to-Event Analysis Built on Top of scikit-learn},
  journal = {Journal of Machine Learning Research},
  year    = {2020},
  volume  = {21},
  number  = {212},
  pages   = {1-6},
  url     = {http://jmlr.org/papers/v21/20-729.html}
}
Comments
  • CoxPH SurvivalAnalysis and Singular Matrix Error

    CoxPH SurvivalAnalysis and Singular Matrix Error

    I'm going through the tutorial using the veterans lung cancer study and I am using the same code for my own dataset for Cox regression. My problem is to calculating the days to graft failure after a transplant and the dataset has about 900 features after encoding and other preprocessing steps and it has 130K rows. I prepared data for Cox regression (data_x is a dataframe and data_y is a numpy array of status and suvival_in_days) and took a sample of it to run. However when I run the CoxRegression, I am getting the error of: LinAlgError:Matrix is Singular I manipulated my data in different ways, but I could not understand what is the problem and how to solve it.

    awaiting response 
    opened by sarahysh12 22
  • Explain how to interpret output of .predict() in API doc

    Explain how to interpret output of .predict() in API doc

    (I also posted this as a question on Stack Overflow: https://stackoverflow.com/q/47274356/1870832 )

    I'm confused how to interpret the output of .predict from a fitted CoxnetSurvivalAnalysis model in scikit-survival. I've read through the notebook Intro to Survival Analysis in scikit-survival and the API reference, but can't find an explanation. Below is a minimal example of what leads to my confusion:

    import pandas as pd
    from sksurv.datasets import load_veterans_lung_cancer
    from sksurv.linear_model import CoxnetSurvivalAnalysis
    
    # load data
    data_X, data_y = load_veterans_lung_cancer()
    
    # one-hot-encode categorical columns in X
    categorical_cols = ['Celltype', 'Prior_therapy', 'Treatment']
    
    X = data_X.copy()
    for c in categorical_cols:
        dummy_matrix = pd.get_dummies(X[c], prefix=c, drop_first=False)
        X = pd.concat([X, dummy_matrix], axis=1).drop(c, axis=1)
    
    # display final X to fit Cox Elastic Net model on
    del data_X
    print(X.head(3))
    
    

    so here's the X going into the model:

       Age_in_years  Celltype  Karnofsky_score  Months_from_Diagnosis  \
    0          69.0  squamous             60.0                    7.0   
    1          64.0  squamous             70.0                    5.0   
    2          38.0  squamous             60.0                    3.0   
    
      Prior_therapy Treatment  
    0            no  standard  
    1           yes  standard  
    2            no  standard  
    
    

    ...moving on to fitting model and generating predictions:

    # Fit Model
    coxnet_model = CoxnetSurvivalAnalysis()
    coxnet.fit(X, data_y)    
    
    # What are these predictions?    
    preds = coxnet.predict(X)
    
    

    preds has same number of records as X, but their values are wayyy different than the values in data_y, even when predicted on the same data they were fit on.

    print(preds.mean()) 
    print(data_y['Survival_in_days'].mean())
    

    output:

    -0.044114643249153422
    121.62773722627738
    
    

    So what exactly are preds? Clearly .predict means something pretty different here than in scikit-learn, but I can't figure out what. The API Reference says it returns "The predicted decision function," but what does that mean? And how do I get to the predicted estimate in months yhat for a given X? I'm new to survival analysis so I'm obviously missing something.

    opened by MaxPowerWasTaken 21
  • During install: error: command '/usr/bin/clang' failed with exit code 1

    During install: error: command '/usr/bin/clang' failed with exit code 1

    Python version: Python 3.10.3

    OS: OSX 12.4 (Proc: M1 chip)

    When trying to pip install (tried versions 0.17 and 0.18):

          222 warnings and 4 errors generated.
          error: command '/usr/bin/clang' failed with exit code 1
          [end of output]
    

    The errors seem to be:

          In file included from sksurv/linear_model/_coxnet.cpp:801:
          In file included from sksurv/linear_model/src/coxnet_wrapper.h:21:
          sksurv/linear_model/src/coxnet/coxnet.h:139:23: error: expected unqualified-id
                      if (!std::isfinite(exp_xw[k])) {
                                ^
    

          In file included from sksurv/linear_model/src/coxnet/coxnet.h:18:
          In file included from sksurv/linear_model/src/eigen/Eigen/Core:374:
          sksurv/linear_model/src/eigen/Eigen/src/Core/MathFunctions.h:753:12: error: reference to unresolved using declaration
              return isnan EIGEN_NOT_A_MACRO (x);
                     ^
    

          In file included from sksurv/linear_model/src/coxnet/coxnet.h:18:
          In file included from sksurv/linear_model/src/eigen/Eigen/Core:374:
          sksurv/linear_model/src/eigen/Eigen/src/Core/MathFunctions.h:738:12: error: reference to unresolved using declaration
              return isinf EIGEN_NOT_A_MACRO (x);
                     ^
    

          In file included from sksurv/linear_model/src/coxnet/coxnet.h:18:
          In file included from sksurv/linear_model/src/eigen/Eigen/Core:374:
          sksurv/linear_model/src/eigen/Eigen/src/Core/MathFunctions.h:723:12: error: reference to unresolved using declaration
              return isfinite EIGEN_NOT_A_MACRO (x);
                     ^
    

    Happy to provide more details if needed

    opened by tpilewicz 13
  • 0.12.0: from sksurv.ensemble import RandomSurvivalForest fails

    0.12.0: from sksurv.ensemble import RandomSurvivalForest fails

    Upon upgrading to 0.12.0

    >>> from sksurv.ensemble import RandomSurvivalForest
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/Users/gchu/miniconda3/envs/dev/lib/python3.6/site-packages/sksurv/ensemble/__init__.py", line 2, in <module>
        from .forest import RandomSurvivalForest  # noqa: F401
      File "/Users/gchu/miniconda3/envs/dev/lib/python3.6/site-packages/sksurv/ensemble/forest.py", line 14, in <module>
        from ..tree import SurvivalTree
      File "/Users/gchu/miniconda3/envs/dev/lib/python3.6/site-packages/sksurv/tree/__init__.py", line 1, in <module>
        from .tree import SurvivalTree  # noqa: F401
      File "/Users/gchu/miniconda3/envs/dev/lib/python3.6/site-packages/sksurv/tree/tree.py", line 14, in <module>
        from ._criterion import LogrankCriterion
      File "_splitter.pxd", line 34, in init sksurv.tree._criterion
    ValueError: sklearn.tree._splitter.Splitter size changed, may indicate binary incompatibility. Expected 368 from C header, got 360 from PyObject
    >>>
    
    opened by gregchu 13
  • Fix a variety of build problems.

    Fix a variety of build problems.

    Checklist

    • [x] py.test passes
    • [x] documentation renders correctly

    What does this implement/fix? Explain your changes

    In LLVM, this project was not compiling properly. With these changes, the project seems to compile fine.

    opened by llpamies 10
  • viz of ensemble models

    viz of ensemble models

    Hi!

    would you have any advice on how to visualize decision path / decision trees from the ensemble survival model methods (either RF or Gradient Boosting)?

    opened by ad05bzag 10
  • Different results of CoxPHSurvivalAnalysis and CoxnetSurvivalAnalysis

    Different results of CoxPHSurvivalAnalysis and CoxnetSurvivalAnalysis

    The documentation of CoxPHSurvivalAnalysis says:

    Cox proportional hazards model.

    And the documentation of CoxnetSurvivalAnalysis says:

    Cox's proportional hazard's model with elastic net penalty.

    So I assume the two classes implement the same model, and should return the same results when set with the same model parameters and given the same data. However, I see different results. Why? Also, what are the differences between them?

    Codes:

    from sksurv.linear_model import CoxPHSurvivalAnalysis, CoxnetSurvivalAnalysis
    from sksurv.datasets import load_veterans_lung_cancer
    from sksurv.preprocessing import OneHotEncoder
    
    X_, y = load_veterans_lung_cancer()
    X = OneHotEncoder().fit_transform(X_)
    
    # try to match the model parameters wherever possible
    f = CoxPHSurvivalAnalysis(alpha=0.5, n_iter=100000)
    g = CoxnetSurvivalAnalysis(alphas=[0.5], alpha_min_ratio=1, n_alphas=1, 
                               l1_ratio=1e-16, tol=1e-09, normalize=False)
    
    print(f)
    print(g)
    
    f.fit(X, y)
    g.fit(X, y)
    
    print(f.coef_)
    print(g.coef_[:,0])
    

    Output:

    CoxPHSurvivalAnalysis(alpha=0.5, n_iter=100000, tol=1e-09, verbose=0)
    CoxnetSurvivalAnalysis(alpha_min_ratio=0.0001, alphas=[0.5], copy_X=True,
                l1_ratio=1e-16, max_iter=100000, n_alphas=1, normalize=False,
                penalty_factor=None, tol=1e-09, verbose=False)
    [-8.34518623e-03 -7.21105070e-01 -2.80434400e-01 -1.11234345e+00
     -3.26083027e-02 -1.93213436e-04  6.22726190e-02  2.90289950e-01]
    [-0.00346722 -0.05117406  0.06044394 -0.16433136 -0.03300373  0.0003172
     -0.00881617  0.06956854]
    

    What I've gathered:

    • CoxPHSurvivalAnalysis is sksurv's own implementation of Cox Proportional Hazard model, and supports ridge (L2) regularization.
    • CoxnetSurvivalAnalysis is a wrapper of some C++ extension codes used by R's glmnet package, and supports elastic net (L1 and L2) regularization.
    • In the test files, CoxPHSurvivalAnalysis is tested with the Rossi dataset, while CoxnetSurvivalAnalysis is tested with the Breast Cancer dataset.
    • The two classes have different constructor signatures and methods (eg, only CoxPHSurvivalAnalysis has predict_survival_function).

    Will it be some nice features to have a consolidated constructor signatures and methods for the two classes? And have them tested on the same dataset, for validation or comparison?

    Thanks.

    opened by leihuang 10
  • Add `apply` and `decision_path` to `SurvivalTree`

    Add `apply` and `decision_path` to `SurvivalTree`

    Checklist

    • [x] closes #290
    • [x] py.test passes
    • [x] tests are included
    • [x] code is well formatted
    • [x] documentation renders correctly

    What does this implement/fix? Explain your changes

    Add apply and decision_path to SurvivalForest to also enable the same methods for RandomSurvivalForest and ExtraSurvivalTrees.

    opened by Vincent-Maladiere 8
  • RandomSurvivalForest - predict_survival_function

    RandomSurvivalForest - predict_survival_function

    Describe the bug

    1. I am trying to predict the survival function for my data using RandomSurvivalForest, although the class method works well, it doesn't retrieve the times for each of the steps in the survival function. Each list containing the survival function has a lenght equal or lower to the number of unique times in our "y", hence we can't deduct to what point in time each steps belongs to.

    2. Additionally, if you follow the example given in the documentation of RandomSurvivalForest, you will get the following error:

    from sksurv.datasets import load_whas500
    X, y = load_whas500()
    times = sorted(np.unique(y["lenfol"])) 
    n_times = len(times) 
    # n_times =  395
    
    estimator = RandomSurvivalForest().fit(X, y)
    surv_funcs = estimator.predict_survival_function(X.iloc[:5])
    
    surv_funcs[0]
    # array([0.9975    , 0.9975    , 0.9975    , 0.9975    , 0.9975    ,
    #       0.9975    , 0.9975    , 0.995     , 0.98883333, 0.98883333,...
    
    len(surv_funcs[0])
    # 162
    
    

    Additionally, if you follow the example given in the documentation of RandomSurvivalForest, you will get an error since the result of predict_survival_function is an 1D unlike the same function used in CoxnetSurvivalAnalysis or CoxPHSurvivalAnalysis. This is the error you get:

    from sksurv.datasets import load_whas500
    X, y = load_whas500()
    estimator = RandomSurvivalForest().fit(X, y)
    surv_funcs = estimator.predict_survival_function(X.iloc[:5])
    for fn in surv_funcs:
           plt.step(fn.x, fn(fn.x), where="post")
    
    plt.ylim(0, 1)
    plt.show()
    
    AttributeError: 'numpy.ndarray' object has no attribute 'x'
    
    opened by felipe0216 8
  • Error when using PIP to install scikit-survival 0.13 that uses PEP 517

    Error when using PIP to install scikit-survival 0.13 that uses PEP 517

    Describe the bug

    A clear and concise description of what the bug is.

    Code Sample to Reproduce the Bug

    # Insert your code here that produces the bug.
    # This example should be as succinct as possible and self-contained,
    # i.e., not rely on external data.
    # We are going to copy-paste your code and we expect to get the same result as you.
    # It should run in a fresh python session, and so include all relevant imports.
    

    Expected Results A clear and concise description of what you expected to happen.

    Actual Results Please paste or specifically describe the actual output or traceback.

    Versions Please execute the following snippet and paste the output below.

    import sklearn; sklearn.show_versions()
    import sksurv; print("sksurv:", sksurv.__version__)
    import cvxopt; print("cvxopt:", cvxopt.__version__)
    import cvxpy; print("cvxpy:", cvxpy.__version__)
    import numexpr; print("numexpr:", numexpr.__version__)
    import osqp; print("osqp:", osqp.OSQP().version())
    
    opened by SurajitTest 8
  • Loss Function

    Loss Function "ipcwls" in GradientBoostingSurvivalAnalysis leads to error

    Hi

    I was trying to train a time-to-failure model using machine sensor data. I chose the loss function 'ipcwls' which as per the docs weights the observations by their censoring weights. Although I'm not aware of the thoery behind it, it seemed like a reasonable choice. But, the code fails while applying the fit() function with the error message "input contains nan infinity or a value too large for dtype float64"

    FYI, All of my X variables are scaled and they take continuous values within +-50 range. Quite a few has small values close to zero (5-6 decimal places). Is the loss function choice leading to a division by zero situation? Need some clarity on this and when this loss function should not be used.

    Thanks, Soham

    opened by Soham2112 8
  • n_iter_no_change in GradientBoostingSurvivalAnalysis

    n_iter_no_change in GradientBoostingSurvivalAnalysis

    Describe the bug

    The documentation for the parameter "n_estimators_" of GradientBoostingSurvivalAnalysis says "The number of estimators as selected by early stopping (if n_iter_no_change is specified)." However, GradientBoostingSurvivalAnalysis does not accept n_iter_no_change as an argument.

    Code Sample to Reproduce the Bug

    from sksurv.ensemble import GradientBoostingSurvivalAnalysis
    GradientBoostingSurvivalAnalysis(n_iter_no_change = 10)
    

    Actual Results

    TypeError: GradientBoostingSurvivalAnalysis.__init__() got an unexpected keyword argument 'n_iter_no_change'
    Please paste or specifically describe the actual output or traceback.
    

    Versions System: python: 3.10.8 (main, Nov 24 2022, 14:13:03) [GCC 11.2.0] machine: Linux-5.15.0-56-generic-x86_64-with-glibc2.35

    Python dependencies: sklearn: 1.2.0 pip: 22.3.1 setuptools: 65.5.0 numpy: 1.23.4 scipy: None Cython: 0.29.32 pandas: 1.5.1 matplotlib: 3.6.2 joblib: 1.2.0 threadpoolctl: 3.1.0

    opened by TristanFauvel 0
  • Added conditional property to expose time scale predictions

    Added conditional property to expose time scale predictions

    Checklist

    • [X] closes #324
    • [X] py.test passes
    • [ ] tests are included
    • [X] code is well formatted
    • [X] documentation renders correctly

    Added a decorator for properties, which are only available, if a check returns true. The decorator provided by scikit-learn only works for functions sadly.

    @sebp I am not sure what to test exactly, maybe a test which tests whether pipelines correctly patch the property and functions through? I also think this should not show up in the documentation, as it is internal?

    opened by Finesim97 5
  • SciKit-Learn Pipeline not patched with

    SciKit-Learn Pipeline not patched with "_predict_risk_score"

    Describe the bug

    In my own evaluation code I used the check for '_predict_risk_score' to see, whether models return their predictions on the time scale or risk scale, but this doesn't work, when the estimator is wrapped in a pipeline.

    # Insert your code here that produces the bug.
    from sklearn.pipeline import Pipeline
    from sksurv.linear_model.aft import IPCRidge
    from sksurv.datasets import load_veterans_lung_cancer
    from sksurv.preprocessing import OneHotEncoder
    from sksurv.base import SurvivalAnalysisMixin
    
    
    data_x, data_y = load_veterans_lung_cancer()
    
    
    data_x_prep = OneHotEncoder().fit_transform(data_x)
    model_direct = IPCRidge().fit(data_x_prep, data_y)
    
    
    pipe = Pipeline([('encode', OneHotEncoder()),
                     ('model', IPCRidge())])
    pipe.fit(data_x, data_y)
    
    
    # Are equal
    print(model_direct.predict(data_x_prep.head()))
    print(pipe.predict(data_x.head()))
    
    
    # Steal super method
    # This does not match, because ...
    print(SurvivalAnalysisMixin.score(model_direct, data_x_prep, data_y))
    print(SurvivalAnalysisMixin.score(pipe, data_x, data_y))
    
    
    # ... the property is not patched through
    # if this returns true, the scores are treated as being on the time scale
    print(not getattr(model_direct, "_predict_risk_score", True))
    print(not getattr(pipe, "_predict_risk_score", True))
    
    
    # The second one should also be true!
    

    Expected Results A Pipeline object should also have the corresponding property set, as this might break evaluation codes.

    Actual Results The property is not available. It should be possible to just add it to the __init__.py, but I am not sure, how well it works together with the @property decorator. Currently I am finishing my master thesis, but I should be able to work out a PR on the 5th of December while testing the behaviour.

    Versions (Not running the newest version cough)

    System:
        python: 3.9.9 | packaged by conda-forge | (main, Dec 20 2021, 02:41:03)  [GCC 9.4.0]
    executable: /home/jovyan/master-thesis/env/bin/python
       machine: Linux-5.10.0-15-amd64-x86_64-with-glibc2.35
    
    Python dependencies:
          sklearn: 1.1.2
              pip: 22.2.2
       setuptools: 65.4.0
            numpy: 1.23.3
            scipy: 1.9.1
           Cython: None
           pandas: 1.5.0
       matplotlib: 3.6.0
           joblib: 1.2.0
    threadpoolctl: 3.1.0
    
    Built with OpenMP: True
    
    threadpoolctl info:
           user_api: openmp
       internal_api: openmp
             prefix: libgomp
           filepath: /home/jovyan/master-thesis/env/lib/python3.9/site-packages/scikit_learn.libs/libgomp-a34b3233.so.1.0.0
            version: None
        num_threads: 48
    
           user_api: blas
       internal_api: openblas
             prefix: libopenblas
           filepath: /home/jovyan/master-thesis/env/lib/libopenblasp-r0.3.21.so
            version: 0.3.21
    threading_layer: pthreads
       architecture: Zen
        num_threads: 48
    
           user_api: blas
       internal_api: openblas
             prefix: libopenblas
           filepath: /home/jovyan/master-thesis/env/lib/python3.9/site-packages/scipy.libs/libopenblasp-r0-9f9f5dbc.3.18.so
            version: 0.3.18
    threading_layer: pthreads
       architecture: Zen
        num_threads: 48
    sksurv: 0.18.0
    
    enhancement 
    opened by Finesim97 2
  • Bug in nonparametric.py when calling IPCRidge

    Bug in nonparametric.py when calling IPCRidge

    Describe the bug

    Running IPCRidge hangs with the following message

    assert (Ghat > 0).all()

    and nothing after. I found that changing the option 'reverse = False' as shown down below in kaplan_meier_estimator in the function ipc_weights in the file nonparametric.py corrects the mistake. Error message:

    AssertionError                            Traceback (most recent call last)
    Input In [74], in <cell line: 5>()
          2 set_config(display="text")  # displays text representation of estimators
          4 estimator = IPCRidge(alpha = 0.5,fit_intercept=True)
    ----> 5 estimator.fit(data_x,data_y)
    
    File /opt/homebrew/Caskroom/miniforge/base/envs/teaching_env/lib/python3.10/site-packages/sksurv/linear_model/aft.py:90, in IPCRidge.fit(self, X, y)
         72 """Build an accelerated failure time model.
         73 
         74 Parameters
       (...)
         86 self
         87 """
         88 event, time = check_array_survival(X, y)
    ---> 90 weights = ipc_weights(event, time)
         91 super().fit(X, numpy.log(time), sample_weight=weights)
         93 return self
    
    File /opt/homebrew/Caskroom/miniforge/base/envs/teaching_env/lib/python3.10/site-packages/sksurv/nonparametric.py:323, in ipc_weights(event, time)
        320 idx = numpy.searchsorted(unique_time, time[event])
        321 Ghat = p[idx]
    --> 323 assert (Ghat > 0).all()
        325 weights = numpy.zeros(time.shape[0])
        326 weights[event] = 1.0 / Ghat
    
    AssertionError: 
    

    Code Sample to Reproduce the Bug

    used code:

    estimator = IPCRidge(alpha = 0.5,fit_intercept=True)
    estimator.fit(data_x,data_y)
    
    

    Here is what I changed in the nonparametric.py in the line unique_time, p = kaplan_meier_estimator(event, time, reverse=False) -- changed True to False

    def ipc_weights(event, time):
        """Compute inverse probability of censoring weights
    
        Parameters
        ----------
        event : array, shape = (n_samples,)
            Boolean event indicator.
    
        time : array, shape = (n_samples,)
            Time when a subject experienced an event or was censored.
    
        Returns
        -------
        weights : array, shape = (n_samples,)
            inverse probability of censoring weights
    
        See also
        --------
        CensoringDistributionEstimator
            An estimator interface for estimating inverse probability
            of censoring weights for unseen time points.
        """
        if event.all():
            return np.ones(time.shape[0])
    
        unique_time, p = kaplan_meier_estimator(event, time, reverse=False)
    
        idx = np.searchsorted(unique_time, time[event])
        Ghat = p[idx]
    
        assert (Ghat > 0).all()
    
        weights = np.zeros(time.shape[0])
        weights[event] = 1.0 / Ghat
    
        return weights
    

    Machine and packages versions used:

    Last updated: 2022-11-08T08:59:04.111247-05:00
    
    Python implementation: CPython
    Python version       : 3.10.5
    IPython version      : 8.4.0
    
    Compiler    : Clang 13.0.1 
    OS          : Darwin
    Release     : 21.6.0
    Machine     : arm64
    Processor   : arm
    CPU cores   : 10
    Architecture: 64bit
    
    matplotlib: 3.5.2
    numpy     : 1.22.4
    pandas    : 1.4.4
    json      : 2.0.9
    
    bug 
    opened by fbarfi 4
  • Suggestions for StepFunction

    Suggestions for StepFunction

    I have 2 minor suggestions for StepFunction that I would like to see:

    1. Different argument name for 'x' in init and call. In addition, current API reference is missing.
    2. Sort the arrays inside the function.

    Thanks.

    awaiting response 
    opened by drproduck 1
  • KM_variance_estimator

    KM_variance_estimator

    Checklist

    • [x] py.test passes
    • [x] tests are included
    • [x] code is well formatted
    • [ ] documentation renders correctly

    What does this implement/fix? Explain your changes

    Hi @sebp, I added the Greenwood's estimation of KM variance to nonparametric.py (this is a prerequesite for implementing some goodness-of-fit tests). NB: I ran tox -e py310-docs but for some reason the new function does not not appear in the API doc. Best,

    opened by TristanFauvel 3
Releases(v0.19.0.post1)
Owner
Sebastian Pölsterl
Sebastian Pölsterl
An Indexer that works out-of-the-box when you have less than 100K stored Documents

U100KIndexer An Indexer that works out-of-the-box when you have less than 100K stored Documents. U100K means under 100K. At 100K stored Documents with

Jina AI 7 Mar 15, 2022
Open-Domain Question-Answering for COVID-19 and Other Emergent Domains

Open-Domain Question-Answering for COVID-19 and Other Emergent Domains This repository contains the source code for an end-to-end open-domain question

7 Sep 27, 2022
This python script allows you to manipulate the audience data from Sl.ido surveys

Slido-Automated-VoteBot This python script allows you to manipulate the audience data from Sl.ido surveys Since Slido blocks interference from automat

Pranav Menon 1 Jan 24, 2022
Cold Brew: Distilling Graph Node Representations with Incomplete or Missing Neighborhoods

Cold Brew: Distilling Graph Node Representations with Incomplete or Missing Neighborhoods Introduction Graph Neural Networks (GNNs) have demonstrated

37 Dec 15, 2022
Package for decomposing EMG signals into motor unit firings, as used in Formento et al 2021.

EMGDecomp Package for decomposing EMG signals into motor unit firings, created for Formento et al 2021. Based heavily on Negro et al, 2016. Supports G

13 Nov 01, 2022
Python Package for DataHerb: create, search, and load datasets.

The Python Package for DataHerb A DataHerb Core Service to Create and Load Datasets.

DataHerb 4 Feb 11, 2022
PandaPy has the speed of NumPy and the usability of Pandas 10x to 50x faster (by @firmai)

PandaPy "I came across PandaPy last week and have already used it in my current project. It is a fascinating Python library with a lot of potential to

Derek Snow 527 Jan 02, 2023
API>local_db>AWS_RDS - Disclaimer! All data used is for educational purposes only.

APIlocal_dbAWS_RDS Disclaimer! All data used is for educational purposes only. ETL pipeline diagram. Aim of project By creating a fully working pipe

0 Apr 25, 2022
Improving your data science workflows with

Make Better Defaults Author: Kjell Wooding [email protected] This is the git re

Kjell Wooding 18 Dec 23, 2022
Processo de ETL (extração, transformação, carregamento) realizado pela equipe no projeto final do curso da Soul Code Academy.

Processo de ETL (extração, transformação, carregamento) realizado pela equipe no projeto final do curso da Soul Code Academy.

Débora Mendes de Azevedo 1 Feb 03, 2022
This is a tool for speculation of ancestral allel, calculation of sfs and drawing its bar plot.

superSFS This is a tool for speculation of ancestral allel, calculation of sfs and drawing its bar plot. It is easy-to-use and runing fast. What you s

3 Dec 16, 2022
A probabilistic programming library for Bayesian deep learning, generative models, based on Tensorflow

ZhuSuan is a Python probabilistic programming library for Bayesian deep learning, which conjoins the complimentary advantages of Bayesian methods and

Tsinghua Machine Learning Group 2.2k Dec 28, 2022
Fitting thermodynamic models with pycalphad

ESPEI ESPEI, or Extensible Self-optimizing Phase Equilibria Infrastructure, is a tool for thermodynamic database development within the CALPHAD method

Phases Research Lab 42 Sep 12, 2022
WithPipe is a simple utility for functional piping in Python.

A utility for functional piping in Python that allows you to access any function in any scope as a partial.

Michael Milton 1 Oct 26, 2021
apricot implements submodular optimization for the purpose of selecting subsets of massive data sets to train machine learning models quickly.

Please consider citing the manuscript if you use apricot in your academic work! You can find more thorough documentation here. apricot implements subm

Jacob Schreiber 457 Dec 20, 2022
Feature engineering and machine learning: together at last

Feature engineering and machine learning: together at last! Lambdo is a workflow engine which significantly simplifies data analysis by unifying featu

Alexandr Savinov 14 Sep 15, 2022
We're Team Arson and we're using the power of predictive modeling to combat wildfires.

We're Team Arson and we're using the power of predictive modeling to combat wildfires. Arson Map Inspiration There’s been a lot of wildfires in Califo

Jerry Lee 3 Oct 17, 2021
Implementation in Python of the reliability measures such as Omega.

OmegaPy Summary Simple implementation in Python of the reliability measures: Omega Total, Omega Hierarchical and Omega Hierarchical Total. Name Link O

Rafael Valero Fernández 2 Apr 27, 2022
Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis. You write a high level configuration file specifying your in

Blue Collar Bioinformatics 917 Jan 03, 2023