Supervised domain-agnostic prediction framework for probabilistic modelling

Overview

skpro

PyPI version Build Status License

A supervised domain-agnostic framework that allows for probabilistic modelling, namely the prediction of probability distributions for individual data points.

The package offers a variety of features and specifically allows for

  • the implementation of probabilistic prediction strategies in the supervised contexts
  • comparison of frequentist and Bayesian prediction methods
  • strategy optimization through hyperparamter tuning and ensemble methods (e.g. bagging)
  • workflow automation

List of developers and contributors

Documentation

The full documentation is available here.

Installation

Installation is easy using Python's package manager

$ pip install skpro

Contributing & Citation

We welcome contributions to the skpro project. Please read our contribution guide.

If you use skpro in a scientific publication, we would appreciate citations.

Comments
  • Distributions as return objects

    Distributions as return objects

    Re-opening the sub-issue opened in #3 and commented upon by @murphyk

    Question: should skpro's predict methods return a vector of distribution objects? For example, using the distributions from scipy.stats which implement methods pdf, cdf, mean, var, etc.

    Pro:

    • this would be using an existing, consolidated, and well-supported interface
    • it might be easier to use
    • it might be easier to understand

    Contra:

    • mixture types are not supported
    • l2 norm is not supported (as would be needed for squared/Gneiting loss)
    • mixed distributions on the reals, especially empirical distributions (weighted sum of deltas) which are returned by Bayesian packages are not supported
    • vectors of distributions are not supported, alternatively Cartesian products of distributions
    • this is not the status quo
    help wanted 
    opened by fkiraly 11
  • documentation: np.mean(y_pred) does not work

    documentation: np.mean(y_pred) does not work

    I'm following along with this intro example.. However this line fails

    (numpy.mean(y_pred) * 2).shape
    

    Error below (seems to be because Distribution objects don't support the mean() function but instead insist on obscurely calling it point!)

    np.mean(y_pred)
    Traceback (most recent call last):
    
      File "<ipython-input-38-19819be87ab5>", line 1, in <module>
        np.mean(y_pred)
    
      File "/home/kpmurphy/anaconda3/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 2920, in mean
        out=out, **kwargs)
    
      File "/home/kpmurphy/anaconda3/lib/python3.7/site-packages/numpy/core/_methods.py", line 75, in _mean
        ret = umr_sum(arr, axis, dtype, out, keepdims)
    
    TypeError: unsupported operand type(s) for +: 'Distribution' and 'Distribution'
    
    opened by murphyk 3
  • First example: 'utils' not found

    First example: 'utils' not found

    The first example in your documentation (DensityBaseline) does not run right on my machine: it throws a 'module not found' exception at the call to 'utils'.

    This might be a python version problem (I am using 3.6), so perhaps it's not an error in the normal sense - though I don't see any specification that the package required a particular python version. Apologies if I missed it: in any case, I fixed it by importing matplotlib instead: i.e.

    import matplotlib.pyplot as plt plt.scatter(y_test, y_pred)

    instead of:

    import utils utils.plot_performance(y_test, y_pred)

    opened by Thomas-M-H-Hope 2
  • problem in loading the skpro

    problem in loading the skpro

    It has been 2 days that I am trying to import skpro. But I can not I keep getting this error:

    cannot import name 'six' from 'sklearn.externals' (C:\Users\My Book\anaconda3\lib\site-packages\sklearn\externals_init_.py)

    opened by honestee 1
  • (wish)list of probabilistic regressors to implement or to interface

    (wish)list of probabilistic regressors to implement or to interface

    A wishlist for probabilistic regression methods to implement or interface. This is partly copied from the R counterpart https://github.com/mlr-org/mlr3proba/issues/32 . Number of stars at the end is estimated difficulty or time investment.

    GLM

    • [ ] generalized linear model(s) with regression link, e.g., Gaussian *
    • [ ] generalized linear model(s) with count link, e.g., Poisson *
    • [ ] heteroscedastic linear regression ***
    • [ ] Bayesian GLM where conjugate priors are available, e.g., GLM with Gaussian link ***

    KRR aka Gaussian process regression

    • [ ] vanilla kernel ridge regression with fixed kernel parameters and variance *
    • [ ] kernel ridge regression with MLE for kernel parameters and regularization parameter **
    • [ ] heteroscedastic KRR or Gaussian processes ***

    CDE

    • [ ] variants of conditional density estimation (Nadaraya-Watson type) **
    • [ ] reduction to density estimation by binning of input variables, then apply unconditional density estimation **

    Tree-based

    • [ ] probabilistic regression trees **

    Neural networks

    • [ ] interface tensorflow probability - some hard-coded NN architectures **
    • [ ] generic tensorflow probability interface - some hard-coded NN architectures ***

    Bayesian toolboxes

    • [ ] generic pymc3 interface ***
    • [ ] generic pyro interface ****
    • [ ] generic Stan interface ****
    • [ ] generic JAGS interface ****
    • [ ] generic BUGS interface ****
    • [ ] generic Bayesian interface - prior-valued hyperparameters *****

    Pipeline elements for target transformation

    • [ ] distr fixed target transformation **
    • [ ] distr predictive target calibration **

    Composite techniques, reduction to deterministic regression

    • [ ] stick mean, sd, from a deterministic regressor which already has these as return types into some location/scale distr family (Gaussian, Laplace) *
    • [ ] use model 1 for the mean, model 2 fit to residuals (squared, absolute, or log), put this in some location/scale distr family (Gaussian, Laplace) **
    • [ ] upper/lower thresholder for a regression prediction, to use as a pipeline element for a forced lower variance bound **
    • [ ] generic parameter prediction by elicitation, output being plugged into parameters of a distr object not necessarily scale/location ****
    • [ ] reduction via bootstrapped sampling of a determinstic regressor **

    Ensembling type pipeline elements and compositors

    • [ ] simple bagging, averaging of pdf/cdf **
    • [ ] probabilistic boosting ***
    • [ ] probabilistic stacking ***

    baselines

    • [ ] always predict a Gaussian with mean = training mean, var = training var *
    • [ ] IMPORTANT as featureless baseline: reduction to distr/density estimation to produce an unconditional probabilistic regressor **
    • [ ] IMPORTANT as deterministic style baseline: reduction to deterministic regression, mean = prediction by det.regressor, var = training sample var, distr type = Gaussian (or Laplace) **

    Other reduction from/to probabilistic regression

    • [ ] reducing deterministic regression to probabilistic regression - take mean, median or mode **
    • [ ] reduction(s) to quantile regression, use predictive quantiles to make a distr ***
    • [ ] reducing deterministic (quantile) regression to probabilistic regression - take quantile(s) **
    • [ ] reducing interval regression to probabilistic regression - take mean/sd, or take quantile(s) **
    • [ ] reduction to survival, as the sub-case of no censoring **
    • [ ] reduction to classification, by binning ***
    good first issue 
    opened by fkiraly 0
  • skpro-refactoring (version-2)

    skpro-refactoring (version-2)

    See below some comments/description of the coming refactoring contents :

    • Distribution classes refactoring in a more OOD way (see. skpro->distribution)
    • Losse functions (see. metrics->distribution)
    • Estimators (see. metrics->distribution)

    Some descriptive notebooks (in docs->notebooks) and a full set of unit test (in tests) are also available.

    opened by jesellier 24
Releases(v1.0.1-beta)
Owner
The Alan Turing Institute
The UK's national institute for data science and artificial intelligence.
The Alan Turing Institute
Web service for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation based on OpenFace 2.0

OpenGaze: Web Service for OpenFace Facial Behaviour Analysis Toolkit Overview OpenFace is a fantastic tool intended for computer vision and machine le

Sayom Shakib 4 Nov 03, 2022
BarcodeRattler - A Raspberry Pi Powered Barcode Reader to load a game on the Mister FPGA using MBC

Barcode Rattler A Raspberry Pi Powered Barcode Reader to load a game on the Mist

Chrissy 29 Oct 31, 2022
This repository is the official implementation of Using Time-Series Privileged Information for Provably Efficient Learning of Prediction Models

Using Time-Series Privileged Information for Provably Efficient Learning of Prediction Models Link to paper Abstract We study prediction of future out

Rickard Karlsson 2 Aug 19, 2022
A smaller subset of 10 easily classified classes from Imagenet, and a little more French

Imagenette 🎶 Imagenette, gentille imagenette, Imagenette, je te plumerai. 🎶 (Imagenette theme song thanks to Samuel Finlayson) NB: Versions of Image

fast.ai 718 Jan 01, 2023
Fast and accurate optimisation for registration with little learningconvexadam

convexAdam Learn2Reg 2021 Submission Fast and accurate optimisation for registration with little learning Excellent results on Learn2Reg 2021 challeng

17 Dec 06, 2022
Predictive Modeling on Electronic Health Records(EHR) using Pytorch

Predictive Modeling on Electronic Health Records(EHR) using Pytorch Overview Although there are plenty of repos on vision and NLP models, there are ve

81 Jan 01, 2023
NER for Indian languages

CL-NERIL: A Cross-Lingual Model for NER in Indian Languages Code for the paper - https://arxiv.org/abs/2111.11815 Setup Setup a virtual environment Th

Akshara P 0 Nov 24, 2021
DeOldify - A Deep Learning based project for colorizing and restoring old images (and video!)

DeOldify - A Deep Learning based project for colorizing and restoring old images (and video!)

Jason Antic 15.8k Jan 04, 2023
CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

TUCH This repo is part of our project: On Self-Contact and Human Pose. [Project Page] [Paper] [MPI Project Page] License Software Copyright License fo

Lea Müller 45 Jan 07, 2023
M3DSSD: Monocular 3D Single Stage Object Detector

M3DSSD: Monocular 3D Single Stage Object Detector Setup pytorch 0.4.1 Preparation Download the full KITTI detection dataset. Then place a softlink (or

mumianyuxin 64 Dec 27, 2022
Breaking the Dilemma of Medical Image-to-image Translation

Breaking the Dilemma of Medical Image-to-image Translation Supervised Pix2Pix and unsupervised Cycle-consistency are two modes that dominate the field

Kid Liet 86 Dec 21, 2022
Pytorch ImageNet1k Loader with Bounding Boxes.

ImageNet 1K Bounding Boxes For some experiments, you might wanna pass only the background of imagenet images vs passing only the foreground. Here, I'v

Amin Ghiasi 11 Oct 15, 2022
DziriBERT: a Pre-trained Language Model for the Algerian Dialect

DziriBERT DziriBERT is the first Transformer-based Language Model that has been pre-trained specifically for the Algerian Dialect. It handles Algerian

117 Jan 07, 2023
Crossover Learning for Fast Online Video Instance Segmentation (ICCV 2021)

TL;DR: CrossVIS (Crossover Learning for Fast Online Video Instance Segmentation) proposes a novel crossover learning paradigm to fully leverage rich c

Hust Visual Learning Team 79 Nov 25, 2022
A rough implementation of the paper "A Steering Algorithm for Redirected Walking Using Reinforcement Learning"

A rough implementation of the paper "A Steering Algorithm for Redirected Walking Using Reinforcement Learning"

Somnus ï½€Chen 2 Jun 09, 2022
DANA paper supplementary materials

DANA Supplements This repository stores the data, results, and R scripts to generate these reuslts and figures for the corresponding paper Depth Norma

0 Dec 17, 2021
Code for Learning to Segment The Tail (LST)

Learning to Segment the Tail [arXiv] In this repository, we release code for Learning to Segment The Tail (LST). The code is directly modified from th

47 Nov 07, 2022
Machine learning library for fast and efficient Gaussian mixture models

This repository contains code which implements the Stochastic Gaussian Mixture Model (S-GMM) for event-based datasets Dependencies CMake Premake4 Blaz

Omar Oubari 1 Dec 19, 2022
Code release for "Conditional Adversarial Domain Adaptation" (NIPS 2018)

CDAN Code release for "Conditional Adversarial Domain Adaptation" (NIPS 2018) New version: https://github.com/thuml/Transfer-Learning-Library Dataset

THUML @ Tsinghua University 363 Dec 20, 2022
HODEmu, is both an executable and a python library that is based on Ragagnin 2021 in prep.

HODEmu HODEmu, is both an executable and a python library that is based on Ragagnin 2021 in prep. and emulates satellite abundance as a function of co

Antonio Ragagnin 1 Oct 13, 2021