A Python 3 package for state-of-the-art statistical dimension reduction methods

Related tags

Deep Learningdirepack
Overview

direpack: a Python 3 library for state-of-the-art statistical dimension reduction techniques

This package delivers a scikit-learn compatible Python 3 package for sundry state-of-the art multivariate statistical methods, with a focus on dimension reduction.

The categories of methods delivered in this package, are:

  • Projection pursuit dimension reduction (ppdire)
  • Sufficient dimension reduction (sudire)
  • Robust M-estimators for dimension reduction (sprm) each of which are presented as scikit-learn compatible objects in the corresponding folders.

We hope that this package leads to scientific success. If it does so, we kindly ask to cite the direpack vignette [0], as well as the original publication of the corresponding method.

The package also contains a set of tools for pre- and postprocessing:

  • The preprocessing folder provides classical and robust centring and scaling, as well as spatial sign transforms [4]
  • The dicomo folder contains a versatile class to access a wide variety of moment and co-moment statistics, and statistics derived from those. Check out the dicomo Documentation file and the dicomo Examples Notebook.
  • Plotting utilities in the plot folder
  • Cross-validation utilities in the cross-validation folder

AIG sprm score space

Methods in the sprm folder

  • The estimator (sprm.py) [1]
  • The Sparse NIPALS (SNIPLS) estimator [3](snipls.py)
  • Robust M regression estimator (rm.py)
  • Ancillary functions for M-estimation (_m_support_functions.py)

Methods in the ppdire folder

The ppdire class will give access to a wide range of projection pursuit dimension reduction techniques. These include slower approximate estimates for well-established methods such as PCA, PLS and continuum regression. However, the class provides unique access to a set of robust options, such as robust continuum regression (RCR) [5], through its native grid optimization algorithm, first published for RCR as well [6]. Moreover, ppdire is also a great gateway to calculate generalized betas, using the CAPI projection index [7].

The code is orghanized in

  • ppdire.py - the main PP dimension reduction class
  • capi.py - the co-moment analysis projection index.

Methods in the sudire folder

The sudire folder gives access to an extensive set of methods that resort under the umbrella of sufficient dimension reduction. These range from meanwhile long-standing, well-accepted approaches, such as sliced inverse regression (SIR) and the closely related SAVE [8,9], through methods such as directional regression [10] and principal Hessian directions [11], and more. However, the package also contains some of the most recently developed, state-of-the-art sufficient dimension reduction techniques, that require no distributional assumptions. The options provided in this category are based on energy statistics (distance covariance [12] or martingale difference divergence [13]) and ball statistics (ball covariance) [14]. All of these options can be called by setting the corresponding parameters in the sudire class, cf. the docs. Note: the ball covariance option will require some lines to be uncommented as indicated. We decided not to make that option generally available, since it depends on the Ball package that seems to be difficult to install on certain architectures.

How to install

The package is distributed through PyPI, so install through:

    pip install direpack

Note that some of the key methods in the sudire subpackage rely on the IPOPT optimization package, which according to their recommendation, can best be installed directly as:

    conda install -c conda-forge cyipopt

Documentation

  • Detailed documentation can be found in the ReadTheDocs page.
  • A more extensive description on the background is presented in the direpack vignette.
  • Examples on how to use each of the dicomo, ppdire, sprm and sudire classes are presented as Jupyter notebooks in the examples folder
  • Furthemore, the docs folder contains a few markdown files on usage of the classes.

References

  1. direpack: A Python 3 package for state-of-the-art statistical dimension reduction methods
  2. Sparse partial robust M regression, Irene Hoffmann, Sven Serneels, Peter Filzmoser, Christophe Croux, Chemometrics and Intelligent Laboratory Systems, 149 (2015), 50-59.
  3. Partial robust M regression, Sven Serneels, Christophe Croux, Peter Filzmoser, Pierre J. Van Espen, Chemometrics and Intelligent Laboratory Systems, 79 (2005), 55-64.
  4. Sparse and robust PLS for binary classification, I. Hoffmann, P. Filzmoser, S. Serneels, K. Varmuza, Journal of Chemometrics, 30 (2016), 153-162.
  5. Spatial Sign Preprocessing:  A Simple Way To Impart Moderate Robustness to Multivariate Estimators, Sven Serneels, Evert De Nolf, Pierre J. Van Espen, Journal of Chemical Information and Modeling, 46 (2006), 1402-1409.
  6. Robust Continuum Regression, Sven Serneels, Peter Filzmoser, Christophe Croux, Pierre J. Van Espen, Chemometrics and Intelligent Laboratory Systems, 76 (2005), 197-204.
  7. Robust Multivariate Methods: The Projection Pursuit Approach, Peter Filzmoser, Sven Serneels, Christophe Croux and Pierre J. Van Espen, in: From Data and Information Analysis to Knowledge Engineering, Spiliopoulou, M., Kruse, R., Borgelt, C., Nuernberger, A. and Gaul, W., eds., Springer Verlag, Berlin, Germany, 2006, pages 270--277.
  8. Projection pursuit based generalized betas accounting for higher order co-moment effects in financial market analysis, Sven Serneels, in: JSM Proceedings, Business and Economic Statistics Section. Alexandria, VA: American Statistical Association, 2019, 3009-3035.
  9. Sliced Inverse Regression for Dimension Reduction Li K-C, Journal of the American Statistical Association (1991), 86, 316-327.
  10. Sliced Inverse Regression for Dimension Reduction: Comment, R.D. Cook, and Sanford Weisberg, Journal of the American Statistical Association (1991), 86, 328-332.
  11. On directional regression for dimension reduction , B. Li and S.Wang, Journal of the American Statistical Association (2007), 102:997–1008.
  12. On principal hessian directions for data visualization and dimension reduction:Another application of stein’s lemma, K.-C. Li. , Journal of the American Statistical Association(1992)., 87,1025–1039.
  13. Sufficient Dimension Reduction via Distance Covariance, Wenhui Sheng and Xiangrong Yin in: Journal of Computational and Graphical Statistics (2016), 25, issue 1, pages 91-104.
  14. A martingale-difference-divergence-based estimation of central mean subspace, Yu Zhang, Jicai Liu, Yuesong Wu and Xiangzhong Fang, in: Statistics and Its Interface (2019), 12, number 3, pages 489-501.
  15. Robust Sufficient Dimension Reduction Via Ball Covariance Jia Zhang and Xin Chen, Computational Statistics and Data Analysis 140 (2019) 144–154

Release Notes can be checked out in the repository.

A list of possible topics for further development is provided as well. Additions and comments are welcome!

You might also like...
Code for paper "A Critical Assessment of State-of-the-Art in Entity Alignment" (https://arxiv.org/abs/2010.16314)

A Critical Assessment of State-of-the-Art in Entity Alignment This repository contains the source code for the paper A Critical Assessment of State-of

Quickly comparing your image classification models with the state-of-the-art models (such as DenseNet, ResNet, ...)
Quickly comparing your image classification models with the state-of-the-art models (such as DenseNet, ResNet, ...)

Image Classification Project Killer in PyTorch This repo is designed for those who want to start their experiments two days before the deadline and ki

State of the art Semantic Sentence Embeddings

Contrastive Tension State of the art Semantic Sentence Embeddings Published Paper · Huggingface Models · Report Bug Overview This is the official code

LaneDet is an open source lane detection toolbox based on PyTorch that aims to pull together a wide variety of state-of-the-art lane detection models
LaneDet is an open source lane detection toolbox based on PyTorch that aims to pull together a wide variety of state-of-the-art lane detection models

LaneDet is an open source lane detection toolbox based on PyTorch that aims to pull together a wide variety of state-of-the-art lane detection models. Developers can reproduce these SOTA methods and build their own methods.

Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).
Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

Deep Text Search - AI Based Text Search & Recommendation System Deep Text Search is an AI-powered multilingual text search and recommendation engine w

State-of-the-art data augmentation search algorithms in PyTorch
State-of-the-art data augmentation search algorithms in PyTorch

MuarAugment Description MuarAugment is a package providing the easiest way to a state-of-the-art data augmentation pipeline. How to use You can instal

This is the unofficial code of  Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes. which achieve state-of-the-art trade-off between accuracy and speed on cityscapes and camvid, without using inference acceleration and extra data
A selection of State Of The Art research papers (and code) on human locomotion (pose + trajectory) prediction (forecasting)

A selection of State Of The Art research papers (and code) on human trajectory prediction (forecasting). Papers marked with [W] are workshop papers.

A state of the art of new lightweight YOLO model implemented by TensorFlow 2.
A state of the art of new lightweight YOLO model implemented by TensorFlow 2.

CSL-YOLO: A New Lightweight Object Detection System for Edge Computing This project provides a SOTA level lightweight YOLO called "Cross-Stage Lightwe

Comments
  • `p` should never be smaller than `n_components` in `sprm.fit`

    `p` should never be smaller than `n_components` in `sprm.fit`

    The variable p should never be smaller than n_components in sprm.fit otherwise an error occurs. This is checked for at the top of fit but p can be redefined at line 185.

    Inserting as line 186:

                self.n_components = min(p, self.n_components)
    

    ...appears to fix the issue, but I have not done extensive testing. It may also be advisable to raise a warning if n_components is reduced in this way.

    opened by MattWenham 5
  • gsspp.GenSpatialSignPrePprocessor().transform() is not working

    gsspp.GenSpatialSignPrePprocessor().transform() is not working

    Dear sirs,

    I like to make spatial sign transform for my data when I come across your module and found it won't work. My codes is as the following:

    scaler = gsspp.GenSpatialSignPrePprocessor(center = 'kstepLTS', fun = 'ball').fit(X_train) X_scaled = scaler.transform(X_train)

    It won't work for scaler don't have the transform method due to no object type is defined which makes it no attribute or method bestowed upon. The error message is as the following:

    AttributeError: 'NoneType' object has no attribute 'transform'

    maurice

    opened by shinhongwu 2
  • coef_ attribute expected but missing when using ppdire

    coef_ attribute expected but missing when using ppdire

    Below is a reproducible code for the error. The cells with # NB code are code blocks while the other are jupyter outputs.

    # NB code
    import numpy as np
    from direpack import dicomo, ppdire
    
    X = np.random.rand(5,5)
    
    reducer = ppdire(
        projection_index = dicomo,
        # mode of projection_index class defines dim reduction 'method'
        pi_arguments = {'mode' : 'var'},
        n_components=4,
        optimizer='SLSQP'
    )
    reducer.fit(X)
    reducer.x_loadings_
    
    array([[-0.36157257,  0.59084429,  0.31816485, -0.13799567],
           [-0.59046145, -0.14633256,  0.28087908, -0.57627361],
           [ 0.52330409,  0.27622013, -0.27929959, -0.75601132],
           [ 0.09839508,  0.72132604,  0.11781207,  0.27450752],
           [-0.48692072,  0.18133122, -0.85322337,  0.04425411]])
    
    # NB code
    reducer.transform(X)
    
    
    ---------------------------------------------------------------------------
    AttributeError                            Traceback (most recent call last)
    /tmp/ipykernel_63144/911793123.py in <module>
    ----> 1 reducer.transform(X)
    
    ~/.conda/envs/prod3/lib/python3.9/site-packages/direpack/ppdire/ppdire.py in transform(self, Xn)
        759         Xn = convert_X_input(Xn)
        760         (n,p) = Xn.shape
    --> 761         if p!= self.coef_.shape[0]:
        762             raise(ValueError('New data must have seame number of columns as the ones the model has been trained with'))
        763         Xnc = scale_data(Xn,self.x_loc_,self.x_sca_)
    
    AttributeError: 'ppdire' object has no attribute 'coef_'
    

    I looked into the code and the issue seems to come from this attribute only being created in there is no flag one-block.

    but a data check on the transform and predict functions uses that attribute.

    opened by nikml 1
  • A possible mistake in the estimation basis of SDR

    A possible mistake in the estimation basis of SDR

    Thanks for the package you provide, and I found a confusing problem. in src/direpack/sudire/sudire.py Line 489. When using scale, x_loadings should be set to N2 multiply P, not P, because we do scale. I notice you intended to do so in Line225 in src/direpack/sudire/_sudire_utils.py (take SIR for example), but x passed to this function has already been scaled, so variable "signsqrt" in this function is always identity matrix, which can not function as we want.

    opened by I-zhouqh 1
Releases(1.0.25)
Owner
Sven Serneels
I Presently manage a team on stats, machine learning and AI. On the side, avid method developer for high dimensional stats and machine learning.
Sven Serneels
Serving PyTorch 1.0 Models as a Web Server in C++

Serving PyTorch Models in C++ This repository contains various examples to perform inference using PyTorch C++ API. Run git clone https://github.com/W

Onur Kaplan 223 Jan 04, 2023
Teaching end to end workflow of deep learning

Deep-Education This repository is now available for public use for teaching end to end workflow of deep learning. This implies that learners/researche

Data Lab at College of William and Mary 2 Sep 26, 2022
A benchmark dataset for mesh multi-label-classification based on cube engravings introduced in MeshCNN

Double Cube Engravings This script creates a dataset for multi-label mesh clasification, with an intentionally difficult setup for point cloud classif

Yotam Erel 1 Nov 30, 2021
GLIP: Grounded Language-Image Pre-training

GLIP: Grounded Language-Image Pre-training Updates 12/06/2021: GLIP paper on arxiv https://arxiv.org/abs/2112.03857. Code and Model are under internal

Microsoft 862 Jan 01, 2023
A Free and Open Source Python Library for Multiobjective Optimization

Platypus What is Platypus? Platypus is a framework for evolutionary computing in Python with a focus on multiobjective evolutionary algorithms (MOEAs)

Project Platypus 424 Dec 18, 2022
QMagFace: Simple and Accurate Quality-Aware Face Recognition

Quality-Aware Face Recognition 26.11.2021 start readme QMagFace: Simple and Accurate Quality-Aware Face Recognition Research Paper Implementation - To

Philipp Terhörst 59 Jan 04, 2023
Neural Point-Based Graphics

Neural Point-Based Graphics Project   Video   Paper Neural Point-Based Graphics Kara-Ali Aliev1 Artem Sevastopolsky1,2 Maria Kolos1,2 Dmitry Ulyanov3

Ali Aliev 252 Dec 13, 2022
Towards Flexible Blind JPEG Artifacts Removal (FBCNN, ICCV 2021)

Towards Flexible Blind JPEG Artifacts Removal (FBCNN, ICCV 2021)

Jiaxi Jiang 282 Jan 02, 2023
Open source annotation tool for machine learning practitioners.

doccano doccano is an open source text annotation tool for humans. It provides annotation features for text classification, sequence labeling and sequ

7.1k Jan 01, 2023
DrWhy is the collection of tools for eXplainable AI (XAI). It's based on shared principles and simple grammar for exploration, explanation and visualisation of predictive models.

Responsible Machine Learning With Great Power Comes Great Responsibility. Voltaire (well, maybe) How to develop machine learning models in a responsib

Model Oriented 590 Dec 26, 2022
3DIAS: 3D Shape Reconstruction with Implicit Algebraic Surfaces (ICCV 2021)

3DIAS_Pytorch This repository contains the official code to reproduce the results from the paper: 3DIAS: 3D Shape Reconstruction with Implicit Algebra

Mohsen Yavartanoo 21 Dec 12, 2022
[ACM MM 2021] Diverse Image Inpainting with Bidirectional and Autoregressive Transformers

Diverse Image Inpainting with Bidirectional and Autoregressive Transformers Installation pip install -r requirements.txt Dataset Preparation Given the

Yingchen Yu 25 Nov 09, 2022
Network Enhancement implementation in pytorch

network_enahncement_pytorch Network Enhancement implementation in pytorch Research paper Network Enhancement: a general method to denoise weighted bio

Yen 1 Nov 12, 2021
Code for "Steerable Pyramid Transform Enables Robust Left Ventricle Quantification"

Code for "Steerable Pyramid Transform Enables Robust Left Ventricle Quantification" This is an end-to-end framework for accurate and robust left ventr

2 Jul 09, 2022
High performance Cross-platform Inference-engine, you could run Anakin on x86-cpu,arm, nv-gpu, amd-gpu,bitmain and cambricon devices.

Anakin2.0 Welcome to the Anakin GitHub. Anakin is a cross-platform, high-performance inference engine, which is originally developed by Baidu engineer

514 Dec 28, 2022
LineBoard - Python+React+MySQL-白板即時系統改善人群行為

LineBoard-白板即時系統改善人群行為 即時顯示實驗室的使用狀況,並遠端預約排隊,以此來改善人們的工作效率 程式架構 運作流程 使用者先至該實驗室網站預約

Bo-Jyun Huang 1 Feb 22, 2022
Semi-Supervised Semantic Segmentation with Pixel-Level Contrastive Learning from a Class-wise Memory Bank

This repository provides the official code for replicating experiments from the paper: Semi-Supervised Semantic Segmentation with Pixel-Level Contrast

Iñigo Alonso Ruiz 58 Dec 15, 2022
Code for ICLR2018 paper: Improving GAN Training via Binarized Representation Entropy (BRE) Regularization - Y. Cao · W Ding · Y.C. Lui · R. Huang

code for "Improving GAN Training via Binarized Representation Entropy (BRE) Regularization" (ICLR2018 paper) paper: https://arxiv.org/abs/1805.03644 G

21 Oct 12, 2020
Adversarial vulnerability of powerful near out-of-distribution detection

Adversarial vulnerability of powerful near out-of-distribution detection by Stanislav Fort In this repository we're collecting replications for the ke

Stanislav Fort 9 Aug 30, 2022
使用深度学习框架提取视频硬字幕;docker容器免安装深度学习库,使用本地api接口使得界面和后端识别分离;

extract-video-subtittle 使用深度学习框架提取视频硬字幕; 本地识别无需联网; CPU识别速度可观; 容器提供API接口; 运行环境 本项目运行环境非常好搭建,我做好了docker容器免安装各种深度学习包; 提供windows界面操作; 容器为CPU版本; 视频演示 https

歌者 16 Aug 06, 2022