Improving Generalization Bounds for VC Classes Using the Hypergeometric Tail Inversion

Overview

Improving Generalization Bounds for VC Classes Using the Hypergeometric Tail Inversion

Preface

This directory provides an implementation of the algorithms used to compute the hypergeometric tail pseudo-inverse, as well as the code used to produce all figures of the paper "Improving Generalization Bounds for VC Classes Using the Hypergeometric Tail Inversion" by Leboeuf, LeBlanc and Marchand.

Installation

To run the scripts, one must first install the package and its requirements. To do so, run the following command from the root directory:

pip install .

Doing so will also provide you with the package hypergeo, which implements an algorithm to compute the hypergeometric tail pseudo-inverses.

Requirements

The code was written to run on Python 3.8 or more recent version. The requirements are shown in the file requirements.txt and can be installed using the command:

pip install -r requirements.txt

The code

The code is split into 2 parts: the 'hypergeo' package and the 'scripts' directory.

The hypergeo package implements the utilities regarding the hypergeometric distribution (to compute the tail and its inverse), the binomial distribution (reimplementing the inverse as the scipy version suffered from numerical unstabilities) and some generalization bounds.

The scripts files produce the figures found in the paper using the hypergeo package. All figures are generated directly in LaTeX using the package python2latex. To run a script, navigate from the command line to the directory root directory of the project and run the command

/ .py" ">
python "./scripts/
     
      /
      
       .py"

      
     

The code does not provide command line control on the parameters of each script. However, each script is fairly simple, and parameters can be directly changed in the __main__ part of the script.

Scripts used in the body of the paper

  • Section 3.3: The ghost sample trade-off. In this section, we claim that optimizing m' gives relative gain between 8% and 10%. To obtain these number, you need to run the file mprime_tradeoff/generate_mprime_data.py to first generate the data, and then run mprime_tradeoff/stats.py.

  • Section 5: Numerical comparison. Figure 1a and 1b are obtain by executing the scripts bounds_comparison/bounds_comparison_risk.py and bounds_comparison/bounds_comparison_d.py respectively. Figure 2a and 2b are obtain by executing the scripts bounds_comparison/bounds_comparison_m.py, the first setting the variable risk to 0, the second by setting it equal to 0.1.

Scripts used in the appendices of the paper

  • Appendix B: Overview of the hypergeometric distribution. Figure 3 is generated from hypergeometric_tail/hyp_tail_plot.py. Figure 4 is generated from hypergeometric_tail/hyp_tail_inv_plot.py. Algorithm 1 is implemented in the hypergeo file hypergeo/hypergeometric_distribution.py as the function hypergeometric_tail_inverse. Algorithm 2 is implemented in the hypergeo file hypergeo/hypergeometric_distribution.py as the function berkopec_hypergeometric_tail_inverse.

  • Appendix D: In-depth analysis of the ghost sample trade-off. Figure 5 is generated from mprime_tradeoff/plot_epsilon_comp.py. Figure 6 is generated from mprime_tradeoff/plot_mprime_best.py.

  • Appendix E: The hypergeometric tail inversion relative deviation bound. To generate Figure 7 and 8, you must first run the file relative_deviation_mprime_tradeoff/mprime_tradeoff_relative_deviation.py to generate the data, then run the script relative_deviation_mprime_tradeoff/plot_epsilon_comp.py to produce Figure 7 and relative_deviation_comparison/plot_mprime_best.py to produce Figure 8.

  • Appendix G: The hypergeometric tail lower bound . Figure 9 is generated from lower_bound/lower_bound_comparison_risk.py.

  • Appendix F: Further numerical comparisons. Figure 10 and 12a are generated from bounds_comparison/bounds_comparison_risk.py by changing the parameters of the scripts. Figure 11 and 12b is generated from bounds_comparison/bounds_comparison_m.py by changing the parameters of the scripts. Figure 13a and 13b are generated from bounds_comparison/sample_compression_comparison_risk.py and bounds_comparison/sample_compression_comparison_m.py respectively.

Other

The script pseudo-inverse_benchmarking/pseudo-inverse_benchmarking.py benchmarks the various algorithms used to invert the hypergeometric tail. The 'tests' directory contains unit tests using the package pytest.

Owner
Jean-Samuel Leboeuf
PhD candidate in Computer Sciences (Machine Learning). MSc in Theoretical Physics.
Jean-Samuel Leboeuf
Use CLIP to represent video for Retrieval Task

A Straightforward Framework For Video Retrieval Using CLIP This repository contains the basic code for feature extraction and replication of results.

Jesus Andres Portillo Quintero 54 Dec 22, 2022
code from "Tensor decomposition of higher-order correlations by nonlinear Hebbian plasticity"

Code associated with the paper "Tensor decomposition of higher-order correlations by nonlinear Hebbian learning," Ocker & Buice, Neurips 2021. "plot_f

Gabriel Koch Ocker 4 Oct 16, 2022
Model-free Vehicle Tracking and State Estimation in Point Cloud Sequences

Model-free Vehicle Tracking and State Estimation in Point Cloud Sequences 1. Introduction This project is for paper Model-free Vehicle Tracking and St

TuSimple 92 Jan 03, 2023
Using python and scikit-learn to make stock predictions

MachineLearningStocks in python: a starter project and guide EDIT as of Feb 2021: MachineLearningStocks is no longer actively maintained MachineLearni

Robert Martin 1.3k Dec 29, 2022
xitorch: differentiable scientific computing library

xitorch is a PyTorch-based library of differentiable functions and functionals that can be widely used in scientific computing applications as well as deep learning.

24 Apr 15, 2021
PClean: A Domain-Specific Probabilistic Programming Language for Bayesian Data Cleaning

PClean: A Domain-Specific Probabilistic Programming Language for Bayesian Data Cleaning Warning: This is a rapidly evolving research prototype.

MIT Probabilistic Computing Project 190 Dec 27, 2022
Seeing Dynamic Scene in the Dark: High-Quality Video Dataset with Mechatronic Alignment (ICCV2021)

Seeing Dynamic Scene in the Dark: High-Quality Video Dataset with Mechatronic Alignment This is a pytorch project for the paper Seeing Dynamic Scene i

DV Lab 21 Nov 28, 2022
Pytorch code for our paper "Feedback Network for Image Super-Resolution" (CVPR2019)

Feedback Network for Image Super-Resolution [arXiv] [CVF] [Poster] Update: Our proposed Gated Multiple Feedback Network (GMFN) will appear in BMVC2019

Zhen Li 539 Jan 06, 2023
Implementation of "Learning Multi-Granular Hypergraphs for Video-Based Person Re-Identification"

hypergraph_reid Implementation of "Learning Multi-Granular Hypergraphs for Video-Based Person Re-Identification" If you find this help your research,

62 Dec 21, 2022
Cave Generation using metaballs in Blender. Originally created by sdfgeoff, Edited by Myself (Archie Jaskowicz).

Blender-Cave-Generation Cave Generation using metaballs in Blender. Originally created by sdfgeoff, Edited by Myself (Archie Jaskowicz). Installation

2 Dec 28, 2022
This is the official implementation of the paper "Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation".

ObjProp Introduction This is the official implementation of the paper "Object Propagation via Inter-Frame Attentions for Temporally Stable Video Insta

Anirudh S Chakravarthy 6 May 03, 2022
Performance Analysis of Multi-user NOMA Wireless-Powered mMTC Networks: A Stochastic Geometry Approach

Performance Analysis of Multi-user NOMA Wireless-Powered mMTC Networks: A Stochastic Geometry Approach Thanh Luan Nguyen, Tri Nhu Do, Georges Kaddoum

Thanh Luan Nguyen 2 Oct 10, 2022
Real-time Joint Semantic Reasoning for Autonomous Driving

MultiNet MultiNet is able to jointly perform road segmentation, car detection and street classification. The model achieves real-time speed and state-

Marvin Teichmann 518 Dec 12, 2022
An open source app to help calm you down when needed.

By: Seanpm2001, Et; Al. Top README.md Read this article in a different language Sorted by: A-Z Sorting options unavailable ( af Afrikaans Afrikaans |

Sean P. Myrick V19.1.7.2 2 Oct 24, 2022
This is the repo for the paper `SumGNN: Multi-typed Drug Interaction Prediction via Efficient Knowledge Graph Summarization'. (published in Bioinformatics'21)

SumGNN: Multi-typed Drug Interaction Prediction via Efficient Knowledge Graph Summarization This is the code for our paper ``SumGNN: Multi-typed Drug

Yue Yu 58 Dec 21, 2022
Training BERT with Compute/Time (Academic) Budget

Training BERT with Compute/Time (Academic) Budget This repository contains scripts for pre-training and finetuning BERT-like models with limited time

Intel Labs 263 Jan 07, 2023
Aquarius - Enabling Fast, Scalable, Data-Driven Virtual Network Functions

Aquarius Aquarius - Enabling Fast, Scalable, Data-Driven Virtual Network Functions NOTE: We are currently going through the open-source process requir

Zhiyuan YAO 0 Jun 02, 2022
The repository offers the official implementation of our paper in PyTorch.

Cloth Interactive Transformer (CIT) Cloth Interactive Transformer for Virtual Try-On Bin Ren1, Hao Tang1, Fanyang Meng2, Runwei Ding3, Ling Shao4, Phi

Bingoren 49 Dec 01, 2022
LSTM Neural Networks for Spectroscopic Studies of Type Ia Supernovae

Package Description The difficulties in acquiring spectroscopic data have been a major challenge for supernova surveys. snlstm is developed to provide

7 Oct 11, 2022
Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds (CVPR 2022, Oral)

Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds (CVPR 2022, Oral) This is the official implementat

Yifan Zhang 259 Dec 25, 2022