Learned model to estimate number of distinct values (NDV) of a population using a small sample.

Last update: Nov 21, 2022

Overview

Learned NDV estimator

Learned model to estimate number of distinct values (NDV) of a population using a small sample. The model approximates the maximum likelihood estimation of NDV, which is difficult to obtain analytically. See our VLDB 2022 paper Learning to be a Statistician: Learned Estimator for Number of Distinct Values for more details.

How to use

Install the package

pip install estndv
Import and create an instance

   from estndv import ndvEstimator
   estimator = ndvEstimator()

Assume your sample is S=[1,1,1,3,5,5,12] and the population size is N=100000. You can estimate population ndv by:

ndv = estimator.sample_predict(S=[1,1,1,3,5,5,12], N=100000)
If you have the sample profile e.g. f=[2,1,1], you can estimate population NDV by:

ndv = estimator.profile_predict(f=[2,1,1], N=100000)
If you have multiple samples/profiles from multiple populations, you can estimate population NDV for all of them in a batch by method estimator.sample_predict_batch() or estimator.profile_predict_batch().

How to train the ndv estimator

You can directly use our package on PyPI for your datasets, as the pre-trained model is agnostic to any workloads. However, if you want to train the model from scratch anyway, do the following:

Go to the model_training folder cd model_training
Install requirements

pip install requirements.txt
Generate training data. (This uses a lot of memory.)

python training_data_generation.py
Train model

python model_training.py
Save trained pytorch model parameters to numpy, this generates a file model_paras.npy

python torch2npy.py
Test with your model parameters by specifying a path to your model_paras.npy

estimator = ndvEstimator(para_path=your path to model_paras.npy)

Citation

If you use our work or found it useful, please cite our paper:

@article{wu2022learning,
   author = {Wu, Renzhi and Ding, Bolin and Chu, Xu and Wei, Zhewei and Dai, Xiening and Guan, Tao and Zhou, Jingren},
   title = {Learning to Be a Statistician: Learned Estimator for Number of Distinct Values},
   year = {2021},
   issue_date = {October 2021},
   publisher = {VLDB Endowment},
   volume = {15},
   number = {2},
   issn = {2150-8097},
   url = {https://doi.org/10.14778/3489496.3489508},
   doi = {10.14778/3489496.3489508},
   journal = {Proc. VLDB Endow.},
   month = {oct},
   pages = {272–284},
   numpages = {13}
}

Learned model to estimate number of distinct values (NDV) of a population using a small sample.

Related tags

Overview

Learned NDV estimator

How to use

How to train the ndv estimator

Citation

Owner

Implementation of the state-of-the-art vision transformers with tensorflow

Code release for paper: The Boombox: Visual Reconstruction from Acoustic Vibrations

MIMO-UNet - Official Pytorch Implementation

A tensorflow implementation of GCN-LPA

Code for the published paper : Learning to recognize rare traffic sign

Image Completion with Deep Learning in TensorFlow

Code for the ECCV2020 paper "A Differentiable Recurrent Surface for Asynchronous Event-Based Data"

Code for EMNLP 2021 paper Contrastive Out-of-Distribution Detection for Pretrained Transformers.

Detect roadway lanes using Python OpenCV for project during the 5th semester at DHBW Stuttgart for lecture in digital image processing.

Implementation for Paper "Inverting Generative Adversarial Renderer for Face Reconstruction"

Tools to create pixel-wise object masks, bounding box labels (2D and 3D) and 3D object model (PLY triangle mesh) for object sequences filmed with an RGB-D camera.

Code and Datasets from the paper "Self-supervised contrastive learning for volcanic unrest detection from InSAR data"

Permeability Prediction Via Multi Scale 3D CNN

Convert human motion from video to .bvh

Gesture-controlled Video Game. Just swing your finger and play the game without touching your PC

Official Code For TDEER: An Efficient Translating Decoding Schema for Joint Extraction of Entities and Relations (EMNLP2021)

Official PyTorch implementation of CAPTRA: CAtegory-level Pose Tracking for Rigid and Articulated Objects from Point Clouds

Real-time Object Detection for Streaming Perception, CVPR 2022

Hand Gesture Volume Control | Open CV | Computer Vision

Official implementation for paper: A Latent Transformer for Disentangled Face Editing in Images and Videos.