Simple and fast histogramming in Python accelerated with OpenMP.

Overview

pygram11

Documentation Status Actions Status PyPI version Conda Forge Python Version

Simple and fast histogramming in Python accelerated with OpenMP with help from pybind11.

pygram11 provides functions for very fast histogram calculations (and the variance in each bin) in one and two dimensions. The API is very simple; documentation can be found here (you'll also find some benchmarks there).

Installing

From PyPI

Binary wheels are provided for Linux and macOS. They can be installed from PyPI via pip:

pip install pygram11

From conda-forge

For installation via the conda package manager pygram11 is part of conda-forge.

conda install pygram11 -c conda-forge

Please note that on macOS the OpenMP libraries from LLVM (libomp) and Intel (libiomp) may clash if your conda environment includes the Intel Math Kernel Library (MKL) package distributed by Anaconda. You may need to install the nomkl package to prevent the clash (Intel MKL accelerates many linear algebra operations, but does not impact pygram11):

conda install nomkl ## sometimes necessary fix (macOS only)

From Source

You need is a C++14 compiler and OpenMP. If you are using a relatively modern GCC release on Linux then you probably don't have to worry about the OpenMP dependency. If you are on macOS, you can install libomp from Homebrew (pygram11 does compile on Apple Silicon devices with Python version >= 3.9 and libomp installed from Homebrew). With those dependencies met, simply run:

git clone https://github.com/douglasdavis/pygram11.git --recurse-submodules
cd pygram11
pip install .

Or let pip handle the cloning procedure:

pip install git+https://github.com/douglasdavis/[email protected]

Tests are run on Python versions >= 3.7 and binary wheels are provided for those versions.

In Action

A histogram (with fixed bin width) of weighted data in one dimension:

>>> rng = np.random.default_rng(123)
>>> x = rng.standard_normal(10000)
>>> w = rng.uniform(0.8, 1.2, x.shape[0])
>>> h, err = pygram11.histogram(x, bins=40, range=(-4, 4), weights=w)

A histogram with fixed bin width which saves the under and overflow in the first and last bins:

>>> x = rng.standard_normal(1000000)
>>> h, __ = pygram11.histogram(x, bins=20, range=(-3, 3), flow=True)

where we've used __ to catch the None returned when weights are absent. A histogram in two dimensions with variable width bins:

>>> x = rng.standard_normal(1000)
>>> y = rng.standard_normal(1000)
>>> xbins = [-2.0, -1.0, -0.5, 1.5, 2.0, 3.1]
>>> ybins = [-3.0, -1.5, -0.1, 0.8, 2.0, 2.8]
>>> h, err = pygram11.histogram2d(x, y, bins=[xbins, ybins])

Manually controlling OpenMP acceleration with context managers:

>>> with pygram11.omp_disabled():  # disable all thresholds.
...     result, _ = pygram11.histogram(x, bins=10, range=(-3, 3))
...
>>> with pygram11.omp_forced(key="thresholds.var1d"):  # force a single threshold.
...     result, _ = pygram11.histogram(x, bins=[-3, -2, 0, 2, 3])
...

Histogramming multiple weight variations for the same data, then putting the result in a DataFrame (the input pandas DataFrame will be interpreted as a NumPy array):

>> data = rng.standard_normal(N) >>> count, err = pygram11.histogram(data, bins=20, range=(-3, 3), weights=weights, flow=True) >>> count_df = pd.DataFrame(count, columns=weights.columns) >>> err_df = pd.DataFrame(err, columns=weights.columns)">
>>> N = 10000
>>> weights = pd.DataFrame({"weight_a": np.abs(rng.standard_normal(N)),
...                         "weight_b": rng.uniform(0.5, 0.8, N),
...                         "weight_c": rng.uniform(0.0, 1.0, N)})
>>> data = rng.standard_normal(N)
>>> count, err = pygram11.histogram(data, bins=20, range=(-3, 3), weights=weights, flow=True)
>>> count_df = pd.DataFrame(count, columns=weights.columns)
>>> err_df = pd.DataFrame(err, columns=weights.columns)

I also wrote a blog post with some simple examples.

Other Libraries

  • boost-histogram provides Pythonic object oriented histograms.
  • Simple and fast histogramming in Python using the NumPy C API: fast-histogram (no variance or overflow support).
  • To calculate histograms in Python on a GPU, see cupy.histogram.

If there is something you'd like to see in pygram11, please open an issue or pull request.

Comments
  • Where is pygram11 in performance compared to fast-histogram?

    Where is pygram11 in performance compared to fast-histogram?

    Basically the title.

    I want to go for raw speed only and don't care for much more than being able to specify bin-edges and getting back a numpy-array from the function that calculates the histogram.

    To give a little more context: It could be, that i will have to calculate a histogram of 1'000'000 bins, where each bin is 1000 integers wide. I would have to fill 1000 values in those bins. Can you tell me if this is possible with pygram11?

    opened by tim-hilt 7
  • Config adjustments

    Config adjustments

    • Rewrite configuration; now controllable by a module, context managers, and decorator
    • modularize things a bit (separate numpy API from pygram11 core API)
    opened by douglasdavis 0
  • WIP: Large overhaul of backend, reorganize internal (non-public) API

    WIP: Large overhaul of backend, reorganize internal (non-public) API

    • [x] OpenMP required to build (not a difficult dependency to satisfy) - toggled via if statements on the input data size.
      • [x] omp function argument deprecated
    • [x] Backend C++ code for 1D histograms rewritten (use Python and NumPy C APIs directory, no pybind11)
    • [x] Rewrite multiweight histograms (still via pybind11) to support more types (avoid conversions)
    • [x] Python code moved to the top level module (__init__.py) (without changing public API)
    • [x] Documentation updates
    opened by douglasdavis 0
  • Improve OpenMP inspection; docs improvements; pybind11 update

    Improve OpenMP inspection; docs improvements; pybind11 update

    • Improve API for accessing information about OpenMP availability and properties
    • Docs improvements
    • Update to latest pybind11 release (2.4.3)
    • Clean up some old scripts
    opened by douglasdavis 0
  • Development for v0.5

    Development for v0.5

    • [x] implement C++ back end for multi weight histogramming
    • [x] redesign one-dimensional histogramming backend, simplify code by using more of the pybind11 array API (front end API uinchanged)
    opened by douglasdavis 0
  • Allow over/under flow for 1D histograms

    Allow over/under flow for 1D histograms

    • extend the number of bins in memory by 2
    • fill bin 0 for data less than xmin
    • fill bin nbins + 1 for data larger than than xmax
    • first real bin is now index 1
    • last real bin is now index nbins
    • add named arguments to python functions which dictate how to slice result so we return the desired bin counts.
    opened by douglasdavis 0
Releases(0.13.2)
  • 0.13.2(Dec 5, 2022)

  • 0.13.1(Aug 30, 2022)

  • 0.13.0(Aug 24, 2022)

  • 0.12.2(May 25, 2022)

  • 0.12.1(Oct 19, 2021)

  • 0.12.0(Apr 16, 2021)

    • OpenMP configuration redesigned.
      • constants have been removed in favor of new pygram11.config module.
      • Decorators and context managers added to control toggling OpenMP thresholds
        • pygram11.without_omp decorator
        • pygram11.with_omp decorator
        • pygram11.disable_omp context manager
        • pygram11.force_omp context manager
      • See the documentation for more
    • New keyword argument added to histogramming functions (cons_var) for returning variance instead of standard error.
    Source code(tar.gz)
    Source code(zip)
  • 0.11.2(Feb 27, 2021)

    Renamed internal Python files hist.py and misc.py to _hist.py and _misc.py, respectively.

    The contents of these modules are brought in to the main pygram11 module namespace by imports in __init__.py (the submodules themselves are not meant to be part of the public API). This avoids tab completions of pygram11.hi<tab> yielding pygram11.hist when we actually want pygram11.histogram.

    Source code(tar.gz)
    Source code(zip)
  • 0.11.1(Feb 25, 2021)

    Two convenience functions added to the pygram11 namespace:

    • bin_centers: returns an array representing the the center of histogram bins given a total number of bins and an axis range or given existing bin edges.
    • bin_edges: returns an array representing bin edges given a total number of bins and an axis range.
    Source code(tar.gz)
    Source code(zip)
  • 0.11.0(Feb 11, 2021)

    • API change: functions calls with weights=None now return None as the second return. Previously the uncertainty was returned (which is just the square-root of the bin heights); now users can take the square-root themselves, and the back-end does not waste cycles tracking the uncertainty since it's trivial for unweighted histograms.
    • More types are supported without conversions (previously np.float64 and np.float32 were the only supported array types, and we converted non-floating point input). Now signed and unsigned integer inputs (both 64 and 32 bit) are supported.
      • If unsupported array types are used TypeError is now raised. This library prioritizes performance; hidden performance hits are explicitly avoided.
    • Configurable thresholds have been introduced to configure when OpenMP acceleration is used (described in the documentation).
    • The back-end was refactored with some help from boost::mp11 to aid in adding more type support without introducing more boilerplate. We now vendor boost::mp11 as a submodule.
    • Bumped the vendored pybind11 submodule to v2.6.2.
    • C++14 now required to build from source.
    • Added Apple Silicon support for Python 3.9 with libomp from Homebrew installed at /opt/homebrew.
    • Documentation improvements
    • Renamed master branch to main.
    Source code(tar.gz)
    Source code(zip)
  • 0.11.0rc1(Feb 10, 2021)

  • 0.10.3(Oct 9, 2020)

  • 0.10.2(Oct 9, 2020)

    • Vendor pybind11 2.6 series.
    • Some changes to backend types (std::size_t -> pybind11::ssize_t for sizes) (https://github.com/pybind/pybind11/pull/2293)
    • Improvements to CI/CD: using cibuildwheel to build wheels.
    • Python 3.9 is now tested and compatible wheels are built.
    Source code(tar.gz)
    Source code(zip)
  • 0.10.2rc1(Oct 9, 2020)

  • 0.10.1(Sep 5, 2020)

  • 0.10.0(Jun 30, 2020)

    Renamed internal Python module from histogram to hist. This avoids a clash with the module function of the same name. Some IDE features were confused.

    Source code(tar.gz)
    Source code(zip)
  • 0.9.1(May 26, 2020)

  • 0.9.0(May 26, 2020)

  • 0.8.2(May 18, 2020)

  • 0.8.1(Apr 26, 2020)

  • 0.8.0(Feb 25, 2020)

    Public API changes:

    • Remove previously deprecated omp function argument

    Backend changed:

    • moved 1D backend back to pybind11 (template flexibility)
    • parallelization cutoffs consistently at array size of 5000 for all calculations
    • all non-floating point inputs are converted

    Other:

    • Improved tests
    Source code(tar.gz)
    Source code(zip)
  • 0.7.3(Feb 11, 2020)

  • 0.7.2(Feb 6, 2020)

  • 0.7.1(Feb 6, 2020)

  • 0.7.0(Feb 5, 2020)

    • OpenMP is now required to build from source (not a difficult dependency to satisfy)
      • omp function argument deprecated
      • omp_max_threads renamed to omp_get_max_threads to mirror OpenMP C API
      • omp_available function removed
    • Backend C++ code for 1D histograms rewritten (use Python and NumPy C APIs directly, no pybind11), more types supported (avoid conversions)
    • Rewrite multiweight histograms (still via pybind11) to support more types (avoid conversions)
    • Python code moved to the top level module (__init__.py) (without changing public API)
    Source code(tar.gz)
    Source code(zip)
  • 0.6.1(Oct 20, 2019)

    • OpenMP inspection improved; new functions replace pygram11.OPENMP:
      • pygram11.omp_available() -> bool checks for availability
      • pygram11.omp_max_threads() -> int checks for the maximum available threads
    • some documentation improvements
    • bump pybind11 version to 2.4.3
    • pygram11.OPENMP will be removed in a future release
    Source code(tar.gz)
    Source code(zip)
  • 0.6.0(Oct 17, 2019)

  • 0.5.2(Aug 20, 2019)

  • 0.5.1(Aug 10, 2019)

    • Fix setup.py such that setuptools doesn't try to install numpy via easy_install (remove unnecessary setup_requires argument).
    • Add _max_threads attribute to pygram11._core.
    • Fixed macOS wheels (use delocate to ensure OpenMP symbols are bundled).
    Source code(tar.gz)
    Source code(zip)
  • 0.5.0(Jul 29, 2019)

    New features

    • histogramming multiple weight variations in a single function call is now possible via fix1dmw and var1dmw
    • In the NumPy like API, passing a 2 dimensional array to the weights argument will use this feature as well.
    • Binary wheels for Linux and macOS are provided on PyPI (conda-forge binaries are of course still available as well)

    Other updates

    • All functions now return the sqrt(sum-of-weights-squared) instead of sum-of-weights; before v0.5.x the 2D functions returned the sum of weights squared.
    Source code(tar.gz)
    Source code(zip)
  • 0.5.0.a2(Jul 29, 2019)

Owner
Doug Davis
OSS Engineer @anaconda
Doug Davis
NorthPitch is a python soccer plotting library that sits on top of Matplotlib

NorthPitch is a python soccer plotting library that sits on top of Matplotlib.

Devin Pleuler 30 Feb 22, 2022
Draw tree diagrams from indented text input

Draw tree diagrams This repository contains two very different scripts to produce hierarchical tree diagrams like this one: $ ./classtree.py collectio

Luciano Ramalho 8 Dec 14, 2022
Area-weighted venn-diagrams for Python/matplotlib

Venn diagram plotting routines for Python/Matplotlib Routines for plotting area-weighted two- and three-circle venn diagrams. Installation The simples

Konstantin Tretyakov 400 Dec 31, 2022
Visualize the training curve from the *.csv file (tensorboard format).

Training-Curve-Vis Visualize the training curve from the *.csv file (tensorboard format). Feature Custom labels Curve smoothing Support for multiple c

Luckky 7 Feb 23, 2022
a robust room presence solution for home automation with nearly no false negatives

Argos Room Presence This project builds a room presence solution on top of Argos. Using just a cheap raspberry pi zero w (plus an attached pi camera,

Angad Singh 46 Sep 18, 2022
Flow-based visual scripting for Python

A simple visual node editor for Python Ryven combines flow-based visual scripting with Python. It gives you absolute freedom for your nodes and a simp

Leon Thomm 3.1k Jan 06, 2023
Lightspin AWS IAM Vulnerability Scanner

Red-Shadow Lightspin AWS IAM Vulnerability Scanner Description Scan your AWS IAM Configuration for shadow admins in AWS IAM based on misconfigured den

Lightspin 90 Dec 14, 2022
PolytopeSampler is a Matlab implementation of constrained Riemannian Hamiltonian Monte Carlo for sampling from high dimensional disributions on polytopes

PolytopeSampler PolytopeSampler is a Matlab implementation of constrained Riemannian Hamiltonian Monte Carlo for sampling from high dimensional disrib

9 Sep 26, 2022
A python visualization of the A* path finding algorithm

A python visualization of the A* path finding algorithm. It allows you to pick your start, end location and make obstacles and then view the process of finding the shortest path. You can also choose

Kimeon 4 Aug 02, 2022
DataVisualization - The evolution of my arduino and python journey. New level of competence achieved

DataVisualization - The evolution of my arduino and python journey. New level of competence achieved

1 Jan 03, 2022
Color maps for POV-Ray v3.7 from the Plasma, Inferno, Magma and Viridis color maps in Python's Matplotlib

POV-Ray-color-maps Color maps for POV-Ray v3.7 from the Plasma, Inferno, Magma and Viridis color maps in Python's Matplotlib. The include file Color_M

Tor Olav Kristensen 1 Apr 05, 2022
An XLSX spreadsheet renderer for Django REST Framework.

drf-renderer-xlsx provides an XLSX renderer for Django REST Framework. It uses OpenPyXL to create the spreadsheet and returns the data.

The Wharton School 166 Dec 01, 2022
Fast 1D and 2D histogram functions in Python

About Sometimes you just want to compute simple 1D or 2D histograms with regular bins. Fast. No nonsense. Numpy's histogram functions are versatile, a

Thomas Robitaille 237 Dec 18, 2022
Visualize data of Vietnam's regions with interactive maps.

Plotting Vietnam Development Map This is my personal project that I use plotly to analyse and visualize data of Vietnam's regions with interactive map

1 Jun 26, 2022
Analysis and plotting for motor/prop/ESC characterization, thrust vs RPM and torque vs thrust

esc_test This is a Python package used to plot and analyze data collected for the purpose of characterizing a particular propeller, motor, and ESC con

Alex Spitzer 1 Dec 28, 2021
Visualization of numerical optimization algorithms

Visualization of numerical optimization algorithms

Zhengxia Zou 46 Dec 01, 2022
Automatically visualize your pandas dataframe via a single print! 📊 💡

A Python API for Intelligent Visual Discovery Lux is a Python library that facilitate fast and easy data exploration by automating the visualization a

Lux 4.3k Dec 28, 2022
HW 2: Visualizing interesting datasets

HW 2: Visualizing interesting datasets Check out the project instructions here! Mean Earnings per Hour for Males and Females My first graph uses data

7 Oct 27, 2021
script to generate HeN ipfs app exports of GLSL shaders

HeNerator A simple script to generate HeN ipfs app exports from any frag shader created with: GlslViewer GlslEditor The Book of Shaders glslCanvas VS

Patricio Gonzalez Vivo 22 Dec 21, 2022
Implementation of SOMs (Self-Organizing Maps) with neighborhood-based map topologies.

py-self-organizing-maps Simple implementation of self-organizing maps (SOMs) A SOM is an unsupervised method for learning a mapping from a discrete ne

Jonas Grebe 6 Nov 22, 2022