Simple and fast histogramming in Python accelerated with OpenMP.

Overview

pygram11

Documentation Status Actions Status PyPI version Conda Forge Python Version

Simple and fast histogramming in Python accelerated with OpenMP with help from pybind11.

pygram11 provides functions for very fast histogram calculations (and the variance in each bin) in one and two dimensions. The API is very simple; documentation can be found here (you'll also find some benchmarks there).

Installing

From PyPI

Binary wheels are provided for Linux and macOS. They can be installed from PyPI via pip:

pip install pygram11

From conda-forge

For installation via the conda package manager pygram11 is part of conda-forge.

conda install pygram11 -c conda-forge

Please note that on macOS the OpenMP libraries from LLVM (libomp) and Intel (libiomp) may clash if your conda environment includes the Intel Math Kernel Library (MKL) package distributed by Anaconda. You may need to install the nomkl package to prevent the clash (Intel MKL accelerates many linear algebra operations, but does not impact pygram11):

conda install nomkl ## sometimes necessary fix (macOS only)

From Source

You need is a C++14 compiler and OpenMP. If you are using a relatively modern GCC release on Linux then you probably don't have to worry about the OpenMP dependency. If you are on macOS, you can install libomp from Homebrew (pygram11 does compile on Apple Silicon devices with Python version >= 3.9 and libomp installed from Homebrew). With those dependencies met, simply run:

git clone https://github.com/douglasdavis/pygram11.git --recurse-submodules
cd pygram11
pip install .

Or let pip handle the cloning procedure:

pip install git+https://github.com/douglasdavis/[email protected]

Tests are run on Python versions >= 3.7 and binary wheels are provided for those versions.

In Action

A histogram (with fixed bin width) of weighted data in one dimension:

>>> rng = np.random.default_rng(123)
>>> x = rng.standard_normal(10000)
>>> w = rng.uniform(0.8, 1.2, x.shape[0])
>>> h, err = pygram11.histogram(x, bins=40, range=(-4, 4), weights=w)

A histogram with fixed bin width which saves the under and overflow in the first and last bins:

>>> x = rng.standard_normal(1000000)
>>> h, __ = pygram11.histogram(x, bins=20, range=(-3, 3), flow=True)

where we've used __ to catch the None returned when weights are absent. A histogram in two dimensions with variable width bins:

>>> x = rng.standard_normal(1000)
>>> y = rng.standard_normal(1000)
>>> xbins = [-2.0, -1.0, -0.5, 1.5, 2.0, 3.1]
>>> ybins = [-3.0, -1.5, -0.1, 0.8, 2.0, 2.8]
>>> h, err = pygram11.histogram2d(x, y, bins=[xbins, ybins])

Manually controlling OpenMP acceleration with context managers:

>>> with pygram11.omp_disabled():  # disable all thresholds.
...     result, _ = pygram11.histogram(x, bins=10, range=(-3, 3))
...
>>> with pygram11.omp_forced(key="thresholds.var1d"):  # force a single threshold.
...     result, _ = pygram11.histogram(x, bins=[-3, -2, 0, 2, 3])
...

Histogramming multiple weight variations for the same data, then putting the result in a DataFrame (the input pandas DataFrame will be interpreted as a NumPy array):

>> data = rng.standard_normal(N) >>> count, err = pygram11.histogram(data, bins=20, range=(-3, 3), weights=weights, flow=True) >>> count_df = pd.DataFrame(count, columns=weights.columns) >>> err_df = pd.DataFrame(err, columns=weights.columns)">
>>> N = 10000
>>> weights = pd.DataFrame({"weight_a": np.abs(rng.standard_normal(N)),
...                         "weight_b": rng.uniform(0.5, 0.8, N),
...                         "weight_c": rng.uniform(0.0, 1.0, N)})
>>> data = rng.standard_normal(N)
>>> count, err = pygram11.histogram(data, bins=20, range=(-3, 3), weights=weights, flow=True)
>>> count_df = pd.DataFrame(count, columns=weights.columns)
>>> err_df = pd.DataFrame(err, columns=weights.columns)

I also wrote a blog post with some simple examples.

Other Libraries

  • boost-histogram provides Pythonic object oriented histograms.
  • Simple and fast histogramming in Python using the NumPy C API: fast-histogram (no variance or overflow support).
  • To calculate histograms in Python on a GPU, see cupy.histogram.

If there is something you'd like to see in pygram11, please open an issue or pull request.

Comments
  • Where is pygram11 in performance compared to fast-histogram?

    Where is pygram11 in performance compared to fast-histogram?

    Basically the title.

    I want to go for raw speed only and don't care for much more than being able to specify bin-edges and getting back a numpy-array from the function that calculates the histogram.

    To give a little more context: It could be, that i will have to calculate a histogram of 1'000'000 bins, where each bin is 1000 integers wide. I would have to fill 1000 values in those bins. Can you tell me if this is possible with pygram11?

    opened by tim-hilt 7
  • Config adjustments

    Config adjustments

    • Rewrite configuration; now controllable by a module, context managers, and decorator
    • modularize things a bit (separate numpy API from pygram11 core API)
    opened by douglasdavis 0
  • WIP: Large overhaul of backend, reorganize internal (non-public) API

    WIP: Large overhaul of backend, reorganize internal (non-public) API

    • [x] OpenMP required to build (not a difficult dependency to satisfy) - toggled via if statements on the input data size.
      • [x] omp function argument deprecated
    • [x] Backend C++ code for 1D histograms rewritten (use Python and NumPy C APIs directory, no pybind11)
    • [x] Rewrite multiweight histograms (still via pybind11) to support more types (avoid conversions)
    • [x] Python code moved to the top level module (__init__.py) (without changing public API)
    • [x] Documentation updates
    opened by douglasdavis 0
  • Improve OpenMP inspection; docs improvements; pybind11 update

    Improve OpenMP inspection; docs improvements; pybind11 update

    • Improve API for accessing information about OpenMP availability and properties
    • Docs improvements
    • Update to latest pybind11 release (2.4.3)
    • Clean up some old scripts
    opened by douglasdavis 0
  • Development for v0.5

    Development for v0.5

    • [x] implement C++ back end for multi weight histogramming
    • [x] redesign one-dimensional histogramming backend, simplify code by using more of the pybind11 array API (front end API uinchanged)
    opened by douglasdavis 0
  • Allow over/under flow for 1D histograms

    Allow over/under flow for 1D histograms

    • extend the number of bins in memory by 2
    • fill bin 0 for data less than xmin
    • fill bin nbins + 1 for data larger than than xmax
    • first real bin is now index 1
    • last real bin is now index nbins
    • add named arguments to python functions which dictate how to slice result so we return the desired bin counts.
    opened by douglasdavis 0
Releases(0.13.2)
  • 0.13.2(Dec 5, 2022)

  • 0.13.1(Aug 30, 2022)

  • 0.13.0(Aug 24, 2022)

  • 0.12.2(May 25, 2022)

  • 0.12.1(Oct 19, 2021)

  • 0.12.0(Apr 16, 2021)

    • OpenMP configuration redesigned.
      • constants have been removed in favor of new pygram11.config module.
      • Decorators and context managers added to control toggling OpenMP thresholds
        • pygram11.without_omp decorator
        • pygram11.with_omp decorator
        • pygram11.disable_omp context manager
        • pygram11.force_omp context manager
      • See the documentation for more
    • New keyword argument added to histogramming functions (cons_var) for returning variance instead of standard error.
    Source code(tar.gz)
    Source code(zip)
  • 0.11.2(Feb 27, 2021)

    Renamed internal Python files hist.py and misc.py to _hist.py and _misc.py, respectively.

    The contents of these modules are brought in to the main pygram11 module namespace by imports in __init__.py (the submodules themselves are not meant to be part of the public API). This avoids tab completions of pygram11.hi<tab> yielding pygram11.hist when we actually want pygram11.histogram.

    Source code(tar.gz)
    Source code(zip)
  • 0.11.1(Feb 25, 2021)

    Two convenience functions added to the pygram11 namespace:

    • bin_centers: returns an array representing the the center of histogram bins given a total number of bins and an axis range or given existing bin edges.
    • bin_edges: returns an array representing bin edges given a total number of bins and an axis range.
    Source code(tar.gz)
    Source code(zip)
  • 0.11.0(Feb 11, 2021)

    • API change: functions calls with weights=None now return None as the second return. Previously the uncertainty was returned (which is just the square-root of the bin heights); now users can take the square-root themselves, and the back-end does not waste cycles tracking the uncertainty since it's trivial for unweighted histograms.
    • More types are supported without conversions (previously np.float64 and np.float32 were the only supported array types, and we converted non-floating point input). Now signed and unsigned integer inputs (both 64 and 32 bit) are supported.
      • If unsupported array types are used TypeError is now raised. This library prioritizes performance; hidden performance hits are explicitly avoided.
    • Configurable thresholds have been introduced to configure when OpenMP acceleration is used (described in the documentation).
    • The back-end was refactored with some help from boost::mp11 to aid in adding more type support without introducing more boilerplate. We now vendor boost::mp11 as a submodule.
    • Bumped the vendored pybind11 submodule to v2.6.2.
    • C++14 now required to build from source.
    • Added Apple Silicon support for Python 3.9 with libomp from Homebrew installed at /opt/homebrew.
    • Documentation improvements
    • Renamed master branch to main.
    Source code(tar.gz)
    Source code(zip)
  • 0.11.0rc1(Feb 10, 2021)

  • 0.10.3(Oct 9, 2020)

  • 0.10.2(Oct 9, 2020)

    • Vendor pybind11 2.6 series.
    • Some changes to backend types (std::size_t -> pybind11::ssize_t for sizes) (https://github.com/pybind/pybind11/pull/2293)
    • Improvements to CI/CD: using cibuildwheel to build wheels.
    • Python 3.9 is now tested and compatible wheels are built.
    Source code(tar.gz)
    Source code(zip)
  • 0.10.2rc1(Oct 9, 2020)

  • 0.10.1(Sep 5, 2020)

  • 0.10.0(Jun 30, 2020)

    Renamed internal Python module from histogram to hist. This avoids a clash with the module function of the same name. Some IDE features were confused.

    Source code(tar.gz)
    Source code(zip)
  • 0.9.1(May 26, 2020)

  • 0.9.0(May 26, 2020)

  • 0.8.2(May 18, 2020)

  • 0.8.1(Apr 26, 2020)

  • 0.8.0(Feb 25, 2020)

    Public API changes:

    • Remove previously deprecated omp function argument

    Backend changed:

    • moved 1D backend back to pybind11 (template flexibility)
    • parallelization cutoffs consistently at array size of 5000 for all calculations
    • all non-floating point inputs are converted

    Other:

    • Improved tests
    Source code(tar.gz)
    Source code(zip)
  • 0.7.3(Feb 11, 2020)

  • 0.7.2(Feb 6, 2020)

  • 0.7.1(Feb 6, 2020)

  • 0.7.0(Feb 5, 2020)

    • OpenMP is now required to build from source (not a difficult dependency to satisfy)
      • omp function argument deprecated
      • omp_max_threads renamed to omp_get_max_threads to mirror OpenMP C API
      • omp_available function removed
    • Backend C++ code for 1D histograms rewritten (use Python and NumPy C APIs directly, no pybind11), more types supported (avoid conversions)
    • Rewrite multiweight histograms (still via pybind11) to support more types (avoid conversions)
    • Python code moved to the top level module (__init__.py) (without changing public API)
    Source code(tar.gz)
    Source code(zip)
  • 0.6.1(Oct 20, 2019)

    • OpenMP inspection improved; new functions replace pygram11.OPENMP:
      • pygram11.omp_available() -> bool checks for availability
      • pygram11.omp_max_threads() -> int checks for the maximum available threads
    • some documentation improvements
    • bump pybind11 version to 2.4.3
    • pygram11.OPENMP will be removed in a future release
    Source code(tar.gz)
    Source code(zip)
  • 0.6.0(Oct 17, 2019)

  • 0.5.2(Aug 20, 2019)

  • 0.5.1(Aug 10, 2019)

    • Fix setup.py such that setuptools doesn't try to install numpy via easy_install (remove unnecessary setup_requires argument).
    • Add _max_threads attribute to pygram11._core.
    • Fixed macOS wheels (use delocate to ensure OpenMP symbols are bundled).
    Source code(tar.gz)
    Source code(zip)
  • 0.5.0(Jul 29, 2019)

    New features

    • histogramming multiple weight variations in a single function call is now possible via fix1dmw and var1dmw
    • In the NumPy like API, passing a 2 dimensional array to the weights argument will use this feature as well.
    • Binary wheels for Linux and macOS are provided on PyPI (conda-forge binaries are of course still available as well)

    Other updates

    • All functions now return the sqrt(sum-of-weights-squared) instead of sum-of-weights; before v0.5.x the 2D functions returned the sum of weights squared.
    Source code(tar.gz)
    Source code(zip)
  • 0.5.0.a2(Jul 29, 2019)

Owner
Doug Davis
OSS Engineer @anaconda
Doug Davis
High performance, editable, stylable datagrids in jupyter and jupyterlab

An ipywidgets wrapper of regular-table for Jupyter. Examples Two Billion Rows Notebook Click Events Notebook Edit Events Notebook Styling Notebook Pan

J.P. Morgan Chase 75 Dec 15, 2022
This project is an Algorithm Visualizer where a user can visualize algorithms like Bubble Sort, Merge Sort, Quick Sort, Selection Sort, Linear Search and Binary Search.

Algo_Visualizer This project is an Algorithm Visualizer where a user can visualize common algorithms like "Bubble Sort", "Merge Sort", "Quick Sort", "

Rahul 4 Feb 07, 2022
Alternative layout visualizer for ZSA Moonlander keyboard

General info This is a keyboard layout visualizer for ZSA Moonlander keyboard (because I didn't find their Oryx or their training tool particularly us

10 Jul 19, 2022
Lumen provides a framework for visual analytics, which allows users to build data-driven dashboards from a simple yaml specification

Lumen project provides a framework for visual analytics, which allows users to build data-driven dashboards from a simple yaml specification

HoloViz 120 Jan 04, 2023
Analysis and plotting for motor/prop/ESC characterization, thrust vs RPM and torque vs thrust

esc_test This is a Python package used to plot and analyze data collected for the purpose of characterizing a particular propeller, motor, and ESC con

Alex Spitzer 1 Dec 28, 2021
HW_02 Data visualisation task

HW_02 Data visualisation and Matplotlib practice Instructions for HW_02 Idea for data analysis As I was brainstorming ideas and running through databa

9 Dec 13, 2022
Simple Inkscape Scripting

Simple Inkscape Scripting Description In the Inkscape vector-drawing program, how would you go about drawing 100 diamonds, each with a random color an

Scott Pakin 140 Dec 27, 2022
Realtime Viewer Mandelbrot set with Python and Taichi (cpu, opengl, cuda, vulkan, metal)

Mandelbrot-set-Realtime-Viewer- Realtime Viewer Mandelbrot set with Python and Taichi (cpu, opengl, cuda, vulkan, metal) Control: "WASD" - movement, "

22 Oct 31, 2022
CPG represent!

CoolPandasGroup CPG represent! Arianna Brandon Enne Luan Tracie Project requirements: use Pandas to clean and format datasets use Jupyter Notebook to

Enne 3 Feb 07, 2022
Visualize tensors in a plain Python REPL using Sparklines

Visualize tensors in a plain Python REPL using Sparklines

Shawn Presser 43 Sep 03, 2022
Schema validation for Xarray objects

xarray-schema Schema validation for Xarray installation This package is in the early stages of development. Install it from source: pip install git+gi

carbonplan 22 Oct 31, 2022
TensorDebugger (TDB) is a visual debugger for deep learning. It extends TensorFlow with breakpoints + real-time visualization of the data flowing through the computational graph

TensorDebugger (TDB) is a visual debugger for deep learning. It extends TensorFlow (Google's Deep Learning framework) with breakpoints + real-time visualization of the data flowing through the comput

Eric Jang 1.4k Dec 15, 2022
A set of useful perceptually uniform colormaps for plotting scientific data

Colorcet: Collection of perceptually uniform colormaps Build Status Coverage Latest dev release Latest release Docs What is it? Colorcet is a collecti

HoloViz 590 Dec 31, 2022
The visual framework is designed on the idea of module and implemented by mixin method

Visual Framework The visual framework is designed on the idea of module and implemented by mixin method. Its biggest feature is the mixins module whic

LEFTeyes 9 Sep 19, 2022
Python scripts for plotting audiograms and related data from Interacoustics Equinox audiometer and Otoaccess software.

audiometry Python scripts for plotting audiograms and related data from Interacoustics Equinox 2.0 audiometer and Otoaccess software. Maybe similar sc

Hamilton Lab at UT Austin 2 Jun 15, 2022
Streamlit component for Let's-Plot visualization library

streamlit-letsplot This is a work-in-progress, providing a convenience function to plot charts from the Lets-Plot visualization library. Example usage

Randy Zwitch 9 Nov 03, 2022
The Spectral Diagram (SD) is a new tool for the comparison of time series in the frequency domain

The Spectral Diagram (SD) is a new tool for the comparison of time series in the frequency domain. The SD provides a novel way to display the coherence function, power, amplitude, phase, and skill sc

Mabel 3 Oct 10, 2022
Productivity Tools for Plotly + Pandas

Cufflinks This library binds the power of plotly with the flexibility of pandas for easy plotting. This library is available on https://github.com/san

Jorge Santos 2.7k Dec 30, 2022
GUI for visualization and interactive editing of SMPL-family body models ie. SMPL, SMPL-X, MANO, FLAME.

Body Model Visualizer Introduction This is a simple Open3D-based GUI for SMPL-family body models. This GUI lets you play with the shape, expression, a

Muhammed Kocabas 207 Jan 01, 2023