Tuple-sum-filter - Library to play with filtering numeric sequences by sums of their pairs, triplets, etc. With a bonus CLI demo

Overview

Tuple Sum Filter

A library to play with filtering numeric sequences by sums of their pairs, triplets, etc.

Comes with a bonus CLI to demo the functionality.

Requires (and CI tests on) python 3.8 to 3.10. If you need to use python 3.7 then try replacing math.prod(some_iterable) with functools.reduce(lambda x, y: x * y, some_iterable)

Approach

We're thinking of this mostly as a library with the CLI as only for demo purposes. Ways you can see this in the code:

  • logging should really handled by the consumer,
    • our get_logger should be something that is passed into the lib
  • the CLI is pretty light on automated tests
  • we use pretty loose production dependency pinning
    • rather than pip freeze > requirements.txt of a deployed app
    • we want to keep things loose so that consumers can keep installing us alongside other things
    • we should probably set up tox/nox test runs against v.latest of our dependencies

Running the demo

in a fresh virtualenv (python>=3.8)

# install project and deps
pip install git+https://github.com/lbillingham/tuple-sum-filter.git

# create a suitable input file
echo "1721\n979\n366\n299\n675\n1456\n" > example.txt

# run the demo
filter_demo --input_file=example.txt --sum_target=2020 --dimension=2

you should see output like

checking for pairs of numbers that sum to 2020 in example.txt
Results pair (1721, 299) match: sum to 2020 and multiply to 514579

Consuming the library

The main filtering functions are pairs_that_sum_to and triplets_that_sum_to. They both have signatures (numbers: Sequence[int|float], sum_target: int|float) -> things_that_passed_the_filter list[tuple].

There is also a file-reading helper numbers_in_file exported at the top level.

Developing

Run the following to install the project (and dev dependencies) into your active virtualenv:

make dev_install

day-to-day development tasks can be orchestrated via make

  • dependency management
  • test/lint/typecheck running
  • coverage reporting
  • run make without any arguments to see a list

There is a CI suite which runs lint and test on several python versions. We don't run typechecking as a gate in CI because we think that turns a sometimes-useful tool into a Goodhart target.

Performance

We have not been optimizing for performance and it kind of shows.

When we run the benchmarking suite we see ~0.4 seconds fairly consistently for the triplet/3D problem.

We have at least 3 ideas of how to speed things up: several of them include dropping floating-point support.

$ make benchmark

tests/performance_check.py ..                                                                                                                                [100%]


------------------------------------------------------------------------------------- benchmark: 2 tests ------------------------------------------------------------------------------------
Name (time in ms)             Min                 Max                Mean            StdDev              Median               IQR            Outliers       OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_input1_pairs          5.4665 (1.0)        6.2297 (1.0)        5.6687 (1.0)      0.1018 (1.0)        5.6575 (1.0)      0.1289 (1.0)          47;3  176.4077 (1.0)         172           1
test_input1_triplets     384.6154 (70.36)    386.5000 (62.04)    385.4776 (68.00)    0.8287 (8.14)     385.4333 (68.13)    1.5047 (11.67)         2;0    2.5942 (0.01)          5           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--

🍪 ✂️ cookiecut from lbillingham's python-cli-template

You might also like...
Linux GUI app to codon optimize many single-fasta files with coding sequences , using many taxonomy ids
Linux GUI app to codon optimize many single-fasta files with coding sequences , using many taxonomy ids

codon_optimize_cds_with_many_taxids_singlefasta Linux GUI app to codon optimize many single-fasta files with coding sequences, using many taxonomy ids

Ingestinator is my personal VFX pipeline tool for ingesting folders containing frame sequences that have been pulled and downloaded to a local folder

Ingestinator Ingestinator is my personal VFX pipeline tool for ingesting folders containing frame sequences that have been pulled and downloaded to a

Python Common things by Problem Fighter Library, (Exception, Debug Log, etc.)

In the name of God, the Most Gracious, the Most Merciful. PF-PY-Common Documentation Install and update using pip: pip install -U xxxx Please find the

Devil - Very Semple Auto Filter V1 Bot
Devil - Very Semple Auto Filter V1 Bot

Devil Very Semple Auto Filter V1 Bot

Cairo-bloom - A naive bloom filter implementation in Cairo

🥀 cairo-bloom A naive bloom filter implementation in Cairo. A Bloom filter is a

Snakemake worflow to process and filter long read data from Oxford Nanopore Technologies.
Snakemake worflow to process and filter long read data from Oxford Nanopore Technologies.

Nanopore-Workflow Snakemake workflow to process and filter long read data from Oxford Nanopore Technologies. It is designed to compare whole human gen

Runnable Python demo of ArtLine

artline-demo How to run? pip3 install -r requirements.txt python3 app.py How to use? Run the Flask app Open localhost:5000 in browser Select an image(

Tiny demo site for exploring SameSite=Lax

samesite-lax-demo Background on my blog: Exploring the SameSite cookie attribute for preventing CSRF This repo holds some tools for exploring the impl

An extended version of the hotkeys demo code using action classes

An extended version of the hotkeys application using action classes. In adafruit's Hotkeys code, a macro is using a series of integers, assumed to be

Comments
  • Perf: :zap:  merge if you want to go faster but don't need float support

    Perf: :zap: merge if you want to go faster but don't need float support

    This moves away from the shared itertools implimentations for finding pairs, triplets of the input numbers that sum to a given target.

    Instead, we

    • 1st expose the underlying $~O^{dimensions}$ nested loops
    • trade some extra memory and some $O^{1}$ lookups to give us $~O^{dimensions-1}$
    • get a >= 170x speedup in our benchmarks

    However, we:

    • loose the ability to properly work with floating point input
      • the fast lookup uses hasing and hashing floats gets weird due to floating point equality
    • can't trivially extend to higher-dimension problems: 4-element-tuples etc.

    I've moved the float input tests out the their own file and away from the CI test path

    Performance benchmarks

    with these changes:

    $ make benchmark
    tests/performance_check.py ..                                                                                                                             [100%]
    
    -------------------------------------------------------------------------------------------- benchmark: 2 tests --------------------------------------------------------------------------------------------
    Name (time in us)               Min                   Max                  Mean              StdDev                Median                 IQR            Outliers          OPS            Rounds  Iterations
    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    test_input1_pairs           22.1660 (1.0)        166.3790 (1.0)         23.6089 (1.0)        5.0170 (1.0)         23.0000 (1.0)        0.5123 (1.0)        87;521  42,356.8183 (1.0)        7677           1
    test_input1_triplets     1,994.8000 (89.99)    3,561.1120 (21.40)    2,152.1272 (91.16)    204.1428 (40.69)    2,033.1040 (88.40)    299.1878 (584.07)       41;4     464.6565 (0.01)        341           1
    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    

    itertools, but non-float supporting version

    $ make benchmark
    tests/performance_check.py ..                                                                                                      [100%]
    
    ------------------------------------------------------------------------------------- benchmark: 2 tests ------------------------------------------------------------------------------------
    Name (time in ms)             Min                 Max                Mean            StdDev              Median               IQR            Outliers       OPS            Rounds  Iterations
    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    test_input1_pairs          2.8727 (1.0)        4.2386 (1.0)        3.1265 (1.0)      0.1638 (1.0)        3.1067 (1.0)      0.1888 (1.0)          78;9  319.8414 (1.0)         326           1
    test_input1_triplets     211.6325 (73.67)    213.3950 (50.35)    212.4042 (67.94)    0.6555 (4.00)     212.2717 (68.33)    0.8081 (4.28)          2;0    4.7080 (0.01)          5           1
    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    =========================================================== 2 passed in 3.59s ============================================================
    

    Note the change in units this branch is in microseconds, the itertools version is in milliseconds

    opened by lbillingham 0
Releases(v0.0.1)
  • v0.0.1(Feb 17, 2022)

    Initial release. Lib allows filtering by sum over pairs and triplets of numbers loaded from a local file. Plus a bonus CLI app that can be used for demoing the lib.

    Solution is itertools-y and rather slow (probably $O^{n}$ where pairs->n=2 and triplets->n=3).

    This is the version shown to MM

    Source code(tar.gz)
    Source code(zip)
Owner
Laurence Billingham
+ sustainable software + data science
Laurence Billingham
Certipy is a Python tool to enumerate and abuse misconfigurations in Active Directory Certificate Services (AD CS).

Certipy Certipy is a Python tool to enumerate and abuse misconfigurations in Active Directory Certificate Services (AD CS). Based on the C# variant Ce

ollypwn 1.3k Jan 01, 2023
Turn crypto miner on/off depending on powerwall charge level

Mining Crypto with Tesla Solar and Powerwalls This script turns a crypto miner on and off when the Tesla Powerwall level drops/rises above a certain t

Matt 1 Nov 09, 2021
tidevice can be used to communicate with iPhone device

h 该工具能够用于与iOS设备进行通信, 提供以下功能 截图 获取手机信息 ipa包的安装和卸载 根据bundleID 启动和停止应用 列出安装应用信息 模拟Xcode运行XCTest,常用的如启动WebDriverAgent测试

Alibaba 1.8k Dec 30, 2022
A curated collection of Amazing Python scripts from Basics to Advance with automation task scripts

📑 Introduction A curated collection of Amazing Python scripts from Basics to Advance with automation task scripts. This is your Personal space to fin

Amitesh kumar mishra 1 Jan 22, 2022
A simple and easy to use Python's PIP configuration manager, similar to the Arch Linux's Java manager.

PIPCONF - The PIP configuration manager If you need to manage multiple configurations containing indexes and trusted hosts for PIP, this project was m

João Paulo Carvalho 11 Nov 30, 2022
python DroneCAN code generation, interface and utilities

UAVCAN v0 stack in Python Python implementation of the UAVCAN v0 protocol stack. UAVCAN is a lightweight protocol designed for reliable communication

DroneCAN 11 Dec 12, 2022
lets learn Python language with basic examples. highly recommended for beginners who just start coding.

Lets Learn Python 🐍 Learn python from basic programs. learn python from scratch. 1.Online python compiler: https://www.onlinegdb.com/online_python_co

Subhranshu Choudhury 1 Jan 18, 2022
Chicks get hostloc points regularly

hostloc_getPoints 小鸡定时获取hostloc积分 github action大规模失效,mjj平均一人10鸡,以下可以部署到自己的小鸡上

59 Dec 28, 2022
Add all JuliaLang unicode abbreviations to AutoKey.

Autokey Unicode characters Usage This script adds all the unicode character abbreviations supported by Julia to autokey. However, instead of [TAB], th

Randolf Scholz 49 Dec 02, 2022
Password manager using MySQL and Python 3.10.2

Password Manager Password manager using MySQL and Python 3.10.2 Installation Install my-project with github git clone https://github.com/AyaanSiddiq

1 Feb 18, 2022
Rofi script to minimize / unminimize multiple windows in qtile

Qminimize Rofi script to minimize / unminimize multiple windows in qtile Additional requirements : EWMH module fuzzywuzzy module How to use it : - Clo

9 Sep 18, 2022
A powerful and user-friendly binary analysis platform!

angr angr is a platform-agnostic binary analysis framework. It is brought to you by the Computer Security Lab at UC Santa Barbara, SEFCOM at Arizona S

6.3k Jan 02, 2023
DownTime-Score is a Small project aimed to Monitor the performance and the availabillity of a variety of the Vital and Critical Moroccan Web Portals

DownTime-Score DownTime-Score is a Small project aimed to Monitor the performance and the availabillity of a variety of the Vital and Critical Morocca

adnane-tebbaa 5 Apr 30, 2022
RFDesign - Protein hallucination and inpainting with RoseTTAFold

RFDesign: Protein hallucination and inpainting with RoseTTAFold Jue Wang (juewan

139 Jan 06, 2023
💡 Fully automatic light management based on conditions like motion, illuminance, humidity, and other clever features

Fully automatic light management based on motion as AppDaemon app. 🕓 multiple daytimes to define different scenes for morning, noon, ... 💡 supports

Ben 105 Dec 23, 2022
CHIP-8 interpreter written in Python

chip8py CHIP-8 interpreter written in Python Contents About Installation Usage License About CHIP-8 is an interpreted language developed during the 19

Robert Olaru 1 Nov 09, 2021
Uma versão em Python/Ursina do aplicativo Real Drum (android).

Real Drum Descrição Esta é uma versão alternativa feita em Python com a engine Ursina do aplicatio Real Drum (presente no Google Play Store). Como exe

hayukimori 5 Aug 20, 2022
Cairo-bloom - A naive bloom filter implementation in Cairo

🥀 cairo-bloom A naive bloom filter implementation in Cairo. A Bloom filter is a

Sam Barnes 37 Oct 01, 2022
little proyect to organize myself, but maybe can help someone else

TaskXT 0.1 Little proyect to organize myself, but maybe can help someone else Idea The main idea is to ogranize you work and stuff to do, but with onl

Gabriel Carmona 4 Oct 03, 2021
LOC-FLOW is an “hands-free” earthquake location workflow to process continuous seismic records

LOC-FLOW is an “hands-free” earthquake location workflow to process continuous seismic records: from raw waveforms to well located earthquakes with magnitude calculations. The package assembles sever

Miao Zhang 71 Jan 09, 2023