PipeChain is a utility library for creating functional pipelines.

Overview

PipeChain

Motivation

PipeChain is a utility library for creating functional pipelines. Let's start with a motivating example. We have a list of Australian phone numbers from our users. We need to clean this data before we insert it into the database. With PipeChain, you can do this whole process in one neat pipeline:

from pipechain import PipeChain, PLACEHOLDER as _

nums = [
    "493225813",
    "0491 570 156",
    "55505488",
    "Barry",
    "02 5550 7491",
    "491570156",
    "",
    "1800 975 707"
]

PipeChain(
    nums
).pipe(
    # Remove spaces
    map, lambda x: x.replace(" ", ""), _
).pipe(
    # Remove non-numeric entries
    filter, lambda x: x.isnumeric(), _
).pipe(
    # Add the mobile code to the start of 8-digit numbers
    map, lambda x: "04" + x if len(x) == 8 else x, _
).pipe(
    # Add the 0 to the start of 9-digit numbers
    map, lambda x: "0" + x if len(x) == 9 else x, _
).pipe(
    # Convert to a set to remove duplicates
    set
).eval()
{'0255507491', '0455505488', '0491570156', '0493225813', '1800975707'}

Without PipeChain, we would have to horrifically nest our code, or else use a lot of temporary variables:

set(
    map(
        lambda x: "0" + x if len(x) == 9 else x,
        map(
            lambda x: "04" + x if len(x) == 8 else x,
            filter(
                lambda x: x.isnumeric(),
                map(
                    lambda x: x.replace(" ", ""),
                    nums
                )
            )
        )
    )
)
{'0255507491', '0455505488', '0491570156', '0493225813', '1800975707'}

Installation

pip install pipechain

Usage

Basic Usage

PipeChain has only two exports: PipeChain, and PLACEHOLDER.

PipeChain is a class that defines a pipeline. You create an instance of the class, and then call .pipe() to add another function onto the pipeline:

from pipechain import PipeChain, PLACEHOLDER
PipeChain(1).pipe(str)
PipeChain(arg=1, pipes=[functools.partial(
   
    )])

   

Finally, you call .eval() to run the pipeline and return the result:

PipeChain(1).pipe(str).eval()
'1'

You can "feed" the pipe at either end, either during construction (PipeChain("foo")), or during evaluation .eval("foo"):

PipeChain().pipe(str).eval(1)
'1'

Each call to .pipe() takes a function, and any additional arguments you provide, both positional and keyword, will be forwarded to the function:

PipeChain(["b", "a", "c"]).pipe(sorted, reverse=True).eval()
['c', 'b', 'a']

Argument Position

By default, the previous value is passed as the first positional argument to the function:

PipeChain(2).pipe(pow, 3).eval()
8

The only magic here is that if you use the PLACEHOLDER variable as an argument to .pipe(), then the pipeline will replace it with the output of the previous pipe at runtime:

PipeChain(2).pipe(pow, 3, PLACEHOLDER).eval()
9

Note that you can rename PLACEHOLDER to something more usable using Python's import statement, e.g.

from pipechain import PLACEHOLDER as _
PipeChain(2).pipe(pow, 3, _).eval()
9

Methods

It might not see like methods will play that well with this pipe convention, but after all, they are just functions. You should be able to access any object's method as a function by accessing it on that object's parent class. In the below example, str is the parent class of "":

"".join(["a", "b", "c"])
'abc'
PipeChain(["a", "b", "c"]).pipe(str.join, "", _).eval()
'abc'

Operators

The same goes for operators, such as +, *, [] etc. We just have to use the operator module in the standard library:

from operator import add, mul, getitem

PipeChain(5).pipe(mul, 3).eval()
15
PipeChain(5).pipe(add, 3).eval()
8
PipeChain(["a", "b", "c"]).pipe(getitem, 1).eval()
'b'

Test Suite

Note, you will need poetry installed.

To run the test suite, use:

git clone https://github.com/multimeric/PipeChain.git
cd PipeChain
poetry install
poetry run pytest test/test.py
Owner
Michael Milton
Michael Milton
Python scripts aim to use a Random Forest machine learning algorithm to predict the water affinity of Metal-Organic Frameworks

The following Python scripts aim to use a Random Forest machine learning algorithm to predict the water affinity of Metal-Organic Frameworks (MOFs). The training set is extracted from the Cambridge S

1 Jan 09, 2022
follow-analyzer helps GitHub users analyze their following and followers relationship

follow-analyzer follow-analyzer helps GitHub users analyze their following and followers relationship by providing a report in html format which conta

Yin-Chiuan Chen 2 May 02, 2022
Transform-Invariant Non-Negative Matrix Factorization

Transform-Invariant Non-Negative Matrix Factorization A comprehensive Python package for Non-Negative Matrix Factorization (NMF) with a focus on learn

EMD Group 6 Jul 01, 2022
This python script allows you to manipulate the audience data from Sl.ido surveys

Slido-Automated-VoteBot This python script allows you to manipulate the audience data from Sl.ido surveys Since Slido blocks interference from automat

Pranav Menon 1 Jan 24, 2022
A script to "SHUA" H1-2 map of Mercenaries mode of Hearthstone

lushi_script Introduction This script is to "SHUA" H1-2 map of Mercenaries mode of Hearthstone Installation Make sure you installed python=3.6. To in

210 Jan 02, 2023
nrgpy is the Python package for processing NRG Data Files

nrgpy nrgpy is the Python package for processing NRG Data Files Website and source: https://github.com/nrgpy/nrgpy Documentation: https://nrgpy.github

NRG Tech Services 23 Dec 08, 2022
Hidden Markov Models in Python, with scikit-learn like API

hmmlearn hmmlearn is a set of algorithms for unsupervised learning and inference of Hidden Markov Models. For supervised learning learning of HMMs and

2.7k Jan 03, 2023
Two phase pipeline + StreamlitTwo phase pipeline + Streamlit

Two phase pipeline + Streamlit This is an example project that demonstrates how to create a pipeline that consists of two phases of execution. In betw

Rick Lamers 1 Nov 17, 2021
small package with utility functions for analyzing (fly) calcium imaging data

fly2p Tools for analyzing two-photon (2p) imaging data collected with Vidrio Scanimage software and micromanger. Loading scanimage data relies on scan

Hannah Haberkern 3 Dec 14, 2022
Very basic but functional Kakuro solver written in Python.

kakuro.py Very basic but functional Kakuro solver written in Python. It uses a reduction to exact set cover and Ali Assaf's elegant implementation of

Louis Abraham 4 Jan 15, 2022
Developed for analyzing the covariance for OrcVIO

about This repo is developed for analyzing the covariance for OrcVIO environment setup platform ubuntu 18.04 using conda conda env create --file envir

Sean 1 Dec 08, 2021
Show you how to integrate Zeppelin with Airflow

Introduction This repository is to show you how to integrate Zeppelin with Airflow. The philosophy behind the ingtegration is to make the transition f

Jeff Zhang 11 Dec 30, 2022
collect training and calibration data for gaze tracking

Collect Training and Calibration Data for Gaze Tracking This tool allows collecting gaze data necessary for personal calibration or training of eye-tr

Pascal 5 Dec 17, 2022
Exploratory Data Analysis of the 2019 Indian General Elections using a dataset from Kaggle.

2019-indian-election-eda Exploratory Data Analysis of the 2019 Indian General Elections using a dataset from Kaggle. This project is a part of the Cou

Souradeep Banerjee 5 Oct 10, 2022
Numerical Analysis toolkit centred around PDEs, for demonstration and understanding purposes not production

Numerics Numerical Analysis toolkit centred around PDEs, for demonstration and understanding purposes not production Use procedure: Initialise a new i

George Whittle 1 Nov 13, 2021
fds is a tool for Data Scientists made by DAGsHub to version control data and code at once.

Fast Data Science, AKA fds, is a CLI for Data Scientists to version control data and code at once, by conveniently wrapping git and dvc

DAGsHub 359 Dec 22, 2022
This mini project showcase how to build and debug Apache Spark application using Python

Spark app can't be debugged using normal procedure. This mini project showcase how to build and debug Apache Spark application using Python programming language. There are also options to run Spark a

Denny Imanuel 1 Dec 29, 2021
ETL pipeline on movie data using Python and postgreSQL

Movies-ETL ETL pipeline on movie data using Python and postgreSQL Overview This project consisted on a automated Extraction, Transformation and Load p

Juan Nicolas Serrano 0 Jul 07, 2021
Statistical Rethinking: A Bayesian Course Using CmdStanPy and Plotnine

Statistical Rethinking: A Bayesian Course Using CmdStanPy and Plotnine Intro This repo contains the python/stan version of the Statistical Rethinking

Andrés Suárez 3 Nov 08, 2022
Sentiment analysis on streaming twitter data using Spark Structured Streaming & Python

Sentiment analysis on streaming twitter data using Spark Structured Streaming & Python This project is a good starting point for those who have little

Himanshu Kumar singh 2 Dec 04, 2021