Pipetools enables function composition similar to using Unix pipes.

Overview

Pipetools

tests-badge coverage-badge pypi-badge

Complete documentation

pipetools enables function composition similar to using Unix pipes.

It allows forward-composition and piping of arbitrary functions - no need to decorate them or do anything extra.

It also packs a bunch of utils that make common operations more convenient and readable.

Source is on github.

Why?

Piping and function composition are some of the most natural operations there are for plenty of programming tasks. Yet Python doesn't have a built-in way of performing them. That forces you to either deep nesting of function calls or adding extra glue code.

Example

Say you want to create a list of python files in a given directory, ordered by filename length, as a string, each file on one line and also with line numbers:

>>> print(pyfiles_by_length('../pipetools'))
1. ds_builder.py
2. __init__.py
3. compat.py
4. utils.py
5. main.py

All the ingredients are already there, you just have to glue them together. You might write it like this:

def pyfiles_by_length(directory):
    all_files = os.listdir(directory)
    py_files = [f for f in all_files if f.endswith('.py')]
    sorted_files = sorted(py_files, key=len, reverse=True)
    numbered = enumerate(py_files, 1)
    rows = ("{0}. {1}".format(i, f) for i, f in numbered)
    return '\n'.join(rows)

Or perhaps like this:

def pyfiles_by_length(directory):
    return '\n'.join('{0}. {1}'.format(*x) for x in enumerate(reversed(sorted(
        [f for f in os.listdir(directory) if f.endswith('.py')], key=len)), 1))

Or, if you're a mad scientist, you would probably do it like this:

pyfiles_by_length = lambda d: (reduce('{0}\n{1}'.format,
    map(lambda x: '%d. %s' % x, enumerate(reversed(sorted(
        filter(lambda f: f.endswith('.py'), os.listdir(d)), key=len))))))

But there should be one -- and preferably only one -- obvious way to do it.

So which one is it? Well, to redeem the situation, pipetools give you yet another possibility!

pyfiles_by_length = (pipe
    | os.listdir
    | where(X.endswith('.py'))
    | sort_by(len).descending
    | (enumerate, X, 1)
    | foreach("{0}. {1}")
    | '\n'.join)

Why would I do that, you ask? Comparing to the native Python code, it's

  • Easier to read -- minimal extra clutter
  • Easier to understand -- one-way data flow from one step to the next, nothing else to keep track of
  • Easier to change -- want more processing? just add a step to the pipeline
  • Removes some bug opportunities -- did you spot the bug in the first example?

Of course it won't solve all your problems, but a great deal of code can be expressed as a pipeline, giving you the above benefits. Read on to see how it works!

Installation

$ pip install pipetools

Uh, what's that?

Usage

The pipe

The pipe object can be used to pipe functions together to form new functions, and it works like this:

from pipetools import pipe

f = pipe | a | b | c

# is the same as:
def f(x):
    return c(b(a(x)))

A real example, sum of odd numbers from 0 to x:

from functools import partial
from pipetools import pipe

odd_sum = pipe | range | partial(filter, lambda x: x % 2) | sum

odd_sum(10)  # -> 25

Note that the chain up to the sum is lazy.

Automatic partial application in the pipe

As partial application is often useful when piping things together, it is done automatically when the pipe encounters a tuple, so this produces the same result as the previous example:

odd_sum = pipe | range | (filter, lambda x: x % 2) | sum

As of 0.1.9, this is even more powerful, see X-partial.

Built-in tools

Pipetools contain a set of pipe-utils that solve some common tasks. For example there is a shortcut for the filter class from our example, called where():

from pipetools import pipe, where

odd_sum = pipe | range | where(lambda x: x % 2) | sum

Well that might be a bit more readable, but not really a huge improvement, but wait!

If a pipe-util is used as first or second item in the pipe (which happens quite often) the pipe at the beginning can be omitted:

odd_sum = range | where(lambda x: x % 2) | sum

See pipe-utils' documentation.

OK, but what about the ugly lambda?

where(), but also foreach(), sort_by() and other pipe-utils can be quite useful, but require a function as an argument, which can either be a named function -- which is OK if it does something complicated -- but often it's something simple, so it's appropriate to use a lambda. Except Python's lambdas are quite verbose for simple tasks and the code gets cluttered...

X object to the rescue!

from pipetools import where, X

odd_sum = range | where(X % 2) | sum

How 'bout that.

Read more about the X object and it's limitations.

Automatic string formatting

Since it doesn't make sense to compose functions with strings, when a pipe (or a pipe-util) encounters a string, it attempts to use it for (advanced) formatting:

>>> countdown = pipe | (range, 1) | reversed | foreach('{}...') | ' '.join | '{} boom'
>>> countdown(5)
'4... 3... 2... 1... boom'

Feeding the pipe

Sometimes it's useful to create a one-off pipe and immediately run some input through it. And since this is somewhat awkward (and not very readable, especially when the pipe spans multiple lines):

result = (pipe | foo | bar | boo)(some_input)

It can also be done using the > operator:

result = some_input > pipe | foo | bar | boo

Note

Note that the above method of input won't work if the input object defines __gt__ for any object - including the pipe. This can be the case for example with some objects from math libraries such as NumPy. If you experience strange results try falling back to the standard way of passing input into a pipe.

But wait, there is more

Checkout the Maybe pipe, partial application on steroids or automatic data structure creation in the full documentation.

Import, connect and transform data into Excel

xlwings_query Import, connect and transform data into Excel. Description The concept is to apply data transformations to a main query object. When the

George Karakostas 1 Jan 19, 2022
A simple and efficient tool to parallelize Pandas operations on all available CPUs

Pandaral·lel Without parallelization With parallelization Installation $ pip install pandarallel [--upgrade] [--user] Requirements On Windows, Pandara

Manu NALEPA 2.8k Dec 31, 2022
Important dataframe statistics with a single command

quick_eda Receiving dataframe statistics with one command Project description A python package for Data Scientists, Students, ML Engineers and anyone

Sven Eschlbeck 2 Dec 19, 2021
Projeto para realizar o RPA Challenge . Utilizando Python e as bibliotecas Selenium e Pandas.

RPA Challenge in Python Projeto para realizar o RPA Challenge (www.rpachallenge.com), utilizando Python. O objetivo deste desafio é criar um fluxo de

Henrique A. Lourenço 1 Apr 12, 2022
Pyspark Spotify ETL

This is my first Data Engineering project, it extracts data from the user's recently played tracks using Spotify's API, transforms data and then loads it into Postgresql using SQLAlchemy engine. Data

16 Jun 09, 2022
VHub - An API that permits uploading of vulnerability datasets and return of the serialized data

VHub - An API that permits uploading of vulnerability datasets and return of the serialized data

André Rodrigues 2 Feb 14, 2022
Churn prediction with PySpark

It is expected to develop a machine learning model that can predict customers who will leave the company.

3 Aug 13, 2021
Projects that implement various aspects of Data Engineering.

DATAWAREHOUSE ON AWS The purpose of this project is to build a datawarehouse to accomodate data of active user activity for music streaming applicatio

2 Oct 14, 2021
Very useful and necessary functions that simplify working with data

Additional-function-for-pandas Very useful and necessary functions that simplify working with data random_fill_nan(module_name, nan) - Replaces all sp

Alexander Goldian 2 Dec 02, 2021
Pizza Orders Data Pipeline Usecase Solved by SQL, Sqoop, HDFS, Hive, Airflow.

PizzaOrders_DataPipeline There is a Tony who is owning a New Pizza shop. He knew that pizza alone was not going to help him get seed funding to expand

Melwin Varghese P 4 Jun 05, 2022
Show you how to integrate Zeppelin with Airflow

Introduction This repository is to show you how to integrate Zeppelin with Airflow. The philosophy behind the ingtegration is to make the transition f

Jeff Zhang 11 Dec 30, 2022
High Dimensional Portfolio Selection with Cardinality Constraints

High-Dimensional Portfolio Selecton with Cardinality Constraints This repo contains code for perform proximal gradient descent to solve sample average

Du Jinhong 2 Mar 22, 2022
Efficient matrix representations for working with tabular data

Efficient matrix representations for working with tabular data

QuantCo 70 Dec 14, 2022
Evaluation of a Monocular Eye Tracking Set-Up

Evaluation of a Monocular Eye Tracking Set-Up As part of my master thesis, I implemented a new state-of-the-art model that is based on the work of Che

Pascal 19 Dec 17, 2022
Sensitivity Analysis Library in Python (Numpy). Contains Sobol, Morris, Fractional Factorial and FAST methods.

Sensitivity Analysis Library (SALib) Python implementations of commonly used sensitivity analysis methods. Useful in systems modeling to calculate the

SALib 663 Jan 05, 2023
A columnar data container that can be compressed.

Unmaintained Package Notice Unfortunately, and due to lack of resources, the Blosc Development Team is unable to maintain this package anymore. During

944 Dec 09, 2022
NFCDS Workshop Beginners Guide Bioinformatics Data Analysis

Genomics Workshop FIXME: overview of workshop Code of Conduct All participants s

Elizabeth Brooks 2 Jun 13, 2022
Analysis scripts for QG equations

qg-edgeofchaos Analysis scripts for QG equations FIle/Folder Structure eigensolvers.py - Spectral and finite-difference solvers for Rossby wave eigenf

Norman Cao 2 Sep 27, 2022
PySpark bindings for H3, a hierarchical hexagonal geospatial indexing system

h3-pyspark: Uber's H3 Hexagonal Hierarchical Geospatial Indexing System in PySpark PySpark bindings for the H3 core library. For available functions,

Kevin Schaich 12 Dec 24, 2022