Data pipelines built with polars

Last update: Jan 03, 2023

Related tags

Overview

valves

Warning: the project is very much work in progress.

Valves is a collection of functions for your data .pipe()-lines.

This project aimes to host a few performant implementations of functions that are common in industry. This gives us an opportunity to share sensible implementations but it also allows us to compare the performance across libraries. For now the project mainly targets polars, pandas and dask.

Owner

GitHub Repository

Flexible HDF5 saving/loading and other data science tools from the University of Chicago

deepdish Flexible HDF5 saving/loading and other data science tools from the University of Chicago. This repository also host a Deep Learning blog: htt

255 Dec 10, 2022

A stock analysis app with streamlit

StockAnalysisApp A stock analysis app with streamlit. You select the ticker of the stock and the app makes a series of analysis by using the price cha

50 Nov 27, 2022

Advanced Pandas Vault — Utilities, Functions and Snippets (by @firmai).

PandasVault ⁠— Advanced Pandas Functions and Code Snippets The only Pandas utility package you would ever need. It has no exotic external dependencies

374 Jan 07, 2023

WAL enables programmable waveform analysis.

This repro introcudes the Waveform Analysis Language (WAL). The initial paper on WAL will appear at ASPDAC'22 and can be downloaded here: https://www.

40 Dec 13, 2022

Cleaning and analysing aggregated UK political polling data.

Analysing aggregated UK polling data The tweet collection & storage pipeline used in email-service is used to also collect tweets from @britainelects.

0 Dec 22, 2021

Office365 (Microsoft365) audit log analysis tool

Office365 (Microsoft365) audit log analysis tool The header describes it all WHY?? The first line of code was written long time before other colleague

1 Jul 27, 2022

peptides.py is a pure-Python package to compute common descriptors for protein sequences

peptides.py Physicochemical properties and indices for amino-acid sequences. 🗺️ Overview peptides.py is a pure-Python package to compute common descr

32 Dec 31, 2022

Flood modeling by 2D shallow water equation

hydraulicmodel Flood modeling by 2D shallow water equation. Refer to Hunter et al (2005), Bates et al. (2010). Diffusive wave approximation Local iner

6 Nov 30, 2022

Datashredder is a simple data corruption engine written in python. You can corrupt anything text, images and video.

Datashredder is a simple data corruption engine written in python. You can corrupt anything text, images and video. You can chose the cha

2 Jul 22, 2022

Visions provides an extensible suite of tools to support common data analysis operations

Visions And these visions of data types, they kept us up past the dawn. Visions provides an extensible suite of tools to support common data analysis

168 Dec 28, 2022

Finds, downloads, parses, and standardizes public bikeshare data into a standard pandas dataframe format

Finds, downloads, parses, and standardizes public bikeshare data into a standard pandas dataframe format.

2 Dec 01, 2021

CleanX is an open source python library for exploring, cleaning and augmenting large datasets of X-rays, or certain other types of radiological images.

cleanX CleanX is an open source python library for exploring, cleaning and augmenting large datasets of X-rays, or certain other types of radiological

20 Jan 05, 2023

Data pipelines built with polars

Related tags

Overview

valves

Owner

Flexible HDF5 saving/loading and other data science tools from the University of Chicago

A stock analysis app with streamlit

Advanced Pandas Vault — Utilities, Functions and Snippets (by @firmai).

WAL enables programmable waveform analysis.

Cleaning and analysing aggregated UK political polling data.

Office365 (Microsoft365) audit log analysis tool

peptides.py is a pure-Python package to compute common descriptors for protein sequences

Flood modeling by 2D shallow water equation

Datashredder is a simple data corruption engine written in python. You can corrupt anything text, images and video.

Visions provides an extensible suite of tools to support common data analysis operations

Finds, downloads, parses, and standardizes public bikeshare data into a standard pandas dataframe format

CleanX is an open source python library for exploring, cleaning and augmenting large datasets of X-rays, or certain other types of radiological images.

A Python package for the mathematical modeling of infectious diseases via compartmental models

Top 50 best selling books on amazon

Important dataframe statistics with a single command

Python script to automate the plotting and analysis of percentage depth dose and dose profile simulations in TOPAS.

PandaPy has the speed of NumPy and the usability of Pandas 10x to 50x faster (by @firmai)

Randomisation-based inference in Python based on data resampling and permutation.

This is an analysis and prediction project for house prices in King County, USA based on certain features of the house

Python data processing, analysis, visualization, and data operations