Convert monolithic Jupyter notebooks into Ploomber pipelines.

Last update: Dec 16, 2022

Overview

Soorgeon

Convert monolithic Jupyter notebooks into Ploomber pipelines.

soorgeon.mp4

3-minute video tutorial.

Try the interactive demo:

Note: Soorgeon is in alpha, help us make it better.

Install

pip install soorgeon

Usage

# refactor notebook
soorgeon refactor nb.ipynb

# all variables with the df prefix are stored in csv files
soorgeon refactor nb.ipynb --df-format csv
# all variables with the df prefix are stored in parquet files
soorgeon refactor nb.ipynb --df-format parquet

# store task output in 'some-directory' (if missing, this defaults to 'output')
soorgeon refactor nb.ipynb --product-prefix some-directory

# generate tasks in .py format
soorgeon refactor nb.ipynb --file-format py

To learn more, check out our guide.

Examples

git clone https://github.com/ploomber/soorgeon

Exploratory daya analysis notebook:

cd examples/exploratory
soorgeon refactor nb.ipynb

# to run the pipeline
pip install -r requirements.txt
ploomber build

Machine learning notebook:

cd examples/machine-learning
soorgeon refactor nb.ipynb

# to run the pipeline
pip install -r requirements.txt
ploomber build

To learn more, check out our guide.

Convert monolithic Jupyter notebooks into Ploomber pipelines.

Related tags

Overview

Soorgeon

Install

Usage

Examples

Community

Owner

Ploomber

Senator Trades Monitor

Exploratory Data Analysis for Employee Retention Dataset

Calculate multilateral price indices in Python (with Pandas and PySpark).

Vectorizers for a range of different data types

Statsmodels: statistical modeling and econometrics in Python

Single-Cell Analysis in Python. Scales to >1M cells.

:truck: Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark

Weather Image Recognition - Python weather application using series of data

Full automated data pipeline using docker images

MotorcycleParts DataAnalysis python

Galvanalyser is a system for automatically storing data generated by battery cycling machines in a database

Efficient matrix representations for working with tabular data

Analyzing Earth Observation (EO) data is complex and solutions often require custom tailored algorithms.

A Python package for Bayesian forecasting with object-oriented design and probabilistic models under the hood.

Feature Detection Based Template Matching

Extract Thailand COVID-19 Cluster data from daily briefing pdf.

Python tools for querying and manipulating BIDS datasets.

Data Analysis for First Year Laboratory at Imperial College, London.

A tool to compare differences between dataframes and create a differences report in Excel

HyperSpy is an open source Python library for the interactive analysis of multidimensional datasets