Lightweight, Python library for fast and reproducible experimentation :microscope:

Last update: Jul 10, 2022

Overview

Steppy

What is Steppy?

Steppy is a lightweight, open-source, Python 3 library for fast and reproducible experimentation.
Steppy lets data scientist focus on data science, not on software development issues.
Steppy's minimal interface does not impose constraints, however, enables clean machine learning pipeline design.

What problem steppy solves?

Problems

In the course of the project, data scientist faces two problems:

Difficulties with reproducibility in data science / machine learning projects.
Lack of the ability to prepare or extend experiments quickly.

Solution

Steppy address both problems by introducing two simple abstractions: Step and Tranformer. We consider it minimal interface for building machine learning pipelines.

Step is a wrapper over the transformer and handles multiple aspects of the execution of the pipeline, such as saving intermediate results (if needed), checkpointing the model during training and much more.
Tranformer in turn, is purely computational, data scientist-defined piece that takes an input data and produces some output data. Typical Transformers are neural network, machine learning algorithms and pre- or post-processing routines.

Start using steppy

Installation

Steppy requires python3.5 or above.

pip3 install steppy

(you probably want to install it in your virtualenv)

Resources

📒 Documentation
💻 Source
📛 Bugs reports
🚀 Feature requests
🌟 Tutorial notebooks (their repository):
- ▶️ Getting started
- ▶️ Steps with multiple inputs
- ▶️ Advanced adapters
- ▶️ Caching and persistance
- ▶️ Steppy with Keras

Feature Requests

Please send us your ideas on how to improve steppy library! We are looking for your comments here: Feature requests.

Roadmap

⏩ At this point steppy is early-stage library heavily tested on multiple machine learning challenges (data-science-bowl, toxic-comment-classification-challenge, mapping-challenge) and educational projects (minerva-advanced-data-scientific-training).

⏩ We are developing steppy towards practical tool for data scientists who can run their experiments easily and change their pipelines with just few manipulations in the code.

Related projects

We are also building steppy-toolkit, a collection of high quality implementations of the top deep learning architectures -> all of them with the same, intuitive interface.

Contributing

You are welcome to contribute to the Steppy library. Please check CONTRIBUTING for more information.

Terms of use

Steppy is MIT-licensed.

Comments

Concat features

How is it possible to do the following Step in new version(use of pandas_concat_inputs)?:

                                    transformer=GroupbyAggregationsFeatures(AGGREGATION_RECIPIES),
                                    input_steps=[df_step],
                                    input_data=['input'],
                                    adapter=Adapter({
                                        'X': ([('input', 'X'),
                                               (df_step.name, 'X')],
                                              pandas_concat_inputs)
                                    }),
                                    cache_dirpath=config.env.cache_dirpath)

opened by denyslazarenko 8

Docs3
Pull Request template

Doc contributions

Contributing.html FAQ.html intro.html testdoc.html

tested by running in docs/

>>> (Steppy) sphinx-apidoc -o generated/ -d 4 -fMa ../steppy >>> (Steppy) clear;make clean;make html

Regards Bruce

core contributors to the minerva.ml
opened by bcottman 6
How to evaluate each step only once?

I have the following structure of my steps. The problem is that many steps are called more than once and it makes the process of training very slow. Is it possible somehow to simplify it? more precisely, how to optimize this part? I would like to compute input_missing just once

opened by denyslazarenko 4
Difference between cache and persist

I do not really get the difference between these two things. Both of them cache the result of execution in the disc. Is it a good idea to add cache_output to all the Steps to avoid any executions twice? In some of your examples, you use both cache and persist at the same time, I think it is a good idea to use one of it...

opened by denyslazarenko 2
ENH: Adds id to support output caching
Fixes https://github.com/neptune-ml/steppy/issues/39

This PR adds an optional id field to data dictionary. When cache_output is set to True, theid field is appended to step.nameto distinguish between output caches produced by different data dictionaries.

For example:

data_train = { 'id': 'data_train' 'input': { 'features': np.array([ [1, 6], [2, 5], [3, 4] ]), 'labels': np.array([2, 5, 3]), } } step = Step( name='test_cache_output_with_key', transformer=IdentityOperation(), input_data=['input'], experiment_directory='/exp_dir', cache_output=True ) step.fit_transform(data_train)

This will produce a output cache file at /exp_dir/cache/test_cache_output_with_key__data_train.
opened by thomasjpfan 2
Simplified adapter syntax

This is my idea for simplifying adapter syntax. The benefit is that importing the extractor E from the adapter module is no longer needed. On the other hand, the rules for deciding if something is an atomic recipe or part of a larger recipe or even a constant get more complicated.
feature-request API-design

opened by mromaniukcdl 2
refactor adapter.py
Problem: Currently User must from steppy.adapter import Adapter, E in order to use adapters.

Refactor so that:

Use does not have to import E

add Example to docstrings

Refactor is comprehensive, so that:

correct the code

correct tests

correct docstrings

feature-request API-design
opened by kamil-kaczmarek 2
PyTorch model is never saved as checkpoint after first epoch

Look here: https://github.com/minerva-ml/gradus/blob/dev/steps/pytorch/callbacks.py#L266 If self.epoch_id is equal to 0, then loss_sum is equal to self.best_score and model is not saved. I think it should be fixed, because sometimes we want to have model after first epoch saved.
bug feature-request

opened by apyskir 2
Unintuitive adapter syntax
Current syntax for adapters has some peculiarities. Consider the following example.

step = Step( name='ensembler', transformer=Dummy(), input_data=['input_1'], adapter={'X': [('input_1', 'features')]}, cache_dirpath='.cache' )

This step basically extracts one element of the input. It seems redundant to write brackets and parentheses. Doing adapter={'X': ('input_1', 'features')}, should be sufficient.

Moreover, to my suprise adapter={'X': [('input_1', 'features'), ('input_2', 'extra_features')]}, is incorrect, and currently leads to ValueError: too many values to unpack (expected 2)

My suggestions to make the syntax consistent are:

adapter={'X': ('input_1', 'features')} should map X to extracted features.

adapter={'X': [...]} should map X to a list of extracted objects (specified by elements of the list). In particular adapter={'X': [('input_1', 'features')]} should map X to a one-element list with extracted features.

adapter={'X': ([...], func)} should extract appropriate objects and put them on the list, then func should be called on that list, and X should map to the result of that call.

API-design
opened by grzes314 2
2nd version docs for steppy

Pull Request template

Doc contributions

This represents 0.01, where we/you were at 0.0? As you should be able to see I was able to use 95% of what was there previously. redid index.rst redid conf.py added directory docs.nbdocs

needs more work . about days worth. before pushing out to read the docs.

i found the docstrings very strong.

i not very strongly suggest step-toolkit and steppy-examples be merged into one project.

I see you use goggle-docstring-style. i will switch from numpy-style.

Regards Bruce

opened by bcottman 1
FAQ DOC

Started. intend on first pass to fill with my (naive/embarassing) discoveries and really good (i.e. incredibly stupid) questions and enlightening answers from gaggle.

opened by bcottman 1
Let's make it possible to transform based on checkpoints

Hi! Let's assume I'm training a huge network for a lot of epochs and it saves checkpoints in checkpoints folder. I suggest to prepare a possibility to run transform on a pipeline, when transformer is not in experiment_dir/transformers, but a checkpoint is available in checkpoints folder. What do you think?

opened by apyskir 0
Structure of steps - ideas for making it cleaner
@kamil-kaczmarek, @jakubczakon I know it is a bunch of different ideas and suggestions clustered in one issue. Let me know which of those are compatible with the current roadmap. (I am happy to contribute/collaborate on some.)

default data folder (e.g. ./.steppy/step_name/) or to be configurable if needed; overriding only when strictly necessary

no input_data; it complicates things for no obvious reason!

names optional, automatically generated from class names + number

more explicit job structure (steps = Sequence([step1, step2])); vide Keras API

adapters as inheriting from BaseTrainers,step = Rename({'a': 'aaa', 'b': 'bbb'}), vide rename in Pandas

how to separate persist-data vs persist-parameters? (e.g. for image preprocessing, it may be time-saving to save once processed images)

built-in data tests (e.g. len(X) == len(Y)), in def test

built-in test if persist->load is correct (i.e. loaded data is the same as saved)
opened by stared 2
Do all Steps execute parallel?

Is it necessary to divide executions inside my class to be separate Thread or just divide them between Steps? For example, I can to fit KNN, PCA in one class method and parallel them or create two separate classes for them...

opened by denyslazarenko 2
Maybe load_saved_input?

Hi, I have a proposal: let's make it possible to dump adapted input of a step to disk. It's very handy when you are working on a 5th or 10th step in a pipeline that has 2,3 or more input steps. Now you have to set flag load_saved_output=True on each of the input steps to be able to work on your beloved step. If you could just set load_saved_input=True (adapted or not adapted, I think it's worth discussion) on the step you are currently working on, it would be much easier. What do you think?

opened by apyskir 0

Releases(v0.1.16)

v0.1.16(Nov 23, 2018)
check that output from transformer is dict

check that input to step.fit_transform() and step.transform() is dict (None is Ok though).

Source code(tar.gz)
Source code(zip)
v0.1.15(Oct 18, 2018)

Source code(tar.gz)
Source code(zip)
v0.1.14(Oct 10, 2018)

bug was reported here: https://github.com/neptune-ml/steppy-examples/issues/9
Source code(tar.gz)
Source code(zip)
v0.1.13(Oct 10, 2018)

Source code(tar.gz)
Source code(zip)
v0.1.12(Oct 5, 2018)

Source code(tar.gz)
Source code(zip)
v0.1.11(Oct 5, 2018)

Source code(tar.gz)
Source code(zip)
v0.1.10(Sep 27, 2018)

Source code(tar.gz)
Source code(zip)
v0.1.9(Sep 20, 2018)

Source code(tar.gz)
Source code(zip)
v0.1.8(Sep 20, 2018)

Source code(tar.gz)
Source code(zip)

v0.1.7(Sep 18, 2018)

removed cache dir, cache is now stored in memory as self.output
name is not obligatory. By default use transformer.__class__.__name__
name by default is added suffix '_0' to the name (or 1,2,3,...)
now you can clean_cache or clean_cache_upstream_steps
added name validation
new defaults:
    is_fittable=True
    force_fitting=True
    cache_output=False
    persist_output=True

Source code(tar.gz)
Source code(zip)

v0.1.6(Jul 23, 2018)

Source code(tar.gz)
Source code(zip)
v0.1.5(Jun 27, 2018)

Source code(tar.gz)
Source code(zip)
v0.1.4(Jun 18, 2018)

Source code(tar.gz)
Source code(zip)
v0.1.3(Jun 15, 2018)

Non-technical release that introduces proper readme.md on github and index.rst on readthedocs
Source code(tar.gz)
Source code(zip)
v0.1.2(Jun 7, 2018)
In this release we are breaking backward compatibility.

Source code(tar.gz)
Source code(zip)
v0.1.1(Jun 2, 2018)
release notes

pip3 install steppy

read the docs up and running

basic classes and functions documented

API after first review

end-to-end tutorial on steppy-examples

see also

https://github.com/minerva-ml/steppy/milestone/1
Source code(tar.gz)
Source code(zip)

Owner

minerva.ml

GitHub Repository

Pip-package for trajectory benchmarking from "Be your own Benchmark: No-Reference Trajectory Metric on Registered Point Clouds", ECMR'21

Map Metrics for Trajectory Quality Map metrics toolkit provides a set of metrics to quantitatively evaluate trajectory quality via estimating consiste

31 Oct 28, 2022

On the Limits of Pseudo Ground Truth in Visual Camera Re-Localization

On the Limits of Pseudo Ground Truth in Visual Camera Re-Localization This repository contains the evaluation code and alternative pseudo ground truth

36 Dec 22, 2022

TransMorph: Transformer for Medical Image Registration

TransMorph: Transformer for Medical Image Registration keywords: Vision Transformer, Swin Transformer, convolutional neural networks, image registrati

180 Jan 07, 2023

Age and Gender prediction using Keras

cnn_age_gender Age and Gender prediction using Keras Dataset example : Description : UTKFace dataset is a large-scale face dataset with long age span

58 May 03, 2022

Fully Automatic Page Turning on Real Scores

Fully Automatic Page Turning on Real Scores This repository contains the corresponding code for our extended abstract Henkel F., Schwaiger S. and Widm

7 Jan 02, 2022

Autonomous Perception: 3D Object Detection with Complex-YOLO

Autonomous Perception: 3D Object Detection with Complex-YOLO LiDAR object detect

2 Feb 18, 2022

Instant neural graphics primitives: lightning fast NeRF and more

Instant Neural Graphics Primitives Ever wanted to train a NeRF model of a fox in under 5 seconds? Or fly around a scene captured from photos of a fact

10.6k Jan 01, 2023

Code for our EMNLP 2021 paper "Learning Kernel-Smoothed Machine Translation with Retrieved Examples"

KSTER Code for our EMNLP 2021 paper "Learning Kernel-Smoothed Machine Translation with Retrieved Examples" [paper]. Usage Download the processed datas

23 Nov 24, 2022

TensorFlow, PyTorch and Numpy layers for generating Orthogonal Polynomials

OrthNet TensorFlow, PyTorch and Numpy layers for generating multi-dimensional Orthogonal Polynomials 1. Installation 2. Usage 3. Polynomials 4. Base C

29 May 25, 2022

An all-in-one application to visualize multiple different local path planning algorithms

Table of Contents Table of Contents Local Planner Visualization Project (LPVP) Features Installation/Usage Local Planners Probabilistic Roadmap (PRM)

47 Dec 30, 2022

Pytorch implementation of Masked Auto-Encoder

Masked Auto-Encoder (MAE) Pytorch implementation of Masked Auto-Encoder: Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick

22 Dec 13, 2022

🦕 NanoSaur is a little tracked robot ROS2 enabled, made for an NVIDIA Jetson Nano

🦕 nanosaur NanoSaur is a little tracked robot ROS2 enabled, made for an NVIDIA Jetson Nano Website: nanosaur.ai Do you need an help? Discord For tech

162 Dec 09, 2022

Code & Models for Temporal Segment Networks (TSN) in ECCV 2016

Temporal Segment Networks (TSN) We have released MMAction, a full-fledged action understanding toolbox based on PyTorch. It includes implementation fo

1.4k Jan 01, 2023

Lung Pattern Classification for Interstitial Lung Diseases Using a Deep Convolutional Neural Network

ild-cnn This is supplementary material for the manuscript: "Lung Pattern Classification for Interstitial Lung Diseases Using a Deep Convolutional Neur

22 Nov 05, 2022

Graph Self-Supervised Learning for Optoelectronic Properties of Organic Semiconductors

SSL_OSC Graph Self-Supervised Learning for Optoelectronic Properties of Organic Semiconductors

2 May 14, 2022

My coursework for Machine Learning (2021 Spring) at National Taiwan University (NTU)

Machine Learning 2021 Machine Learning (NTU EE 5184, Spring 2021) Instructor: Hung-yi Lee Course Website : (https://speech.ee.ntu.edu.tw/~hylee/ml/202

100 Dec 26, 2022

A simple, clean TensorFlow implementation of Generative Adversarial Networks with a focus on modeling illustrations.

IllustrationGAN A simple, clean TensorFlow implementation of Generative Adversarial Networks with a focus on modeling illustrations. Generated Images

268 Nov 27, 2022

PyTorch implementation of DeepUME: Learning the Universal Manifold Embedding for Robust Point Cloud Registration (BMVC 2021)

DeepUME: Learning the Universal Manifold Embedding for Robust Point Cloud Registration [video] [paper] [supplementary] [data] [thesis] Introduction De

10 Dec 14, 2022

PixelPick This is an official implementation of the paper "All you need are a few pixels: semantic segmentation with PixelPick."

PixelPick This is an official implementation of the paper "All you need are a few pixels: semantic segmentation with PixelPick." [Project page] [Paper

59 Sep 25, 2022

Readings for "A Unified View of Relational Deep Learning for Polypharmacy Side Effect, Combination Therapy, and Drug-Drug Interaction Prediction."

Polypharmacy - DDI - Synergy Survey The Survey Paper This repository accompanies our survey paper A Unified View of Relational Deep Learning for Polyp

79 Jan 05, 2023

Lightweight, Python library for fast and reproducible experimentation :microscope:

Related tags

Overview

Steppy

What is Steppy?

What problem steppy solves?

Problems

Solution

Start using steppy

Installation

Resources

Feature Requests

Roadmap

Related projects

Contributing

Terms of use

Comments

Pull Request template

Doc contributions

Pull Request template

Doc contributions

Releases(v0.1.16)

v0.1.16(Nov 23, 2018)

v0.1.15(Oct 18, 2018)

v0.1.14(Oct 10, 2018)

v0.1.13(Oct 10, 2018)

v0.1.12(Oct 5, 2018)

v0.1.11(Oct 5, 2018)

v0.1.10(Sep 27, 2018)

v0.1.9(Sep 20, 2018)

v0.1.8(Sep 20, 2018)