Lightweight Machine Learning Experiment Logging 📖

Last update: Dec 08, 2022

Related tags

Overview

A Lightweight Logger for ML Experiments 📖

Simple logging of statistics, model checkpoints, plots and other objects for your Machine Learning Experiments (MLE). Furthermore, the MLELogger comes with smooth multi-seed result aggregation and combination of multi-configuration runs. For a quickstart checkout the notebook blog 🚀

The API 🎮

from mle_logging import MLELogger

# Instantiate logging to experiment_dir
log = MLELogger(time_to_track=['num_updates', 'num_epochs'],
                what_to_track=['train_loss', 'test_loss'],
                experiment_dir="experiment_dir/",
                model_type='torch')

time_tic = {'num_updates': 10, 'num_epochs': 1}
stats_tic = {'train_loss': 0.1234, 'test_loss': 0.1235}

# Update the log with collected data & save it to .hdf5
log.update(time_tic, stats_tic)
log.save()

You can also log model checkpoints, matplotlib figures and other .pkl compatible objects.

# Save a model (torch, tensorflow, sklearn, jax, numpy)
import torchvision.models as models
model = models.resnet18()
log.save_model(model)

# Save a matplotlib figure as .png
fig, ax = plt.subplots()
log.save_plot(fig)

# You can also save (somewhat) arbitrary objects .pkl
some_dict = {"hi" : "there"}
log.save_extra(some_dict)

Or do everything in a single line...

log.update(time_tic, stats_tic, model, fig, extra, save=True)

File Structure & Re-Loading 📚

The MLELogger will create a nested directory, which looks as follows:

experiment_dir
├── extra: Stores saved .pkl object files
├── figures: Stores saved .png figures
├── logs: Stores .hdf5 log files (meta, stats, time)
├── models: Stores different model checkpoints
    ├── final: Stores most recent checkpoint
    ├── every_k: Stores every k-th checkpoint provided in update
    ├── top_k: Stores portfolio of top-k checkpoints based on performance
├── tboards: Stores tensorboards for model checkpointing
├── .json: Copy of configuration file (if provided)

For visualization and post-processing load the results via

>> log_out.meta.keys() # odict_keys(['experiment_dir', 'extra_storage_paths', 'fig_storage_paths', 'log_paths', 'model_ckpt', 'model_type']) # >>> log_out.stats.keys() # odict_keys(['test_loss', 'train_loss']) # >>> log_out.time.keys() # odict_keys(['time', 'num_epochs', 'num_updates', 'time_elapsed']) ">

from mle_logging import load_log
log_out = load_log("experiment_dir/")

# The results can be accessed via meta, stats and time keys
# >>> log_out.meta.keys()
# odict_keys(['experiment_dir', 'extra_storage_paths', 'fig_storage_paths', 'log_paths', 'model_ckpt', 'model_type'])
# >>> log_out.stats.keys()
# odict_keys(['test_loss', 'train_loss'])
# >>> log_out.time.keys()
# odict_keys(['time', 'num_epochs', 'num_updates', 'time_elapsed'])

If an experiment was aborted, you can reload and continue the previous run via the reload=True option:

log = MLELogger(time_to_track=['num_updates', 'num_epochs'],
                what_to_track=['train_loss', 'test_loss'],
                experiment_dir="experiment_dir/",
                model_type='torch',
                reload=True)

Installation ⏳

A PyPI installation is available via:

pip install mle-logging

Alternatively, you can clone this repository and afterwards 'manually' install it:

git clone https://github.com/RobertTLange/mle-logging.git
cd mle-logging
pip install -e .

Advanced Options 🚴

Merging Multiple Logs 👫

Merging Multiple Random Seeds 🌱 + 🌱

>> log.eval_ids # ['seed_1', 'seed_2'] ">

from mle_logging import merge_seed_logs
merge_seed_logs("multi_seed.hdf", "experiment_dir/")
log_out = load_log("experiment_dir/")
# >>> log.eval_ids
# ['seed_1', 'seed_2']

Merging Multiple Configurations 🔖 + 🔖

>> log.eval_ids # ['config_2', 'config_1'] # >>> meta_log.config_1.stats.test_loss.keys() # odict_keys(['mean', 'std', 'p50', 'p10', 'p25', 'p75', 'p90'])) ">

from mle_logging import merge_config_logs, load_meta_log
merge_config_logs(experiment_dir="experiment_dir/",
                  all_run_ids=["config_1", "config_2"])
meta_log = load_meta_log("multi_config_dir/meta_log.hdf5")
# >>> log.eval_ids
# ['config_2', 'config_1']
# >>> meta_log.config_1.stats.test_loss.keys()
# odict_keys(['mean', 'std', 'p50', 'p10', 'p25', 'p75', 'p90']))

Plotting of Logs 🧑‍🎨

meta_log = load_meta_log("multi_config_dir/meta_log.hdf5")
meta_log.plot("train_loss", "num_updates")

Storing Checkpoint Portfolios 📂

Logging every k-th checkpoint update ❗ ⏩ ... ⏩ ❗

# Save every second checkpoint provided in log.update (stored in models/every_k)
log = MLELogger(time_to_track=['num_updates', 'num_epochs'],
                what_to_track=['train_loss', 'test_loss'],
                experiment_dir='every_k_dir/',
                model_type='torch',
                ckpt_time_to_track='num_updates',
                save_every_k_ckpt=2)

Logging top-k checkpoints based on metric 🔱

# Save top-3 checkpoints provided in log.update (stored in models/top_k)
# Based on minimizing the test_loss metric
log = MLELogger(time_to_track=['num_updates', 'num_epochs'],
                what_to_track=['train_loss', 'test_loss'],
                experiment_dir="top_k_dir/",
                model_type='torch',
                ckpt_time_to_track='num_updates',
                save_top_k_ckpt=3,
                top_k_metric_name="test_loss",
                top_k_minimize_metric=True)

Development & Milestones for Next Release

You can run the test suite via python -m pytest -vv tests/. If you find a bug or are missing your favourite feature, feel free to contact me @RobertTLange or create an issue 🤗 . Here are some features I want to implement for the next release:

Add a progress bar if total number of updates is specified
Add Weights and Biases Backend Support
Extend Tensorboard logging (for JAX/TF models)

Comments

Make `pickle5` requirement Python version dependent

The pickle5 dependency forces python < 3.8. If I understand it correctly, pickle5 is only there to backport pickle features that were added with Python 3.8, right? I modified the dependency to only apply for Python < 3.8. With this I was able to install mle-logging in my Python 3.9 environment.

I also modified the only place where pickle5 was used. Didn't test anything, I was hoping this PR would trigger some tests to make sure I didn't break anything (didn't want to install all those test dependencies locally :P).

opened by denisalevi 2
Missing sample json config files break colab demo

Hello!

Just read your blogpost and ~50% of the way through the colab demo, and I have to say that so far it looks like this project has the potential to be profoundly clarifying in how it simplifies & abstracts various pieces of key experiment logic that otherwise suffers from unnecessary complexity. As a PhD student who has had to refactor my whole experimental configuration workflow more times than I would like to admit to even myself, I'm super excited to try out your logger!

I'd also like to commend you for how to-the-point your choice of explanatory examples were for the blogpost. Too many frameworks fill their docs with a bunch of overly-simplistic toy problems and fail to bridge the gap between these and a real experimental situation (e.g. the elegant layout of your multi-seed, multi-config experiment

That said, my experience working through your demo was interrupted once I reached the section "Log Different Random Seeds for Same Configuration". It seems this code cell references a file called "config_1.json", which doesnt exist. While I'm sure I could figure out a simple json file with 1-2 example items, this kind of guesswork distracts immensely from the otherwise very elegant flow from simple to complex that you've set up. I also assume your target audience stretches further than experienced coders, so providing a simple demo config file to reduce the time from reading->coding seems worthwhile.

tldr; the colab needs 1-2 demo config json files

opened by JacobARose 1
Add `wandb` support

I want to add a weights&biases backend which performs automatic grouping across seeds/search experiments. The credentials can be passed as options at initialization of MLELogger and a WandbLogger object has to be added.

When calling log.update this will then automatically forward all info with correct grouping by project/search/config/seed to W&B.

Think about how to integrate gradients/weights from flax/jax models in a natural way (tree flattening?).

opened by RobertTLange 0
Merge `experiment_dir` for different seeds into single one
I would like to have utilities for merging two experiments which are identical except for the seed_id they used (probably only for the multiple-configs case). Steps should include something like this:

Check that experiments are actually identical.

Identify different seeds.

Create new results directory.

Copy over extra/, figures/ for different seeds.

Open both logs (for all configs) and combine them.

Clean-up old directories for different experiments.
opened by RobertTLange 0

[Bug] "OSError: Can't write data" if `what_to_track` has certain Types

Code to recreate:

from mle_logging import MLELogger

# Instantiate logging to experiment_dir
log = MLELogger(time_to_track=['num_updates', 'num_epochs'],
                what_to_track=['train_loss', 'test_loss'],
                experiment_dir="experiment_dir/",
                config_dict={"train_config": {"lrate": 0.01}},
                use_tboard=False,
                model_type='torch',
                print_every_k_updates=1,
                verbose=True)

# Save some time series statistics
time_tic = {'num_updates': 10, 'num_epochs': 1}
stats_tic = {'train_loss': 1, 'test_loss': 1}

# Update the log with collected data & save it to .hdf5
log.update(time_tic, stats_tic)
log.save()

Output from the console:

Traceback (most recent call last):
  File "mle-log-test.py", line 19, in <module>
    log.save()
  File "/home/luc/.local/lib/python3.8/site-packages/mle_logging/mle_logger.py", line 417, in save
    write_to_hdf5(
  File "/home/luc/.local/lib/python3.8/site-packages/mle_logging/utils.py", line 74, in write_to_hdf5
    h5f.create_dataset(
  File "/home/luc/.local/lib/python3.8/site-packages/h5py/_hl/group.py", line 149, in create_dataset
    dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
  File "/home/luc/.local/lib/python3.8/site-packages/h5py/_hl/dataset.py", line 143, in make_new_dset
    dset_id.write(h5s.ALL, h5s.ALL, data)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5d.pyx", line 232, in h5py.h5d.DatasetID.write
  File "h5py/_proxy.pyx", line 114, in h5py._proxy.dset_rw
OSError: Can't write data (no appropriate function for conversion path)

The above code is essentially the Getting Started code with the what_to_track Float values swapped out for Ints. If only 1 of the Floats is swapped for an Int, it still works (I guess it casts the Int to a Float?). I also found the same issue if the what_to_track values are Floats from a DeviceArray.

Please let me know if you have any suggestions or questions!

opened by DiamonDiva 0

Releases(v0.0.4)

v0.0.4(Dec 7, 2021)
[x] Add plot details (title, labels) to meta_log.plot()

[x] Get rid of time string in sub directories

[x] Make log merging more robust

[x] Small fixes for mle-monitor release

[x] Fix overwrite and make verbose warning

Source code(tar.gz)
Source code(zip)
v0.0.3(Sep 11, 2021)
🎉 Mini-release getting rid of small bugs and adding functionality (🐛 & 📈 ) :

Add function to store initial model checkpoint for post-processing via log.save_init_model(model).

Fix byte decoding for strings stored as arrays in .hdf5 log file. Previously this only worked for multi seed/config settings.

MLELogger got a new optional argument: config_dict, which allows you to provide a (nested) configuration of your experiment. It will be stored as a .yaml file if you don't provide a path to an alternative configuration file. The file can either be a .json or a .yaml:

log = MLELogger(time_to_track=['num_updates', 'num_epochs'], what_to_track=['train_loss', 'test_loss'], experiment_dir="experiment_dir/", config_dict={"train_config": {"lrate": 0.01}}, model_type='torch', verbose=True)

The config_dict/ loaded config_fname data will be stored in the meta data of the loaded log and can be easily retrieved:

log = load_log("experiment_dir/") log.meta.config_dict
Source code(tar.gz)
Source code(zip)
v0.0.2(Aug 23, 2021)

Source code(tar.gz)
Source code(zip)
v0.0.1(Aug 18, 2021)

First release of mle-logging utilities.
Source code(tar.gz)
Source code(zip)

Owner

Robert Lange

Deep Something @ TU Berlin 🕵️

GitHub Repository

Lightweight Machine Learning Experiment Logging 📖

Related tags

Overview

A Lightweight Logger for ML Experiments 📖

The API 🎮

File Structure & Re-Loading 📚

Installation ⏳

Advanced Options 🚴

Merging Multiple Logs 👫

Plotting of Logs 🧑‍🎨

Storing Checkpoint Portfolios 📂

Development & Milestones for Next Release

Comments

Make `pickle5` requirement Python version dependent

Missing sample json config files break colab demo

Add `wandb` support

Merge `experiment_dir` for different seeds into single one

[Bug] "OSError: Can't write data" if `what_to_track` has certain Types

Releases(v0.0.4)

v0.0.4(Dec 7, 2021)

v0.0.3(Sep 11, 2021)

v0.0.2(Aug 23, 2021)

v0.0.1(Aug 18, 2021)

Owner

Robert Lange

MLflow App Using React, Hooks, RabbitMQ, FastAPI Server, Celery, Microservices

BigDL: Distributed Deep Learning Framework for Apache Spark

ml4ir: Machine Learning for Information Retrieval

Warren - Stock Price Predictor

Python ML pipeline that showcases mltrace functionality.

Simple Machine Learning Tool Kit

Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models.

An AutoML survey focusing on practical systems.

BioPy is a collection (in-progress) of biologically-inspired algorithms written in Python

A simple machine learning python sign language detection project.

Arquivos do curso online sobre a estatística voltada para ciência de dados e aprendizado de máquina.

Course files for "Ocean/Atmosphere Time Series Analysis"

Module for statistical learning, with a particular emphasis on time-dependent modelling

Management of exclusive GPU access for distributed machine learning workloads

This repository contains the code to predict house price using Linear Regression Method

🚪✊Knock Knock: Get notified when your training ends with only two additional lines of code

A Powerful Serverless Analysis Toolkit That Takes Trial And Error Out of Machine Learning Projects

Python Automated Machine Learning library for tabular data.

Continuously evaluated, functional, incremental, time-series forecasting

Create large-scale ML-driven multiscale simulation ensembles to study the interactions