On Anytime Learning At Macroscale

Learning from sequential data dumps

(key) Requirements

Python 3.7
Pytorch 1.9.0
Hydra 1.1.0 (pip install hydra-core & pip install hydra-submitit-launcher)

Structure

├── crlapi           
  ├── benchmark.py    # Creates the data stream, feeds it to the model and evaluates it
  ├── core.py         # Abstract classes for 
  ├── logger.py   
  ├── sl
    ├── architectures
      ├── ...         # NN architectures used in this project
    ├── clmodels
      ├── ...         # Models (e.g. Single, gEns, ..., )
    ├── streams
      ├── ...         # CIFAR and MNIST stream implementatins

Running Experiments

To run experiments, you need to call the dataset specific run file, and you need to pass the configuration of the run. We have place the configurations in the previous directory (../configs). The config structure is as follows

    ├── configs
        ├── mnist
           ├── run.py                 # run file
           ├── test_usage_gmoe.yaml   # This is the "gMoE" model
           ├── test_finetune_mlp.yaml # This is the "Single Model"
           ... 
        ├── cifar
           ├── run.py                 # run file
           ├── test_finetune_vgg.yaml # This is the "Single Model"
           ├── test_usage_gmoe.yaml   # This is the "gMoE" model
           ...

To run an e.g. mnist gMoE run, the command is (launched from the directory just above (so cd ..)

PYTHONPATH=./ python configs/mnist/run.py -cn test_usage_gmoe n_megabatches=2 replay=1 clmodel.max_epochs=200

Important arguments

n_megabatches : controls the number of megabatches. So n_megabatches=1 is your regular full dataset training
replay : whether to use replay or not
clmodel.init_from_scratch : whether to reinitialize the model at every MB. Should only be used when replay=1
device : use cuda or cpu depending on your hardware

License

alma is released under the MIT license. See LICENSE for additional details about it. See also our Terms of Use and Privacy Policy.

Anytime Learning At Macroscale

Related tags

Overview

On Anytime Learning At Macroscale

(key) Requirements

Structure

Running Experiments

Important arguments

License

Owner

Meta Research

The easy way to combine mlflow, hydra and optuna into one machine learning pipeline.

Programming assignments and quizzes from all courses within the Machine Learning Engineering for Production (MLOps) specialization offered by deeplearning.ai

PROTEIN EXPRESSION ANALYSIS FOR DOWN SYNDROME

Tangram makes it easy for programmers to train, deploy, and monitor machine learning models.

A Streamlit demo to interactively visualize Uber pickups in New York City

This is a curated list of medical data for machine learning

A high performance and generic framework for distributed DNN training

Formulae is a Python library that implements Wilkinson's formulas for mixed-effects models.

Crypto-trading - ML techiques are used to forecast short term returns in 14 popular cryptocurrencies

30 Days Of Machine Learning Using Pytorch

PennyLane is a cross-platform Python library for differentiable programming of quantum computers

Machine learning that just works, for effortless production applications

This is an auto-ML tool specialized in detecting of outliers

TIANCHI Purchase Redemption Forecast Challenge

A naive Bayes model for cancer classification using a set of documents

A Python package for time series classification

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

An open-source library of algorithms to analyse time series in GPU and CPU.

Both social media sentiment and stock market data are crucial for stock price prediction

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks