Torchrecipes provides a set of reproduci-able, re-usable, ready-to-run RECIPES for training different types of models, across multiple domains, on PyTorch Lightning.

Last update: Dec 28, 2022

Related tags

Text Data & NLP recipes

Overview

torchrecipes

This library is currently under heavy development - if you have suggestions on the API or use-cases you'd like to be covered, please open an github issue or reach out. We'd love to hear about how you're using torchrecipes.

torchrecipes is a prototype is built on top of PyTORCH and provides a set of reproduci-able, re-usable, ready-to-run RECIPES for training different types of models, across multiple domains, on PyTorch Lightning.

It aims to provide reproduci-able "applications" built on top of PyTorch with good performance and easy reproduciability. Because this project builds on the pytorch ecosystem and requires significant investment, we'd love to hear from and work with early adopters to shape the design. Please reach out on the issue tracker if you're interested in using this for your project.

Why `torchrecipes`?

The primary goal of the torchrecipes is to 10x ML development by providing standard blueprints to easily train production-ready ML models across environemnts (from local development to cluster deployment).

Requirements

PyTorch Recipes (torchrecipes):

python3 (3.8+)
torch

Running

The easiest way to run torchrecipes is to use torchx. You can install it directly (if not already included as part of our requirements.txt) with:

pip install torchx

Then go to torchrecipes/launcher/ and create a file torchx_app.py:

specs.AppDef: return specs.AppDef( name="run", roles=[ specs.Role( name="run", image=image, entrypoint="python", args=[*image_classification_args, *job_args], env={ "CONFIG_MODULE": "torchrecipes.vision.image_classification.conf", "MODE": "prod", "HYDRA_FULL_ERROR": "1", } ) ], ) ">

# 'torchrecipes/launcher/torchx_app.py'

import torchx.specs as specs

image_classification_args = [
    "-m", "run",
    "--config-name",
    "train_app",
    "--config-path",
    "torchrecipes/vision/image_classification/conf",
]

def torchx_app(image: str = "run.py:latest", *job_args: str) -> specs.AppDef:
    return specs.AppDef(
        name="run",
        roles=[
            specs.Role(
                name="run",
                image=image,
                entrypoint="python",
                args=[*image_classification_args, *job_args],
                env={
                    "CONFIG_MODULE": "torchrecipes.vision.image_classification.conf",
                    "MODE": "prod",
                    "HYDRA_FULL_ERROR": "1",
                }
            )
        ],
    )

This app defines the entrypoint, args and image for launching.

Now that we have created a torchx app, we are (almost) ready for launching a job!

Firstly, create a symlink for launcher/run.py at the top level of the repo:

ln -s torchrecipes/launcher/run.py ./run.py

Then we are ready-to-go! Simply launch the image_classification recipe with the following command:

torchx run --scheduler local_cwd torchrecipes/launcher/torchx_app.py:torchx_app trainer.fast_dev_run=True trainer.checkpoint_callback=False +tb_save_dir=/tmp/

Release

# install torchrecipes
pip install torchrecipes

Contributing

We welcome PRs! See the CONTRIBUTING file.

License

torchrecipes is BSD licensed, as found in the LICENSE file.

Torchrecipes provides a set of reproduci-able, re-usable, ready-to-run RECIPES for training different types of models, across multiple domains, on PyTorch Lightning.

Related tags

Overview

torchrecipes

Why `torchrecipes`?

Requirements

Running

Release

Contributing

License

Owner

Meta Research

Official codebase for Can Wikipedia Help Offline Reinforcement Learning?

Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition

Winner system (DAMO-NLP) of SemEval 2022 MultiCoNER shared task over 10 out of 13 tracks.

Faster, modernized fork of the language identification tool langid.py

📜 GPT-2 Rhyming Limerick and Haiku models using data augmentation

This project converts your human voice input to its text transcript and to an automated voice too.

Spokestack is a library that allows a user to easily incorporate a voice interface into any Python application with a focus on embedded systems.

hashily is a Python module that provides a variety of text decoding and encoding operations.

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

This repository consists of a complete guide on natural language processing (NLP) in Python where we'll learn various techniques for implementing NLP including parsing & text processing and understand how to use NLP for text feature engineering.

Syntax-aware Multi-spans Generation for Reading Comprehension (TASLP 2022)

Python functions for summarizing and improving voice dictation input.

The official repository of the ISBI 2022 KNIGHT Challenge

Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet

Google AI 2018 BERT pytorch implementation

Search msDS-AllowedToActOnBehalfOfOtherIdentity

BMInf (Big Model Inference) is a low-resource inference package for large-scale pretrained language models (PLMs).

Conversational text Analysis using various NLP techniques

neural network based speaker embedder

This repository contains the code, models and datasets discussed in our paper "Few-Shot Question Answering by Pretraining Span Selection"

Torchrecipes provides a set of reproduci-able, re-usable, ready-to-run RECIPES for training different types of models, across multiple domains, on PyTorch Lightning.

Related tags

Overview

torchrecipes

Why torchrecipes?

Requirements

Running

Release

Contributing

License

Owner

Meta Research

Official codebase for Can Wikipedia Help Offline Reinforcement Learning?

Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition

Winner system (DAMO-NLP) of SemEval 2022 MultiCoNER shared task over 10 out of 13 tracks.

Faster, modernized fork of the language identification tool langid.py

📜 GPT-2 Rhyming Limerick and Haiku models using data augmentation

This project converts your human voice input to its text transcript and to an automated voice too.

Spokestack is a library that allows a user to easily incorporate a voice interface into any Python application with a focus on embedded systems.

hashily is a Python module that provides a variety of text decoding and encoding operations.

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

This repository consists of a complete guide on natural language processing (NLP) in Python where we'll learn various techniques for implementing NLP including parsing & text processing and understand how to use NLP for text feature engineering.

Syntax-aware Multi-spans Generation for Reading Comprehension (TASLP 2022)

Python functions for summarizing and improving voice dictation input.

The official repository of the ISBI 2022 KNIGHT Challenge

Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet

Google AI 2018 BERT pytorch implementation

Search msDS-AllowedToActOnBehalfOfOtherIdentity

BMInf (Big Model Inference) is a low-resource inference package for large-scale pretrained language models (PLMs).

Conversational text Analysis using various NLP techniques

neural network based speaker embedder

This repository contains the code, models and datasets discussed in our paper "Few-Shot Question Answering by Pretraining Span Selection"

Why `torchrecipes`?