Accelerated deep learning R&D

Last update: Jan 06, 2023

Overview

Accelerated deep learning R&D

PyTorch framework for Deep Learning research and development. It focuses on reproducibility, rapid experimentation, and codebase reuse so you can create something new rather than write another regular train loop.
Break the cycle - use the Catalyst!

Project manifest. Part of PyTorch Ecosystem. Part of Catalyst Ecosystem:

Alchemy - experiments logging & visualization
Catalyst - accelerated deep learning R&D
Reaction - convenient deep learning models serving

Catalyst at AI Landscape.

Getting started

pip install -U catalyst

import os
import torch
from torch.nn import functional as F
from torch.utils.data import DataLoader
from catalyst import dl, metrics
from catalyst.contrib.data.cv import ToTensor
from catalyst.contrib.datasets import MNIST

model = torch.nn.Linear(28 * 28, 10)
optimizer = torch.optim.Adam(model.parameters(), lr=0.02)

loaders = {
    "train": DataLoader(MNIST(os.getcwd(), train=True, download=True, transform=ToTensor()), batch_size=32),
    "valid": DataLoader(MNIST(os.getcwd(), train=False, download=True, transform=ToTensor()), batch_size=32),
}

class CustomRunner(dl.Runner):

    def predict_batch(self, batch):
        # model inference step
        return self.model(batch[0].to(self.device).view(batch[0].size(0), -1))

    def _handle_batch(self, batch):
        # model train/valid step
        x, y = batch
        y_hat = self.model(x.view(x.size(0), -1))

        loss = F.cross_entropy(y_hat, y)
        accuracy01, accuracy03 = metrics.accuracy(y_hat, y, topk=(1, 3))
        self.batch_metrics.update(
            {"loss": loss, "accuracy01": accuracy01, "accuracy03": accuracy03}
        )

        if self.is_train_loader:
            loss.backward()
            self.optimizer.step()
            self.optimizer.zero_grad()

runner = CustomRunner()
# model training
runner.train(
    model=model,
    optimizer=optimizer,
    loaders=loaders,
    logdir="./logs",
    num_epochs=5,
    verbose=True,
    load_best_on_end=True,
)
# model inference
for prediction in runner.predict_loader(loader=loaders["valid"]):
    assert prediction.detach().cpu().numpy().shape[-1] == 10
# model tracing
traced_model = runner.trace(loader=loaders["valid"])

Step by step guide

Start with Catalyst 101 — Accelerated PyTorch introduction.
Check minimal examples.
Try notebook tutorials with Google Colab.
Read blogposts with use-cases and guides.
Learn machine learning with our "Deep Learning with Catalyst" course.
If you would like to contribute to the project, follow our contribution guidelines.
If you want to support the project, feel free to donate on patreon page or write us with your proposals.
And do not forget to join our slack for collaboration.

Overview
Catalyst
- Tutorials
- Blogposts
- Docs
- Projects
- Talks
Community

Overview

Catalyst helps you write compact but full-featured Deep Learning pipelines in a few lines of code. You get a training loop with metrics, early-stopping, model checkpointing and other features without the boilerplate.

Installation

Common installation:

pip install -U catalyst

Specific versions with additional requirements

pip install catalyst[ml]         # installs ML-based Catalyst
pip install catalyst[cv]         # installs CV-based Catalyst
pip install catalyst[nlp]        # installs NLP-based Catalyst
pip install catalyst[tune]       # installs Catalyst+Optuna
pip install catalyst[ecosystem]  # installs Catalyst.Ecosystem
# master version installation
pip install git+https://github.com/catalyst-team/[email protected] --upgrade

Catalyst is compatible with: Python 3.6+. PyTorch 1.1+.
Tested on Ubuntu 16.04/18.04/20.04, macOS 10.15, Windows 10 and Windows Subsystem for Linux.

Minimal Examples

ML - linear regression

import torch
from torch.utils.data import DataLoader, TensorDataset
from catalyst.dl import SupervisedRunner

# data
num_samples, num_features = int(1e4), int(1e1)
X, y = torch.rand(num_samples, num_features), torch.rand(num_samples)
dataset = TensorDataset(X, y)
loader = DataLoader(dataset, batch_size=32, num_workers=1)
loaders = {"train": loader, "valid": loader}

# model, criterion, optimizer, scheduler
model = torch.nn.Linear(num_features, 1)
criterion = torch.nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters())
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, [3, 6])

# model training
runner = SupervisedRunner()
runner.train(
    model=model,
    criterion=criterion,
    optimizer=optimizer,
    scheduler=scheduler,
    loaders=loaders,
    logdir="./logdir",
    num_epochs=8,
    verbose=True,
)

ML - multiclass classification

import torch
from torch.utils.data import DataLoader, TensorDataset
from catalyst import dl

# sample data
num_samples, num_features, num_classes = int(1e4), int(1e1), 4
X = torch.rand(num_samples, num_features)
y = (torch.rand(num_samples, ) * num_classes).to(torch.int64)

# pytorch loaders
dataset = TensorDataset(X, y)
loader = DataLoader(dataset, batch_size=32, num_workers=1)
loaders = {"train": loader, "valid": loader}

# model, criterion, optimizer, scheduler
model = torch.nn.Linear(num_features, num_classes)
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters())
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, [2])

# model training
runner = dl.SupervisedRunner()
runner.train(
    model=model,
    criterion=criterion,
    optimizer=optimizer,
    scheduler=scheduler,
    loaders=loaders,
    logdir="./logdir",
    num_epochs=3,
    callbacks=[dl.AccuracyCallback(num_classes=num_classes)]
)

ML - multilabel classification

import torch
from torch.utils.data import DataLoader, TensorDataset
from catalyst import dl

# sample data
num_samples, num_features, num_classes = int(1e4), int(1e1), 4
X = torch.rand(num_samples, num_features)
y = (torch.rand(num_samples, num_classes) > 0.5).to(torch.float32)

# pytorch loaders
dataset = TensorDataset(X, y)
loader = DataLoader(dataset, batch_size=32, num_workers=1)
loaders = {"train": loader, "valid": loader}

# model, criterion, optimizer, scheduler
model = torch.nn.Linear(num_features, num_classes)
criterion = torch.nn.BCEWithLogitsLoss()
optimizer = torch.optim.Adam(model.parameters())
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, [2])

# model training
runner = dl.SupervisedRunner()
runner.train(
    model=model,
    criterion=criterion,
    optimizer=optimizer,
    scheduler=scheduler,
    loaders=loaders,
    logdir="./logdir",
    num_epochs=3,
    callbacks=[dl.MultiLabelAccuracyCallback(threshold=0.5)]
)

CV - MNIST classification

import os
import torch
from torch.nn import functional as F
from torch.utils.data import DataLoader
from catalyst import dl, metrics
from catalyst.contrib.data.cv import ToTensor
from catalyst.contrib.datasets import MNIST

model = torch.nn.Linear(28 * 28, 10)
optimizer = torch.optim.Adam(model.parameters(), lr=0.02)

loaders = {
    "train": DataLoader(MNIST(os.getcwd(), train=True, download=True, transform=ToTensor()), batch_size=32),
    "valid": DataLoader(MNIST(os.getcwd(), train=False, download=True, transform=ToTensor()), batch_size=32),
}

class CustomRunner(dl.Runner):

    def _handle_batch(self, batch):
        x, y = batch
        y_hat = self.model(x.view(x.size(0), -1))

        loss = F.cross_entropy(y_hat, y)
        accuracy01, accuracy03, accuracy05 = metrics.accuracy(y_hat, y, topk=(1, 3, 5))
        self.batch_metrics = {
            "loss": loss,
            "accuracy01": accuracy01,
            "accuracy03": accuracy03,
            "accuracy05": accuracy05,
        }
        
        if self.is_train_loader:
            loss.backward()
            self.optimizer.step()
            self.optimizer.zero_grad()

runner = CustomRunner()
runner.train(
    model=model, 
    optimizer=optimizer, 
    loaders=loaders, 
    verbose=True,
)

CV - classification with AutoEncoder

import os
import torch
from torch import nn
from torch.nn import functional as F
from torch.utils.data import DataLoader
from catalyst import dl, metrics
from catalyst.contrib.data.cv import ToTensor
from catalyst.contrib.datasets import MNIST

class ClassifyAE(nn.Module):

    def __init__(self, in_features, hid_features, out_features):
        super().__init__()
        self.encoder = nn.Sequential(nn.Linear(in_features, hid_features), nn.Tanh())
        self.decoder = nn.Sequential(nn.Linear(hid_features, in_features), nn.Sigmoid())
        self.clf = nn.Linear(hid_features, out_features)

    def forward(self, x):
        z = self.encoder(x)
        y_hat = self.clf(z)
        x_ = self.decoder(z)
        return y_hat, x_

model = ClassifyAE(28 * 28, 128, 10)
optimizer = torch.optim.Adam(model.parameters(), lr=0.02)

loaders = {
    "train": DataLoader(MNIST(os.getcwd(), train=True, download=True, transform=ToTensor()), batch_size=32),
    "valid": DataLoader(MNIST(os.getcwd(), train=False, download=True, transform=ToTensor()), batch_size=32),
}

class CustomRunner(dl.Runner):

    def _handle_batch(self, batch):
        x, y = batch
        x = x.view(x.size(0), -1)
        y_hat, x_ = self.model(x)

        loss_clf = F.cross_entropy(y_hat, y)
        loss_ae = F.mse_loss(x_, x)
        loss = loss_clf + loss_ae
        accuracy01, accuracy03, accuracy05 = metrics.accuracy(y_hat, y, topk=(1, 3, 5))
        self.batch_metrics = {
            "loss_clf": loss_clf,
            "loss_ae": loss_ae,
            "loss": loss,
            "accuracy01": accuracy01,
            "accuracy03": accuracy03,
            "accuracy05": accuracy05,
        }

        if self.is_train_loader:
            loss.backward()
            self.optimizer.step()
            self.optimizer.zero_grad()

runner = CustomRunner()
runner.train(
    model=model,
    optimizer=optimizer,
    loaders=loaders,
    verbose=True,
)

CV - classification with Variational AutoEncoder

import os
import numpy as np
import torch
from torch import nn
from torch.nn import functional as F
from torch.utils.data import DataLoader
from catalyst import dl, metrics
from catalyst.contrib.data.cv import ToTensor
from catalyst.contrib.datasets import MNIST

LOG_SCALE_MAX = 2
LOG_SCALE_MIN = -10

def normal_sample(loc, log_scale):
    scale = torch.exp(0.5 * log_scale)
    return loc + scale * torch.randn_like(scale)

class ClassifyVAE(torch.nn.Module):

    def __init__(self, in_features, hid_features, out_features):
        super().__init__()
        self.encoder = nn.Linear(in_features, hid_features * 2)
        self.decoder = nn.Sequential(nn.Linear(hid_features, in_features), nn.Sigmoid())
        self.clf = nn.Linear(hid_features, out_features)

    def forward(self, x, deterministic=False):
        z = self.encoder(x)
        bs, z_dim = z.shape

        loc, log_scale = z[:, :z_dim // 2], z[:, z_dim // 2:]
        log_scale = torch.clamp(log_scale, LOG_SCALE_MIN, LOG_SCALE_MAX)

        z_ = loc if deterministic else normal_sample(loc, log_scale)
        z_ = z_.view(bs, -1)
        x_ = self.decoder(z_)

        y_hat = self.clf(z_)

        return y_hat, x_, loc, log_scale

model = ClassifyVAE(28 * 28, 64, 10)
optimizer = torch.optim.Adam(model.parameters(), lr=0.02)

loaders = {
    "train": DataLoader(MNIST(os.getcwd(), train=True, download=True, transform=ToTensor()), batch_size=32),
    "valid": DataLoader(MNIST(os.getcwd(), train=False, download=True, transform=ToTensor()), batch_size=32),
}

class CustomRunner(dl.Runner):

    def _handle_batch(self, batch):
        x, y = batch
        x = x.view(x.size(0), -1)
        y_hat, x_, loc, log_scale = self.model(x, deterministic=not self.is_train_loader)

        loss_clf = F.cross_entropy(y_hat, y)
        loss_ae = F.mse_loss(x_, x)
        loss_kld = (-0.5 * torch.sum(1 + log_scale - loc.pow(2) - log_scale.exp(), dim=1)).mean()
        loss = loss_clf + loss_ae + loss_kld
        accuracy01, accuracy03, accuracy05 = metrics.accuracy(y_hat, y, topk=(1, 3, 5))
        self.batch_metrics = {
            "loss_clf": loss_clf,
            "loss_ae": loss_ae,
            "loss_kld": loss_kld,
            "loss": loss,
            "accuracy01": accuracy01,
            "accuracy03": accuracy03,
            "accuracy05": accuracy05,
        }

        if self.is_train_loader:
            loss.backward()
            self.optimizer.step()
            self.optimizer.zero_grad()

runner = CustomRunner()
runner.train(
    model=model,
    optimizer=optimizer,
    loaders=loaders,
    verbose=True,
)

CV - segmentation with classification auxiliary task

import os
import torch
from torch import nn
from torch.nn import functional as F
from torch.utils.data import DataLoader
from catalyst import dl, metrics
from catalyst.contrib.data.cv import ToTensor
from catalyst.contrib.datasets import MNIST

class ClassifyUnet(nn.Module):

    def __init__(self, in_channels, in_hw, out_features):
        super().__init__()
        self.encoder = nn.Sequential(nn.Conv2d(in_channels, in_channels, 3, 1, 1), nn.Tanh())
        self.decoder = nn.Conv2d(in_channels, in_channels, 3, 1, 1)
        self.clf = nn.Linear(in_channels * in_hw * in_hw, out_features)

    def forward(self, x):
        z = self.encoder(x)
        z_ = z.view(z.size(0), -1)
        y_hat = self.clf(z_)
        x_ = self.decoder(z)
        return y_hat, x_

model = ClassifyUnet(1, 28, 10)
optimizer = torch.optim.Adam(model.parameters(), lr=0.02)

loaders = {
    "train": DataLoader(MNIST(os.getcwd(), train=True, download=True, transform=ToTensor()), batch_size=32),
    "valid": DataLoader(MNIST(os.getcwd(), train=False, download=True, transform=ToTensor()), batch_size=32),
}

class CustomRunner(dl.Runner):

    def _handle_batch(self, batch):
        x, y = batch
        x_noise = (x + torch.rand_like(x)).clamp_(0, 1)
        y_hat, x_ = self.model(x_noise)

        loss_clf = F.cross_entropy(y_hat, y)
        iou = metrics.iou(x_, x).mean()
        loss_iou = 1 - iou
        loss = loss_clf + loss_iou
        accuracy01, accuracy03, accuracy05 = metrics.accuracy(y_hat, y, topk=(1, 3, 5))
        self.batch_metrics = {
            "loss_clf": loss_clf,
            "loss_iou": loss_iou,
            "loss": loss,
            "iou": iou,
            "accuracy01": accuracy01,
            "accuracy03": accuracy03,
            "accuracy05": accuracy05,
        }
        
        if self.is_train_loader:
            loss.backward()
            self.optimizer.step()
            self.optimizer.zero_grad()

runner = CustomRunner()
runner.train(
    model=model, 
    optimizer=optimizer, 
    loaders=loaders, 
    verbose=True,
)

CV - MNIST with Metric Learning

from torch.optim import Adam
from torch.utils.data import DataLoader

from catalyst import data, dl, utils
from catalyst.contrib import datasets, models, nn
import catalyst.contrib.data.cv.transforms.torch as t


# 1. train and valid datasets
dataset_root = "."
transforms = t.Compose([t.ToTensor(), t.Normalize((0.1307,), (0.3081,))])

dataset_train = datasets.MnistMLDataset(root=dataset_root, download=True, transform=transforms)
sampler = data.BalanceBatchSampler(labels=dataset_train.get_labels(), p=5, k=10)
train_loader = DataLoader(dataset=dataset_train, sampler=sampler, batch_size=sampler.batch_size)

dataset_val = datasets.MnistQGDataset(root=dataset_root, transform=transforms, gallery_fraq=0.2)
val_loader = DataLoader(dataset=dataset_val, batch_size=1024)

# 2. model and optimizer
model = models.SimpleConv(features_dim=16)
optimizer = Adam(model.parameters(), lr=0.001)

# 3. criterion with triplets sampling
sampler_inbatch = data.HardTripletsSampler(norm_required=False)
criterion = nn.TripletMarginLossWithSampler(margin=0.5, sampler_inbatch=sampler_inbatch)

# 4. training with catalyst Runner
callbacks = [
    dl.ControlFlowCallback(dl.CriterionCallback(), loaders="train"),
    dl.ControlFlowCallback(dl.CMCScoreCallback(topk_args=[1]), loaders="valid"),
    dl.PeriodicLoaderCallback(valid=100),
]

runner = dl.SupervisedRunner(device=utils.get_device())
runner.train(
    model=model,
    criterion=criterion,
    optimizer=optimizer,
    callbacks=callbacks,
    loaders={"train": train_loader, "valid": val_loader},
    minimize_metric=False,
    verbose=True,
    valid_loader="valid",
    num_epochs=200,
    main_metric="cmc01",
)

GAN - MNIST, flatten version

import os
import torch
from torch import nn
from torch.nn import functional as F
from torch.utils.data import DataLoader
from catalyst import dl
from catalyst.contrib.data.cv import ToTensor
from catalyst.contrib.datasets import MNIST
from catalyst.contrib.nn.modules import Flatten, GlobalMaxPool2d, Lambda

latent_dim = 128
generator = nn.Sequential(
    # We want to generate 128 coefficients to reshape into a 7x7x128 map
    nn.Linear(128, 128 * 7 * 7),
    nn.LeakyReLU(0.2, inplace=True),
    Lambda(lambda x: x.view(x.size(0), 128, 7, 7)),
    nn.ConvTranspose2d(128, 128, (4, 4), stride=(2, 2), padding=1),
    nn.LeakyReLU(0.2, inplace=True),
    nn.ConvTranspose2d(128, 128, (4, 4), stride=(2, 2), padding=1),
    nn.LeakyReLU(0.2, inplace=True),
    nn.Conv2d(128, 1, (7, 7), padding=3),
    nn.Sigmoid(),
)
discriminator = nn.Sequential(
    nn.Conv2d(1, 64, (3, 3), stride=(2, 2), padding=1),
    nn.LeakyReLU(0.2, inplace=True),
    nn.Conv2d(64, 128, (3, 3), stride=(2, 2), padding=1),
    nn.LeakyReLU(0.2, inplace=True),
    GlobalMaxPool2d(),
    Flatten(),
    nn.Linear(128, 1)
)

model = {"generator": generator, "discriminator": discriminator}
optimizer = {
    "generator": torch.optim.Adam(generator.parameters(), lr=0.0003, betas=(0.5, 0.999)),
    "discriminator": torch.optim.Adam(discriminator.parameters(), lr=0.0003, betas=(0.5, 0.999)),
}
loaders = {
    "train": DataLoader(MNIST(os.getcwd(), train=True, download=True, transform=ToTensor()), batch_size=32),
}

class CustomRunner(dl.Runner):

    def _handle_batch(self, batch):
        real_images, _ = batch
        batch_metrics = {}
        
        # Sample random points in the latent space
        batch_size = real_images.shape[0]
        random_latent_vectors = torch.randn(batch_size, latent_dim).to(self.device)
        
        # Decode them to fake images
        generated_images = self.model["generator"](random_latent_vectors).detach()
        # Combine them with real images
        combined_images = torch.cat([generated_images, real_images])
        
        # Assemble labels discriminating real from fake images
        labels = torch.cat([
            torch.ones((batch_size, 1)), torch.zeros((batch_size, 1))
        ]).to(self.device)
        # Add random noise to the labels - important trick!
        labels += 0.05 * torch.rand(labels.shape).to(self.device)
        
        # Train the discriminator
        predictions = self.model["discriminator"](combined_images)
        batch_metrics["loss_discriminator"] = \
          F.binary_cross_entropy_with_logits(predictions, labels)
        
        # Sample random points in the latent space
        random_latent_vectors = torch.randn(batch_size, latent_dim).to(self.device)
        # Assemble labels that say "all real images"
        misleading_labels = torch.zeros((batch_size, 1)).to(self.device)
        
        # Train the generator
        generated_images = self.model["generator"](random_latent_vectors)
        predictions = self.model["discriminator"](generated_images)
        batch_metrics["loss_generator"] = \
          F.binary_cross_entropy_with_logits(predictions, misleading_labels)
        
        self.batch_metrics.update(**batch_metrics)

runner = CustomRunner()
runner.train(
    model=model, 
    optimizer=optimizer,
    loaders=loaders,
    callbacks=[
        dl.OptimizerCallback(
            optimizer_key="generator", 
            metric_key="loss_generator"
        ),
        dl.OptimizerCallback(
            optimizer_key="discriminator", 
            metric_key="loss_discriminator"
        ),
    ],
    main_metric="loss_generator",
    num_epochs=20,
    verbose=True,
    logdir="./logs_gan",
)

ML - multiclass classification (fp16 training version)

# pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" git+https://github.com/NVIDIA/apex
import torch
from torch.utils.data import DataLoader, TensorDataset
from catalyst import dl

# sample data
num_samples, num_features, num_classes = int(1e4), int(1e1), 4
X = torch.rand(num_samples, num_features)
y = (torch.rand(num_samples, ) * num_classes).to(torch.int64)

# pytorch loaders
dataset = TensorDataset(X, y)
loader = DataLoader(dataset, batch_size=32, num_workers=1)
loaders = {"train": loader, "valid": loader}

# model, criterion, optimizer, scheduler
model = torch.nn.Linear(num_features, num_classes)
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters())
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, [2])

# model training
runner = dl.SupervisedRunner()
runner.train(
    model=model,
    criterion=criterion,
    optimizer=optimizer,
    scheduler=scheduler,
    loaders=loaders,
    logdir="./logdir",
    num_epochs=3,
    callbacks=[dl.AccuracyCallback(num_classes=num_classes)],
    fp16=True,
)

ML - multiclass classification (advanced fp16 training version)

# pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" git+https://github.com/NVIDIA/apex
import torch
from torch.utils.data import DataLoader, TensorDataset
from catalyst import dl

# sample data
num_samples, num_features, num_classes = int(1e4), int(1e1), 4
X = torch.rand(num_samples, num_features)
y = (torch.rand(num_samples, ) * num_classes).to(torch.int64)

# pytorch loaders
dataset = TensorDataset(X, y)
loader = DataLoader(dataset, batch_size=32, num_workers=1)
loaders = {"train": loader, "valid": loader}

# model, criterion, optimizer, scheduler
model = torch.nn.Linear(num_features, num_classes)
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters())
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, [2])

# model training
runner = dl.SupervisedRunner()
runner.train(
    model=model,
    criterion=criterion,
    optimizer=optimizer,
    scheduler=scheduler,
    loaders=loaders,
    logdir="./logdir",
    num_epochs=3,
    callbacks=[dl.AccuracyCallback(num_classes=num_classes)],
    fp16=dict(apex=True, opt_level="O1"),
)

ML - Linear Regression (distributed training version)

#!/usr/bin/env python
import torch
from torch.utils.data import TensorDataset
from catalyst.dl import SupervisedRunner, utils

def datasets_fn(num_features: int):
    X = torch.rand(int(1e4), num_features)
    y = torch.rand(X.shape[0])
    dataset = TensorDataset(X, y)
    return {"train": dataset, "valid": dataset}

def train():
    num_features = int(1e1)
    # model, criterion, optimizer, scheduler
    model = torch.nn.Linear(num_features, 1)
    criterion = torch.nn.MSELoss()
    optimizer = torch.optim.Adam(model.parameters())
    scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, [3, 6])

    runner = SupervisedRunner()
    runner.train(
        model=model,
        datasets={
            "batch_size": 32,
            "num_workers": 1,
            "get_datasets_fn": datasets_fn,
            "num_features": num_features,  # will be passed to datasets_fn
        },
        criterion=criterion,
        optimizer=optimizer,
        scheduler=scheduler,
        logdir="./logs/example_distributed_ml",
        num_epochs=8,
        verbose=True,
        distributed=False,
    )

utils.distributed_cmd_run(train)

CV - classification with AutoEncoder (distributed training version)

#!/usr/bin/env python
import os
import torch
from torch import nn
from torch.nn import functional as F
from catalyst import dl, metrics, utils
from catalyst.contrib.data.cv import ToTensor
from catalyst.contrib.datasets import MNIST

class ClassifyAE(nn.Module):

    def __init__(self, in_features, hid_features, out_features):
        super().__init__()
        self.encoder = nn.Sequential(nn.Linear(in_features, hid_features), nn.Tanh())
        self.decoder = nn.Linear(hid_features, in_features)
        self.clf = nn.Linear(hid_features, out_features)

    def forward(self, x):
        z = self.encoder(x)
        y_hat = self.clf(z)
        x_ = self.decoder(z)
        return y_hat, x_

class CustomRunner(dl.Runner):

    def _handle_batch(self, batch):
        x, y = batch
        x = x.view(x.size(0), -1)
        y_hat, x_ = self.model(x)

        loss_clf = F.cross_entropy(y_hat, y)
        loss_ae = F.mse_loss(x_, x)
        loss = loss_clf + loss_ae
        accuracy01, accuracy03, accuracy05 = metrics.accuracy(y_hat, y, topk=(1, 3, 5))
        self.batch_metrics = {
            "loss_clf": loss_clf,
            "loss_ae": loss_ae,
            "loss": loss,
            "accuracy01": accuracy01,
            "accuracy03": accuracy03,
            "accuracy05": accuracy05,
        }

        if self.is_train_loader:
            loss.backward()
            self.optimizer.step()
            self.optimizer.zero_grad()

def datasets_fn():
    dataset = MNIST(os.getcwd(), train=False, download=True, transform=ToTensor())
    return {"train": dataset, "valid": dataset}

def train():
    model = ClassifyAE(28 * 28, 128, 10)
    optimizer = torch.optim.Adam(model.parameters(), lr=0.02)

    runner = CustomRunner()
    runner.train(
        model=model,
        optimizer=optimizer,
        datasets={
            "batch_size": 32,
            "num_workers": 1,
            "get_datasets_fn": datasets_fn,
        },
        logdir="./logs/distributed_ae",
        num_epochs=8,
        verbose=True,
    )

utils.distributed_cmd_run(train)

ML - multiclass classification (TPU version)

import torch
from torch.utils.data import DataLoader, TensorDataset
from catalyst import dl, utils

# sample data
num_samples, num_features, num_classes = int(1e4), int(1e1), 4
X = torch.rand(num_samples, num_features)
y = (torch.rand(num_samples, ) * num_classes).to(torch.int64)

# pytorch loaders
dataset = TensorDataset(X, y)
loader = DataLoader(dataset, batch_size=32, num_workers=1)
loaders = {"train": loader, "valid": loader}

# device (TPU > GPU > CPU)
device = utils.get_device()  # <--------- TPU device

# model, criterion, optimizer, scheduler
model = torch.nn.Linear(num_features, num_classes).to(device)
criterion = torch.nn.CrossEntropyLoss().to(device)
optimizer = torch.optim.Adam(model.parameters())

# model training
runner = dl.SupervisedRunner(device=device)
runner.train(
    model=model,
    criterion=criterion,
    optimizer=optimizer,
    loaders=loaders,
    logdir="./logdir",
    num_epochs=3,
    callbacks=[dl.AccuracyCallback(num_classes=num_classes)]
)

AutoML - hyperparameters optimization with Optuna

import os
import optuna
import torch
from torch import nn
from torch.utils.data import DataLoader
from catalyst import dl
from catalyst.contrib.data.cv import ToTensor
from catalyst.contrib.datasets import MNIST
from catalyst.contrib.nn import Flatten
    

def objective(trial):
    lr = trial.suggest_loguniform("lr", 1e-3, 1e-1)
    num_hidden = int(trial.suggest_loguniform("num_hidden", 32, 128))

    loaders = {
        "train": DataLoader(MNIST(os.getcwd(), train=True, download=True, transform=ToTensor()), batch_size=32),
        "valid": DataLoader(MNIST(os.getcwd(), train=False, download=True, transform=ToTensor()), batch_size=32),
    }
    model = nn.Sequential(
        Flatten(), nn.Linear(784, num_hidden), nn.ReLU(), nn.Linear(num_hidden, 10)
    )
    optimizer = torch.optim.Adam(model.parameters(), lr=lr)
    criterion = nn.CrossEntropyLoss()

    runner = dl.SupervisedRunner()
    runner.train(
        model=model,
        loaders=loaders,
        criterion=criterion,
        optimizer=optimizer,
        callbacks=[
            dl.OptunaCallback(trial),
            dl.AccuracyCallback(num_classes=10),
        ],
        num_epochs=10,
        main_metric="accuracy01",
        minimize_metric=False,
    )
    return runner.best_valid_metrics[runner.main_metric]

study = optuna.create_study(
    direction="maximize",
    pruner=optuna.pruners.MedianPruner(
        n_startup_trials=1, n_warmup_steps=0, interval_steps=1
    ),
)
study.optimize(objective, n_trials=10, timeout=300)
print(study.best_value, study.best_params)

Features

Universal train/inference loop.
Configuration files for model/data hyperparameters.
Reproducibility – all source code and environment variables will be saved.
Callbacks – reusable train/inference pipeline parts with easy customization.
Training stages support.
Deep Learning best practices - SWA, AdamW, Ranger optimizer, OneCycle, and more.
Developments best practices - fp16 support, distributed training, slurm support.

Structure

callbacks - a variety of callbacks for your train-loop customization.
contrib - additional modules contributed by Catalyst users.
core - framework core with main abstractions - Experiment, Runner and Callback.
data - useful tools and scripts for data processing.
dl - entrypoint for your deep learning experiments.
experiments - a number of useful experiments extensions for Notebook and Config API.
metrics – classic ML and CV/NLP/RecSys metrics.
registry - Catalyst global registry for Config API.
runners - runners extensions for different deep learning tasks.
tools - extra tools for Deep Learning research, class-based helpers.
utils - typical utils for Deep Learning research, function-based helpers.

Tests

All Catalyst code, features and pipelines are fully tested with our own catalyst-codestyle.

In fact, we train a number of different models for various of tasks - image classification, image segmentation, text classification, GANs training and much more. During the tests, we compare their convergence metrics in order to verify the correctness of the training procedure and its reproducibility.

As a result, Catalyst provides fully tested and reproducible best practices for your deep learning research.

Catalyst

Tutorials

Customizing what happens in train
Demo with minimal examples for ML, CV, NLP, GANs and RecSys
Detailed classification tutorial
Advanced segmentation tutorial
Metric Learning tutorial
Catalyst with Google TPU

Blogposts

Docs

Projects

Examples, notebooks and starter kits

CamVid Segmentation Example - Example of semantic segmentation for CamVid dataset
Notebook API tutorial for segmentation in Understanding Clouds from Satellite Images Competition
Catalyst.RL - NeurIPS 2019: Learn to Move - Walk Around – starter kit
Catalyst.RL - NeurIPS 2019: Animal-AI Olympics - starter kit
Inria Segmentation Example - An example of training segmentation model for Inria Sattelite Segmentation Challenge
iglovikov_segmentation - Semantic segmentation pipeline using Catalyst

Competitions

Kaggle Quick, Draw! Doodle Recognition Challenge - 11th place solution
Catalyst.RL - NeurIPS 2018: AI for Prosthetics Challenge – 3rd place solution
Kaggle Google Landmark 2019 - 30th place solution
iMet Collection 2019 - FGVC6 - 24th place solution
ID R&D Anti-spoofing Challenge - 14th place solution
NeurIPS 2019: Recursion Cellular Image Classification - 4th place solution
MICCAI 2019: Automatic Structure Segmentation for Radiotherapy Planning Challenge 2019
- 3rd place solution for Task 3: Organ-at-risk segmentation from chest CT scans
- and 4th place solution for Task 4: Gross Target Volume segmentation of lung cancer
Kaggle Seversteal steel detection - 5th place solution
RSNA Intracranial Hemorrhage Detection - 5th place solution
APTOS 2019 Blindness Detection – 7th place solution
Catalyst.RL - NeurIPS 2019: Learn to Move - Walk Around – 2nd place solution
xView2 Damage Assessment Challenge - 3rd place solution

Paper implementations

Tools and pipelines

Catalyst.RL – A Distributed Framework for Reproducible RL Research by Scitator
Catalyst.Classification - Comprehensive classification pipeline with Pseudo-Labeling by Bagxi and Pdanilov
Catalyst.Segmentation - Segmentation pipelines - binary, semantic and instance, by Bagxi
Catalyst.Detection - Anchor-free detection pipeline by Avi2011class and TezRomacH
Catalyst.GAN - Reproducible GANs pipelines by Asmekal
Catalyst.Neuro - Brain image analysis project, in collaboration with TReNDS Center
MLComp – distributed DAG framework for machine learning with UI by Lightforever
Pytorch toolbelt - PyTorch extensions for fast R&D prototyping and Kaggle farming by BloodAxe
Helper functions - An unstructured set of helper functions by Ternaus
BERT Distillation with Catalyst by elephantmipt

Talks

Catalyst-team YouTube channel
Catalyst.RL – reproducible RL research framework at Stachka
Catalyst.DL – reproducible DL research framework (rus) and slides (eng) at RIF
Catalyst.DL – reproducible DL research framework (rus) and slides (eng) at AI-Journey
Catalyst.DL – fast & reproducible DL at Datastart
Catalyst.RL - NeurIPS 2019: Learn to Move - Walk Around and slides (eng) at RL reading group Meetup
Catalyst – accelerated DL & RL (rus) and slides (eng) at Facebook Developer Circle: Moscow | ML & AI Meetup
Catalyst.RL - Learn to Move - Walk Around 2nd place solution at NeurIPS competition track
Open Source ML 2019 edition at Datafest.elka

Community

Contribution guide

We appreciate all contributions. If you are planning to contribute back bug-fixes, please do so without any further discussion. If you plan to contribute new features, utility functions or extensions, please first open an issue and discuss the feature with us.

Please see the contribution guide for more information.
By participating in this project, you agree to abide by its Code of Conduct.

User feedback

We have created [email protected] for "user feedback".

If you like the project and want to say thanks, this the right place.
If you would like to start a collaboration between your team and Catalyst team to do better Deep Learning R&D - you are always welcome.
If you just don't like Github issues and this ways suits you better - feel free to email us.
Finally, if you do not like something, please, share it with us and we can see how to improve it.

We appreciate any type of feedback. Thank you!

Acknowledgments

Since the beginning of the development of the Сatalyst, a lot of people have influenced it in a lot of different ways.

Catalyst.Team

Eugene Kachan (bagxi) - Config API improvements and CV pipelines
Dmytro Doroshenko (ditwoo) - best ever test cases
Artem Zolkin (arquestro) - documentation grandmaster
David Kuryakin (dkuryakin) - Reaction design

Catalyst - Metric Learning team

Catalyst.Contributors

Evgeny Semyonov (lightforever) - MLComp creator
Andrey Zharkov (asmekal) - Catalyst.GAN initiative
Aleksey Grinchuk (alexgrinch) and Valentin Khrulkov (khrulkovv) - many RL collaborations
Alex Gaziev (gazay) - a bunch of Config API improvements and our Config API wizard support
Eugene Khvedchenya (bloodaxe) - Pytorch-toolbelt library maintainer
Yury Kashnitsky (yorko) - Catalyst.NLP initiative

Catalyst.Friends

Vladimir Iglovikov (ternaus) - kaggle grandmaster advices
Nguyen Xuan Bac (ngxbac) - kaggle competitions support
Ivan Stepanenko - awesome Catalyst.Ecosystem design

Trusted by

Awecom
Researchers@Center for Translational Research in Neuroimaging and Data Science (TReNDS)
Deep Learning School
Researchers@Emory University
Evil Martians
Researchers@Georgia Institute of Technology
Researchers@Georgia State University
Helios
HPCD Lab
iFarm
Kinoplan
Researchers@Moscow Institute of Physics and Technology
Neuromation
Poteha Labs
Provectus
Researchers@Skolkovo Institute of Science and Technology
SoftConstruct
Researchers@Tinkoff
Researchers@Yandex.Research

Supported by

Citation

Please use this bibtex if you want to cite this repository in your publications:

@misc{catalyst,
    author = {Kolesnikov, Sergey},
    title = {Accelerated deep learning R&D},
    year = {2018},
    publisher = {GitHub},
    journal = {GitHub repository},
    howpublished = {\url{https://github.com/catalyst-team/catalyst}},
}

Comments

Version/19.03
catalyst-dl 19.02 proposal

main goal:

from catalyst import Runner from expdir import MyExperiment Runner.run(MyExperiment) (train/infer)

typical run:

mode (train/infer) stage epoch loader batch

during stage - model/etc are the same, between stages - can be easily replaced

main entities:

Registry - Factory for registering user extentions

Experiment - keeper of the config, knows how to create model / etc, but does not keep them

State - all infos about what is in the experiment now, in the current stage

Runner - runner, responsible for main logic

Callbacks - additional user extentions for changing runner’s work a bit

WIP
opened by Scitator 43
WandB batch metrics logging error
🐛 Bug Report

In wandb all batch metrics are logged as single value per epoch.

Expected behavior

Batch metrics must be logged once per step.

Catalyst version: 21.7

Additional context

The problem is here:

https://github.com/catalyst-team/catalyst/blob/master/catalyst/loggers/wandb.py#L115

Step must be equal to global_sample_step, not global_epoch_step.
bug help wanted
opened by ivan-chai 20

Evaluate for Runner

🚀 Feature Request

The evaluate_loader method for Python API. Similar to .train and .predict_loader

Motivation

Proposal

Possible use case

import os
from torch import nn, optim
from torch.utils.data import DataLoader
from catalyst import dl, utils
from catalyst.data.transforms import ToTensor
from catalyst.contrib.datasets import MNIST

model = nn.Sequential(nn.Flatten(), nn.Linear(28 * 28, 10))
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.02)

loaders = {
    "train": DataLoader(
        MNIST(os.getcwd(), train=True, download=True, transform=ToTensor()), batch_size=32
    ),
    "valid": DataLoader(
        MNIST(os.getcwd(), train=False, download=True, transform=ToTensor()), batch_size=32
    ),
}

runner = dl.SupervisedRunner(
    input_key="features", output_key="logits", target_key="targets", loss_key="loss"
)
# model training
runner.train(
    model=model,
    criterion=criterion,
    optimizer=optimizer,
    loaders=loaders,
    num_epochs=1,
    callbacks=[
        dl.AccuracyCallback(input_key="logits", target_key="targets", topk_args=(1, 3, 5)),
        dl.PrecisionRecallF1SupportCallback(
            input_key="logits", target_key="targets", num_classes=10
        ),
        dl.AUCCallback(input_key="logits", target_key="targets"),
        # catalyst[ml] required ``pip install catalyst[ml]``
        # dl.ConfusionMatrixCallback(input_key="logits", target_key="targets", num_classes=10),
    ],
    logdir="./logs",
    valid_loader="valid",
    valid_metric="loss",
    minimize_valid_metric=True,
    verbose=True,
    load_best_on_end=True,
)

loader_metrics = runner.evaluate_loader(
    loader=loaders["valid"]), 
    callbacks=[
        dl.AccuracyCallback(input_key="logits", target_key="targets", topk_args=(1, 3, 5)),
        dl.PrecisionRecallF1SupportCallback(
            input_key="logits", target_key="targets", num_classes=10
        ),
    ])

Alternatives

The whole method could be easily done with the .train approach, but for a more user-friendly API – why should not we add a simplified alias?

Additional context

Checklist

[x] feature proposal description
[x] motivation
[x] extra proposal context / proposal alternatives review

FAQ

Please review the FAQ before submitting an issue:

[x] I have read the documentation and FAQ
[x] I have reviewed the minimal examples section
[x] I have checked the changelog for main framework updates
[x] I have read the contribution guide
[x] I have joined Catalyst slack (#__questions channel) for issue discussion

enhancement help wanted good first issue

opened by Scitator 17

Naming inconsistency

Describe the bug I found that some names agruments in framework aren't consistent. So for example:

class SupervisedRunner(Runner):
    """Runner for experiments with supervised model."""

    _experiment_fn: Callable = SupervisedExperiment

    def __init__(
        self,
        model: Model = None,
        device: Device = None,
        input_key: Any = "features", 
        output_key: Any = "logits",
        input_target_key: str = "targets", # This argument corresponds to input_key argument in CriterionCallback
    ):

class CriterionCallback(_MetricCallback):
    """Callback for that measures loss with specified criterion."""

    def __init__(
        self,
        input_key: Union[str, List[str], Dict[str, str]] = "targets", # This argument corresponds to input_target_key argument in SupervisedRunner
        output_key: Union[str, List[str], Dict[str, str]] = "logits",
        prefix: str = "loss",
        criterion_key: str = None,
        multiplier: float = 1.0,
        **metric_kwargs,
    ):

To Reproduce Steps to reproduce the behavior:

Check files: catalyst.core.callback.metric.py and catalyst.dl.runner.supervised.py

Expected behavior I expect that names would be consistent across the framework and means the same

enhancement help wanted good first issue question wontfix

opened by ogvalt 17

Update ce.py
Description

Implementation of Symmetric Cross Entropy

Related Issue

https://github.com/catalyst-team/catalyst/issues/479

Type of Change

[ ] Examples / docs / tutorials / contributors update

[ ] Bug fix (non-breaking change which fixes an issue)

[x] Improvement (non-breaking change which improves an existing feature)

[x] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to change)

Checklist

[x] I have read the Code of Conduct document.

[x] I have read the Contributing guide.

[ ] I have checked the code-style using make check-style.

[x] I have written the docstring in Google format for all the methods and classes that I used.

[ ] I have checked the docs using make check-docs.

enhancement good first issue WIP
opened by KyloRen1 17
Triplet loss epic

best triplet loss ever

https://github.com/adambielski/siamese-triplet https://github.com/andreasveit/triplet-network-pytorch https://github.com/CoinCheung/triplet-reid-pytorch https://discuss.pytorch.org/t/triplet-loss-in-pytorch/30634
enhancement good first issue

opened by ermakovpetr 16

Add support for WandbLogger

Before submitting (checklist)

[x] Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
[x] Did you read the contribution guide?
[x] Did you check the code style? catalyst-make-codestyle && catalyst-check-codestyle (pip install -U catalyst-codestyle).
[x] Did you make sure to update the docs? We use Google format for all the methods and classes.
[x] Did you check the docs with make check-docs?
[x] Did you write any new necessary tests?
[x] Did you check that your code passes the unit tests pytest . ?
[x] Did you add your new functionality to the docs?
[x] Did you update the CHANGELOG?
[ ] Did you run colab minimal CI/CD with latest and minimal requirements?

Description

This PR adds support for WandbLogger that enables logging metrics and media to W&B dashboard

Related Issue

Type of Change

[ ] Examples / docs / tutorials / contributors update
[ ] Bug fix (non-breaking change which fixes an issue)
[ ] Improvement (non-breaking change which improves an existing feature)
[x] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to change)

PR review

Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Additional Deatils:

The minimum tests colab seemed stuck while running tests, but the tests passed on my machine. I'll update this thread with test results
I've made this draft PR as I still want to confirm the hyperparameter logging behavior. On running the test_finetune2.py train method, no hyperparameters are being logged.

Test logs

Code style --> catalyst-make-codestyle && catalyst-check-codestyle

python3.8/site-packages/isort/settings.py:619: UserWarning: Failed to pull configuration information from /home/saksham/Desktop/catalyst/setup.cfg
  warn(f"Failed to pull configuration information from {potential_config_file}")
Skipped 55 files
python3.8/site-packages/isort/main.py:1000: UserWarning: W0501: The following deprecated CLI flags were used and ignored: --apply!
  warn(python3.8/site-packages/isort/main.py:1004: UserWarning: W0500: Please see the 5.0.0 Upgrade guide: https://pycqa.github.io/isort/docs/upgrade_guides/5.0.0/
  warn(
All done! ✨ 🍰 ✨
350 files left unchanged.
python3.8/site-packages/isort/settings.py:619: UserWarning: Failed to pull configuration information from /home/catalyst/setup.cfg
  warn(f"Failed to pull configuration information from {potential_config_file}")
Skipped 55 files
All done! ✨ 🍰 ✨
350 files would be left unchanged.
Failed to pull configuration information from home/catalyst/setup.cfg
0

Docs check -->rm -rf ./builds; REMOVE_BUILDS=0 make check-docs

reading sources... [100%] tutorials/ddp                                                                                                                                                              
looking for now-outdated files... none found
pickling environment... done
checking consistency... done
preparing documents... done
writing output... [100%] tutorials/ddp                                                                                                                                                               
generating indices...  genindex py-modindexdone
highlighting module code... [100%] torch.utils.data.sampler                                                                                                                                          
writing additional pages...  search/home/saksham/anaconda3/envs/catalyst_dev/lib/python3.8/site-packages/catalyst_sphinx_theme/search.html:21: RemovedInSphinx30Warning: To modify script_files in the theme is deprecated. Please insert a <script> tag directly in your theme instead.
  {% trans %}Please activate JavaScript to enable the search
done
copying static files... ... done
copying extra files... done
dumping search index in English (code: en)... done
dumping object inventory... done
build succeeded.

The HTML pages are in builds.
#### CODE: 0 ####

Tests --> pytest .

337 passed, 134 skipped, 2 xfailed, 93 warnings in 439.09s (0:07:19)

@Scitator Let me know if I missed any steps here

FAQ

Please review the FAQ before submitting an issue:

[ ] I have read the documentation and FAQ
[ ] I have reviewed the minimal examples section
[ ] I have checked the changelog for main framework updates
[ ] I have read the contribution guide
[ ] I have joined Catalyst slack (#__questions channel) for issue discussion

opened by AyushExel 15

updated dl_cpu(workflows)- For passing CI-Tests
Before submitting (checklist)

[ ] Was this discussed/approved via a Github issue? (no need for typos and docs improvements)

[ ] Did you read the contribution guide?

[ ] Did you check the code style? catalyst-make-codestyle && catalyst-check-codestyle (pip install -U catalyst-codestyle).

[ ] Did you make sure to update the docs? We use Google format for all the methods and classes.

[ ] Did you check the docs with make check-docs?

[ ] Did you write any new necessary tests?

[ ] Did you check that your code passes the unit tests pytest . ?

[ ] Did you add your new functionality to the docs?

[ ] Did you update the CHANGELOG?

Description

Related Issue

Type of Change

[ ] Examples / docs / tutorials / contributors update

[ ] Bug fix (non-breaking change which fixes an issue)

[ ] Improvement (non-breaking change which improves an existing feature)

[ ] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to change)

PR review

Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

PS

[x] I know, that I could join slack for pull request discussion.

Note

Mentioned in comment of #1131. so that CI test will run properly
opened by Atharva-Phatak 15
Fixed OneCycleLRWithWarmup
Before submitting

[x] Was this discussed/approved via a Github issue? (no need for typos and docs improvements)

[x] Did you read the contribution guide?

[x] Did you check the code style? catalyst-make-codestyle && catalyst-check-codestyle (pip install -U catalyst-codestyle). Not able to check its showing 'catalyst-make-codestyle' is not recognized as an internal or external command Please suggest me how to do this. I am in Windows 10 environment

[ ] Did you make sure to update the docs? We use Google format for all the methods and classes.

[x] Did you check the docs with make check-docs?

[x] Did you write any new necessary tests?

[ ] Did you add your new functionality to the docs?

[x] Did you update the CHANGELOG?

[x] You can use 'Login as guest' to see Teamcity build logs.

Description

OneCycleLRWithWarmup starts ahead of initial LR (does not start with init_lr)

Related Issue

https://github.com/catalyst-team/catalyst/issues/851

Type of Change

[ ] Examples / docs / tutorials / contributors update

[x] Bug fix (non-breaking change which fixes an issue)

[ ] Improvement (non-breaking change which improves an existing feature)

[ ] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to change)

PR review

Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
opened by lokeshkvn 15
Fixed gradient tracking
Description

Fixed storing gradients in OptimizerCallback

Related Issue

Type of Change

[ ] Examples / docs / tutorials / contributors update

[x] Bug fix (non-breaking change which fixes an issue)

[ ] Improvement (non-breaking change which improves an existing feature)

[ ] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to change)

Checklist

[x] I have read the Code of Conduct document.

[x] I have read the Contributing guide.

[x] I have checked the code-style using make check-codestyle.

[x] I have written tests for all new methods and classes that I created.

[x] I have written the docstring in Google format for all the methods and classes that I used.

[x] I have checked the docs using make check-docs.

[x] I have read I need to click 'Login as guest' to see Teamcity build logs.

enhancement WIP
opened by pdanilov 15
Accumulate gradient
I was trying to use the accumulate gradient feature but run into an error. The training works without the OptimizerCallback(accmulation_steps=2).

runner.train( model=model, criterion=criterion, optimizer=optimizer, scheduler=scheduler, loaders=loaders, callbacks=[DiceCallback(), EarlyStoppingCallback(patience=5, min_delta=0.001), OptimizerCallback(accumulation_steps=2)], logdir=logdir, num_epochs=num_epochs, verbose=True )

FYI, the error message:

0/60 * Epoch (train): 0% 0/624 [00:00<?, ?it/s]

TypeError Traceback (most recent call last) in 9 logdir=logdir, 10 num_epochs=num_epochs, ---> 11 verbose=True 12 )

~/.conda/envs/mmdet_cloud/lib/python3.6/site-packages/catalyst/dl/runner/supervised.py in train(self, model, criterion, optimizer, loaders, logdir, callbacks, scheduler, resume, num_epochs, valid_loader, main_metric, minimize_metric, verbose, state_kwargs, checkpoint_data, fp16, monitoring_params, check) 195 monitoring_params=monitoring_params 196 ) --> 197 self.run_experiment(experiment, check=check) 198 199 def infer(

~/.conda/envs/mmdet_cloud/lib/python3.6/site-packages/catalyst/dl/core/runner.py in run_experiment(self, experiment, check) 229 except (Exception, KeyboardInterrupt) as ex: 230 self.state.exception = ex --> 231 self._run_event("exception") 232 233 return self

~/.conda/envs/mmdet_cloud/lib/python3.6/site-packages/catalyst/dl/core/runner.py in run_event(self, event) 100 101 if self.state is not None and hasattr(self.state, f"on{event}post"): --> 102 getattr(self.state, f"on{event}_post")() 103 104 @abstractmethod

~/.conda/envs/mmdet_cloud/lib/python3.6/site-packages/catalyst/dl/core/state.py in on_exception_post(self) 183 def on_exception_post(self): 184 for logger in self.loggers.values(): --> 185 logger.on_exception(self) 186 187

~/.conda/envs/mmdet_cloud/lib/python3.6/site-packages/catalyst/dl/callbacks/logging.py in on_exception(self, state) 194 195 if state.need_reraise_exception: --> 196 raise exception 197 198

~/.conda/envs/mmdet_cloud/lib/python3.6/site-packages/catalyst/dl/core/runner.py in run_experiment(self, experiment, check) 226 try: 227 for stage in self.experiment.stages: --> 228 self._run_stage(stage) 229 except (Exception, KeyboardInterrupt) as ex: 230 self.state.exception = ex

~/.conda/envs/mmdet_cloud/lib/python3.6/site-packages/catalyst/dl/core/runner.py in _run_stage(self, stage) 199 200 self._run_event("epoch_start") --> 201 self._run_epoch(loaders) 202 self._run_event("epoch_end") 203

~/.conda/envs/mmdet_cloud/lib/python3.6/site-packages/catalyst/dl/core/runner.py in _run_epoch(self, loaders) 186 self._run_event("loader_start") 187 with torch.set_grad_enabled(self.state.need_backward): --> 188 self._run_loader(loader) 189 self._run_event("loader_end") 190

~/.conda/envs/mmdet_cloud/lib/python3.6/site-packages/catalyst/dl/core/runner.py in _run_loader(self, loader) 148 149 for i, batch in enumerate(loader): --> 150 self._run_batch(batch) 151 152 self.state.timer.reset()

~/.conda/envs/mmdet_cloud/lib/python3.6/site-packages/catalyst/dl/core/runner.py in _run_batch(self, batch) 130 self.state.timer.stop("_timers/model_time") 131 self.state.timer.stop("_timers/batch_time") --> 132 self._run_event("batch_end") 133 134 def _run_loader(self, loader):

~/.conda/envs/mmdet_cloud/lib/python3.6/site-packages/catalyst/dl/core/runner.py in run_event(self, event) 97 if self.callbacks is not None: 98 for callback in self.callbacks.values(): ---> 99 getattr(callback, f"on{event}")(self.state) 100 101 if self.state is not None and hasattr(self.state, f"on_{event}_post"):

~/.conda/envs/mmdet_cloud/lib/python3.6/site-packages/catalyst/dl/callbacks/optimizer.py in on_batch_end(self, state) 117 return 118 --> 119 loss = self._get_loss(state) 120 121 self._accumulation_counter += 1

~/.conda/envs/mmdet_cloud/lib/python3.6/site-packages/catalyst/dl/callbacks/optimizer.py in _get_loss(self, state) 91 92 def _get_loss(self, state) -> torch.Tensor: ---> 93 loss = state.get_key(key="loss", inner_key=self.loss_key) 94 95 if isinstance(loss, list):

~/.conda/envs/mmdet_cloud/lib/python3.6/site-packages/catalyst/dl/core/state.py in get_key(self, key, inner_key) 114 return getattr(self, key) 115 else: --> 116 return getattr(self, key)[inner_key] 117 118 def set_key(self, value, key, inner_key=None):

TypeError: 'NoneType' object is not subscriptable
bug
opened by wmmxk 14
No utils.initialization file

🐛 Bug Report

The initialization file under the utils folder does not exist in this repo and also during installation, hence returning the below error whenever I try to import utils.initialization, AttributeError: module 'catalyst.utils' has no attribute 'initialization'

Screenshots

Expected behavior
bug help wanted

opened by Klins101 2

Multi Criterion Training

Error in Multi Criterion Training

weights = [0.2,0.3]
class_weights = torch.FloatTensor(weights).to(device) #.cuda()
criterion = {"CE_Loss1": nn.CrossEntropyLoss(weight=class_weights),"CE_Loss2": nn.CrossEntropyLoss()} 
....
....
loss1 = self.criterion["CE_Loss1"](self.batch["logits1"], self.batch["targets1"])
loss2 = self.criterion["CE_Loss2"](self.batch["logits2"], self.batch["targets2"])
loss_ce1ce2 = loss1 + loss2
self.batch_metrics.update({"loss_ce1": loss1, 
                           "loss_ce2": loss2, 
                           "loss_ce1ce2": loss_ce1ce2})

for key in ["loss_ce1", "loss_ce2", "loss_ce1ce2"]:
        self.meters[key].update(self.batch_metrics[key].item(), self.batch_size)

if self.is_train_loader:
    self.engine.backward(loss_ce1ce2) #causing problem
    self.optimizer.step()
    self.optimizer.zero_grad()

Hi, I am trying to train a model using multi-criterion. Part of code for computing the loss is shown above. Doing so I am getting the following error.

RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

Can anyone please check if I am doing the correct way?

help wanted question

opened by GirinChutia 3

utils.process_model_params

This was an error when I was running my program. [ AttributeError: module 'catalyst.utils' has no attribute 'process_model_params' ] How can I use catalyst to process model parameters? Just like catalyst.utils.process_model_paramsgFdZafw) for issue discussion
help wanted question wontfix

opened by Chris-Ran 2
Crashes on 2xT4 GPUs
🐛 Bug Report

Catalyst fails on 2xT4 GPUs.

We install Catalyst in the Kaggle base image. This week we wanted to release a new image with upgraded packages. It doesn't look like Catalyst was upgraded, but Accelerate was (from 0.12 to 0.13.1).

How To Reproduce

Steps to reproduce the behavior: Run this unit test on a 2xT4 GPU.

Code sample

https://github.com/Kaggle/docker-python/blob/main/tests/test_catalyst.py

Screenshots

Expected behavior

The test passes on a P100 GPU.

Environment

https://gist.github.com/Philmod/0349a2cf16d76e8d20e960d750962241

Checklist

[x] bug description

[x] steps to reproduce

[x] expected behavior

[x] environment

[x] code sample / screenshots

FAQ

Please review the FAQ before submitting an issue:

[x] I have read the documentation and FAQ

[x] I have reviewed the minimal examples section

[x] I have checked the changelog for main framework updates

[x] I have read the contribution guide

bug help wanted wontfix
opened by Philmod 3
Custom loader stages
🚀 Feature Request

In addition to loader ∈ [train, valid, infer], a user should be able to define a custom loader stage.

Motivation

The purpose of loader is to switch between datasets that will be fed into the pipeline. Therefore, the natural use cases are:

Running inference on multiple datasets (e.g. infer_coco, infer_kodak, infer_vimeo90k, ...) and tracking their metrics in the same way as infer.

Other custom stages that involve analysis using subsets of data and data loaders.

Proposal

self.{train/valid/infer} variables need to be converted to functions/dictionaries. For example:

def handle_batch(batch): # Previously: if self.is_infer_coco_loader: ... # Proposed: if self.loader == "infer_coco": ... if self.loader == "infer_vimeo90k": ... # Or, perhaps less Pythonically: if self.is_loader("infer_coco"): ...

self.is_infer_loader and similar can be kept, and perhaps even later deprecated.

Type-hints should be naturally converted via the type transformation T -> Mapping[str, T].

Alternatives

Implementing entire "infer" loop from scratch in on_epoch_end for each dataset/loader. Or chaining dataloaders (kind of weird, too). These wouldn't really be clean, and would not generalize to other non-standard use cases.

Additional context

N/A

Checklist

[x] feature proposal description

[x] motivation

[x] extra proposal context / proposal alternatives review

FAQ

Please review the FAQ before submitting an issue:

[x] I have read the documentation and FAQ

[x] I have reviewed the minimal examples section

[x] I have checked the changelog for main framework updates

[x] I have read the contribution guide

[x] I have joined Catalyst slack (#__questions channel) for issue discussion

enhancement help wanted
opened by YodaEmbedding 1

Releases(v22.04)

v22.04(Apr 29, 2022)
[22.04] - 2022-04-29

Added

catalyst-tune for Config API added #1411

tests for python 3.9 and 3.10 #1414

Fixed

catalyst compatibility with python 3.10 #1409

Source code(tar.gz)
Source code(zip)
v22.02.1(Feb 27, 2022)
[22.02.1] - 2022-02-27

"Few fixes and Config API v22 MVP".

Added

catalyst-run for Config API support added #1406

Fixed

Logger API naming #1405

Source code(tar.gz)
Source code(zip)
v22.02(Feb 13, 2022)
[22.02] - 2022-02-13

Tl;dr

Catalyst architecture simplification.

#1395, #1396, #1397, #1398, #1399, #1400, #1401, #1402, #1403.

Added

Additional tests for different hardware accelerators setups. Please check out the tests/pipelines folder for more information.

BackwardCallback and BackwardCallbackOrder as an abstraction on top of loss.backward. Now you could easily log model gradients or transform them before OptimizerCallback.

CheckpointCallbackOrder for ICheckpointCallback.

Changed

Minimal python version moved to 3.7, minimal PyTorch version moved to 1.4.0.

Engines were rewritten on top of Accelerate. First, we found these two abstractions very close to each other. Second, Accelerate provides additional user-friendly API and more stable API for "Nvidia APEX" and "Facebook Fairscale" - it does not support them.

SelfSupervisedRunner moved to the examples folder from the Catalyst API. The only Runners API, that will be supported in the future: IRunner, Runner, ISupervisedRunner, SupervisedRunner due to their consistency. If you are interested in any other Runner API - feel free to write your own CustomRunner and use SelfSupervisedRunner as an example.

Runner.{global/stage}_{batch/loader/epoch}_metrics renamed to Runner.{batch/loader/epoch}_metrics

CheckpointCallback rewritten from scratch.

Catalyst registry moved to full-imports-paths only.

Logger API changed to receive IRunner for all log_* methods.

Metric API: topk_args renamed to topk.

Contrib API: init imports from catalyst.contrib - removed, use from catalyst.contrib.{smth} import {smth}. Could be change to full-imports-only in future versions for stability.

All quickstarts, minimal examples, notebooks, and pipelines moved to the new version.

Codestyle moved to 89 right margin. Honestly speaking, it's much easier to maintain Catalyst with 89 right margin on MBP'16.

Removed

ITrial removed.

Stages support removed. While we embrace stages in deep learning experiments, current hardware accelerators are not prepared well for such setups. Additionally, ~95% of dl pipelines are single-stage. Multi-stage runner support is under review. For multi-stage support, please define a CustomRunner with rewritten API.

Config/Hydra API support removed. Config API is under review. For now, you could write your own Config API with hydra-slayer if needed.

catalyst-dl scripts removed. Without Config API we don't need them anymore.

Nvidia Apex, Fairscale, Albumentations, Nifti, Hydra requiremets removed.

OnnxCallback, PruningCallback, QuantizationCallback, TracingCallback removed from callbacks API. These callbacks are under review now.

If you have any questions on the Catalyst 22 edition updates, please join Catalyst slack for discussion.
Source code(tar.gz)
Source code(zip)
v22.02rc0(Feb 7, 2022)
[22.02rc0] - 2022-02-07

Tl;dr

Beta version of Catalyst 22 edition.

core architecture moved to Animus-like (stages were removed)

engines moved to Accelerate

config/hydra APIs deprecated in favor of hydra-slayer-custom config runners

dl-based scripts removed from the API

self-supervised runner moved to examples - it's better to have custom still

contrib and utils - truncated

requirements - simplified

codestyle moved to -l 89 (better view on 16'' screen ;) )

Source code(tar.gz)
Source code(zip)
v21.12(Dec 28, 2021)
[21.12] - 2021-12-28

Tl;dr

Distributed engines update (multi-node support) and many other improvements.

Added

MNIST dataset for SSL banchmark (#1368)

MoveiLens 20M dataset #1336

logger property for logging customization (#1372)

MacridVAE example (#1363)

SSL benchmark results (#1374)

Neptune example (#1377)

multi-node support for engines (#1364)

Changed

RL examples update to last version (#1370)

DDPLoaderWrapper updated to new version (#1385)

num_classes for classification metrics became optional (#1379)

colab ci/cd update to new verion

Removed

Fixed

requests requirements for catalyst[cv] added (#1371)

loader step counter (#1374)

detection example data preprocessing (#1369)

gradient clipping with fp16 runs (#1378)

config API fix for DDP runs (#1383)

checkpoint creation for fp16 engines (#1382)

Contributors ❤️

@bagxi @ditwoo @MrNightSky @Nimrais @y-ksenia @sergunya17 @Thiefwerty @zkid18
Source code(tar.gz)
Source code(zip)
v21.11(Nov 30, 2021)
[21.11] - 2021-11-30

Tl;dr

Framework architecture simplification and speedup + SSL & RecSys extensions.

Added

MultiVAE RecSys example (#1340)`

Returned resume support - resolved #1193 (#1349)

Smoothing dice loss to contrib (#1344)

profile flag for runner.train (#1348)

MultiDAE RecSys example (#1356)

SETTINGS.log_batch_metrics, SETTINGS.log_epoch_metrics, SETTINGS.compute_per_class_metrics for framework-wise Metric & Logger APIs specification (#1357)

log_batch_metrics and log_epoch_metrics options for all available Loggers (#1357)

compute_per_class_metrics option for all available multiclass/label metrics (#1357)

pytorch benchmark script and simplified MNIST (#1360)

Changed

A few framework simplifications were made (#1346):

catalyst-contrib scripts reduced to collect-env and project-embeddings only

catalyst-dl scripts recuded to run and tune only

transforms. prefix deprecated for Catalyst-based transforms

catalyst.tools moved to catalyst.extras

task-dependent extensions from catalyst.data moved to catalyst.contrib.data

catalyst.data.transforms moved to catalyst.contrib.data.transforms

Normalize, ToTensor transforms renamed to NormalizeImage, ImageToTensor

metric learning extensions moved to catalyst.contrib.data

catalyst.contrib moved to code-as-a-documentation development

catalyst[cv] and catalyst[ml] extensions moved to flatten architecture design; examples: catalyst.contrib.data.dataset_cv, catalyst.contrib.data.dataset_ml

catalyst.contrib moved to flatten architecture design; exampels: catalyst.contrib.data, catalyst.contrib.datasets, catalyst.contrib.layers, catalyst.contrib.models, catalyst.contrib.optimizers, catalyst.contrib.schedulers

internal functionality moved to ***._misc modules

catalyst.utils.mixup moved to catalyst.utils.torch

catalyst.utils.numpy moved to catalyst.contrib.utils.numpy

default logging logic moved from "batch & epoch" to "epoch"-only to save computation time during logging; to respecify, please use:

SETTINGS.log_batch_metrics=True/False or os.environ["CATALYST_LOG_BATCH_METRICS"]

SETTINGS.log_epoch_metrics=True/False or os.environ["CATALYST_LOG_EPOCH_METRICS"]

default metrics computation moved from "per-class & aggregations" to "aggregations"-only to save computation time during logging; to respecify, please use:

SETTINGS.compute_per_class_metrics=True/False or os.environ["CATALYST_COMPUTE_PER_CLASS_METRICS"]

no transformations required for MNIST contrib dataset (#1360

Removed

A few framework simplifications were made (#1346):

catalyst.contrib.pandas

catalyst.contrib.parallel

catalyst.contrib.models.cv

a few catalyst.utils.misc functions

catalyst.extras removed from the public documentation

Fixed

documentation search error (21.10 only) (#1346)

docs examples (#1362)

Self-Supervised benchmark: (#1365), (#1361)

Contributors ❤️

@asteyo @Dokholyan @Nimrais @y-ksenia @sergunya17
Source code(tar.gz)
Source code(zip)
v21.10(Oct 30, 2021)
[21.10] - 2021-10-30

Tl;dr

Readmes and tutorials with a few ddp fixes.

Added

RSquareLoss (#1313)

Self-Supervised example updates: (#1305), (#1322), (#1325), (#1335)

Albert training example (#1326)

YOLO-X (new) detection example and refactoring (#1324)

TopKMetric asbtraction (#1330)

Changed

simlified readme (#1312)

improved DDP tutorial (#1327)

CMCMetric renamed from <prefix>cmc<suffix><k> to <prefix>cmc<k><suffix> (#1330)

Removed

Fixed

Zero seed error (#1329)

updated codestyle issues (#1331)

TopK metrics: (#1330), (#1334), (#1339)

--expdir param for catalyst-dl run (#1338)

ControlFlowCallback for distributed setup (#1341)

Source code(tar.gz)
Source code(zip)
v21.09(Sep 30, 2021)
[21.09] - 2021-09-30

Added

CometLogger support (#1283)

CometLogger examples (#1287)

XLA docs (#1288)

Contarstive loss functions: NTXentLoss (#1278), SupervisedContrastiveLoss (#1293)

Self supervised learning: ISelfSupervisedRunner, SelfSupervisedConfigRunner, SelfSupervisedRunner, SelfSupervisedDatasetWrapper (#1278)

SimCLR example (#1278)

Superivised Contrastive example (#1293)

extra warnings for runner-callbacks interaction (#1295)

CategoricalRegressionLoss and QuantileRegressionLoss to the contrib (#1295)

R2 score metric (#1274)

Changed

Improved WandbLogger to support artifacts and fix logging steps (#1309)

full Runner cleanup, with callbacks and loaders destruction, moved to PipelineParallelFairScaleEngine only (#1295)

HuberLoss renamed to HuberLossV0 for the PyTorch compatibility (#1295)

codestyle update (#1298)

BalanceBatchSampler - deprecated (#1303)

Removed

Fixed

CI/CD (#1292), (#1299), (#1304), (#1306)

Optuna configs (#1296), (#1296)

Contributors ❤️

@asteyo @AyushExel @bagxi @DN6 @gr33n-made @Nimrais @Podidiving @y-ksenia
Source code(tar.gz)
Source code(zip)
v21.09rc1(Sep 27, 2021)

Source code(tar.gz)
Source code(zip)
v21.09rc0(Sep 27, 2021)

Hi guys, nice project!

This is the test case release to check out our updated infrastructure.
Source code(tar.gz)
Source code(zip)
v21.08(Aug 31, 2021)
[21.08] - 2021-08-31

Added

RecSys loss functions: AdaptiveHingeLoss, BPRLoss, HingeLoss, LogisticLoss, RocStarLoss, WARPLoss (#1269, #1282)

object detection examples (#1271)

SklearnModelCallback (#1261)

Barlow Twins example (#1261)

TPU/XLA support (#1275)

with updated example

native sync_bn support for all available engines (#1275)

Torch, AMP, Apex, FairScale

Changed

Registry moved to hydra-slayer (#1264))

(#1275)

batch metrics sync removed from ddp-runs to speedup training process

AccumulationMetric renamed to AccumulativeMetric

moved from catalyst.metrics._metric to catalyst.metrics._accumulative

accululative_fields renamed to keys

Removed

Fixed

PeriodicLoaderCallback docsting (#1279)

matplotlib issue (#1272)

sample counter for the loader (#1285)

Contributors ❤️

@bagxi @Casyfill @ditwoo @Nimrais @penguinflys @sergunya17 @zkid18
Source code(tar.gz)
Source code(zip)
v21.07(Jul 29, 2021)
[21.07] - 2021-07-29

Added

added pre-commit hook to run codestyle checker on commit (#1257)

on publish github action for docker and docs added (#1260)

MixupCallback and utils.mixup_batch (#1241)

Barlow twins loss (#1259)

BatchBalanceClassSampler (#1262)

Changed

Removed

Fixed

make expdir in catalyst-dl run optional (#1249)

Bump neptune-client from 0.9.5 to 0.9.8 in requirements-neptune.txt (#1251)

automatic merge for master (with Mergify) fixed (#1250)

Evaluate loader custom model bug was fixed (#1254)

BatchPrefetchLoaderWrapper issue with batch-based PyTorch samplers (#1262)

Adapted MlflowLogger for new config hierarchy (#1263)

Contributors ❤️

@AlekseySh @bagxi @Casyfill @Dokholyan @leoromanovich @Nimrais @y-ksenia
Source code(tar.gz)
Source code(zip)
v21.06(Jun 29, 2021)
[21.06] - 2021-06-29

Added

(#1230)

FairScale support

DeepSpeed support

utils.ddp_sync_run function for synchronous ddp run

CIFAR10 and CIFAR100 datasets from torchvision (no cv-based requirements)

Catalyst Engines demo

dataset_from_params support in config API (#1231)

transform from params support for config API added (#1236)

samplers from params support for config API added (#1240)

recursive registry.get_from_params added (#1241)

albumentations integration (#1238)

Profiler callback (#1226)

Changed

(#1230)

loaders creation now wrapper with utils.ddp_sync_run for utils.ddp_sync_run data preparation

runner support stage cleanup: loaders and callbacks will be deleted on the stage end

Apex-based engines now support both APEXEngine and ApexEngine registry names

Fixed

multiprocessing in minimal tests hotfix (#1232)

Tracing callback hotfix (#1234)

Engine hotfix for predict_loader (#1235)

(#1230)

Hydra hotfix due to 1.1.0 version changes

HuberLoss name conflict for pytorch 1.9 hotfix (#1239)

Contributors ❤️

@bagxi @y-ksenia @ditwoo @BorNick @Inkln
Source code(tar.gz)
Source code(zip)
v21.05(May 31, 2021)
[21.05] - 2021-05-31

Added

Reinforcement learning tutorials (#1205)

customization demo (#1207)

FAQ docs: multiple input and output keys, engine tutorial (#1202)

minimal Config API example (#1215)

Distributed RL example (Catalyst.RL 2.0 concepts) (#1224)

SklearnCallback as integration of sklearn metrics (#1198)

Changed

tests moved to tests folder (#1208)

pipeline tests moved to tests/pipelines (#1215)

updated NeptuneLogger docstrings (#1223)

Removed

Fixed

customizing what happens in train() notebook (#1203)

transforms imports under catalyst.data (#1211)

change layerwise to layerwise_params (#1210)

add torch metrics support (#1195)

add Config API support for BatchTransformCallback (#1209)

BONUS: Catalyst workshop videos!
Source code(tar.gz)
Source code(zip)
v21.04.2(Apr 30, 2021)
[21.04.2] - 2021-04-30

Added

Weights and Biases Logger (WandbLogger) (#1176)

Neptune Logger (NeptuneLogger) (#1196)

log_artifact method for logging arbitrary files like audio, video, or model weights to ILogger and IRunner (#1196)

Source code(tar.gz)
Source code(zip)
v21.04.1(Apr 19, 2021)
a small hotfix for catalyst.contrib module

Source code(tar.gz)
Source code(zip)
v21.04(Apr 17, 2021)
[21.04] - 2021-04-17

Added

Nifti Reader (NiftiReader) (#1151)

CMC score and callback for ReID task (ReidCMCMetric and ReidCMCScoreCallback) (#1170)

Market1501 metric learning datasets (Market1501MLDataset and Market1501QGDataset) (#1170)

extra kwargs support for Engines (#1156)

engines exception for unknown model type (#1174)

a few docs to the supported loggers (#1174)

Changed

TensorboardLogger switched from global_batch_step counter to global_sample_step one (#1174)

TensorboardLogger logs loader metric on_loader_end rather than on_epoch_end (#1174)

prefix renamed to metric_key for MetricAggregationCallback (#1174)

micro, macro and weighted aggregations renamed to _micro, _macro and _weighted (#1174)

BatchTransformCallback updated (#1153)

Removed

auto torch.sigmoid usage for metrics.AUCMetric and metrics.auc (#1174)

Fixed

hitrate calculation issue (#1155)

ILoader wrapper usage issue with Runner (#1174)

counters for ddp case (#1174)

Source code(tar.gz)
Source code(zip)
v21.03.2(Mar 29, 2021)
Fixed

minimal requirements issue (#1147)

Source code(tar.gz)
Source code(zip)
v21.03.1(Mar 28, 2021)
[21.03.1] - 2021-03-28

Added

Additive Margin SoftMax(AMSoftmax)(#1125)

Generalized Mean Pooling(GeM)(#1084)

Key-value support for CriterionCallback (#1130)

Engine configuration through cmd (#1134)

Extra utils for thresholds (#1134)

Added gradient clipping function to optimizer callback (1124)

FactorizedLinear to contrib (1142)

Extra init params for ConsoleLogger (1142)

Tracing, Quantization, Onnx, Pruninng Callbacks (1127)

_key_value for schedulers in case of multiple optimizers fixed (#1146)

Changed

CriterionCallback now inherits from BatchMetricCallback #1130)

united metrics computation logic

Removed

Config API deprecated parsings logic (1142) (1138)

Fixed

Data-Model device sync and Engine logic during runner.predict_loader (#1134)

BatchLimitLoaderWrapper logic for loaders with shuffle flag (#1136)

config description in the examples (1142)

Config API deprecated parsings logic (1142) (1138)

RecSys metrics Top_k calculations ([#1140] (https://github.com/catalyst-team/catalyst/pull/1140))

Source code(tar.gz)
Source code(zip)
v21.03(Mar 13, 2021)
The v20 is dead, long live the v21!

[21.03] - 2021-03-13 (#1095)

Added

Engine abstraction to support various hardware backends and accelerators: CPU, GPU, multi GPU, distributed GPU, TPU, Apex, and AMP half-precision training.

Logger abstraction to support various monitoring tools: console, tensorboard, MLflow, etc.

Trial abstraction to support various hyperoptimization tools: Optuna, Ray, etc.

Metric abstraction to support various of machine learning metrics: classification, segmentation, RecSys and NLP.

Full support for Hydra API.

Full DDP support for Python API.

MLflow support for metrics logging.

United API for model post-processing: tracing, quantization, pruning, onnx-exporting.

United API for metrics: classification, segmentation, RecSys, and NLP with full DDP and micro/macro/weighted/etc aggregations support.

Changed

Experiment abstraction merged into Runner one.

Runner, SupervisedRunner, ConfigRunner, HydraRunner architectures and dependencies redesigned.

Internal settings and registry mechanisms refactored to be simpler, user-friendly and more extendable.

Bunch of Config API test removed with Python API and pytest.

Codestyle now supports up to 99 symbols per line :)

All callbacks/runners moved for contrib to the library core if was possible.

Runner abstraction simplified to store only current state of the experiment run: all validation logic was moved to the callbacks (by this way, you could easily select best model on various metrics simultaneously).

Runner.input and Runner.output merged into united Runner.batch storage for simplicity.

All metric moved from catalyst.utils.metrics to catalyst.metrics.

All metrics now works on scores/metric-defined-input rather that logits (!).

Logging logic moved from Callbacks to appropriate Loggers.

KorniaCallbacks refactored to BatchTransformCallback.

Removed

Lots of unnecessary contrib extensions.

Transforms configuration support through Config API (could be returned in next releases).

Integrated Python cmd command for model pruning, swa, etc (should be returned in next releases).

CallbackOrder.Validation and CallbackOrder.Logging

All 2020 year backward compatibility fixes and legacy support.

Fixed

Docs rendering simplified.

LrFinderCallback.

Release docs, Python API minimal examples, Config/Hydra API example.
Source code(tar.gz)
Source code(zip)
v21.01rc0(Jan 30, 2021)

Source code(tar.gz)
Source code(zip)
v20.12(Dec 20, 2020)
[20.12] - 2020-12-20

Added

CVS Logger (#1005)

DrawMasksCallback (#999)

(#1002)

a few docs

(#998)

reciprocal_rank metric

unified recsys metrics preprocessing

(#1018)

readme examples for all supported metrics under catalyst.metrics

wrap_metric_fn_with_activation for model outputs wrapping with activation

extra tests for metrics

(#1039)

per_class=False option for metrics callbacks

PrecisionCallack, RecallCallack for multiclass problems

extra docs

Changed

docs update (#1000)

AMPOptimizerCallback and OptimizerCallback were merged (#1007)

(#1017)

fixed bug in SchedulerCallback

Log LRs and momentums for all param groups, not only for the first one

(#1002)

tensorboard, ipython, matplotlib, pandas, scikit-learn moved to optional requirements

PerplexityMetricCallback moved to catalyst.callbacks from catalyst.contrib.callbacks

PerplexityMetricCallback renamed to PerplexityCallback

catalyst.contrib.utils.confusion_matrix renamed to catalyst.contrib.utils.torch_extra

many parts of catalyst.data moved to catalyst.contrib.data

catalyst.data.scripts moved to catalyst.contrib.scripts

catalyst.utils, catalyst.data.utils and catalyst.contrib.utils restructured

ReaderSpec renamed to IReader

SupervisedExperiment renamed to AutoCallbackExperiment

gain functions renamed for dcg/ndcg metrics (#998)

(#1014)

requirements respecification: catalyst[cv], catalyst[dev], catalyst[log], catalyst[ml], catalyst[nlp],catalyst[tune]

settings respecification

extra tests for settings

contrib refactoring

iou and dice metrics moved to per-class computation (#1031)

Removed

(#1002)

KNNMetricCallback

sklearn mode for ConfusionMatrixLogger

catalyst.data.utils

unnecessary catalyst.tools.meters

todos for unnecessary docs

(#1014)

transformers-based contrib (too unstable)

(#1018)

ClasswiseIouCallback/ClasswiseJaccardCallback as deprecated on (should be refactored in future releases)

Fixed

prevented modifying config during the experiment and runner initialization (#1004)

a few test for RecSys MAP computation (#1018)

leave batch size the same for default distributed training (#1023)

(#1032)

Apex: now you can use apex for multiple models training

Apex: DataParallel is allowed for opt_level other than "O1"

Source code(tar.gz)
Source code(zip)
v20.11(Dec 20, 2020)
[20.11] - 2020-11-12

Added

DCG, nDCG metrics (#881)

MAP calculations #968

hitrate calculations [#975] (https://github.com/catalyst-team/catalyst/pull/975)

extra functions for classification metrics (#966)

OneOf and OneOfV2 batch transforms (#951)

precision_recall_fbeta_support metric (#971)

Pruning tutorial (#987)

BatchPrefetchLoaderWrapper (#986)

DynamicBalanceClassSampler (#954)

Changed

update Catalyst version to 20.10.1 for tutorials (#967)

added link to dl-course (#967)

IRunner -> simplified IRunner (#984)

docs were restructured (#985)

set_global_seed moved from utils.seed to utils.misc (#986)

Removed

several deprecated tutorials (#967)

several deprecated func from utils.misc (#986)

Fixed

BatchTransformCallback - add nn.Module transforms support (#951)

moved to contiguous view for accuracy computation (#982)

fixed torch warning on optimizer.py:140 (#979)

Source code(tar.gz)
Source code(zip)
v20.10.1(Dec 20, 2020)
[20.10.1] - 2020-10-15

Added

MRR metrics calculation (#886)

docs for MetricCallbacks (#947)

SoftMax, CosFace, ArcFace layers to contrib (#939)

ArcMargin layer to contrib (#957)

AdaCos to contrib (#958)

Manual SWA to utils (#945)

Changed

fixed path to CHANGELOG.md file and add information about unit test to PULL_REQUEST_TEMPLATE.md ([#955])(https://github.com/catalyst-team/catalyst/pull/955)

catalyst-dl tune config specification - now optuna params are grouped under study_params (#947)

IRunner._prepare_for_stage logic moved to IStageBasedRunner.prepare_for_stage (#947)

now we create components in the following order: datasets/loaders, model, criterion, optimizer, scheduler, callbacks

MnistMLDataset and MnistQGDataset data split logic - now targets of the datasets are disjoint (#949)

architecture redesign (#953)

experiments, runners, callbacks grouped by primitives under catalyst.experiments/catalyst.runners/catalyst.callbacks respectively

settings and typing moved from catalyst.tools.* to catalyst.*

utils moved from catalyst.*.utils to catalyst.utils

swa moved to catalyst.utils (#963)

Removed

Fixed

AMPOptimizerCallback - fix grad clip fn support (#948)

removed deprecated docs types (#947) (#952)

docs for a few files (#952)

extra backward compatibility fixes (#963)

Source code(tar.gz)
Source code(zip)
v20.09.1(Dec 20, 2020)
[20.09.1] - 2020-09-25

Added

Runner registry support for Config API (#936)

catalyst-dl tune command - Optuna with Config API integration for AutoML hyperparameters optimization (#937)

OptunaPruningCallback alias for OptunaCallback (#937)

AdamP and SGDP to catalyst.contrib.nn.criterion (#942)

Changed

Config API components preparation logic moved to utils.prepare_config_api_components (#936)

Removed

Fixed

Logging double logging :) (#936)

CMCCallback (#941)

Source code(tar.gz)
Source code(zip)
v20.09(Dec 20, 2020)
[20.09] - 2020-09-07

Added

MovieLens dataset loader (#903)

force and bert-level keywords to catalyst-data text2embedding (#917)

OptunaCallback to catalyst.contrib (#915)

DynamicQuantizationCallback and catalyst-dl quantize script for fast quantization of your model (#890)

Multi-scheduler support for multi-optimizer case (#923)

Native mixed-precision training support (#740)

OptiomizerCallback - flag use_fast_zero_grad for faster (and hacky) version of optimizer.zero_grad() (#927)

IOptiomizerCallback, ISchedulerCallback, ICheckpointCallback, ILoggerCallback as core abstractions for Callbacks (#933)

flag USE_AMP for PyTorch AMP usage (#933)

Changed

Pruning moved to catalyst.dl (#933)

default USE_APEX changed to 0 (#933)

Removed

Fixed

autoresume option for Config API (#907)

a few issues with TF projector (#917)

batch sampler speed issue (#921)

add apex key-value optimizer support (#924)

runtime warning for PyTorch 1.6 (920)

Apex synbn usage (920)

Catalyst dependency on system git (922)

Source code(tar.gz)
Source code(zip)
v20.08(Dec 20, 2020)
[20.08] - 2020-08-09

Added

CMCScoreCallback (#880)

kornia augmentations BatchTransformCallback (#862)

average_precision and mean_average_precision metrics (#883)

MultiLabelAccuracyCallback, AveragePrecisionCallback and MeanAveragePrecisionCallback callbacks (#883)

minimal examples for multiclass and multilabel classification (#883)

experimental TPU support (#893)

add Imagenette, Imagewoof, and Imagewang datasets (#902)

IMetricCallback, IBatchMetricCallback, ILoaderMetricCallback, BatchMetricCallback, LoaderMetricCallback abstractions (#897)

HardClusterSampler inbatch sampler (#888)

Changed

all registries merged to one catalyst.registry (#883)

mean_average_precision logic merged with average_precision (#897)

all imports moved to absolute (#905)

catalyst.contrib.data merged to catalyst.data (#905)

{breaking} Catalyst transform ToTensor was renamed to ImageToTensor (#905)

TracerCallback moved to catalyst.dl (#905)

ControlFlowCallback, PeriodicLoaderCallback moved to catalyst.core (#905)

Removed

average_accuracy and mean_average_accuracy metrics (#883)

MultiMetricCallback abstraction (#897)

Fixed

utils.tokenize_text typo with punctuation (#880)

ControlFlowCallback logic (#892)

docs (#897)

Source code(tar.gz)
Source code(zip)
v20.07(Dec 20, 2020)
[20.07] - 2020-07-06

Added

log parameter to WandbLogger (#836)

hparams experiment property (#839)

add docs build on push to master branch (#844)

WrapperCallback and ControlFlowCallback (#842)

BatchOverfitCallback (#869)

overfit flag for Config API (#869)

InBatchSamplers: AllTripletsSampler and HardTripletsSampler (#825)

Changed

Renaming (#837)

SqueezeAndExcitation -> cSE

ChannelSqueezeAndSpatialExcitation -> sSE

ConcurrentSpatialAndChannelSqueezeAndChannelExcitation -> scSE

_MetricCallback -> IMetricCallback

dl.Experiment.process_loaders -> dl.Experiment._get_loaders

LRUpdater become abstract class (#837)

calculate_confusion_matrix_from_arrays changed params order (#837)

dl.Runner.predict_loader uses _prepare_inner_state and cleans experiment (#863)

toml to the dependencies (#872)

Removed

crc32c dependency (#872)

Fixed

workflows/deploy_push.yml failed to push some refs (#864)

.dependabot/config.yml contained invalid details (#781)

LanguageModelingDataset (#841)

global_* counters in Runner (#858)

EarlyStoppingCallback considers first epoch as bad (#854)

annoying numpy warning (#860)

PeriodicLoaderCallback overwrites best state (#867)

OneCycleLRWithWarmup (#851)

Source code(tar.gz)
Source code(zip)
v20.06(Jun 4, 2020)
[20.06] - 2020-06-04

Added

Mergify (#831)

PerplexityMetricCallback (#819)

PeriodicLoaderRunnerCallback (#818)

Changed

docs structure were updated during (#822)

utils.process_components moved from utils.distributed to utils.components (#822)

catalyst.core.state.State merged to catalyst.core.runner._Runner (#823) (backward compatibility included)

catalyst.core.callback.Callback now works directly with catalyst.core.runner._Runner

state_kwargs renamed to stage_kwargs

Removed

Fixed

added missed dashes in docker perfixes (#828)

[20.05.1] - 2020-05-23

Added

Circle loss implementation (#802)

BatchBalanceSampler for metric learning and classification (#806)

CheckpointCallback: new argument load_on_stage_start which accepts str and Dict[str, str] (#797)

LanguageModelingDataset to catalyst[nlp] (#808)

Extra counters for batches, loaders and epochs (#809)

TracerCallback (#789)

Changed

CheckpointCallback: additional logic for argument load_on_stage_end - accepts str and Dict[str, str] (#797)

counters names for batches, loaders and epochs (#809)

utils.trace_model: changed logic - runner argument was changed to predict_fn (#789)

redesigned contrib.data and contrib.datasets (#820)

catalyst.utils.meters moved to catalyst.tools (#820)

catalyst.contrib.utils.tools.tensorboard moved to catalyst.contrib.tools (#820)

Removed

Fixed

device selection fix for #798 (#815)

batch size counting fix for #799 and #755 issues (#809)

Source code(tar.gz)
Source code(zip)
v20.06.rc1(Jun 1, 2020)

Source code(tar.gz)
Source code(zip)

Accelerated deep learning R&D

Related tags

Overview

Getting started

Step by step guide

Table of Contents

Overview

Installation

Minimal Examples

Features

Structure

Tests

Catalyst

Tutorials

Blogposts

Docs

Projects

Examples, notebooks and starter kits

Competitions

Paper implementations

Tools and pipelines

Talks

Community

Contribution guide

User feedback

Acknowledgments

Catalyst.Team

Catalyst - Metric Learning team

Catalyst.Contributors

Catalyst.Friends

Trusted by

Supported by

Citation

Comments

🐛 Bug Report

Expected behavior

Additional context

🚀 Feature Request

Motivation

Proposal

Alternatives

Additional context

Checklist

FAQ

Description

Related Issue

Type of Change

Checklist

Before submitting (checklist)

Description

Related Issue

Type of Change

PR review

Additional Deatils:

Test logs

FAQ

Before submitting (checklist)

Description

Related Issue

Type of Change

PR review

Note

Before submitting

Description

Related Issue

Type of Change

PR review

Description

Related Issue

Type of Change

Checklist

0/60 * Epoch (train): 0% 0/624 [00:00<?, ?it/s]

🐛 Bug Report

Screenshots

Expected behavior

Error in Multi Criterion Training

Hi, I am trying to train a model using multi-criterion. Part of code for computing the loss is shown above. Doing so I am getting the following error.

🐛 Bug Report

How To Reproduce

Code sample