A library built upon PyTorch for building embeddings on discrete event sequences using self-supervision

Last update: Dec 17, 2022

Related tags

Overview

pytorch-lifestream a library built upon PyTorch for building embeddings on discrete event sequences using self-supervision. It can process terabyte-size volumes of raw events like game history events, clickstream data, purchase history or card transactions.

It supports various methods of self-supervised training, adapted for event sequences:

Contrastive Learning for Event Sequences (CoLES)
Contrastive Predictive Coding (CPC)
Replaced Token Detection (RTD) from ELECTRA
Next Sequence Prediction (NSP) from BERT
Sequences Order Prediction (SOP) from ALBERT

It supports several types of encoders, including Transformer and RNN. It also supports many types of self-supervised losses.

The following variants of the contrastive losses are supported:

Contrastive loss (paper)
Triplet loss (paper)
Binomial deviance loss (paper)
Histogramm loss (paper)
Margin loss (paper)
VICReg loss (paper)

Install from PyPi

pip install pytorch-lifestream

Install from source

# Ubuntu 20.04

sudo apt install python3.8 python3-venv
pip3 install pipenv

pipenv sync  --dev # install packages exactly as specified in Pipfile.lock
pipenv shell
pytest

Demo notebooks

Self-supervided training and embeddings for downstream task notebook
Self-supervided embeddings in CatBoost notebook
Self-supervided training and fine-tuning notebook
PySpark and Parquet for data preprocessing notebook

Experiments on public datasets

pytorch-lifestream usage experiments on several public event datasets are available in the separate repo

Comments

torch.stack in def collate_feature_dict

ptls/data_load/utils.py

Hello!

If the dataloader has a feature called target. And the batchsize is not a multiple of the length of the dataset, then an error pops up on the last batch: "Sizes of tensors must match except in dimension 0". Due to the use of torch.staсk when processing a feature startwith 'target'.

opened by Ivanich-spb 11
Not supported multiGPU option from pytorchlightning.Trainer

Try to set Trainer(gpus=[0,1]), while using PtlsDataModule as data module, get such error:

AttributeError: Can't pickle local object 'PtlsDataModule.__init__.<locals>.train_dataloader'

opened by mazitovs 1
Correct seq_len for feature dict
rec = { 'mcc': [0, 1, 2, 3], 'target_distribution': [0.1, 0.2, 0.4, 0.1, 0.1, 0.0], }

How to get correct seq_len. true len: 4 possible length: 4, 6 'target_distribution' is incorrect field to get length, this is not a sequence, this is an array
opened by ivkireev86 1
Save categories encodings along with model weights in demos

Вместе с обученной моделью необходимо сохранять обученный препроцессор и разбивку на трейн-тест. Иначе категории могут поехать и сохраненная предобученная модель станет бесполезной.

opened by ivkireev86 1
Documentation index
Прототип главной страницы документации. Три секции:

описание моделей библиотеки

гайд как использовать библиотеку

как писать свои компоненты

Есть краткое описание и ссылки на подробные (которые напишем потом).

В описании модулей предложена структура библиотеки. Предполагается, что мы эти модули в ближайшее создадим и перетащим туда соответсвующие классы из библиотеки. Старые, модули, которые станут пустыми, удалим. Далее будем придерживаться схемы, описанной в этом документе.

На ревью предлагается чекнуть предлагаемую структуру библиотеки, названия модулей ну и сам описательный текст документа.
opened by ivkireev86 1
KL cyclostationarity test tools

Test provides a hystogram with self-samples similarity vs. random sample similarity. Shows compatibility with CoLES.

Think about tests for other frameworks.

opened by ivkireev86 0
Repair pyspark tests
def test_dt_to_timestamp(): spark = SparkSession.builder.getOrCreate() df = spark.createDataFrame(data=[ {'dt': '1970-01-01 00:00:00'}, {'dt': '2012-01-01 12:01:16'}, {'dt': '2021-12-30 00:00:00'} ])

df = df.withColumn('ts', dt_to_timestamp('dt')) ts = [rec.ts for rec in df.select('ts').collect()]

assert ts == [0, 1325419276, 1640822400]

E assert [-10800, 1325...6, 1640811600] == [0, 1325419276, 1640822400] E At index 0 diff: -10800 != 0 E Use -v to get more diff

ptls_tests/test_preprocessing/test_pyspark/test_event_time.py:16: AssertionError

def test_datetime_to_timestamp(): t = DatetimeToTimestamp(col_name_original='dt') spark = SparkSession.builder.getOrCreate() df = spark.createDataFrame(data=[ {'dt': '1970-01-01 00:00:00', 'rn': 1}, {'dt': '2012-01-01 12:01:16', 'rn': 2}, {'dt': '2021-12-30 00:00:00', 'rn': 3} ]) df = t.fit_transform(df) et = [rec.event_time for rec in df.select('event_time').collect()]

assert et[0] == 0

E assert -10800 == 0

ptls_tests/test_preprocessing/test_pyspark/test_event_time.py:48: AssertionError
opened by ikretus 0
docs. Development guide (for demo notebooks)
add current patterns

when model training start print message "model training stats, please wait. See tensorboard to track progress", use it with enable_progress=False

documentation user feedback
opened by ivkireev86 0

Releases(v0.5.1)

v0.5.1(Dec 28, 2022)
What's Changed

fixed cpc import by @ArtyomVorobev in https://github.com/dllllb/pytorch-lifestream/pull/90

add softmaxloss and tests by @ArtyomVorobev in https://github.com/dllllb/pytorch-lifestream/pull/87

MLM NSP Module by @mazitovs in https://github.com/dllllb/pytorch-lifestream/pull/88

fix test dropout error by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/91

New Contributors

@ArtyomVorobev made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/90

@mazitovs made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/88

Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.5.0...v0.5.1
Source code(tar.gz)
Source code(zip)
v0.5.0(Nov 9, 2022)
What's Changed

Fix metrics reset by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/72

Pandas preprocessing without df copy, faster preprocessing for large datasets by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/73

fix in supervised-sequence-to-target.ipynb by @blinovpd in https://github.com/dllllb/pytorch-lifestream/pull/74

ptls.nn.PBDropout by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/75

tanh for rnn starter by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/76

Auc regr metric by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/78

spatial dropout for NoisyEmbedding, LastMaxAvgEncoder, warning for bidir RnnEncoder by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/80

Hparam tuning demo. hydra, optuna, tensorboard by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/81

tabformer by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/83

Supervised Coles Module, trx_encoder refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/84

New Contributors

@blinovpd made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/74

Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.4.0...v0.5.0
Source code(tar.gz)
Source code(zip)
v0.4.0(Jul 27, 2022)
What's Changed

Seq encoder refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/29

regr.task ZILNLoss, RMSE, BucketAccuracy by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/36

lighting modules and nn layers refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/34

Demo colab by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/40

Fix drop target arrays by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/42

feature naming by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/43

Update abs_module.py by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/37

Extended inference demo by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/45

fix import path by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/46

Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/50

Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/52

Target dist by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/58

Data load refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/60

doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/62

doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/63

New Contributors

@ikretus made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/36

Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.3.0...v0.4.0

What's Changed

Seq encoder refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/29

regr.task ZILNLoss, RMSE, BucketAccuracy by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/36

lighting modules and nn layers refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/34

Demo colab by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/40

Fix drop target arrays by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/42

feature naming by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/43

Update abs_module.py by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/37

Extended inference demo by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/45

fix import path by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/46

Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/50

Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/52

Target dist by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/58

Data load refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/60

doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/62

doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/63

New Contributors

@ikretus made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/36

Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.3.0...v0.4.0

What's Changed

Seq encoder refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/29

regr.task ZILNLoss, RMSE, BucketAccuracy by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/36

lighting modules and nn layers refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/34

Demo colab by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/40

Fix drop target arrays by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/42

feature naming by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/43

Update abs_module.py by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/37

Extended inference demo by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/45

fix import path by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/46

Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/50

Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/52

Target dist by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/58

Data load refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/60

doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/62

doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/63

New Contributors

@ikretus made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/36

Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.3.0...v0.4.0
Source code(tar.gz)
Source code(zip)
v0.3.0(Jun 12, 2022)
More Pythonic Core API: constructor arguments instead of config objects

What's Changed

cpc params by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/9

All modules by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/15

Mlm pretrain by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/13

all encoders and get rid of get_loss by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/19

init by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/20

Documentation index by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/8

Demos api update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/18

loss output correction by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/22

Test fixes by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/23

readme_demo_link by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/25

init by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/26

work without logger by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/7

trx_encoder refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/28

Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.1.2...v0.3.0
Source code(tar.gz)
Source code(zip)

Owner

Dmitri Babaev

GitHub Repository

Code for the Paper: Alexandra Lindt and Emiel Hoogeboom.

Discrete Denoising Flows This repository contains the code for the experiments presented in the paper Discrete Denoising Flows [1]. To give a short ov

3 Oct 09, 2022

Aerial Single-View Depth Completion with Image-Guided Uncertainty Estimation (RA-L/ICRA 2020)

Aerial Depth Completion This work is described in the letter "Aerial Single-View Depth Completion with Image-Guided Uncertainty Estimation", by Lucas

70 Dec 22, 2022

Code for Towards Streaming Perception (ECCV 2020) :car:

sAP — Code for Towards Streaming Perception ECCV Best Paper Honorable Mention Award Feb 2021: Announcing the Streaming Perception Challenge (CVPR 2021

85 Dec 22, 2022

Explaining in Style: Training a GAN to explain a classifier in StyleSpace

Explaining in Style: Official TensorFlow Colab Explaining in Style: Training a GAN to explain a classifier in StyleSpace Oran Lang, Yossi Gandelsman,

197 Nov 08, 2022

TransMorph: Transformer for Medical Image Registration

TransMorph: Transformer for Medical Image Registration keywords: Vision Transformer, Swin Transformer, convolutional neural networks, image registrati

180 Jan 07, 2023

A collection of scripts I developed for personal and working projects.

A collection of scripts I developed for personal and working projects Table of contents Introduction Repository diagram structure List of scripts pyth

109 Dec 26, 2022

Codes for [NeurIPS'21] You are caught stealing my winning lottery ticket! Making a lottery ticket claim its ownership.

You are caught stealing my winning lottery ticket! Making a lottery ticket claim its ownership Codes for [NeurIPS'21] You are caught stealing my winni

8 Nov 01, 2022

Predictive AI layer for existing databases.

MindsDB is an open-source AI layer for existing databases that allows you to effortlessly develop, train and deploy state-of-the-art machine learning

12.2k Jan 03, 2023

A library for graph deep learning research

Documentation | Paper [JMLR] | Tutorials | Benchmarks | Examples DIG: Dive into Graphs is a turnkey library for graph deep learning research. Why DIG?

1.3k Jan 01, 2023

A simple program for training and testing vit

Vit This is a simple program for training and testing vit. Key requirements: torch, torchvision and timm. Dataset I put 5 categories of the cub classi

2 Oct 11, 2022

Unofficial Pytorch Lightning implementation of Contrastive Syn-to-Real Generalization (ICLR, 2021)

17 Sep 23, 2021

Sharpened cosine similarity torch - A Sharpened Cosine Similarity layer for PyTorch

Sharpened Cosine Similarity A layer implementation for PyTorch Install At your c

203 Nov 30, 2022

Data Consistency for Magnetic Resonance Imaging

Data Consistency for Magnetic Resonance Imaging Data Consistency (DC) is crucial for generalization in multi-modal MRI data and robustness in detectin

19 Dec 12, 2022

MetaDrive: Composing Diverse Scenarios for Generalizable Reinforcement Learning

MetaDrive: Composing Diverse Driving Scenarios for Generalizable RL [ Documentation | Demo Video ] MetaDrive is a driving simulator with the following

276 Jan 04, 2023

Simple and understandable swin-transformer OCR project

swin-transformer-ocr ocr with swin-transformer Overview Simple and understandable swin-transformer OCR project. The model in this repository heavily r

67 Dec 31, 2022

MusicYOLO framework uses the object detection model, YOLOx, to locate notes in the spectrogram.

MusicYOLO MusicYOLO framework uses the object detection model, YOLOX, to locate notes in the spectrogram. Its performance on the ISMIR2014 dataset, MI

2 Aug 02, 2022

The original implementation of TNDM used in the NeurIPS 2021 paper (no longer being updated)

TNDM - Targeted Neural Dynamical Modeling Note: This code is no longer being updated. The official re-implementation can be found at: https://github.c

1 Jul 21, 2022

PyTorch implementation of EfficientNetV2

[NEW!] Check out our latest work involution accepted to CVPR'21 that introduces a new neural operator, other than convolution and self-attention. PyTo

375 Jan 03, 2023

Tom-the-AI - A compound artificial intelligence software for Linux systems.

Tom the AI (version 0.82) WARNING: This software is not yet ready to use, I'm still setting up the GitHub repository. Should be ready in a few days. T

2 Apr 28, 2022

DWIPrep is a robust and easy-to-use pipeline for preprocessing of diverse dMRI data.

DWIPrep: A Robust Preprocessing Pipeline for dMRI Data DWIPrep is a robust and easy-to-use pipeline for preprocessing of diverse dMRI data. The transp

1 Jan 09, 2023

A library built upon PyTorch for building embeddings on discrete event sequences using self-supervision

Related tags

Overview

Install from PyPi

Install from source

Demo notebooks

Experiments on public datasets

Comments

Releases(v0.5.1)

v0.5.1(Dec 28, 2022)

What's Changed

New Contributors

v0.5.0(Nov 9, 2022)

What's Changed

New Contributors

v0.4.0(Jul 27, 2022)

What's Changed

New Contributors

What's Changed

New Contributors

What's Changed

New Contributors

v0.3.0(Jun 12, 2022)

What's Changed

Owner

Dmitri Babaev

Code for the Paper: Alexandra Lindt and Emiel Hoogeboom.

Aerial Single-View Depth Completion with Image-Guided Uncertainty Estimation (RA-L/ICRA 2020)

Code for Towards Streaming Perception (ECCV 2020) :car:

Explaining in Style: Training a GAN to explain a classifier in StyleSpace

TransMorph: Transformer for Medical Image Registration

A collection of scripts I developed for personal and working projects.

Codes for [NeurIPS'21] You are caught stealing my winning lottery ticket! Making a lottery ticket claim its ownership.

Predictive AI layer for existing databases.

A library for graph deep learning research

A simple program for training and testing vit

Unofficial Pytorch Lightning implementation of Contrastive Syn-to-Real Generalization (ICLR, 2021)

Sharpened cosine similarity torch - A Sharpened Cosine Similarity layer for PyTorch

Data Consistency for Magnetic Resonance Imaging

MetaDrive: Composing Diverse Scenarios for Generalizable Reinforcement Learning

Simple and understandable swin-transformer OCR project

MusicYOLO framework uses the object detection model, YOLOx, to locate notes in the spectrogram.

The original implementation of TNDM used in the NeurIPS 2021 paper (no longer being updated)

PyTorch implementation of EfficientNetV2

Tom-the-AI - A compound artificial intelligence software for Linux systems.

DWIPrep is a robust and easy-to-use pipeline for preprocessing of diverse dMRI data.