A library built upon PyTorch for building embeddings on discrete event sequences using self-supervision

Overview

pytorch-lifestream a library built upon PyTorch for building embeddings on discrete event sequences using self-supervision. It can process terabyte-size volumes of raw events like game history events, clickstream data, purchase history or card transactions.

It supports various methods of self-supervised training, adapted for event sequences:

  • Contrastive Learning for Event Sequences (CoLES)
  • Contrastive Predictive Coding (CPC)
  • Replaced Token Detection (RTD) from ELECTRA
  • Next Sequence Prediction (NSP) from BERT
  • Sequences Order Prediction (SOP) from ALBERT

It supports several types of encoders, including Transformer and RNN. It also supports many types of self-supervised losses.

The following variants of the contrastive losses are supported:

Install from PyPi

pip install pytorch-lifestream

Install from source

# Ubuntu 20.04

sudo apt install python3.8 python3-venv
pip3 install pipenv

pipenv sync  --dev # install packages exactly as specified in Pipfile.lock
pipenv shell
pytest

Demo notebooks

  • Self-supervided training and embeddings for downstream task notebook
  • Self-supervided embeddings in CatBoost notebook
  • Self-supervided training and fine-tuning notebook
  • PySpark and Parquet for data preprocessing notebook

Experiments on public datasets

pytorch-lifestream usage experiments on several public event datasets are available in the separate repo

Comments
  • torch.stack in def collate_feature_dict

    torch.stack in def collate_feature_dict

    ptls/data_load/utils.py

    Hello!

    If the dataloader has a feature called target. And the batchsize is not a multiple of the length of the dataset, then an error pops up on the last batch: "Sizes of tensors must match except in dimension 0". Due to the use of torch.staсk when processing a feature startwith 'target'.

    opened by Ivanich-spb 11
  • Not supported multiGPU option from pytorchlightning.Trainer

    Not supported multiGPU option from pytorchlightning.Trainer

    Try to set Trainer(gpus=[0,1]), while using PtlsDataModule as data module, get such error:

    AttributeError: Can't pickle local object 'PtlsDataModule.__init__.<locals>.train_dataloader'

    opened by mazitovs 1
  • Correct seq_len for feature dict

    Correct seq_len for feature dict

    rec = {
        'mcc': [0, 1, 2, 3],
        'target_distribution': [0.1, 0.2, 0.4, 0.1, 0.1, 0.0],
    }
    

    How to get correct seq_len. true len: 4 possible length: 4, 6 'target_distribution' is incorrect field to get length, this is not a sequence, this is an array

    opened by ivkireev86 1
  • Save categories encodings along with model weights in demos

    Save categories encodings along with model weights in demos

    Вместе с обученной моделью необходимо сохранять обученный препроцессор и разбивку на трейн-тест. Иначе категории могут поехать и сохраненная предобученная модель станет бесполезной.

    opened by ivkireev86 1
  • Documentation index

    Documentation index

    Прототип главной страницы документации. Три секции:

    • описание моделей библиотеки
    • гайд как использовать библиотеку
    • как писать свои компоненты

    Есть краткое описание и ссылки на подробные (которые напишем потом).

    В описании модулей предложена структура библиотеки. Предполагается, что мы эти модули в ближайшее создадим и перетащим туда соответсвующие классы из библиотеки. Старые, модули, которые станут пустыми, удалим. Далее будем придерживаться схемы, описанной в этом документе.

    На ревью предлагается чекнуть предлагаемую структуру библиотеки, названия модулей ну и сам описательный текст документа.

    opened by ivkireev86 1
  • KL cyclostationarity test tools

    KL cyclostationarity test tools

    Test provides a hystogram with self-samples similarity vs. random sample similarity. Shows compatibility with CoLES.

    Think about tests for other frameworks.

    opened by ivkireev86 0
  • Repair pyspark tests

    Repair pyspark tests

    def test_dt_to_timestamp(): spark = SparkSession.builder.getOrCreate() df = spark.createDataFrame(data=[ {'dt': '1970-01-01 00:00:00'}, {'dt': '2012-01-01 12:01:16'}, {'dt': '2021-12-30 00:00:00'} ])

        df = df.withColumn('ts', dt_to_timestamp('dt'))
        ts = [rec.ts for rec in df.select('ts').collect()]
    
      assert ts == [0, 1325419276, 1640822400]
    

    E assert [-10800, 1325...6, 1640811600] == [0, 1325419276, 1640822400] E At index 0 diff: -10800 != 0 E Use -v to get more diff

    ptls_tests/test_preprocessing/test_pyspark/test_event_time.py:16: AssertionError


    def test_datetime_to_timestamp(): t = DatetimeToTimestamp(col_name_original='dt') spark = SparkSession.builder.getOrCreate() df = spark.createDataFrame(data=[ {'dt': '1970-01-01 00:00:00', 'rn': 1}, {'dt': '2012-01-01 12:01:16', 'rn': 2}, {'dt': '2021-12-30 00:00:00', 'rn': 3} ]) df = t.fit_transform(df) et = [rec.event_time for rec in df.select('event_time').collect()]

      assert et[0] == 0
    

    E assert -10800 == 0

    ptls_tests/test_preprocessing/test_pyspark/test_event_time.py:48: AssertionError

    opened by ikretus 0
  • docs. Development guide (for demo notebooks)

    docs. Development guide (for demo notebooks)

    • add current patterns
    • when model training start print message "model training stats, please wait. See tensorboard to track progress", use it with enable_progress=False
    documentation user feedback 
    opened by ivkireev86 0
Releases(v0.5.1)
  • v0.5.1(Dec 28, 2022)

    What's Changed

    • fixed cpc import by @ArtyomVorobev in https://github.com/dllllb/pytorch-lifestream/pull/90
    • add softmaxloss and tests by @ArtyomVorobev in https://github.com/dllllb/pytorch-lifestream/pull/87
    • MLM NSP Module by @mazitovs in https://github.com/dllllb/pytorch-lifestream/pull/88
    • fix test dropout error by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/91

    New Contributors

    • @ArtyomVorobev made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/90
    • @mazitovs made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/88

    Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.5.0...v0.5.1

    Source code(tar.gz)
    Source code(zip)
  • v0.5.0(Nov 9, 2022)

    What's Changed

    • Fix metrics reset by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/72
    • Pandas preprocessing without df copy, faster preprocessing for large datasets by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/73
    • fix in supervised-sequence-to-target.ipynb by @blinovpd in https://github.com/dllllb/pytorch-lifestream/pull/74
    • ptls.nn.PBDropout by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/75
    • tanh for rnn starter by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/76
    • Auc regr metric by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/78
    • spatial dropout for NoisyEmbedding, LastMaxAvgEncoder, warning for bidir RnnEncoder by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/80
    • Hparam tuning demo. hydra, optuna, tensorboard by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/81
    • tabformer by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/83
    • Supervised Coles Module, trx_encoder refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/84

    New Contributors

    • @blinovpd made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/74

    Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.4.0...v0.5.0

    Source code(tar.gz)
    Source code(zip)
  • v0.4.0(Jul 27, 2022)

    What's Changed

    • Seq encoder refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/29
    • regr.task ZILNLoss, RMSE, BucketAccuracy by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/36
    • lighting modules and nn layers refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/34
    • Demo colab by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/40
    • Fix drop target arrays by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/42
    • feature naming by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/43
    • Update abs_module.py by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/37
    • Extended inference demo by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/45
    • fix import path by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/46
    • Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/50
    • Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/52
    • Target dist by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/58
    • Data load refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/60
    • doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/62
    • doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/63

    New Contributors

    • @ikretus made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/36

    Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.3.0...v0.4.0

    What's Changed

    • Seq encoder refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/29
    • regr.task ZILNLoss, RMSE, BucketAccuracy by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/36
    • lighting modules and nn layers refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/34
    • Demo colab by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/40
    • Fix drop target arrays by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/42
    • feature naming by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/43
    • Update abs_module.py by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/37
    • Extended inference demo by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/45
    • fix import path by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/46
    • Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/50
    • Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/52
    • Target dist by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/58
    • Data load refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/60
    • doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/62
    • doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/63

    New Contributors

    • @ikretus made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/36

    Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.3.0...v0.4.0

    What's Changed

    • Seq encoder refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/29
    • regr.task ZILNLoss, RMSE, BucketAccuracy by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/36
    • lighting modules and nn layers refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/34
    • Demo colab by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/40
    • Fix drop target arrays by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/42
    • feature naming by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/43
    • Update abs_module.py by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/37
    • Extended inference demo by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/45
    • fix import path by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/46
    • Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/50
    • Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/52
    • Target dist by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/58
    • Data load refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/60
    • doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/62
    • doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/63

    New Contributors

    • @ikretus made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/36

    Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.3.0...v0.4.0

    Source code(tar.gz)
    Source code(zip)
  • v0.3.0(Jun 12, 2022)

    More Pythonic Core API: constructor arguments instead of config objects

    What's Changed

    • cpc params by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/9
    • All modules by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/15
    • Mlm pretrain by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/13
    • all encoders and get rid of get_loss by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/19
    • init by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/20
    • Documentation index by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/8
    • Demos api update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/18
    • loss output correction by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/22
    • Test fixes by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/23
    • readme_demo_link by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/25
    • init by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/26
    • work without logger by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/7
    • trx_encoder refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/28

    Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.1.2...v0.3.0

    Source code(tar.gz)
    Source code(zip)
Owner
Dmitri Babaev
Dmitri Babaev
AdelaiDet is an open source toolbox for multiple instance-level detection and recognition tasks.

AdelaiDet is an open source toolbox for multiple instance-level detection and recognition tasks.

Adelaide Intelligent Machines (AIM) Group 3k Jan 02, 2023
We present a framework for training multi-modal deep learning models on unlabelled video data by forcing the network to learn invariances to transformations applied to both the audio and video streams.

Multi-Modal Self-Supervision using GDT and StiCa This is an official pytorch implementation of papers: Multi-modal Self-Supervision from Generalized D

Facebook Research 42 Dec 09, 2022
'A C2C E-COMMERCE TRUST MODEL BASED ON REPUTATION' Python implementation

Project description A library providing functionalities to calculate reputation and degree of trust on C2C ecommerce platforms. The work is fully base

Davide Bigotti 2 Dec 14, 2022
The Adapter-Bot: All-In-One Controllable Conversational Model

The Adapter-Bot: All-In-One Controllable Conversational Model This is the implementation of the paper: The Adapter-Bot: All-In-One Controllable Conver

CAiRE 37 Nov 04, 2022
Implementation of hyperparameter optimization/tuning methods for machine learning & deep learning models

Hyperparameter Optimization of Machine Learning Algorithms This code provides a hyper-parameter optimization implementation for machine learning algor

Li Yang 1.1k Dec 19, 2022
The official repository for "Score Transformer: Generating Musical Scores from Note-level Representation" (MMAsia '21)

Score Transformer This is the official repository for "Score Transformer": Score Transformer: Generating Musical Scores from Note-level Representation

22 Dec 22, 2022
A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.

TorchRL Disclaimer This library is not officially released yet and is subject to change. The features are available before an official release so that

Meta Research 860 Jan 07, 2023
Task-based end-to-end model learning in stochastic optimization

Task-based End-to-end Model Learning in Stochastic Optimization This repository is by Priya L. Donti, Brandon Amos, and J. Zico Kolter and contains th

CMU Locus Lab 164 Dec 29, 2022
ML powered analytics engine for outlier detection and root cause analysis.

Website • Docs • Blog • LinkedIn • Community Slack ML powered analytics engine for outlier detection and root cause analysis ✨ What is Chaos Genius? C

Chaos Genius 523 Jan 04, 2023
Export CenterPoint PonintPillars ONNX Model For TensorRT

CenterPoint-PonintPillars Pytroch model convert to ONNX and TensorRT Welcome to CenterPoint! This project is fork from tianweiy/CenterPoint. I impleme

CarkusL 149 Dec 13, 2022
Repository of best practices for deep learning in Julia, inspired by fastai

FastAI Docs: Stable | Dev FastAI.jl is inspired by fastai, and is a repository of best practices for deep learning in Julia. Its goal is to easily ena

FluxML 532 Jan 02, 2023
Magic tool for managing internet connection in local network by @zalexdev

Megacut ✂️ A new powerful Python3 tool for managing internet on a local network Installation git clone https://github.com/stryker-project/megacut cd m

Stryker 12 Dec 15, 2022
StyleGAN2 - Official TensorFlow Implementation

StyleGAN2 - Official TensorFlow Implementation

NVIDIA Research Projects 10.1k Dec 28, 2022
Near-Duplicate Video Retrieval with Deep Metric Learning

Near-Duplicate Video Retrieval with Deep Metric Learning This repository contains the Tensorflow implementation of the paper Near-Duplicate Video Retr

2 Jan 24, 2022
Reinforcement learning library in JAX.

Reinforcement learning library in JAX.

Yicheng Luo 96 Oct 30, 2022
HashNeRF-pytorch - Pure PyTorch Implementation of NVIDIA paper on Instant Training of Neural Graphics primitives

HashNeRF-pytorch Instant-NGP recently introduced a Multi-resolution Hash Encodin

Yash Sanjay Bhalgat 616 Jan 06, 2023
Implementation of Change-Based Exploration Transfer (C-BET)

Implementation of Change-Based Exploration Transfer (C-BET), as presented in Interesting Object, Curious Agent: Learning Task-Agnostic Exploration.

Simone Parisi 29 Dec 04, 2022
Code and data accompanying our SVRHM'21 paper.

Code and data accompanying our SVRHM'21 paper. Requires tensorflow 1.13, python 3.7, scikit-learn, and pytorch 1.6.0 to be installed. Python scripts i

5 Nov 17, 2021
Implementation of "Semi-supervised Domain Adaptive Structure Learning"

Semi-supervised Domain Adaptive Structure Learning - ASDA This repo contains the source code and dataset for our ASDA paper. Illustration of the propo

3 Dec 13, 2021
Code release for NeuS

NeuS We present a novel neural surface reconstruction method, called NeuS, for reconstructing objects and scenes with high fidelity from 2D image inpu

Peng Wang 813 Jan 04, 2023