A library built upon PyTorch for building embeddings on discrete event sequences using self-supervision

Overview

pytorch-lifestream a library built upon PyTorch for building embeddings on discrete event sequences using self-supervision. It can process terabyte-size volumes of raw events like game history events, clickstream data, purchase history or card transactions.

It supports various methods of self-supervised training, adapted for event sequences:

  • Contrastive Learning for Event Sequences (CoLES)
  • Contrastive Predictive Coding (CPC)
  • Replaced Token Detection (RTD) from ELECTRA
  • Next Sequence Prediction (NSP) from BERT
  • Sequences Order Prediction (SOP) from ALBERT

It supports several types of encoders, including Transformer and RNN. It also supports many types of self-supervised losses.

The following variants of the contrastive losses are supported:

Install from PyPi

pip install pytorch-lifestream

Install from source

# Ubuntu 20.04

sudo apt install python3.8 python3-venv
pip3 install pipenv

pipenv sync  --dev # install packages exactly as specified in Pipfile.lock
pipenv shell
pytest

Demo notebooks

  • Self-supervided training and embeddings for downstream task notebook
  • Self-supervided embeddings in CatBoost notebook
  • Self-supervided training and fine-tuning notebook
  • PySpark and Parquet for data preprocessing notebook

Experiments on public datasets

pytorch-lifestream usage experiments on several public event datasets are available in the separate repo

Comments
  • torch.stack in def collate_feature_dict

    torch.stack in def collate_feature_dict

    ptls/data_load/utils.py

    Hello!

    If the dataloader has a feature called target. And the batchsize is not a multiple of the length of the dataset, then an error pops up on the last batch: "Sizes of tensors must match except in dimension 0". Due to the use of torch.staсk when processing a feature startwith 'target'.

    opened by Ivanich-spb 11
  • Not supported multiGPU option from pytorchlightning.Trainer

    Not supported multiGPU option from pytorchlightning.Trainer

    Try to set Trainer(gpus=[0,1]), while using PtlsDataModule as data module, get such error:

    AttributeError: Can't pickle local object 'PtlsDataModule.__init__.<locals>.train_dataloader'

    opened by mazitovs 1
  • Correct seq_len for feature dict

    Correct seq_len for feature dict

    rec = {
        'mcc': [0, 1, 2, 3],
        'target_distribution': [0.1, 0.2, 0.4, 0.1, 0.1, 0.0],
    }
    

    How to get correct seq_len. true len: 4 possible length: 4, 6 'target_distribution' is incorrect field to get length, this is not a sequence, this is an array

    opened by ivkireev86 1
  • Save categories encodings along with model weights in demos

    Save categories encodings along with model weights in demos

    Вместе с обученной моделью необходимо сохранять обученный препроцессор и разбивку на трейн-тест. Иначе категории могут поехать и сохраненная предобученная модель станет бесполезной.

    opened by ivkireev86 1
  • Documentation index

    Documentation index

    Прототип главной страницы документации. Три секции:

    • описание моделей библиотеки
    • гайд как использовать библиотеку
    • как писать свои компоненты

    Есть краткое описание и ссылки на подробные (которые напишем потом).

    В описании модулей предложена структура библиотеки. Предполагается, что мы эти модули в ближайшее создадим и перетащим туда соответсвующие классы из библиотеки. Старые, модули, которые станут пустыми, удалим. Далее будем придерживаться схемы, описанной в этом документе.

    На ревью предлагается чекнуть предлагаемую структуру библиотеки, названия модулей ну и сам описательный текст документа.

    opened by ivkireev86 1
  • KL cyclostationarity test tools

    KL cyclostationarity test tools

    Test provides a hystogram with self-samples similarity vs. random sample similarity. Shows compatibility with CoLES.

    Think about tests for other frameworks.

    opened by ivkireev86 0
  • Repair pyspark tests

    Repair pyspark tests

    def test_dt_to_timestamp(): spark = SparkSession.builder.getOrCreate() df = spark.createDataFrame(data=[ {'dt': '1970-01-01 00:00:00'}, {'dt': '2012-01-01 12:01:16'}, {'dt': '2021-12-30 00:00:00'} ])

        df = df.withColumn('ts', dt_to_timestamp('dt'))
        ts = [rec.ts for rec in df.select('ts').collect()]
    
      assert ts == [0, 1325419276, 1640822400]
    

    E assert [-10800, 1325...6, 1640811600] == [0, 1325419276, 1640822400] E At index 0 diff: -10800 != 0 E Use -v to get more diff

    ptls_tests/test_preprocessing/test_pyspark/test_event_time.py:16: AssertionError


    def test_datetime_to_timestamp(): t = DatetimeToTimestamp(col_name_original='dt') spark = SparkSession.builder.getOrCreate() df = spark.createDataFrame(data=[ {'dt': '1970-01-01 00:00:00', 'rn': 1}, {'dt': '2012-01-01 12:01:16', 'rn': 2}, {'dt': '2021-12-30 00:00:00', 'rn': 3} ]) df = t.fit_transform(df) et = [rec.event_time for rec in df.select('event_time').collect()]

      assert et[0] == 0
    

    E assert -10800 == 0

    ptls_tests/test_preprocessing/test_pyspark/test_event_time.py:48: AssertionError

    opened by ikretus 0
  • docs. Development guide (for demo notebooks)

    docs. Development guide (for demo notebooks)

    • add current patterns
    • when model training start print message "model training stats, please wait. See tensorboard to track progress", use it with enable_progress=False
    documentation user feedback 
    opened by ivkireev86 0
Releases(v0.5.1)
  • v0.5.1(Dec 28, 2022)

    What's Changed

    • fixed cpc import by @ArtyomVorobev in https://github.com/dllllb/pytorch-lifestream/pull/90
    • add softmaxloss and tests by @ArtyomVorobev in https://github.com/dllllb/pytorch-lifestream/pull/87
    • MLM NSP Module by @mazitovs in https://github.com/dllllb/pytorch-lifestream/pull/88
    • fix test dropout error by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/91

    New Contributors

    • @ArtyomVorobev made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/90
    • @mazitovs made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/88

    Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.5.0...v0.5.1

    Source code(tar.gz)
    Source code(zip)
  • v0.5.0(Nov 9, 2022)

    What's Changed

    • Fix metrics reset by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/72
    • Pandas preprocessing without df copy, faster preprocessing for large datasets by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/73
    • fix in supervised-sequence-to-target.ipynb by @blinovpd in https://github.com/dllllb/pytorch-lifestream/pull/74
    • ptls.nn.PBDropout by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/75
    • tanh for rnn starter by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/76
    • Auc regr metric by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/78
    • spatial dropout for NoisyEmbedding, LastMaxAvgEncoder, warning for bidir RnnEncoder by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/80
    • Hparam tuning demo. hydra, optuna, tensorboard by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/81
    • tabformer by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/83
    • Supervised Coles Module, trx_encoder refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/84

    New Contributors

    • @blinovpd made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/74

    Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.4.0...v0.5.0

    Source code(tar.gz)
    Source code(zip)
  • v0.4.0(Jul 27, 2022)

    What's Changed

    • Seq encoder refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/29
    • regr.task ZILNLoss, RMSE, BucketAccuracy by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/36
    • lighting modules and nn layers refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/34
    • Demo colab by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/40
    • Fix drop target arrays by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/42
    • feature naming by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/43
    • Update abs_module.py by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/37
    • Extended inference demo by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/45
    • fix import path by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/46
    • Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/50
    • Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/52
    • Target dist by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/58
    • Data load refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/60
    • doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/62
    • doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/63

    New Contributors

    • @ikretus made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/36

    Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.3.0...v0.4.0

    What's Changed

    • Seq encoder refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/29
    • regr.task ZILNLoss, RMSE, BucketAccuracy by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/36
    • lighting modules and nn layers refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/34
    • Demo colab by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/40
    • Fix drop target arrays by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/42
    • feature naming by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/43
    • Update abs_module.py by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/37
    • Extended inference demo by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/45
    • fix import path by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/46
    • Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/50
    • Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/52
    • Target dist by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/58
    • Data load refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/60
    • doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/62
    • doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/63

    New Contributors

    • @ikretus made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/36

    Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.3.0...v0.4.0

    What's Changed

    • Seq encoder refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/29
    • regr.task ZILNLoss, RMSE, BucketAccuracy by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/36
    • lighting modules and nn layers refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/34
    • Demo colab by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/40
    • Fix drop target arrays by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/42
    • feature naming by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/43
    • Update abs_module.py by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/37
    • Extended inference demo by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/45
    • fix import path by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/46
    • Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/50
    • Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/52
    • Target dist by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/58
    • Data load refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/60
    • doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/62
    • doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/63

    New Contributors

    • @ikretus made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/36

    Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.3.0...v0.4.0

    Source code(tar.gz)
    Source code(zip)
  • v0.3.0(Jun 12, 2022)

    More Pythonic Core API: constructor arguments instead of config objects

    What's Changed

    • cpc params by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/9
    • All modules by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/15
    • Mlm pretrain by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/13
    • all encoders and get rid of get_loss by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/19
    • init by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/20
    • Documentation index by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/8
    • Demos api update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/18
    • loss output correction by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/22
    • Test fixes by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/23
    • readme_demo_link by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/25
    • init by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/26
    • work without logger by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/7
    • trx_encoder refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/28

    Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.1.2...v0.3.0

    Source code(tar.gz)
    Source code(zip)
Owner
Dmitri Babaev
Dmitri Babaev
Temporal Segment Networks (TSN) in PyTorch

TSN-Pytorch We have released MMAction, a full-fledged action understanding toolbox based on PyTorch. It includes implementation for TSN as well as oth

1k Jan 03, 2023
The implementation for paper Joint t-SNE for Comparable Projections of Multiple High-Dimensional Datasets.

Joint t-sne This is the implementation for paper Joint t-SNE for Comparable Projections of Multiple High-Dimensional Datasets. abstract: We present Jo

IDEAS Lab 7 Dec 18, 2022
Generate high quality pictures. GAN. Generative Adversarial Networks

ESRGAN generate high quality pictures. GAN. Generative Adversarial Networks """ Super-resolution of CelebA using Generative Adversarial Networks. The

Lieon 1 Dec 14, 2021
Add gui for YoloV5 using PyQt5

HEAD 更新2021.08.16 **添加图片和视频保存功能: 1.图片和视频按照当前系统时间进行命名 2.各自检测结果存放入output文件夹 3.摄像头检测的默认设备序号更改为0,减少调试报错 温馨提示: 1.项目放置在全英文路径下,防止项目报错 2.默认使用cpu进行检测,自

Ruihao Wang 65 Dec 27, 2022
Implementation of ECCV20 paper: the devil is in classification: a simple framework for long-tail object detection and instance segmentation

Implementation of our ECCV 2020 paper The Devil is in Classification: A Simple Framework for Long-tail Instance Segmentation This repo contains code o

twang 98 Sep 17, 2022
EDCNN: Edge enhancement-based Densely Connected Network with Compound Loss for Low-Dose CT Denoising

EDCNN: Edge enhancement-based Densely Connected Network with Compound Loss for Low-Dose CT Denoising By Tengfei Liang, Yi Jin, Yidong Li, Tao Wang. Th

workingcoder 115 Jan 05, 2023
The official implementation of CircleNet: Anchor-free Detection with Circle Representation, MICCAI 2030

CircleNet: Anchor-free Detection with Circle Representation The official implementation of CircleNet, MICCAI 2020 [PyTorch] [project page] [MICCAI pap

The Biomedical Data Representation and Learning Lab 45 Nov 18, 2022
PyTorch version implementation of DORN

DORN_PyTorch This is a PyTorch version implementation of DORN Reference H. Fu, M. Gong, C. Wang, K. Batmanghelich and D. Tao: Deep Ordinal Regression

Zilin.Zhang 3 Apr 27, 2022
The official implementation of the CVPR 2021 paper FAPIS: a Few-shot Anchor-free Part-based Instance Segmenter

FAPIS The official implementation of the CVPR 2021 paper FAPIS: a Few-shot Anchor-free Part-based Instance Segmenter Introduction This repo is primari

Khoi Nguyen 8 Dec 11, 2022
The repository contain code for building compiler using puthon.

Building Compiler This is a python implementation of JamieBuild's "Super Tiny Compiler" Overview JamieBuilds developed a wonderfully educative compile

Shyam Das Shrestha 1 Nov 21, 2021
A distributed, plug-n-play algorithm for multi-robot applications with a priori non-computable objective functions

A distributed, plug-n-play algorithm for multi-robot applications with a priori non-computable objective functions Kapoutsis, A.C., Chatzichristofis,

Athanasios Ch. Kapoutsis 5 Oct 15, 2022
Numbering permanent and deciduous teeth via deep instance segmentation in panoramic X-rays

Numbering permanent and deciduous teeth via deep instance segmentation in panoramic X-rays In this repo, you will find the instructions on how to requ

Intelligent Vision Research Lab 4 Jul 21, 2022
Generate pixel-style avatars with python.

face2pixel Generate pixel-style avatars with python. Run: Clone the project: git clone https://github.com/theodorecooper/face2pixel install requiremen

Theodore Cooper 2 May 11, 2022
Official implementation of "Refiner: Refining Self-attention for Vision Transformers".

RefinerViT This repo is the official implementation of "Refiner: Refining Self-attention for Vision Transformers". The repo is build on top of timm an

101 Dec 29, 2022
codes for "Scheduled Sampling Based on Decoding Steps for Neural Machine Translation" (long paper of EMNLP-2022)

Scheduled Sampling Based on Decoding Steps for Neural Machine Translation (EMNLP-2021 main conference) Contents Overview Background Quick to Use Furth

Adaxry 13 Jul 25, 2022
Automatically download the cwru data set, and then divide it into training data set and test data set

Automatically download the cwru data set, and then divide it into training data set and test data set.自动下载cwru数据集,然后分训练数据集和测试数据集

6 Jun 27, 2022
A static analysis library for computing graph representations of Python programs suitable for use with graph neural networks.

python_graphs This package is for computing graph representations of Python programs for machine learning applications. It includes the following modu

Google Research 258 Dec 29, 2022
Official repository of IMPROVING DEEP IMAGE MATTING VIA LOCAL SMOOTHNESS ASSUMPTION.

IMPROVING DEEP IMAGE MATTING VIA LOCAL SMOOTHNESS ASSUMPTION This is the official repository of IMPROVING DEEP IMAGE MATTING VIA LOCAL SMOOTHNESS ASSU

电线杆 14 Dec 15, 2022
A Pose Estimator for Dense Reconstruction with the Structured Light Illumination Sensor

Phase-SLAM A Pose Estimator for Dense Reconstruction with the Structured Light Illumination Sensor This open source is written by MATLAB Run Mode Open

Xi Zheng 14 Dec 19, 2022
Axel - 3D printed robotic hands and they controll with Raspberry Pi and Arduino combo

Axel It's our graduation project about 3D printed robotic hands and they control

0 Feb 14, 2022