Official repository for "Exploiting Session Information in BERT-based Session-aware Sequential Recommendation", SIGIR 2022 short.

Related tags

Deep Learningpytorch
Overview

Session-aware BERT4Rec

Official repository for "Exploiting Session Information in BERT-based Session-aware Sequential Recommendation", SIGIR 2022 short.

Everything in the paper is implemented (including vanilla BERT4Rec and SASRec), and can be reproduced.

Usage

1. Build Docker

./scripts/build.sh

2. Download dataset

Download corresponding datasets into some directory, such as ./roughs.

For Steam dataset, use version 2.

Rename datasets: 'ml1m' for MovieLens-1M, 'ml20m' for MovieLens-2M, 'steam2' for Steam.

3. Preprocess

  • --rough_root: for original dataset files
  • --data_root: for processed data files
python preprocess.py prepare ml1m --data_root ./data --rough_root ./roughs
python preprocess.py prepare ml20m --data_root ./data --rough_root ./roughs
python preprocess.py prepare steam2 --data_root ./data --rough_root ./roughs

For some stats:

python preprocess.py count stats --data_root ./data --rough_root ./roughs > dstats.tsv

4. Run

See default configuration setting in entry.py.

To modify configuration, make some directory under runs/ like ./runs/ml1m/bert4rec/vanilla/, and create config.json.

Sample Run Script

My x0.sh file that uses GPU No. 0:

runpy () {
    docker run \
        -it \
        --rm \
        --init \
        --gpus '"device=0"' \
        --shm-size 16G \
        --volume="$HOME/.cache/torch:/root/.cache/torch" \
        --volume="$PWD:/workspace" \
        session-aware-bert4rec \
        python "$@"
}

runpy entry.py ml1m/bert4rec/vanilla

Terminologies

The df_ prefix always means DataFrame from Pandas.

  • uid (str|int): User ID (unique).
  • iid (str|int): Item ID (unique).
  • sid (str|int): Session ID (unique), used only for session separation.
  • uindex (int): mapped index number of User ID, 1 ~ n.
  • iindex (int): mapped index number of Item ID, 1 ~ m.
  • timestamp (int): UNIX timestamp.

Data Files

After preprocessing, we'll have followings in each data/:dataset_name/ directory.

  • uid2uindex.pkl (dict): {uiduindex}.
  • iid2iindex.pkl (dict): {iidiindex}.
  • df_rows.pkl (df): column of (uindex, iindex, sid, timestamp), with no index.
  • train.pkl (dict): {uindex → [list of (iindex, sid, timestamp)]}.
  • valid.pkl (dict): {uindex → [list of (iindex, sid, timestamp)]}.
  • test.pkl (dict): {uindex → [list of (iindex, sid, timestamp)]}.
  • ns_random.pkl (dict): {uindex -> [list of iindex]}.
  • ns_popular.pkl (dict): {uindex -> [list of iindex]}.

Code References

Some toy examples of score matching algorithms written in PyTorch

toy_gradlogp This repo implements some toy examples of the following score matching algorithms in PyTorch: ssm-vr: sliced score matching with variance

Ending Hsiao 21 Dec 26, 2022
Tensorflow 2 implementations of the C-SimCLR and C-BYOL self-supervised visual representation methods from "Compressive Visual Representations" (NeurIPS 2021)

Compressive Visual Representations This repository contains the source code for our paper, Compressive Visual Representations. We developed informatio

Google Research 30 Nov 23, 2022
Neural-net-from-scratch - A simple Neural Network from scratch in Python using the Pymathrix library

A Simple Neural Network from scratch A Simple Neural Network from scratch in Pyt

Youssef Chafiqui 2 Jan 07, 2022
Invasive Plant Species Identification

Invasive_Plant_Species_Identification Used LiDAR Odometry and Mapping (LOAM) to create a 3D point cloud map which can be used to identify invasive pla

2 May 12, 2022
AI创造营 :Metaverse启动机之重构现世,结合PaddlePaddle 和 Wechaty 创造自己的聊天机器人

paddle-wechaty-Zodiac AI创造营 :Metaverse启动机之重构现世,结合PaddlePaddle 和 Wechaty 创造自己的聊天机器人 12星座若穿越科幻剧,会拥有什么超能力呢?快来迎接你的专属超能力吧! 现在很多年轻人都喜欢看科幻剧,像是复仇者系列,里面有很多英雄、超

105 Dec 22, 2022
Riemann Noise Injection With PyTorch

Riemann Noise Injection - PyTorch A module for modeling GAN noise injection based on Riemann geometry, as described in Ruili Feng, Deli Zhao, and Zhen

2 May 27, 2022
Generate saved_model, tfjs, tf-trt, EdgeTPU, CoreML, quantized tflite and .pb from .tflite.

tflite2tensorflow Generate saved_model, tfjs, tf-trt, EdgeTPU, CoreML, quantized tflite and .pb from .tflite. 1. Supported Layers No. TFLite Layer TF

Katsuya Hyodo 214 Dec 29, 2022
[ ICCV 2021 Oral ] Our method can estimate camera poses and neural radiance fields jointly when the cameras are initialized at random poses in complex scenarios (outside-in scenes, even with less texture or intense noise )

GNeRF This repository contains official code for the ICCV 2021 paper: GNeRF: GAN-based Neural Radiance Field without Posed Camera. This implementation

Quan Meng 191 Dec 26, 2022
Unofficial pytorch implementation for Self-critical Sequence Training for Image Captioning. and others.

An Image Captioning codebase This is a codebase for image captioning research. It supports: Self critical training from Self-critical Sequence Trainin

Ruotian(RT) Luo 906 Jan 03, 2023
FCOSR: A Simple Anchor-free Rotated Detector for Aerial Object Detection

FCOSR: A Simple Anchor-free Rotated Detector for Aerial Object Detection FCOSR: A Simple Anchor-free Rotated Detector for Aerial Object Detection arXi

59 Nov 29, 2022
An SE(3)-invariant autoencoder for generating the periodic structure of materials

Crystal Diffusion Variational AutoEncoder This software implementes Crystal Diffusion Variational AutoEncoder (CDVAE), which generates the periodic st

Tian Xie 94 Dec 10, 2022
Implement slightly different caffe-segnet in tensorflow

Tensorflow-SegNet Implement slightly different (see below for detail) SegNet in tensorflow, successfully trained segnet-basic in CamVid dataset. Due t

Tseng Kuan Lun 364 Oct 27, 2022
Reference code for the paper "Cross-Camera Convolutional Color Constancy" (ICCV 2021)

Cross-Camera Convolutional Color Constancy, ICCV 2021 (Oral) Mahmoud Afifi1,2, Jonathan T. Barron2, Chloe LeGendre2, Yun-Ta Tsai2, and Francois Bleibe

Mahmoud Afifi 76 Jan 07, 2023
HTSeq is a Python library to facilitate processing and analysis of data from high-throughput sequencing (HTS) experiments.

HTSeq DEVS: https://github.com/htseq/htseq DOCS: https://htseq.readthedocs.io A Python library to facilitate programmatic analysis of data from high-t

HTSeq 57 Dec 20, 2022
Image Segmentation Evaluation

Image Segmentation Evaluation Martin Keršner, [email protected] Evaluation

Martin Kersner 273 Oct 28, 2022
MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification

MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification

187 Dec 26, 2022
Online-compatible Unsupervised Non-resonant Anomaly Detection Repository

Online-compatible Unsupervised Non-resonant Anomaly Detection Repository Repository containing all scripts used in the studies of Online-compatible Un

0 Nov 09, 2021
Caffe implementation for Hu et al. Segmentation for Natural Language Expressions

Segmentation from Natural Language Expressions This repository contains the Caffe reimplementation of the following paper: R. Hu, M. Rohrbach, T. Darr

10 Jul 27, 2021
Official git repo for the CHIRP project

CHIRP Project This is the official git repository for the CHIRP project. Pull requests are accepted here, but for the moment, the main repository is s

Dan Smith 77 Jan 08, 2023
MultiTaskLearning - Multi Task Learning for 3D segmentation

Multi Task Learning for 3D segmentation Perception stack of an Autonomous Drivin

2 Sep 22, 2022