[ WSDM '22 ] On Sampling Collaborative Filtering Datasets

Last update: Dec 08, 2022

Related tags

Overview

On Sampling Collaborative Filtering Datasets

This repository contains the implementation of many popular sampling strategies, along with various explicit/implicit/sequential feedback recommendation algorithms. The code accompanies the paper "On Sampling Collaborative Filtering Datasets" [ACM] [Public PDF] where we compare the utility of different sampling strategies for preserving the performance of various recommendation algorithms.

We also provide code for Data-Genie which can automatically predict the performance of how good any sampling strategy will be for a given collaborative filtering dataset. We refer the reader to the full paper for more details. Kindly send me an email if you're interested in obtaining access to the pre-trained weights of Data-Genie.

If you find any module of this repository helpful for your own research, please consider citing the below WSDM'22 paper. Thanks!

@inproceedings{sampling_cf,
  author = {Noveen Sachdeva and Carole-Jean Wu and Julian McAuley},
  title = {On Sampling Collaborative Filtering Datasets},
  url = {https://doi.org/10.1145/3488560.3498439},
  booktitle = {Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining},
  series = {WSDM '22},
  year = {2022}
}

Code Author: Noveen Sachdeva ([email protected])

Setup

Environment Setup

$ pip install -r requirements.txt

Data Setup

Once you've correctly setup the python environments and downloaded the dataset of your choice (Amazon: http://jmcauley.ucsd.edu/data/amazon/), the following steps need to be run:

The following command will create the required data/experiment directories as well as download & preprocess the Amazon magazine and the MovieLens-100K datasets. Feel free to download more datasets from the following web-page http://jmcauley.ucsd.edu/data/amazon/ and adjust the setup.sh and preprocess.py files accordingly.

$ ./setup.sh

How to train a model on a sampled/complete CF-dataset?

Edit the hyper_params.py file which lists all config parameters, including what type of model to run. Currently supported models:

Sampling Strategy	What is sampled?	Paper Link
Random	Interactions
Stratified	Interactions
Temporal	Interactions
SVP-CF w/ MF	Interactions	LINK & LINK
SVP-CF w/ Bias-only	Interactions	LINK & LINK
SVP-CF-Prop w/ MF	Interactions	LINK & LINK
SVP-CF-Prop w/ Bias-only	Interactions	LINK & LINK
Random	Users
Head	Users
SVP-CF w/ MF	Users	LINK & LINK
SVP-CF w/ Bias-only	Users	LINK & LINK
SVP-CF-Prop w/ MF	Users	LINK & LINK
SVP-CF-Prop w/ Bias-only	Users	LINK & LINK
Centrality	Graph	LINK
Random-Walk	Graph	LINK
Forest-Fire	Graph	LINK

Finally, type the following command to run:

$ CUDA_VISIBLE_DEVICES=<SOME_GPU_ID> python main.py

Alternatively, to train various possible recommendation algorithm on various CF datasets/subsets, please edit the configuration in grid_search.py and then run:

$ python grid_search.py

How to train Data-Genie?

Edit the data_genie/data_genie_config.py file which lists all config parameters, including what datasets/CF-scenarios/samplers etc. to train Data-Genie on
Finally, use the following command to train Data-Genie:

$ python data_genie.py

License

MIT

[ WSDM '22 ] On Sampling Collaborative Filtering Datasets

Related tags

Overview

On Sampling Collaborative Filtering Datasets

Setup

Environment Setup

Data Setup

How to train a model on a sampled/complete CF-dataset?

How to train Data-Genie?

License

Owner

Noveen Sachdeva

Learning to Adapt Structured Output Space for Semantic Segmentation, CVPR 2018 (spotlight)

PyTorch original implementation of Cross-lingual Language Model Pretraining.

Code for CPM-2 Pre-Train

f-BRS: Rethinking Backpropagating Refinement for Interactive Segmentation

Learning and Building Convolutional Neural Networks using PyTorch

Traductor de lengua de señas al español basado en Python con Opencv y MedaiPipe

PyTorch implementation of MSBG hearing loss model and MBSTOI intelligibility metric

Sync2Gen Code for ICCV 2021 paper: Scene Synthesis via Uncertainty-Driven Attribute Synchronization

Implementation of Perceiver, General Perception with Iterative Attention, in Pytorch

基于YoloX目标检测+DeepSort算法实现多目标追踪Baseline

DNA-RECON { Automatic Web Reconnaissance Tool }

Localized representation learning from Vision and Text (LoVT)

Internship Assessment Task for BaggageAI.

This repo generates the training data and the model for Morpheus-Deblend

COVID-Net Open Source Initiative

System Design course at HSE (2021)

CCP dataset from Clothing Co-Parsing by Joint Image Segmentation and Labeling

Binary classification for arrythmia detection with ECG datasets.

FIRA: Fine-Grained Graph-Based Code Change Representation for Automated Commit Message Generation

Auto Seg-Loss: Searching Metric Surrogates for Semantic Segmentation