๐Ÿงฎ Matrix Factorization for Collaborative Filtering is just Solving an Adjoint Latent Dirichlet Allocation Model after All

Overview

LDA4Rec

Project generated with PyScaffold

Accompanying source code to the paper "Matrix Factorization for Collaborative Filtering is just Solving an Adjoint Latent Dirichlet Allocation Model After All" by Florian Wilhelm. The preprint can be found here along with the following statement:

"ยฉ Florian Wilhelm 2021. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive version was published in RecSys '21: Fifteenth ACM Conference on Recommender Systems Proceedings, https://doi.org/10.1145/3460231.3474266."

Installation

In order to set up the necessary environment:

  1. review and uncomment what you need in environment.yml and create an environment lda4rec with the help of conda:
    conda env create -f environment.yml
    
  2. activate the new environment with:
    conda activate lda4rec
    
  3. (optionally) get a free neptune.ai account for experiment tracking and save the api token under ~/.neptune_api_token (default).

Running Experiments

First check out and adapt the default experiment config configs/default.yaml and run it with:

lda4rec -c configs/default.yaml run

A config like configs/default.yaml can also be used as a template to create an experiment set with:

lda4rec -c configs/default.yaml create -ds movielens-100k

using the Movielens-100k dataset. Check out cli.py for more details.

Cloud Setup

Commands for setting up an Ubuntu 20.10 VM with at least 20 GiB of HD on e.g. a GCP c2-standard-30 instance:

tmux
sudo apt-get install -y build-essential
curl https://sh.rustup.rs -sSf | sh
source $HOME/.cargo/env
cargo install pueue
curl https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O
sh Miniconda3-latest-Linux-x86_64.sh
source ~/.bashrc
git clone https://github.com/FlorianWilhelm/lda4rec.git
cd lda4rec
conda env create -f environment.yml
conda activate lda4rec
vim ~/.neptune_api_token # and copy it over

Then create and run all experiments for full control over parallelism with pueue:

pueued -d # only once to start the daemon
pueue parallel 10
export OMP_NUM_THREADS=4  # to limit then number of threads per model
lda4rec -c configs/default.yaml create # to create the config files
find ./configs -maxdepth 1 -name "exp_*.yaml" -exec pueue add "lda4rec -c {} run" \; -exec sleep 30 \;

Remark: -exec sleep 30 avoids race condition when reading datasets if parallelism is too high.

Dependency Management & Reproducibility

  1. Always keep your abstract (unpinned) dependencies updated in environment.yml and eventually in setup.cfg if you want to ship and install your package via pip later on.
  2. Create concrete dependencies as environment.lock.yml for the exact reproduction of your environment with:
    conda env export -n lda4rec -f environment.lock.yml
    For multi-OS development, consider using --no-builds during the export.
  3. Update your current environment with respect to a new environment.lock.yml using:
    conda env update -f environment.lock.yml --prune

Project Organization

โ”œโ”€โ”€ AUTHORS.md              <- List of developers and maintainers.
โ”œโ”€โ”€ CHANGELOG.md            <- Changelog to keep track of new features and fixes.
โ”œโ”€โ”€ LICENSE.txt             <- License as chosen on the command-line.
โ”œโ”€โ”€ README.md               <- The top-level README for developers.
โ”œโ”€โ”€ configs                 <- Directory for configurations of model & application.
โ”œโ”€โ”€ data                    <- Downloaded datasets will be stored here.
โ”œโ”€โ”€ docs                    <- Directory for Sphinx documentation in rst or md.
โ”œโ”€โ”€ environment.yml         <- The conda environment file for reproducibility.
โ”œโ”€โ”€ notebooks               <- Jupyter notebooks. Naming convention is a number (for
โ”‚                              ordering), the creator's initials and a description,
โ”‚                              e.g. `1.0-fw-initial-data-exploration`.
โ”œโ”€โ”€ logs                    <- Generated logs are collected here.
โ”œโ”€โ”€ results                 <- Results as exported from neptune.ai.
โ”œโ”€โ”€ setup.cfg               <- Declarative configuration of your project.
โ”œโ”€โ”€ setup.py                <- Use `python setup.py develop` to install for development or
โ”‚                              or create a distribution with `python setup.py bdist_wheel`.
โ”œโ”€โ”€ src
โ”‚   โ””โ”€โ”€ lda4rec             <- Actual Python package where the main functionality goes.
โ”œโ”€โ”€ tests                   <- Unit tests which can be run with `py.test`.
โ”œโ”€โ”€ .coveragerc             <- Configuration for coverage reports of unit tests.
โ”œโ”€โ”€ .isort.cfg              <- Configuration for git hook that sorts imports.
โ””โ”€โ”€ .pre-commit-config.yaml <- Configuration of pre-commit git hooks.

How to Cite

Please cite LDA4Rec if it helps your research. You can use the following BibTeX entry:

@inproceedings{wilhelm2021lda4rec,
author = {Wilhelm, Florian},
title = {Matrix Factorization for Collaborative Filtering Is Just Solving an Adjoint Latent Dirichlet Allocation Model After All},
year = {2021},
month = sep,
isbn = {978-1-4503-8458-2/21/09},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3460231.3474266},
doi = {10.1145/3460231.3474266},
booktitle = {Fifteenth ACM Conference on Recommender Systems},
numpages = {8},
location = {Amsterdam, Netherlands},
series = {RecSys '21}
}

License

This sourcecode is AGPL-3-only licensed. If you require a more permissive licence, e.g. for commercial reasons, contact me to obtain a licence for your business.

Acknowledgement

Special thanks goes to Du Phan and Fritz Obermeyer from the (Num)Pyro project for their kind help and helpful comments on my code.

Note

This project has been set up using PyScaffold 4.0 and the dsproject extension 0.6. Some source code was taken from Spotlight (MIT-licensed) by Maciej Kula as well as lrann (MIT-licensed) by Florian Wilhelm and Marcel Kurovski.

Owner
Florian Wilhelm
Data Scientist with a mathematical background.
Florian Wilhelm
Final project for machine learning (CSC 590). Detection of hepatitis C and progression through blood samples.

Hepatitis C Blood Based Detection Final project for machine learning (CSC 590). Dataset from Kaggle. Using data from previous hepatitis C blood panels

Jennefer Maldonado 1 Dec 28, 2021
Implementation of the Paper: "Parameterized Hypercomplex Graph Neural Networks for Graph Classification" by Tuan Le, Marco Bertolini, Frank Noรฉ and Djork-Arnรฉ Clevert

Parameterized Hypercomplex Graph Neural Networks (PHC-GNNs) PHC-GNNs (Le et al., 2021): https://arxiv.org/abs/2103.16584 PHM Linear Layer Illustration

Bayer AG 26 Aug 11, 2022
This repo is duplication of jwyang/faster-rcnn.pytorch

Faster RCNN Pytorch This repo is duplication of jwyang/faster-rcnn.pytorch C/C++ code are removed and easier to study. Python 3.8.5 Ubuntu 20.04.1 LTS

Kim Jihwan 1 Jan 14, 2022
The personal repository of the work: *DanceNet3D: Music Based Dance Generation with Parametric Motion Transformer*.

DanceNet3D The personal repository of the work: DanceNet3D: Music Based Dance Generation with Parametric Motion Transformer. Dataset and Results Pleas

ๅ—ๅ˜‰Nanga 36 Dec 21, 2022
Data and code from COVID-19 machine learning paper

Machine learning approaches for localized lockdown, subnotification analysis and cases forecasting in Sรฃo Paulo state counties during COVID-19 pandemi

Sara Malvar 4 Dec 22, 2022
CV backbones including GhostNet, TinyNet and TNT, developed by Huawei Noah's Ark Lab.

CV Backbones including GhostNet, TinyNet, TNT (Transformer in Transformer) developed by Huawei Noah's Ark Lab. GhostNet Code TinyNet Code TNT Code Pyr

HUAWEI Noah's Ark Lab 3k Jan 08, 2023
Code for paper: Towards Tokenized Human Dynamics Representation

Video Tokneization Codebase for video tokenization, based on our paper Towards Tokenized Human Dynamics Representation. Prerequisites (tested under Py

Kenneth Li 20 May 31, 2022
Based on the paper "Geometry-aware Instance-reweighted Adversarial Training" ICLR 2021 oral

Geometry-aware Instance-reweighted Adversarial Training This repository provides codes for Geometry-aware Instance-reweighted Adversarial Training (ht

Jingfeng 47 Dec 22, 2022
A hue shift helper for OBS

obs-hue-shift A hue shift helper for OBS This is a repo based on the really nice script Hegemege made. The original script can be found https://gist.g

Alexis Tyler 1 Jan 10, 2022
Speech Enhancement Generative Adversarial Network Based on Asymmetric AutoEncoder

ASEGAN: Speech Enhancement Generative Adversarial Network Based on Asymmetric AutoEncoder ไธญๆ–‡็‰ˆ็ฎ€ไป‹ Readme with English Version ไป‹็ป ๅŸบไบŽSEGANๆจกๅž‹็š„ๆ”น่ฟ›็‰ˆๆœฌ๏ผŒไฝฟ็”จ่‡ชไธป่ฎพ่ฎก็š„้ž

Nitin 53 Nov 17, 2022
Implementation for Homogeneous Unbalanced Regularized Optimal Transport

HUROT: An Homogeneous formulation of Unbalanced Regularized Optimal Transport. This repository provides code related to this preprint. This is an alph

Thรฉo Lacombe 1 Feb 17, 2022
Face recognize and crop them

Face Recognize Cropping Module Source ์•„์ด๋””์–ด Face Alignment with OpenCV and Python Requirement ํ•„์š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ imutil dlib python-opence (cv2) Usage ์‚ฌ์šฉ ๋ฐฉ๋ฒ• open

Cho Moon Gi 1 Feb 15, 2022
Learning hierarchical attention for weakly-supervised chest X-ray abnormality localization and diagnosis

Hierarchical Attention Mining (HAM) for weakly-supervised abnormality localization This is the official PyTorch implementation for the HAM method. Pap

Xi Ouyang 22 Jan 02, 2023
Code for the paper titled "Prabhupadavani: A Code-mixed Speech Translation Data for 25 languages"

Prabhupadavani: A Code-mixed Speech Translation Data for 25 languages Code for the paper titled "Prabhupadavani: A Code-mixed Speech Translation Data

Ayush Daksh 12 Dec 01, 2022
Code and Experiments for ACL-IJCNLP 2021 Paper Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering.

Code and Experiments for ACL-IJCNLP 2021 Paper Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering.

Sidd Karamcheti 50 Nov 16, 2022
RCT-ART is an NLP pipeline built with spaCy for converting clinical trial result sentences into tables through jointly extracting intervention, outcome and outcome measure entities and their relations.

Randomised controlled trial abstract result tabulator RCT-ART is an NLP pipeline built with spaCy for converting clinical trial result sentences into

2 Sep 16, 2022
Official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model.

This is the official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model.

peng gao 42 Nov 26, 2022
Simple and ready-to-use tutorials for TensorFlow

TensorFlow World To support maintaining and upgrading this project, please kindly consider Sponsoring the project developer. Any level of support is a

Amirsina Torfi 4.5k Dec 23, 2022
Negative Interactions for Improved Collaborative Filtering:

Negative Interactions for Improved Collaborative Filtering: Donโ€™t go Deeper, go Higher This notebook provides an implementation in Python 3 of the alg

Harald Steck 21 Mar 05, 2022
DynaTune: Dynamic Tensor Program Optimization in Deep Neural Network Compilation

DynaTune: Dynamic Tensor Program Optimization in Deep Neural Network Compilation This repository is the implementation of DynaTune paper. This folder

4 Nov 02, 2022