Cloud-based recommendation system

Overview

Cloud-based recommendation system

This project is based on cloud services to create data lake, ETL process, train and deploy learning model to implement a recommendation system.

Purpose

One Web app can return if the consumer will buy the product or not when providing user ID and corresponding product SKU.

Services

This project will use services:

AWS: lambda function, Step functions, Glue (job,notebook,crawler), Athena, SNS, S3, Sagemaker, IAM, Dynamodb, API Gateway.

Confluent cloud (kafka) for streaming data.

Project description

  1. Create a bucket on S3 as the storage location of the data lake, store the raw data in the bucket (raw data zone), and then return the data after ETL to the same bucket (curated zone).

  2. Preview the data, determine the data is useful and meaningful for our project. Use AWS Glue crawler to grab corresponding data catalog (in created database and generated table info). Use Athena to do SQL query. This like Apache Hive, it does not change raw data, but do operations above the raw data.

  3. Create and store stream data. Create a kafka topic on Clonfluent cloud and set schema registry for the corresponding stream data, schema sets as confluent_cloud_kafka-->confluent_kafka_topic_schema.json. Set the kafka producer as confluent_cloud_kafka-->confluent_kafka_producer_lambda.py to push stream data to corresponding kafka topic in different partitions (because this project does not have exact source giving real stream data, we produce stream data manually). Set the consumer (confluent connector with AWS lambda) as confluent_cloud_kafka-->confluent_kafka_consumer_lambda.py to poll the stream data in kafka topic and store them in Dynamodb table.

  4. ETL process. Use lambda function to do data transformation operations based on SQL, corresponding scripts in file lambda_functions(ETL). Create Glue job to integrate new dataset and store in curated zone in data lake, scripts is in glue_job-->glue_job_ETL.py. Use step fuctions to orchestrate ETL workflow based on above lambda functions, ASL script is in step_function(workflow)-->step_functions_for_curated.json.

    This part is based on spark, and it is similar with the project in repo: https://github.com/Yi-Ding111/spark-ETL-based-databricks-aws.

  5. Train learning model (XGBoost). Use sagemaker notebook instance to do some kinds more operations like: EDA and feature engineering, use XGBoost framework to train the data, adjust parameters and try different attributes combinations to find the best one. Scripts is in sagemaker-->xgboost_deploy_sagemaker.ipynb.

  6. Deploy learning model. Get deploy endpoint after machine learning. Create lambda function to invoke the sagemaker endpoint to use the trained model, scripts is in sagemaker-->endpoint_interact_lambda.py. Let the lambda function integrate with API gatway (proxy integration) as the backend. Deploy the API gatewat and use the invoked URL for web applications to do interactions.

  7. Store the application output. Use SNS to publish the output to lambda and update the information into Dynamodb table, scripts is in sagemaker-->prediction_store_dynamodb.py


Acknowledgement

This project is completed with the guidance from Leo Lee (JR academy)


Author: YI DING, Leo Lee

Created at: Dec 2021

Contact: [email protected]

Owner
Yi Ding
Yi Ding
Jointly Learning Explainable Rules for Recommendation with Knowledge Graph

Jointly Learning Explainable Rules for Recommendation with Knowledge Graph

57 Nov 03, 2022
Respiratory Health Recommendation System

Respiratory-Health-Recommendation-System Respiratory Health Recommendation System based on Air Quality Index Forecasts This project aims to provide pr

Abhishek Gawabde 1 Jan 29, 2022
reXmeX is recommender system evaluation metric library.

A general purpose recommender metrics library for fair evaluation.

AstraZeneca 258 Dec 22, 2022
A Python implementation of LightFM, a hybrid recommendation algorithm.

LightFM Build status Linux OSX (OpenMP disabled) Windows (OpenMP disabled) LightFM is a Python implementation of a number of popular recommendation al

Lyst 4.2k Jan 02, 2023
This library intends to be a reference for recommendation engines in Python

Crab - A Python Library for Recommendation Engines

Marcel Caraciolo 85 Oct 04, 2021
Attentive Social Recommendation: Towards User And Item Diversities

ASR This is a Tensorflow implementation of the paper: Attentive Social Recommendation: Towards User And Item Diversities Preprint, https://arxiv.org/a

Dongsheng Luo 1 Nov 14, 2021
A recommendation system for suggesting new books given similar books.

Book Recommendation System A recommendation system for suggesting new books given similar books. Datasets Dataset Kaggle Dataset Notebooks goodreads-E

Sam Partee 2 Jan 06, 2022
Collaborative variational bandwidth auto-encoder (VBAE) for recommender systems.

Collaborative Variational Bandwidth Auto-encoder The codes are associated with the following paper: Collaborative Variational Bandwidth Auto-encoder f

Yaochen Zhu 14 Dec 11, 2022
基于个性化推荐的音乐播放系统

MusicPlayer 基于个性化推荐的音乐播放系统 Hi, 这是我在大四的时候做的毕设,现如今将该项目开源。 本项目是基于Python的tkinter和pygame所著。 该项目总体来说,代码比较烂(因为当时水平很菜)。 运行的话安装几个基本库就能跑,只不过里面的数据还没有上传至Github。 先

Cedric Niu 6 Nov 19, 2022
A TensorFlow recommendation algorithm and framework in Python.

TensorRec A TensorFlow recommendation algorithm and framework in Python. NOTE: TensorRec is not under active development TensorRec will not be receivi

James Kirk 1.2k Jan 04, 2023
Accuracy-Diversity Trade-off in Recommender Systems via Graph Convolutions

Accuracy-Diversity Trade-off in Recommender Systems via Graph Convolutions This repository contains the code of the paper "Accuracy-Diversity Trade-of

2 Sep 16, 2022
Code for ICML2019 Paper "Compositional Invariance Constraints for Graph Embeddings"

Dependencies NOTE: This code has been updated, if you were using this repo earlier and experienced issues that was due to an outaded codebase. Please

Avishek (Joey) Bose 43 Nov 25, 2022
QRec: A Python Framework for quick implementation of recommender systems (TensorFlow Based)

QRec is a Python framework for recommender systems (Supported by Python 3.7.4 and Tensorflow 1.14+) in which a number of influential and newly state-of-the-art recommendation models are implemented.

Yu 1.4k Dec 27, 2022
Temporal Meta-path Guided Explainable Recommendation (WSDM2021)

Temporal Meta-path Guided Explainable Recommendation (WSDM2021) TMER Code of paper "Temporal Meta-path Guided Explainable Recommendation". Requirement

Yicong Li 13 Nov 30, 2022
fastFM: A Library for Factorization Machines

Citing fastFM The library fastFM is an academic project. The time and resources spent developing fastFM are therefore justified by the number of citat

1k Dec 24, 2022
A Python scikit for building and analyzing recommender systems

Overview Surprise is a Python scikit for building and analyzing recommender systems that deal with explicit rating data. Surprise was designed with th

Nicolas Hug 5.7k Jan 01, 2023
Continuous-Time Sequential Recommendation with Temporal Graph Collaborative Transformer

Introduction This is the repository of our accepted CIKM 2021 paper "Continuous-Time Sequential Recommendation with Temporal Graph Collaborative Trans

SeqRec 29 Dec 09, 2022
ToR[e]cSys is a PyTorch Framework to implement recommendation system algorithms

ToR[e]cSys is a PyTorch Framework to implement recommendation system algorithms, including but not limited to click-through-rate (CTR) prediction, learning-to-ranking (LTR), and Matrix/Tensor Embeddi

LI, Wai Yin 90 Oct 08, 2022
Code for MB-GMN, SIGIR 2021

MB-GMN Code for MB-GMN, SIGIR 2021 For Beibei data, run python .\labcode.py For Tmall data, run python .\labcode.py --data tmall --rank 2 For IJCAI

32 Dec 04, 2022
Implementation of a hadoop based movie recommendation system

Implementation-of-a-hadoop-based-movie-recommendation-system 通过编写代码,设计一个基于Hadoop的电影推荐系统,通过此推荐系统的编写,掌握在Hadoop平台上的文件操作,数据处理的技能。windows 10 hadoop 2.8.3 p

汝聪(Ricardo) 5 Oct 02, 2022