Netflix-recommendation-system

NLP, Machine learning

About

Recommendation algorithms are at the core of the Netflix product. It provides their members with personalized suggestions to reduce the amount of time and frustration to find something great content to watch. Because of the importance of our recommendations, they continually seek to improve them by advancing the state-of-the-art in the field. They do this by using the data about what content our members watch and enjoy along with how they interact with our service to get better at figuring out what the next great movie or TV show for them will be.

Types

The categories under "Trending Now" and "New Releases" are Non-Personalized Recommendation System
The categories under "Because you watched" are Personalized Recommendation System

NLP

Natural language processing is a subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to process and analyze large amounts of natural language data.

#1 Tokenization

Tokenization is the process of breaking down sentence or paragraphs into smaller chunks of words called tokens.

#2 Stop Words Removal

On removal of some words, the meaning of the sentence doesn't change, like and, am. Those words are called stop-words and should be removed before feeding to any algorithm. In datasets, some non-stop words repeat very frequently. Those words too should be removed to get an unbiased result from the algorithm.

#3 Vectorization

After tokenization, and stop words removal, our "content" are still in string format. We need to convert those strings to numbers based on their importance (features). We use TF-IDF vectorization to convert those text to vector of importance. With TF-IDF we can extract important words in our data. It assign rarely occurring words a high number, and frequently occurring words a very low number.

NLP, Machine learning

Related tags

Overview

Netflix-recommendation-system

About

Types

NLP

#1 Tokenization

#2 Stop Words Removal

#3 Vectorization

Owner

Harshith VH

text to speech toolkit. 好用的中文语音合成工具箱，包含语音编码器、语音合成器、声码器和可视化模块。

CCF BDCI BERT系统调优赛题baseline（Pytorch版本）

Malaya-Speech is a Speech-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow.

ByT5: Towards a token-free future with pre-trained byte-to-byte models

Huggingface Transformers + Adapters = ❤️

DeepSpeech - Easy-to-use Speech Toolkit including SOTA ASR pipeline, influential TTS with text frontend and End-to-End Speech Simultaneous Translation.

A natural language processing model for sequential sentence classification in medical abstracts.

Coreference resolution for English, German and Polish, optimised for limited training data and easily extensible for further languages

Train BPE with fastBPE, and load to Huggingface Tokenizer.

Source code for the paper "TearingNet: Point Cloud Autoencoder to Learn Topology-Friendly Representations"

Pytorch NLP library based on FastAI

A method for cleaning and classifying text using transformers.

A Flask Sentiment Analysis API, with visual implementation

📔️ Generate a text-based journal from a template file.

Code for ACL 2021 main conference paper "Conversations are not Flat: Modeling the Intrinsic Information Flow between Dialogue Utterances".

TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Code for the paper "Are Sixteen Heads Really Better than One?"

Words_And_Phrases - Just a repo for useful words and phrases that might come handy in some scenarios. Feel free to add yours

Create a machine learning model which will predict if the mortgage will be approved or not based on 5 variables