Netflix-recommendation-system

NLP, Machine learning

About

Recommendation algorithms are at the core of the Netflix product. It provides their members with personalized suggestions to reduce the amount of time and frustration to find something great content to watch. Because of the importance of our recommendations, they continually seek to improve them by advancing the state-of-the-art in the field. They do this by using the data about what content our members watch and enjoy along with how they interact with our service to get better at figuring out what the next great movie or TV show for them will be.

Types

The categories under "Trending Now" and "New Releases" are Non-Personalized Recommendation System
The categories under "Because you watched" are Personalized Recommendation System

NLP

Natural language processing is a subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to process and analyze large amounts of natural language data.

#1 Tokenization

Tokenization is the process of breaking down sentence or paragraphs into smaller chunks of words called tokens.

#2 Stop Words Removal

On removal of some words, the meaning of the sentence doesn't change, like and, am. Those words are called stop-words and should be removed before feeding to any algorithm. In datasets, some non-stop words repeat very frequently. Those words too should be removed to get an unbiased result from the algorithm.

#3 Vectorization

After tokenization, and stop words removal, our "content" are still in string format. We need to convert those strings to numbers based on their importance (features). We use TF-IDF vectorization to convert those text to vector of importance. With TF-IDF we can extract important words in our data. It assign rarely occurring words a high number, and frequently occurring words a very low number.

NLP, Machine learning

Related tags

Overview

Netflix-recommendation-system

About

Types

NLP

#1 Tokenization

#2 Stop Words Removal

#3 Vectorization

Owner

Harshith VH

ChainKnowledgeGraph, 产业链知识图谱包括A股上市公司、行业和产品共3类实体

Chatbot for the Chatango messaging platform

【原神】自动演奏风物之诗琴的程序

Tokenizer - Module python d'analyse syntaxique et de grammaire, tokenization

A Python 3.6+ package to run .many files, where many programs written in many languages may exist in one file.

Deduplication is the task to combine different representations of the same real world entity.

Code for the paper "Language Models are Unsupervised Multitask Learners"

gaiic2021-track3-小布助手对话短文本语义匹配复赛rank3、决赛rank4

Weird Sort-and-Compress Thing

ConvBERT: Improving BERT with Span-based Dynamic Convolution

Extracting Summary Knowledge Graphs from Long Documents

A list of NLP(Natural Language Processing) tutorials

Python port of Google's libphonenumber

Addon for adding subtitle files to blender VSE as Text sequences. Using pysub2 python module.

Simple program that translates the name of files into English

NLP Overview

NLP-based analysis of poor Chinese movie reviews on Douban

A simple Flask site that allows users to create, update, and delete posts in a database, as well as perform basic NLP tasks on the posts.

In this repository we have tested 3 VQA models on the ImageCLEF-2019 dataset.

Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.