Implementation of TF-IDF algorithm to find documents similarity with cosine similarity

Last update: Aug 25, 2022

Overview

NLP learning

Trying to learn NLP to use in my projects!

Table of Contents

About The Project
- Built With
Getting Started
- Requirements
- Run
Usage
License
Contact

About The Project

There many ways and algorithms to understand language by machines. but first of all we should convert our words to vetcotrs ecause we nedd do to some calulcation on them

Here's some NLP keywords that i have learned till now:

Using classic AI algorithms like NAIVE Bayes
using TF-IDF to convert words to vectors
using word2vec to convert words to vectors

Of course, the list above in not complete but we will epand it in future.

(back to top)

Built With

This section should list any major frameworks/libraries and tools used implement this project.

(back to top)

Getting Started

This is an example of how you may give instructions on setting up your project locally. To get a local copy up and running follow these simple example steps.

Requirements

We used Numpy for it array and math functions

numpy
```
pip install numpy
```

Run

$ python3 main.py

(back to top)

Usage

With the TF-IDF algorithm implemented you can find similaroty between different documnets so you can use it in chat bots and search engines.

For more examples, please refer to the Documentation

(back to top)

License

Distributed under the MIT License. See LICENSE.md for more information.

(back to top)

Contact

Faraz Farangizadeh - [email protected]

Project Link: https://github.com/farazff/NLP-Learning

(back to top)

Implementation of TF-IDF algorithm to find documents similarity with cosine similarity

Related tags

Overview

NLP learning

About The Project

Built With

Getting Started

Requirements

Run

Usage

License

Contact

Owner

Faraz Farangizadeh

A practical and feature-rich paraphrasing framework to augment human intents in text form to build robust NLU models for conversational engines. Created by Prithiviraj Damodaran. Open to pull requests and other forms of collaboration.

Source code of the "Graph-Bert: Only Attention is Needed for Learning Graph Representations" paper

Finally, some decent sample sentences

Dope Wars game engine on StarkNet L2 roll-up

LUKE -- Language Understanding with Knowledge-based Embeddings

This repository implements a brute-force spellchecker utilizing the Damerau-Levenshtein edit distance.

Reformer, the efficient Transformer, in Pytorch

(ACL 2022) The source code for the paper "Towards Abstractive Grounded Summarization of Podcast Transcripts"

multi-label，classifier，text classification，多标签文本分类，文本分类，BERT，ALBERT，multi-label-classification，seq2seq，attention，beam search

Use the state-of-the-art m2m100 to translate large data on CPU/GPU/TPU. Super Easy!

The code for the Subformer, from the EMNLP 2021 Findings paper: "Subformer: Exploring Weight Sharing for Parameter Efficiency in Generative Transformers", by Machel Reid, Edison Marrese-Taylor, and Yutaka Matsuo

Ecco is a python library for exploring and explaining Natural Language Processing models using interactive visualizations.

🦅 Pretrained BigBird Model for Korean (up to 4096 tokens)

A design of MIDI language for music generation task, specifically for Natural Language Processing (NLP) models.

Official implementation of Meta-StyleSpeech and StyleSpeech

topic modeling on unstructured data in Space news articles retrieved from the Guardian (UK) newspaper using API

A toolkit for document-level event extraction, containing some SOTA model implementations

The FinQA dataset from paper: FinQA: A Dataset of Numerical Reasoning over Financial Data

Contains links to publicly available datasets for modeling health outcomes using speech and language.