The following links explain a bit the idea of semantic search and how search mechanisms work by doing retrieve and rerank

Last update: Jan 28, 2022

Related tags

Text Data & NLP information_retrieval

Overview

Main Idea

The following links explain a bit the idea of semantic search and how search mechanisms work by doing retrieve and rerank

Setup

Download trained models

There are two models trained for spanish, a bi-encoder and a cross-encoder. These serve to make the retrieval system using the retrieve and rerank idea:

make setup
pip install -r requirements.txt

Basic usage

Setup Elasticsearch index with semantic vectors. For this step we supose that a set of json files is folder. Each json can contain several optional fields but need to contain id and text fiedlds.

from information_retrieval import SemanticEmbedder, CrossEncoder, Prepare, Search

data_folder = 'data/'
text_field = "texto_parrafo"
id_field = "id_parrafo"
elastic_index_name = "sentencias_2.0"

# Read the files, compute embeddings and upload them to elasticsearch
P = Prepare(data_folder, text_field, id_field, elastic_index_name)
P.prepare()

Make queries to retrieve documents:

from information_retrieval import SearchEngine

query = "la vida es bella"
S = SearchEngine(elastic_index_name)
S.retrieve(query) # Only semantic search

S.rerank(query) # Retrieve and rerank

The following links explain a bit the idea of semantic search and how search mechanisms work by doing retrieve and rerank

Related tags

Overview

Main Idea

Setup

Download trained models

Basic usage

Model architecture

Training

Finetuning

Owner

Sergio Arnaud Gomez

Beyond Accuracy: Behavioral Testing of NLP models with CheckList

Dual languaged (rus+eng) tool for packing and unpacking archives of Silky Engine.

Japanese synonym library

Phomber is infomation grathering tool that reverse search phone numbers and get their details, written in python3.

Rootski - Full codebase for rootski.io (without the data)

A PyTorch implementation of the Transformer model in "Attention is All You Need".

天池中药说明书实体识别挑战冠军方案；中文命名实体识别；NER; BERT-CRF & BERT-SPAN & BERT-MRC；Pytorch

MMDA - multimodal document analysis

Simple telegram bot to convert files into direct download link.you can use telegram as a file server 🪁

Exploration of BERT-based models on twitter sentiment classifications

The SVO-Probes Dataset for Verb Understanding

🤗 The largest hub of ready-to-use NLP datasets for ML models with fast, easy-to-use and efficient data manipulation tools

LUKE -- Language Understanding with Knowledge-based Embeddings

Natural Language Processing Specialization

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

Use fastai-v2 with HuggingFace's pretrained transformers

A simple recipe for training and inferencing Transformer architecture for Multi-Task Learning on custom datasets. You can find two approaches for achieving this in this repo.

NLP, Machine learning

A simple tool to update bib entries with their official information (e.g., DBLP or the ACL anthology).

Natural Language Processing at EDHEC, 2022