Healthsea is a spaCy pipeline for analyzing user reviews of supplementary products for their effects on health.

Overview

Welcome to Healthsea

Create better access to health with spaCy.

Healthsea is a pipeline for analyzing user reviews to supplement products by extracting their effects on health.

Learn more about Healthsea in our blog post!

💉 Creating better access to health

Healthsea aims to analyze user-written reviews of supplements in relation to their effects on health. Based on this analysis, we try to provide product recommendations. For many people, supplements are an addition to maintaining health and achieving personal goals. Due to their rising popularity, consumers have increasing access to a variety of products.

However, it's likely that most of the products on the market are redundant or produced in a "quantity over quality" fashion to maximize profit. The resulting white noise of products makes it hard to find the right supplements.

Healthsea automizes the analysis and provides information in a more digestible way.


🟢 Requirements

To run this project you need:

spacy>=3.2.0
benepar>=0.2.0
torch>=1.6.0
spacy-transformers>=1.1.2

You can install them in the project folder via spacy project run install

📖 Documentation

Documentation
🧭 Usage How to use the pipeline
⚙️ Pipeline Learn more about the architecture of the pipeline
🪐 spaCy project Introduction to the spaCy project
Demos Introduction to the Healthsea demos

🧭 Usage

The pipeline processes reviews to supplements and returns health effects for every found health aspect.

You can either train the pipeline yourself with the provided datasets in the spaCy project or directly download the trained Healthsea pipeline from Huggingface via pip install https://huggingface.co/explosion/en_healthsea/resolve/main/en_healthsea-any-py3-none-any.whl

import spacy

nlp = spacy.load("en_healthsea")
doc = nlp("This is great for joint pain.")

# Clause Segmentation & Blinding
print(doc._.clauses)

>    {"split_indices": [0, 7],
>    "has_ent": true,
>    "ent_indices": [4, 6],
>    "blinder": "_CONDITION_",
>    "ent_name": "joint pain",
>    "cats": {
>        "POSITIVE": 0.9824668169021606,
>        "NEUTRAL": 0.017364952713251114,
>        "NEGATIVE": 0.00002889777533710003,
>        "ANAMNESIS": 0.0001394189748680219
>    },
>    "prediction_text": ["This", "is", "great", "for", "_CONDITION_", "!"]}

# Aggregated results
print(doc._.health_effects)

>    {"joint_pain": {
>        "effects": ["POSITIVE"],
>        "effect": "POSITIVE",
>        "label": "CONDITION",
>        "text": "joint pain"
>    }}


⚙️ Pipeline

The pipeline consists of the following components:

pipeline = [sentencizer, tok2vec, ner, benepar, segmentation, clausecat, aggregation]

It uses Named Entity Recognition to detect two types of entities Condition and Benefit.

Condition entities are defined as health aspects that are improved by decreasing them. They include diseases, symptoms and general health problems (e.g. pain in back). Benefit entities on the other hand, are desired states of health (muscle recovery, glowing skin) that improve by increasing them.

The pipeline uses a modified model that performs Clause Segmentation based on the benepar parser, Entity Blinding and Text Classification. It predicts four exclusive effects: Positive, Negative, Neutral, and Anamnesis.


🪐 spaCy project

The project folder contains a spaCy project with all the training data and workflows.

Use spacy project run inside the project folder to get an overview of all commands and assets. For more detailed documentation, visit the project folders readme.

Use spacy project run install to install dependencies needed for the pipeline.

Demo

Healthsea Demo

A demo for exploring the results of Healthsea on real data can be found at Hugging Face Spaces.

Healthsea Pipeline

A demo for exploring the Healthsea pipeline with its individual processing steps can be found at Hugging Face Spaces.

Owner
Explosion
A software company specializing in developer tools for Artificial Intelligence and Natural Language Processing
Explosion
Materials (slides, code, assignments) for the NYU class I teach on NLP and ML Systems (Master of Engineering).

FREE_7773 Repo containing material for the NYU class (Master of Engineering) I teach on NLP, ML Sys etc. For context on what the class is trying to ac

Jacopo Tagliabue 90 Dec 19, 2022
Google and Stanford University released a new pre-trained model called ELECTRA

Google and Stanford University released a new pre-trained model called ELECTRA, which has a much compact model size and relatively competitive performance compared to BERT and its variants. For furth

Yiming Cui 1.2k Dec 30, 2022
InferSent sentence embeddings

InferSent InferSent is a sentence embeddings method that provides semantic representations for English sentences. It is trained on natural language in

Facebook Research 2.2k Dec 27, 2022
Conditional probing: measuring usable information beyond a baseline

Conditional probing: measuring usable information beyond a baseline

John Hewitt 20 Dec 15, 2022
spaCy-wrap: For Wrapping fine-tuned transformers in spaCy pipelines

spaCy-wrap: For Wrapping fine-tuned transformers in spaCy pipelines spaCy-wrap is minimal library intended for wrapping fine-tuned transformers from t

Kenneth Enevoldsen 32 Dec 29, 2022
Use PaddlePaddle to reproduce the paper:mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer

MT5_paddle Use PaddlePaddle to reproduce the paper:mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer English | 简体中文 mT5: A Massively

2 Oct 17, 2021
PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers

PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers

Microsoft 105 Jan 08, 2022
Open-source offline translation library written in Python. Uses OpenNMT for translations

Open source neural machine translation in Python. Designed to be used either as a Python library or desktop application. Uses OpenNMT for translations and PyQt for GUI.

Argos Open Tech 1.6k Jan 01, 2023
MMDA - multimodal document analysis

MMDA - multimodal document analysis

AI2 75 Jan 04, 2023
🗣️ NALP is a library that covers Natural Adversarial Language Processing.

NALP: Natural Adversarial Language Processing Welcome to NALP. Have you ever wanted to create natural text from raw sources? If yes, NALP is for you!

Gustavo Rosa 21 Aug 12, 2022
Text vectorization tool to outperform TFIDF for classification tasks

WHAT: Supervised text vectorization tool Textvec is a text vectorization tool, with the aim to implement all the "classic" text vectorization NLP meth

186 Dec 29, 2022
A relatively simple python program to generate one of those reddit text to speech videos dominating youtube.

Reddit text to speech generator A basic reddit tts video generator Current functionality Generate videos for subs based on comments,(askreddit) so rea

Aadvik 17 Dec 19, 2022
Free and Open Source Machine Translation API. 100% self-hosted, offline capable and easy to setup.

LibreTranslate Try it online! | API Docs | Community Forum Free and Open Source Machine Translation API, entirely self-hosted. Unlike other APIs, it d

3.4k Dec 27, 2022
Repository for Project Insight: NLP as a Service

Project Insight NLP as a Service Contents Introduction Features Installation Setup and Documentation Project Details Demonstration Directory Details H

Abhishek Kumar Mishra 286 Dec 06, 2022
File-based TF-IDF: Calculates keywords in a document, using a word corpus.

File-based TF-IDF Calculates keywords in a document, using a word corpus. Why? Because I found myself with hundreds of plain text files, with no way t

Jakob Lindskog 1 Feb 11, 2022
Azure Text-to-speech service for Home Assistant

Azure Text-to-speech service for Home Assistant The Azure text-to-speech platform uses online Azure Text-to-Speech cognitive service to read a text wi

Yassine Selmi 2 Aug 06, 2022
Deeply Supervised, Layer-wise Prediction-aware (DSLP) Transformer for Non-autoregressive Neural Machine Translation

Non-Autoregressive Translation with Layer-Wise Prediction and Deep Supervision Training Efficiency We show the training efficiency of our DSLP model b

Chenyang Huang 37 Jan 04, 2023
A minimal code for fairseq vq-wav2vec model inference.

vq-wav2vec inference A minimal code for fairseq vq-wav2vec model inference. Runs without installing the fairseq toolkit and its dependencies. Usage ex

Vladimir Larin 7 Nov 15, 2022
CCF BDCI BERT系统调优赛题baseline(Pytorch版本)

CCF BDCI BERT系统调优赛题baseline(Pytorch版本) 此版本基于Pytorch后端的huggingface进行实现。由于此实现使用了Oneflow的dataloader作为数据读入的方式,因此也需要安装Oneflow。其它框架的数据读取可以参考OneflowDataloade

Ziqi Zhou 9 Oct 13, 2022
Repo for Enhanced Seq2Seq Autoencoder via Contrastive Learning for Abstractive Text Summarization

ESACL: Enhanced Seq2Seq Autoencoder via Contrastive Learning for AbstractiveText Summarization This repo is for our paper "Enhanced Seq2Seq Autoencode

Rachel Zheng 14 Nov 01, 2022