Semantic similarity computation with different state-of-the-art metrics

Related tags

Deep LearningTaxoSS
Overview

Semantic similarity computation with different state-of-the-art metrics

DescriptionInstallationUsageLicense


Description

TaxoSS is a semantic similarity library for Python which implements the state-of-the-art semantic similarity metrics like Resnik, JCN, and HSS.

Requirements

  • Python 3.6 or later
  • NLTK
  • NumPy
  • Pandas

Installation

TaxoSS can be installed through pip (the Python package manager) in the following way:

pip install taxoss

Usage

Semantic similarity functions

You can compute the semantic similarity in the following way:

from TaxoSS.functions import semantic_similarity
semantic_similarity('brother', 'sister', 'hss')

3.353513521371089

The function semantic_similarity(word1, word2, kind, ic) has these options for the argument kind:

  • hss -> HSS (default)
  • wup -> WUP
  • lcs -> LC
  • path_sim -> Shortest Path
  • resnik -> Resnik
  • jcn -> Jiang-Conrath
  • lin -> Lin
  • seco -> Seco

For the argument ic see the following section.

Information Content

Using a Wikipedia copus for calculating the Information Content (default of the argument ic):

from TaxoSS.functions import semantic_similarity
semantic_similarity('cat', 'dog', 'resnik')

6.169410755220327

Calculating Information Conent from a given corpus:

from TaxoSS.calculate_IC import calculate_IC
from TaxoSS.functions import semantic_similarity

calculate_IC(path_to_corpus, path_to_save_IC_file)
semantic_similarity('cat', 'dog', 'resnik', path_to_save_IC_file)

with path_to_save_IC_file a path into the virtual environment TaxoSS package, e.g. venv/lib/python3.6/site-packages/TaxoSS/data/prova_IC.csv.

Benchmark

HSS (ours) HSS (ours) WUP WUP LC LC Shortest Path Shortest Path Resnik Resnik Jiang-Conrath Jiang-Conrath Lin Lin Seco Seco
Pearson Spearman Pearson Spearman Pearson Spearman Pearson Spearman Pearson Spearman Pearson Spearman Pearson Spearman Pearson Spearman
MEN 0.41 0.33 0.36 0.33 0.14 0.05 0.07 0.03 0.05 0.03 -0.05 -0.04 0.05 0.04 -0.01 0.03
MC30 0.74 0.69 0.74 0.73 0.33 0.21 0.22 0.3 0.13 0.03 -0.06 -0.01 0.05 0.01 0.13 -0.09
WSS 0.68 0.65 0.58 0.59 0.36 0.23 0.16 0.1 0.02 -0.03 0.04 0.06 0.03 0.06 -0.01 -0.04
Simlex999 0.4 0.38 0.45 0.43 0.26 0.15 0.2 0.16 -0.04 -0.04 0.12 0.14 0.12 0.14 -0.02 -0.08
MT287 0.46 0.31 0.4 0.28 0.26 0.12 0.11 0.11 0.03 0.04 0.18 0.16 0.22 0.17 0 -0.06
MT771 0.44 0.4 0.43 0.49 0.06 0.02 0.1 0.13 0 -0.01 0 0 0 0 -0.05 -0.03
Time per pair (s) 0.0007 0.0007 0.008 0.008 0.0055 0.0055 0.0064 0.0064 0.5586 0.5586 0.551 0.551 0.5866 0.5866 0.0013 0.0013
Training data extraction on GPT-2

Training data extraction from GPT-2 This repository contains code for extracting training data from GPT-2, following the approach outlined in the foll

Florian Tramer 62 Dec 07, 2022
[NeurIPS 2020] Code for the paper "Balanced Meta-Softmax for Long-Tailed Visual Recognition"

Balanced Meta-Softmax Code for the paper Balanced Meta-Softmax for Long-Tailed Visual Recognition Jiawei Ren, Cunjun Yu, Shunan Sheng, Xiao Ma, Haiyu

Jiawei Ren 65 Dec 21, 2022
Realtime segmentation with ENet, the fast and accurate segmentation net.

Enet This is a realtime segmentation net with almost 22 fps on GTX1080 ti, and the model size is very small with only 28M. This repo contains the infe

JinTian 14 Aug 30, 2022
TipToiDog - Tip Toi Dog With Python

TipToiDog Was ist dieses Projekt? Meine 5-jährige Tochter spielt sehr gerne das

1 Feb 07, 2022
An exploration of log domain "alternative floating point" for hardware ML/AI accelerators.

This repository contains the SystemVerilog RTL, C++, HLS (Intel FPGA OpenCL to wrap RTL code) and Python needed to reproduce the numerical results in

Facebook Research 373 Dec 31, 2022
CLADE - Efficient Semantic Image Synthesis via Class-Adaptive Normalization (TPAMI 2021)

Efficient Semantic Image Synthesis via Class-Adaptive Normalization (Accepted by TPAMI)

tzt 49 Nov 17, 2022
Adjusting for Autocorrelated Errors in Neural Networks for Time Series

Adjusting for Autocorrelated Errors in Neural Networks for Time Series This repository is the official implementation of the paper "Adjusting for Auto

Fan-Keng Sun 51 Nov 05, 2022
Object detection, 3D detection, and pose estimation using center point detection:

Objects as Points Object detection, 3D detection, and pose estimation using center point detection: Objects as Points, Xingyi Zhou, Dequan Wang, Phili

Xingyi Zhou 6.7k Jan 03, 2023
Code for the Paper: Alexandra Lindt and Emiel Hoogeboom.

Discrete Denoising Flows This repository contains the code for the experiments presented in the paper Discrete Denoising Flows [1]. To give a short ov

Alexandra Lindt 3 Oct 09, 2022
K-Means Clustering and Hierarchical Clustering Unsupervised Learning Solution in Python3.

Unsupervised Learning - K-Means Clustering and Hierarchical Clustering - The Heritage Foundation's Economic Freedom Index Analysis 2019 - By David Sal

David Salako 1 Jan 12, 2022
It's a implement of this paper:Relation extraction via Multi-Level attention CNNs

Relation Classification via Multi-Level Attention CNNs It's a implement of this paper:Relation Classification via Multi-Level Attention CNNs. Training

Aybss 2 Nov 04, 2022
🎁 3,000,000+ Unsplash images made available for research and machine learning

The Unsplash Dataset The Unsplash Dataset is made up of over 250,000+ contributing global photographers and data sourced from hundreds of millions of

Unsplash 2k Jan 03, 2023
FACIAL: Synthesizing Dynamic Talking Face With Implicit Attribute Learning. ICCV, 2021.

FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning PyTorch implementation for the paper: FACIAL: Synthesizing Dynamic Talking

226 Jan 08, 2023
Repo for the Video Person Clustering dataset, and code for the associated paper

Video Person Clustering Repo for the Video Person Clustering dataset, and code for the associated paper. This reporsitory contains the Video Person Cl

Andrew Brown 47 Nov 02, 2022
D-NeRF: Neural Radiance Fields for Dynamic Scenes

D-NeRF: Neural Radiance Fields for Dynamic Scenes [Project] [Paper] D-NeRF is a method for synthesizing novel views, at an arbitrary point in time, of

Albert Pumarola 291 Jan 02, 2023
Quadruped-command-tracking-controller - Quadruped command tracking controller (flat terrain)

Quadruped command tracking controller (flat terrain) Prepare Install RAISIM link

Yunho Kim 4 Oct 20, 2022
Original Pytorch Implementation of FLAME: Facial Landmark Heatmap Activated Multimodal Gaze Estimation

FLAME Original Pytorch Implementation of FLAME: Facial Landmark Heatmap Activated Multimodal Gaze Estimation, accepted at the 17th IEEE Internation Co

Neelabh Sinha 19 Dec 17, 2022
MegEngine implementation of YOLOX

Introduction YOLOX is an anchor-free version of YOLO, with a simpler design but better performance! It aims to bridge the gap between research and ind

旷视天元 MegEngine 77 Nov 22, 2022
Codebase of deep learning models for inferring stability of mRNA molecules

Kaggle OpenVaccine Models Codebase of deep learning models for inferring stability of mRNA molecules, corresponding to the Kaggle Open Vaccine Challen

Eternagame 40 Dec 29, 2022
Learning an Adaptive Meta Model-Generator for Incrementally Updating Recommender Systems

Learning an Adaptive Meta Model-Generator for Incrementally Updating Recommender Systems This is our experimental code for RecSys 2021 paper "Learning

11 Jul 28, 2022