Quantify the difference between two arbitrary curves in space

Overview

similaritymeasures

Downloads a month similaritymeasures ci codecov

Quantify the difference between two arbitrary curves

Curves in this case are:

  • discretized by inidviudal data points
  • ordered from a beginning to an ending

Consider the following two curves. We want to quantify how different the Numerical curve is from the Experimental curve. Notice how there are no concurrent Stress or Strain values in the two curves. Additionally one curve has more data points than the other curves.

Image of two different curves

In the ideal case the Numerical curve would match the Experimental curve exactly. This means that the two curves would appear directly on top of each other. Our measures of similarity would return a zero distance between two curves that were on top of each other.

Methods covered

This library includes the following methods to quantify the difference (or similarity) between two curves:

  • Partial Curve Mappingx (PCM) method: Matches the area of a subset between the two curves [1]
  • Area methodx: An algorithm for calculating the Area between two curves in 2D space [2]
  • Discrete Frechet distancey: The shortest distance in-between two curves, where you are allowed to very the speed at which you travel along each curve independently (walking dog problem) [3, 4, 5, 6, 7, 8]
  • Curve Lengthx method: Assumes that the only true independent variable of the curves is the arc-length distance along the curve from the origin [9, 10]
  • Dynamic Time Warpingy (DTW): A non-metric distance between two time-series curves that has been proven useful for a variety of applications [11, 12, 13, 14, 15, 16]

x denotes methods created specifically for material parameter identification

y denotes that the method implemented in this library supports N-D data!

Installation

Install with pip

[sudo] pip install similaritymeasures

or clone and install from source.

git clone https://github.com/cjekel/similarity_measures
[sudo] pip install ./similarity_measures

Example usage

This shows you how to compute the various similarity measures

import numpy as np
import similaritymeasures
import matplotlib.pyplot as plt

# Generate random experimental data
x = np.random.random(100)
y = np.random.random(100)
exp_data = np.zeros((100, 2))
exp_data[:, 0] = x
exp_data[:, 1] = y

# Generate random numerical data
x = np.random.random(100)
y = np.random.random(100)
num_data = np.zeros((100, 2))
num_data[:, 0] = x
num_data[:, 1] = y

# quantify the difference between the two curves using PCM
pcm = similaritymeasures.pcm(exp_data, num_data)

# quantify the difference between the two curves using
# Discrete Frechet distance
df = similaritymeasures.frechet_dist(exp_data, num_data)

# quantify the difference between the two curves using
# area between two curves
area = similaritymeasures.area_between_two_curves(exp_data, num_data)

# quantify the difference between the two curves using
# Curve Length based similarity measure
cl = similaritymeasures.curve_length_measure(exp_data, num_data)

# quantify the difference between the two curves using
# Dynamic Time Warping distance
dtw, d = similaritymeasures.dtw(exp_data, num_data)

# print the results
print(pcm, df, area, cl, dtw)

# plot the data
plt.figure()
plt.plot(exp_data[:, 0], exp_data[:, 1])
plt.plot(num_data[:, 0], num_data[:, 1])
plt.show()

If you are interested in setting up an optimization problem using these measures, check out this Jupyter Notebook which replicates Section 3.2 from [2].

Changelog

Version 0.3.0: Frechet distance now supports N-D data! See CHANGELOG.md for full details.

Documenation

Each function includes a descriptive docstring, which you can view online here.

References

[1] Katharina Witowski and Nielen Stander. Parameter Identification of Hysteretic Models Using Partial Curve Mapping. 12th AIAA Aviation Technology, Integration, and Op- erations (ATIO) Conference and 14th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference, sep 2012. doi: doi:10.2514/6.2012-5580.

[2] Jekel, C. F., Venter, G., Venter, M. P., Stander, N., & Haftka, R. T. (2018). Similarity measures for identifying material parameters from hysteresis loops using inverse analysis. International Journal of Material Forming. https://doi.org/10.1007/s12289-018-1421-8

[3] M Maurice Frechet. Sur quelques points du calcul fonctionnel. Rendiconti del Circol Matematico di Palermo (1884-1940), 22(1):1–72, 1906.

[4] Thomas Eiter and Heikki Mannila. Computing discrete Frechet distance. Technical report, 1994.

[5] Anne Driemel, Sariel Har-Peled, and Carola Wenk. Approximating the Frechet Distance for Realistic Curves in Near Linear Time. Discrete & Computational Geometry, 48(1): 94–127, 2012. ISSN 1432-0444. doi: 10.1007/s00454-012-9402-z. URL http://dx.doi.org/10.1007/s00454-012-9402-z.

[6] K Bringmann. Why Walking the Dog Takes Time: Frechet Distance Has No Strongly Subquadratic Algorithms Unless SETH Fails, 2014.

[7] Sean L Seyler, Avishek Kumar, M F Thorpe, and Oliver Beckstein. Path Similarity Analysis: A Method for Quantifying Macromolecular Pathways. PLOS Computational Biology, 11(10):1–37, 2015. doi: 10.1371/journal.pcbi.1004568. URL https://doi.org/10.1371/journal.pcbi.1004568.

[8] Helmut Alt and Michael Godau. Computing the Frechet Distance Between Two Polyg- onal Curves. International Journal of Computational Geometry & Applications, 05 (01n02):75–91, 1995. doi: 10.1142/S0218195995000064.

[9] A Andrade-Campos, R De-Carvalho, and R A F Valente. Novel criteria for determina- tion of material model parameters. International Journal of Mechanical Sciences, 54 (1):294–305, 2012. ISSN 0020-7403. doi: https://doi.org/10.1016/j.ijmecsci.2011.11.010. URL http://www.sciencedirect.com/science/article/pii/S0020740311002451.

[10] J Cao and J Lin. A study on formulation of objective functions for determin- ing material models. International Journal of Mechanical Sciences, 50(2):193–204, 2008. ISSN 0020-7403. doi: https://doi.org/10.1016/j.ijmecsci.2007.07.003. URL http://www.sciencedirect.com/science/article/pii/S0020740307001178.

[11] Donald J Berndt and James Clifford. Using Dynamic Time Warping to Find Pat- terns in Time Series. In Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, AAAIWS’94, pages 359–370. AAAI Press, 1994. URL http://dl.acm.org/citation.cfm?id=3000850.3000887.

[12] François Petitjean, Alain Ketterlin, and Pierre Gançarski. A global averaging method for dynamic time warping, with applications to clustering. Pattern Recognition, 44 (3):678–693, 2011. ISSN 0031-3203. doi: https://doi.org/10.1016/j.patcog.2010.09.013. URL http://www.sciencedirect.com/science/article/pii/S003132031000453X.

[13] Toni Giorgino. Computing and Visualizing Dynamic Time Warping Alignments in R: The dtw Package. Journal of Statistical Software; Vol 1, Issue 7 (2009), aug 2009. URL http://dx.doi.org/10.18637/jss.v031.i07.

[14] Stan Salvador and Philip Chan. Toward Accurate Dynamic Time Warping in Linear Time and Space. Intell. Data Anal., 11(5):561–580, oct 2007. ISSN 1088-467X. URL http://dl.acm.org/citation.cfm?id=1367985.1367993.

[15] Paolo Tormene, Toni Giorgino, Silvana Quaglini, and Mario Stefanelli. Matching incomplete time series with dynamic time warping: an algorithm and an applica- tion to post-stroke rehabilitation. Artificial Intelligence in Medicine, 45(1):11–34, 2009. ISSN 0933-3657. doi: https://doi.org/10.1016/j.artmed.2008.11.007. URL http://www.sciencedirect.com/science/article/pii/S0933365708001772.

[16] Senin, P., 2008. Dynamic time warping algorithm review. Information and Computer Science Department University of Hawaii at Manoa Honolulu, USA, 855, pp.1-23. http://seninp.github.io/assets/pubs/senin_dtw_litreview_2008.pdf

Contributions welcome!

This is by no means a complete list of all possible similarity measures. For instance the SciPy Hausdorff distance is an alternative similarity measure useful if you don't know the beginning and ending of each curve. There are many more possible functions out there. Feel free to send PRs for other functions in literature!

Requirements for adding new method to this library:

  • all methods should be able to quantify the difference between two curves
  • method must support the case where each curve may have a different number of data points
  • follow the style of existing functions
  • reference to method details, or descriptive docstring of the method
  • include test(s) for your new method
  • minimum Python dependencies (try to stick to SciPy/numpy functions if possible)

Please cite

If you've found this information or library helpful please cite the following paper. You should also cite the papers of any methods that you have used.

Jekel, C. F., Venter, G., Venter, M. P., Stander, N., & Haftka, R. T. (2018). Similarity measures for identifying material parameters from hysteresis loops using inverse analysis. International Journal of Material Forming. https://doi.org/10.1007/s12289-018-1421-8

@article{Jekel2019,
author = {Jekel, Charles F and Venter, Gerhard and Venter, Martin P and Stander, Nielen and Haftka, Raphael T},
doi = {10.1007/s12289-018-1421-8},
issn = {1960-6214},
journal = {International Journal of Material Forming},
month = {may},
title = {{Similarity measures for identifying material parameters from hysteresis loops using inverse analysis}},
url = {https://doi.org/10.1007/s12289-018-1421-8},
year = {2019}
}
Comments
  • frechet_dist input size is bounded by maximum recursion depth

    frechet_dist input size is bounded by maximum recursion depth

    Consider the followings:

    max_len = 1000
    a = [[1,2,3] for i in range(max_len)]
    b = [[1,6,3] for i in range(max_len)]
    frechet_dist(a,b)
    
    

    While running this code on a 32GB RAM machine it raises a stack-overflow error. I would suggest to switch the recursion based computations to iterative based computations using Queue's.

    Is anyone currently working on optimizing the memory usage of frechet_dist ?

    Thank you for your work, Arbel Amir

    opened by ArbelAmir 8
  • discrete Frechet distance between lists or 1D arrays

    discrete Frechet distance between lists or 1D arrays

    my question might sounds a little dumb. but is it possible to use similaritymeasures.frechet_dist() for lists or 1D arrays? i tried to calculate similarity between a list and other multiple list (which also contains the first list )but the most similar output was not the first argument.which i expect to return it since they are exactly the same.but it works when implemented in real coordinates with lat ,lon like trajectories and the most similar output is the first given argument.i'm trying to use factors other than distance for calculating similarity between two thing and those parametrs are just a numerical values and i'm wondering how can i use this frechet _dist for list arrays.

    opened by miladad8 6
  • Regarding code update in

    Regarding code update in "is_simple_quad" function on Aug 18,2019

    Dear Authors,

    Thanks for your contribution in the form of "simialritymeasures" library for quantifying the difference between the curves. I have been using it for finding the area between the curves. But, since your update in the code to check if the quadrilateral is simple or not [ in "is_simple_quad" function on Aug 18,2019], the output for area between the curves is not correct. (However, if I use the previous code the area returned is correct). Specifically, the "if condition" which checks the number of cross products with same sign, should be: sum(crossTF) > 2 instead of sum(crossTF) == 2

    The same can be checked from the following code which tries to find the area between two simple curves. Running the following prints : area1 : 0.0

    while using the previous code give correct area (4 in this case)

    import matplotlib.pyplot as plt
    import similaritymeasures
    
    xaxis=[0,1, 2, 3, 4]
    curve1=[0,0,0,0,0]
    curve2=[1,1,1,1,1]
    exp_data = np.zeros((len(xaxis), 2))
    num_data = np.zeros((len(xaxis), 2))
    
    exp_data[:, 0] = xaxis
    exp_data[:, 1] = curve1
    num_data[:, 0] = xaxis
    num_data[:, 1] = curve2
    
    plt.figure()
    plt.scatter(xaxis, curve1)
    plt.scatter(xaxis, curve2)
    plt.show()
            
    area1=similaritymeasures.area_between_two_curves(exp_data, num_data)
    print("area1 : "+str(area1) )```
    opened by aanchalMongia 5
  • Problem during pip install:

    Problem during pip install: "UnicodeDecodeError: 'gbk' codec can't decode byte"

    pip install similaritymeasures gives

    (base) D:\repositories\joinTracks>pip install similaritymeasures
    Collecting similaritymeasures
      Using cached similaritymeasures-0.4.3.tar.gz (397 kB)
        ERROR: Command errored out with exit status 1:
         command: 'C:\Users\s00557672\Anaconda3\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\S00557~1\\AppData\\Local\\Temp\\pip-install-d5tndazp\\similaritymeasures\\setup.py'"'"';
    __file__='"'"'C:\\Users\\S00557~1\\AppData\\Local\\Temp\\pip-install-d5tndazp\\similaritymeasures\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'
    "');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base 'C:\Users\S00557~1\AppData\Local\Temp\pip-install-d5tndazp\similaritymeasures\pip-egg-info'
             cwd: C:\Users\S00557~1\AppData\Local\Temp\pip-install-d5tndazp\similaritymeasures\
        Complete output (5 lines):
        Traceback (most recent call last):
          File "<string>", line 1, in <module>
          File "C:\Users\S00557~1\AppData\Local\Temp\pip-install-d5tndazp\similaritymeasures\setup.py", line 12, in <module>
            long_description=open('README.md').read(),
        UnicodeDecodeError: 'gbk' codec can't decode byte 0x93 in position 5204: illegal multibyte sequence
        ----------------------------------------
    ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
    
    

    How can I fix it?

    opened by sergorl 4
  • Added MAE and MSE

    Added MAE and MSE

    Added extra functions to find the Mean Absolute Distance (MAE) and the Mean Squared Distance (MSE) between the two curves. It works with all the distance measures in scipy.spatial.distance.cdist.

    opened by HarshRaoD 2
  • Similarity between two curves which have different number of data points

    Similarity between two curves which have different number of data points

    Hello, I would like to know how to compute the similarity of two curves which own various number of data points? Such like that you referred to on the main page: Also, which methods support this? And what is the meaning of the results? Looking for your reply.

    opened by xiaobrnbrn 2
  • pcm may be wrong

    pcm may be wrong

    A user has pointed out that i have potentially an incorrect pcm implementation because I divide the distances by a max value.

    It is possible that it is a mistake on my part, where I was trying to combine code for the curve_length and pcm methods. It is also possible that I thought xmax and ymax would always be one, so it wouldn't matter. The curve_length method needs the max values because there is no other normalization.

    The line in question: https://github.com/cjekel/similarity_measures/blob/master/similaritymeasures/similaritymeasures.py#L352

    I'm pretty sure that line is correct for the curve_length_measure method.

    bug help wanted 
    opened by cjekel 0
  • Improv perf

    Improv perf

    Good afternoon, I hope this message finds you well, and I compliment and thank you for the code.

    I worked with frechet and dtw, and I found that computational performances of the frechet function were subpar since they didn't use the cdist function from scipy (which i found by far more performing than the minkowski_distance one).

    I took the freedom to change the code and propose you a pull request (i also modified tests code in order for it not to import the already installed package).

    On my day to day job I also use cython to improve performances of python programs, and I was thinking that maybe it could benefit some of the loops in the code.

    Best of wishes for whatever!

    Nuc

    opened by nucccc 3
  • Similarity between two curves using PyTorch

    Similarity between two curves using PyTorch

    Hi guys,

    I want to implement in my trainer a measure of similarity between my predicted trajectory and the GT trajectory. Here is an example:

    imagen

    The GT is the red line, my observation is the yellow line (almost hidden by the other participants) and the green line is my prediction. The other agents are not used at this moment.

    Now, in order to train my DL based Motion Prediction algorithm I am using the ADE, FDE and NLL losses w.r.t. the GT. Nevertheless, I think that if my prediction does not match exactly the GT but it is in the same centerline (but driving with a different velocity, for example) it will be better. E.g.

    imagen

    This prediction does not match the GT (until the red diamond at the bottom), but at least the shapes of both curves are more or less the same.

    How could I do that?

    opened by Cram3r95 20
  • Incorrect measurement of area between intersecting 2D curves

    Incorrect measurement of area between intersecting 2D curves

    It appears that either my understanding of area between the curves or its calculation in the library is incorrect (the referenced paper is paywalled). In the following example, I have two plots, where grey line is original data, and there are two different blue splines. Visually, you can clearly see that the area between two lines on the left plot is several times larger than the area on the right plot, but the calculation with similaritymeasures.area_between_two_curves shows only a 2x difference. image image

    Here is the GitHub gist, where I present the calculation (the GDrive zip with airfoils is public, so the whole thing can be ran in Colab or elsewhere if you modify the path in the 2nd cell): https://gist.github.com/rafalszulejko/2c9ff645b448d60d857975a8f7965045#file-wing-optimization2-ipynb

    opened by rafalszulejko 11
  • Faster DTW

    Faster DTW

    Hello,

    Thanks for a really nice repo with an easy-to-use API for quickly generating some metrics on curve similarities. I just thought I would let you know that there is a much faster DTW implementation than the one you are using in this repo which if it covers your needs you should consider replacing with the current implementation:

    Link to faster DTW implementation

    Carry on the great work! :)

    opened by vancromy 2
  • Add other interpolation methods to the area between curves method

    Add other interpolation methods to the area between curves method

    Right now the area between curves method uses bisection of largest gap to add artificial data points. This method was used to minimize the number of artificial quads/points. However, this can have some negative effects in some cases, specifically when the sampling rate is artificial and does not match (e.g. one curve is just a straight line with few points).

    A potential alternative it to use the arc length projection of one curve's points onto the other. This would preserve the sampling rate, and may make for more uniform quads. This is similar to what's done in PCM method.

    When another interpolation method is added, give users the choice of which interpolation method to use. Changing the interpolation method is anticipated to change the results.

    opened by cjekel 0
Releases(0.6.0)
  • 0.6.0(Oct 8, 2022)

    • similaritymeasures.pcm now produces different values! This was done to better follow the original algorithm. To get the same results from previous versions, set norm_seg_length=True. What this option does is scale each segment length by the maximum values of the curve (borrowed from the curve_length_measure). This scaling should not be needed with the PCM method because both curves are always scaled initially.
    • Fix docstring documentation for returns in similaritymeasures.dtw and similaritymeasures.curve_length_measure
    Source code(tar.gz)
    Source code(zip)
  • v0.5.0(Aug 6, 2022)

Owner
Charles Jekel
Charles Jekel
This repository contains datasets and baselines for benchmarking Chinese text recognition.

Benchmarking-Chinese-Text-Recognition This repository contains datasets and baselines for benchmarking Chinese text recognition. Please see the corres

FudanVI Lab 254 Dec 30, 2022
This git repo contains the implementation of my ML project on Heart Disease Prediction

Introduction This git repo contains the implementation of my ML project on Heart Disease Prediction. This is a real-world machine learning model/proje

Aryan Dutta 1 Feb 02, 2022
[ICLR 2021] Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments.

[ICLR 2021] RAPID: A Simple Approach for Exploration in Reinforcement Learning This is the Tensorflow implementation of ICLR 2021 paper Rank the Episo

Daochen Zha 48 Nov 21, 2022
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

ELECTRA Introduction ELECTRA is a method for self-supervised language representation learning. It can be used to pre-train transformer networks using

Google Research 2.1k Dec 28, 2022
SmartSim Infrastructure Library.

Home Install Documentation Slack Invite Cray Labs SmartSim SmartSim makes it easier to use common Machine Learning (ML) libraries like PyTorch and Ten

Cray Labs 139 Jan 01, 2023
Implementation of CaiT models in TensorFlow and ImageNet-1k checkpoints. Includes code for inference and fine-tuning.

CaiT-TF (Going deeper with Image Transformers) This repository provides TensorFlow / Keras implementations of different CaiT [1] variants from Touvron

Sayak Paul 9 Jun 26, 2022
Solution to the Weather4cast 2021 challenge

This code was used for the entry by the team "antfugue" for the Weather4cast 2021 Challenge. Below, you can find the instructions for generating predi

Jussi Leinonen 13 Jan 03, 2023
FCOSR: A Simple Anchor-free Rotated Detector for Aerial Object Detection

FCOSR: A Simple Anchor-free Rotated Detector for Aerial Object Detection FCOSR: A Simple Anchor-free Rotated Detector for Aerial Object Detection arXi

59 Nov 29, 2022
This repo tries to recognize faces in the dataset you created

YÜZ TANIMA SİSTEMİ Bu repo oluşturacağınız yüz verisetlerini tanımaya çalışan ma

Mehdi KOŞACA 2 Dec 30, 2021
Memory Defense: More Robust Classificationvia a Memory-Masking Autoencoder

Memory Defense: More Robust Classificationvia a Memory-Masking Autoencoder Authors: - Eashan Adhikarla - Dan Luo - Dr. Brian D. Davison Abstract Many

Eashan Adhikarla 4 Dec 25, 2022
EncT5: Fine-tuning T5 Encoder for Non-autoregressive Tasks

EncT5 (Unofficial) Pytorch Implementation of EncT5: Fine-tuning T5 Encoder for Non-autoregressive Tasks About Finetune T5 model for classification & r

Jangwon Park 34 Jan 01, 2023
Code to go with the paper "Decentralized Bayesian Learning with Metropolis-Adjusted Hamiltonian Monte Carlo"

dblmahmc Code to go with the paper "Decentralized Bayesian Learning with Metropolis-Adjusted Hamiltonian Monte Carlo" Requirements: https://github.com

1 Dec 17, 2021
NL-Augmenter 🦎 → 🐍 A Collaborative Repository of Natural Language Transformations

NL-Augmenter 🦎 → 🐍 The NL-Augmenter is a collaborative effort intended to add transformations of datasets dealing with natural language. Transformat

684 Jan 09, 2023
Yet Another Robotics and Reinforcement (YARR) learning framework for PyTorch.

Yet Another Robotics and Reinforcement (YARR) learning framework for PyTorch.

Stephen James 51 Dec 27, 2022
Unofficial pytorch implementation of 'Image Inpainting for Irregular Holes Using Partial Convolutions'

pytorch-inpainting-with-partial-conv Official implementation is released by the authors. Note that this is an ongoing re-implementation and I cannot f

Naoto Inoue 525 Jan 01, 2023
Python Interview Questions

Python Interview Questions Clone the code to your computer. You need to understand the code in main.py and modify the content in if __name__ =='__main

ClassmateLin 575 Dec 28, 2022
GNEE - GAT Neural Event Embeddings

GNEE - GAT Neural Event Embeddings This repository contains source code for the GNEE (GAT Neural Event Embeddings) method introduced in the paper: "Se

João Pedro Rodrigues Mattos 0 Sep 15, 2021
Machine Learning Platform for Kubernetes

Reproduce, Automate, Scale your data science. Welcome to Polyaxon, a platform for building, training, and monitoring large scale deep learning applica

polyaxon 3.2k Dec 23, 2022
Constrained Logistic Regression - How to apply specific constraints to logistic regression's coefficients

Constrained Logistic Regression Sample implementation of constructing a logistic regression with given ranges on each of the feature's coefficients (v

1 Dec 29, 2021
An attempt at the implementation of GLOM, Geoffrey Hinton's paper for emergent part-whole hierarchies from data

GLOM TensorFlow This Python package attempts to implement GLOM in TensorFlow, which allows advances made by several different groups transformers, neu

Rishit Dagli 32 Feb 21, 2022