Quantify the difference between two arbitrary curves in space

Last update: Jan 08, 2023

Overview

similaritymeasures

Quantify the difference between two arbitrary curves

Curves in this case are:

discretized by inidviudal data points
ordered from a beginning to an ending

Consider the following two curves. We want to quantify how different the Numerical curve is from the Experimental curve. Notice how there are no concurrent Stress or Strain values in the two curves. Additionally one curve has more data points than the other curves.

In the ideal case the Numerical curve would match the Experimental curve exactly. This means that the two curves would appear directly on top of each other. Our measures of similarity would return a zero distance between two curves that were on top of each other.

Methods covered

This library includes the following methods to quantify the difference (or similarity) between two curves:

Partial Curve Mapping^x (PCM) method: Matches the area of a subset between the two curves [1]
Area method^x: An algorithm for calculating the Area between two curves in 2D space [2]
Discrete Frechet distance^y: The shortest distance in-between two curves, where you are allowed to very the speed at which you travel along each curve independently (walking dog problem) [3, 4, 5, 6, 7, 8]
Curve Length^x method: Assumes that the only true independent variable of the curves is the arc-length distance along the curve from the origin [9, 10]
Dynamic Time Warping^y (DTW): A non-metric distance between two time-series curves that has been proven useful for a variety of applications [11, 12, 13, 14, 15, 16]

^x denotes methods created specifically for material parameter identification

^y denotes that the method implemented in this library supports N-D data!

Installation

Install with pip

[sudo] pip install similaritymeasures

or clone and install from source.

git clone https://github.com/cjekel/similarity_measures
[sudo] pip install ./similarity_measures

Example usage

This shows you how to compute the various similarity measures

import numpy as np
import similaritymeasures
import matplotlib.pyplot as plt

# Generate random experimental data
x = np.random.random(100)
y = np.random.random(100)
exp_data = np.zeros((100, 2))
exp_data[:, 0] = x
exp_data[:, 1] = y

# Generate random numerical data
x = np.random.random(100)
y = np.random.random(100)
num_data = np.zeros((100, 2))
num_data[:, 0] = x
num_data[:, 1] = y

# quantify the difference between the two curves using PCM
pcm = similaritymeasures.pcm(exp_data, num_data)

# quantify the difference between the two curves using
# Discrete Frechet distance
df = similaritymeasures.frechet_dist(exp_data, num_data)

# quantify the difference between the two curves using
# area between two curves
area = similaritymeasures.area_between_two_curves(exp_data, num_data)

# quantify the difference between the two curves using
# Curve Length based similarity measure
cl = similaritymeasures.curve_length_measure(exp_data, num_data)

# quantify the difference between the two curves using
# Dynamic Time Warping distance
dtw, d = similaritymeasures.dtw(exp_data, num_data)

# print the results
print(pcm, df, area, cl, dtw)

# plot the data
plt.figure()
plt.plot(exp_data[:, 0], exp_data[:, 1])
plt.plot(num_data[:, 0], num_data[:, 1])
plt.show()

If you are interested in setting up an optimization problem using these measures, check out this Jupyter Notebook which replicates Section 3.2 from [2].

Changelog

Version 0.3.0: Frechet distance now supports N-D data! See CHANGELOG.md for full details.

Documenation

Each function includes a descriptive docstring, which you can view online here.

References

[1] Katharina Witowski and Nielen Stander. Parameter Identification of Hysteretic Models Using Partial Curve Mapping. 12th AIAA Aviation Technology, Integration, and Op- erations (ATIO) Conference and 14th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference, sep 2012. doi: doi:10.2514/6.2012-5580.

[2] Jekel, C. F., Venter, G., Venter, M. P., Stander, N., & Haftka, R. T. (2018). Similarity measures for identifying material parameters from hysteresis loops using inverse analysis. International Journal of Material Forming. https://doi.org/10.1007/s12289-018-1421-8

[3] M Maurice Frechet. Sur quelques points du calcul fonctionnel. Rendiconti del Circol Matematico di Palermo (1884-1940), 22(1):1–72, 1906.

[4] Thomas Eiter and Heikki Mannila. Computing discrete Frechet distance. Technical report, 1994.

[5] Anne Driemel, Sariel Har-Peled, and Carola Wenk. Approximating the Frechet Distance for Realistic Curves in Near Linear Time. Discrete & Computational Geometry, 48(1): 94–127, 2012. ISSN 1432-0444. doi: 10.1007/s00454-012-9402-z. URL http://dx.doi.org/10.1007/s00454-012-9402-z.

[6] K Bringmann. Why Walking the Dog Takes Time: Frechet Distance Has No Strongly Subquadratic Algorithms Unless SETH Fails, 2014.

[7] Sean L Seyler, Avishek Kumar, M F Thorpe, and Oliver Beckstein. Path Similarity Analysis: A Method for Quantifying Macromolecular Pathways. PLOS Computational Biology, 11(10):1–37, 2015. doi: 10.1371/journal.pcbi.1004568. URL https://doi.org/10.1371/journal.pcbi.1004568.

[8] Helmut Alt and Michael Godau. Computing the Frechet Distance Between Two Polyg- onal Curves. International Journal of Computational Geometry & Applications, 05 (01n02):75–91, 1995. doi: 10.1142/S0218195995000064.

[9] A Andrade-Campos, R De-Carvalho, and R A F Valente. Novel criteria for determina- tion of material model parameters. International Journal of Mechanical Sciences, 54 (1):294–305, 2012. ISSN 0020-7403. doi: https://doi.org/10.1016/j.ijmecsci.2011.11.010. URL http://www.sciencedirect.com/science/article/pii/S0020740311002451.

[10] J Cao and J Lin. A study on formulation of objective functions for determin- ing material models. International Journal of Mechanical Sciences, 50(2):193–204, 2008. ISSN 0020-7403. doi: https://doi.org/10.1016/j.ijmecsci.2007.07.003. URL http://www.sciencedirect.com/science/article/pii/S0020740307001178.

[11] Donald J Berndt and James Clifford. Using Dynamic Time Warping to Find Pat- terns in Time Series. In Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, AAAIWS’94, pages 359–370. AAAI Press, 1994. URL http://dl.acm.org/citation.cfm?id=3000850.3000887.

[12] François Petitjean, Alain Ketterlin, and Pierre Gançarski. A global averaging method for dynamic time warping, with applications to clustering. Pattern Recognition, 44 (3):678–693, 2011. ISSN 0031-3203. doi: https://doi.org/10.1016/j.patcog.2010.09.013. URL http://www.sciencedirect.com/science/article/pii/S003132031000453X.

[13] Toni Giorgino. Computing and Visualizing Dynamic Time Warping Alignments in R: The dtw Package. Journal of Statistical Software; Vol 1, Issue 7 (2009), aug 2009. URL http://dx.doi.org/10.18637/jss.v031.i07.

[14] Stan Salvador and Philip Chan. Toward Accurate Dynamic Time Warping in Linear Time and Space. Intell. Data Anal., 11(5):561–580, oct 2007. ISSN 1088-467X. URL http://dl.acm.org/citation.cfm?id=1367985.1367993.

[15] Paolo Tormene, Toni Giorgino, Silvana Quaglini, and Mario Stefanelli. Matching incomplete time series with dynamic time warping: an algorithm and an applica- tion to post-stroke rehabilitation. Artificial Intelligence in Medicine, 45(1):11–34, 2009. ISSN 0933-3657. doi: https://doi.org/10.1016/j.artmed.2008.11.007. URL http://www.sciencedirect.com/science/article/pii/S0933365708001772.

[16] Senin, P., 2008. Dynamic time warping algorithm review. Information and Computer Science Department University of Hawaii at Manoa Honolulu, USA, 855, pp.1-23. http://seninp.github.io/assets/pubs/senin_dtw_litreview_2008.pdf

Contributions welcome!

This is by no means a complete list of all possible similarity measures. For instance the SciPy Hausdorff distance is an alternative similarity measure useful if you don't know the beginning and ending of each curve. There are many more possible functions out there. Feel free to send PRs for other functions in literature!

Requirements for adding new method to this library:

all methods should be able to quantify the difference between two curves
method must support the case where each curve may have a different number of data points
follow the style of existing functions
reference to method details, or descriptive docstring of the method
include test(s) for your new method
minimum Python dependencies (try to stick to SciPy/numpy functions if possible)

Please cite

If you've found this information or library helpful please cite the following paper. You should also cite the papers of any methods that you have used.

Jekel, C. F., Venter, G., Venter, M. P., Stander, N., & Haftka, R. T. (2018). Similarity measures for identifying material parameters from hysteresis loops using inverse analysis. International Journal of Material Forming. https://doi.org/10.1007/s12289-018-1421-8

@article{Jekel2019,
author = {Jekel, Charles F and Venter, Gerhard and Venter, Martin P and Stander, Nielen and Haftka, Raphael T},
doi = {10.1007/s12289-018-1421-8},
issn = {1960-6214},
journal = {International Journal of Material Forming},
month = {may},
title = {{Similarity measures for identifying material parameters from hysteresis loops using inverse analysis}},
url = {https://doi.org/10.1007/s12289-018-1421-8},
year = {2019}
}

Comments

frechet_dist input size is bounded by maximum recursion depth
Consider the followings:

max_len = 1000 a = [[1,2,3] for i in range(max_len)] b = [[1,6,3] for i in range(max_len)] frechet_dist(a,b)

While running this code on a 32GB RAM machine it raises a stack-overflow error. I would suggest to switch the recursion based computations to iterative based computations using Queue's.

Is anyone currently working on optimizing the memory usage of frechet_dist ?

Thank you for your work, Arbel Amir
opened by ArbelAmir 8
discrete Frechet distance between lists or 1D arrays

my question might sounds a little dumb. but is it possible to use similaritymeasures.frechet_dist() for lists or 1D arrays? i tried to calculate similarity between a list and other multiple list (which also contains the first list )but the most similar output was not the first argument.which i expect to return it since they are exactly the same.but it works when implemented in real coordinates with lat ,lon like trajectories and the most similar output is the first given argument.i'm trying to use factors other than distance for calculating similarity between two thing and those parametrs are just a numerical values and i'm wondering how can i use this frechet _dist for list arrays.

opened by miladad8 6
Regarding code update in "is_simple_quad" function on Aug 18,2019
Dear Authors,

Thanks for your contribution in the form of "simialritymeasures" library for quantifying the difference between the curves. I have been using it for finding the area between the curves. But, since your update in the code to check if the quadrilateral is simple or not [ in "is_simple_quad" function on Aug 18,2019], the output for area between the curves is not correct. (However, if I use the previous code the area returned is correct). Specifically, the "if condition" which checks the number of cross products with same sign, should be: sum(crossTF) > 2 instead of sum(crossTF) == 2

The same can be checked from the following code which tries to find the area between two simple curves. Running the following prints : area1 : 0.0

while using the previous code give correct area (4 in this case)

import matplotlib.pyplot as plt import similaritymeasures xaxis=[0,1, 2, 3, 4] curve1=[0,0,0,0,0] curve2=[1,1,1,1,1] exp_data = np.zeros((len(xaxis), 2)) num_data = np.zeros((len(xaxis), 2)) exp_data[:, 0] = xaxis exp_data[:, 1] = curve1 num_data[:, 0] = xaxis num_data[:, 1] = curve2 plt.figure() plt.scatter(xaxis, curve1) plt.scatter(xaxis, curve2) plt.show() area1=similaritymeasures.area_between_two_curves(exp_data, num_data) print("area1 : "+str(area1) )```
opened by aanchalMongia 5

Problem during pip install: "UnicodeDecodeError: 'gbk' codec can't decode byte"

pip install similaritymeasures gives

(base) D:\repositories\joinTracks>pip install similaritymeasures
Collecting similaritymeasures
  Using cached similaritymeasures-0.4.3.tar.gz (397 kB)
    ERROR: Command errored out with exit status 1:
     command: 'C:\Users\s00557672\Anaconda3\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\S00557~1\\AppData\\Local\\Temp\\pip-install-d5tndazp\\similaritymeasures\\setup.py'"'"';
__file__='"'"'C:\\Users\\S00557~1\\AppData\\Local\\Temp\\pip-install-d5tndazp\\similaritymeasures\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'
"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base 'C:\Users\S00557~1\AppData\Local\Temp\pip-install-d5tndazp\similaritymeasures\pip-egg-info'
         cwd: C:\Users\S00557~1\AppData\Local\Temp\pip-install-d5tndazp\similaritymeasures\
    Complete output (5 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "C:\Users\S00557~1\AppData\Local\Temp\pip-install-d5tndazp\similaritymeasures\setup.py", line 12, in <module>
        long_description=open('README.md').read(),
    UnicodeDecodeError: 'gbk' codec can't decode byte 0x93 in position 5204: illegal multibyte sequence
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

How can I fix it?

opened by sergorl 4

Added MAE and MSE

Added extra functions to find the Mean Absolute Distance (MAE) and the Mean Squared Distance (MSE) between the two curves. It works with all the distance measures in scipy.spatial.distance.cdist.

opened by HarshRaoD 2
Similarity between two curves which have different number of data points

Hello, I would like to know how to compute the similarity of two curves which own various number of data points? Such like that you referred to on the main page: Also, which methods support this? And what is the meaning of the results? Looking for your reply.

opened by xiaobrnbrn 2
pcm may be wrong

A user has pointed out that i have potentially an incorrect pcm implementation because I divide the distances by a max value.

It is possible that it is a mistake on my part, where I was trying to combine code for the curve_length and pcm methods. It is also possible that I thought xmax and ymax would always be one, so it wouldn't matter. The curve_length method needs the max values because there is no other normalization.

The line in question: https://github.com/cjekel/similarity_measures/blob/master/similaritymeasures/similaritymeasures.py#L352

I'm pretty sure that line is correct for the curve_length_measure method.
bug help wanted

opened by cjekel 0
Improv perf

Good afternoon, I hope this message finds you well, and I compliment and thank you for the code.

I worked with frechet and dtw, and I found that computational performances of the frechet function were subpar since they didn't use the cdist function from scipy (which i found by far more performing than the minkowski_distance one).

I took the freedom to change the code and propose you a pull request (i also modified tests code in order for it not to import the already installed package).

On my day to day job I also use cython to improve performances of python programs, and I was thinking that maybe it could benefit some of the loops in the code.

Best of wishes for whatever!

Nuc

opened by nucccc 3
Similarity between two curves using PyTorch

Hi guys,

I want to implement in my trainer a measure of similarity between my predicted trajectory and the GT trajectory. Here is an example:

The GT is the red line, my observation is the yellow line (almost hidden by the other participants) and the green line is my prediction. The other agents are not used at this moment.

Now, in order to train my DL based Motion Prediction algorithm I am using the ADE, FDE and NLL losses w.r.t. the GT. Nevertheless, I think that if my prediction does not match exactly the GT but it is in the same centerline (but driving with a different velocity, for example) it will be better. E.g.

This prediction does not match the GT (until the red diamond at the bottom), but at least the shapes of both curves are more or less the same.

How could I do that?

opened by Cram3r95 20
Incorrect measurement of area between intersecting 2D curves

It appears that either my understanding of area between the curves or its calculation in the library is incorrect (the referenced paper is paywalled). In the following example, I have two plots, where grey line is original data, and there are two different blue splines. Visually, you can clearly see that the area between two lines on the left plot is several times larger than the area on the right plot, but the calculation with similaritymeasures.area_between_two_curves shows only a 2x difference.

Here is the GitHub gist, where I present the calculation (the GDrive zip with airfoils is public, so the whole thing can be ran in Colab or elsewhere if you modify the path in the 2nd cell): https://gist.github.com/rafalszulejko/2c9ff645b448d60d857975a8f7965045#file-wing-optimization2-ipynb

opened by rafalszulejko 11
Faster DTW

Hello,

Thanks for a really nice repo with an easy-to-use API for quickly generating some metrics on curve similarities. I just thought I would let you know that there is a much faster DTW implementation than the one you are using in this repo which if it covers your needs you should consider replacing with the current implementation:

Link to faster DTW implementation

Carry on the great work! :)

opened by vancromy 2
Add other interpolation methods to the area between curves method

Right now the area between curves method uses bisection of largest gap to add artificial data points. This method was used to minimize the number of artificial quads/points. However, this can have some negative effects in some cases, specifically when the sampling rate is artificial and does not match (e.g. one curve is just a straight line with few points).

A potential alternative it to use the arc length projection of one curve's points onto the other. This would preserve the sampling rate, and may make for more uniform quads. This is similar to what's done in PCM method.

When another interpolation method is added, give users the choice of which interpolation method to use. Changing the interpolation method is anticipated to change the results.

opened by cjekel 0

Releases(0.6.0)

0.6.0(Oct 8, 2022)
similaritymeasures.pcm now produces different values! This was done to better follow the original algorithm. To get the same results from previous versions, set norm_seg_length=True. What this option does is scale each segment length by the maximum values of the curve (borrowed from the curve_length_measure). This scaling should not be needed with the PCM method because both curves are always scaled initially.

Fix docstring documentation for returns in similaritymeasures.dtw and similaritymeasures.curve_length_measure

Source code(tar.gz)
Source code(zip)
v0.5.0(Aug 6, 2022)

Adds mean squared error (mse) and mean absolute error (mae) methods thanks to @HarshRaoD
Source code(tar.gz)
Source code(zip)

Owner

Charles Jekel

GitHub Repository

Sematic-Segmantation - Semantic Segmentation on MIT ADE20K dataset in PyTorch

Semantic Segmentation on MIT ADE20K dataset in PyTorch This is a PyTorch impleme

4 Mar 22, 2022

PyTorch implementation of "VRT: A Video Restoration Transformer"

VRT: A Video Restoration Transformer Jingyun Liang, Jiezhang Cao, Yuchen Fan, Kai Zhang, Rakesh Ranjan, Yawei Li, Radu Timofte, Luc Van Gool Computer

837 Jan 09, 2023

A Next Generation ConvNet by FaceBookResearch Implementation in PyTorch(Original) and TensorFlow.

ConvNeXt A Next Generation ConvNet by FaceBookResearch Implementation in PyTorch(Original) and TensorFlow. A FacebookResearch Implementation on A Conv

2 Feb 14, 2022

Offcial implementation of "A Hybrid Video Anomaly Detection Framework via Memory-Augmented Flow Reconstruction and Flow-Guided Frame Prediction, ICCV-2021".

HF2-VAD Offcial implementation of "A Hybrid Video Anomaly Detection Framework via Memory-Augmented Flow Reconstruction and Flow-Guided Frame Predictio

76 Dec 21, 2022

PyTorch implementation for Partially View-aligned Representation Learning with Noise-robust Contrastive Loss (CVPR 2021)

2021-CVPR-MvCLN This repo contains the code and data of the following paper accepted by CVPR 2021 Partially View-aligned Representation Learning with

33 Nov 01, 2022

Annotated, understandable, and visually interpretable PyTorch implementations of: VAE, BIRVAE, NSGAN, MMGAN, WGAN, WGANGP, LSGAN, DRAGAN, BEGAN, RaGAN, InfoGAN, fGAN, FisherGAN

Overview PyTorch 0.4.1 | Python 3.6.5 Annotated implementations with comparative introductions for minimax, non-saturating, wasserstein, wasserstein g

471 Dec 16, 2022

NEO: Non Equilibrium Sampling on the orbit of a deterministic transform

NEO: Non Equilibrium Sampling on the orbit of a deterministic transform Description of the code This repo describes the NEO estimator described in the

0 Dec 01, 2021

Labels4Free: Unsupervised Segmentation using StyleGAN

Labels4Free: Unsupervised Segmentation using StyleGAN ICCV 2021 Figure: Some segmentation masks predicted by Labels4Free Framework on real and synthet

70 Dec 23, 2022

Code for IntraQ, PyTorch implementation of our paper under review

IntraQ: Learning Synthetic Images with Intra-Class Heterogeneity for Zero-Shot Network Quantization paper Requirements Python = 3.7.10 Pytorch == 1.7

1 Nov 19, 2021

An implementation for the ICCV 2021 paper Deep Permutation Equivariant Structure from Motion.

Deep Permutation Equivariant Structure from Motion Paper | Poster This repository contains an implementation for the ICCV 2021 paper Deep Permutation

72 Dec 27, 2022

Colossal-AI: A Unified Deep Learning System for Large-Scale Parallel Training

ColossalAI An integrated large-scale model training system with efficient parallelization techniques. arXiv: Colossal-AI: A Unified Deep Learning Syst

7.9k Jan 08, 2023

Just playing with getting CLIP Guided Diffusion running locally, rather than having to use colab.

CLIP-Guided-Diffusion Just playing with getting CLIP Guided Diffusion running locally, rather than having to use colab. Original colab notebooks by Ka

336 Dec 09, 2022

MonoScene: Monocular 3D Semantic Scene Completion

MonoScene: Monocular 3D Semantic Scene Completion MonoScene: Monocular 3D Semantic Scene Completion] [arXiv + supp] | [Project page] Anh-Quan Cao, Rao

298 Jan 08, 2023

Repo for "TableParser: Automatic Table Parsing with Weak Supervision from Spreadsheets" at [email protected]

TableParser Repo for "TableParser: Automatic Table Parsing with Weak Supervision from Spreadsheets" at DS3 Lab 11 Dec 13, 2022

A repository for storing njxzc final exam review material

文档地址，请戳我 👈 👈 👈 ☀️ 1.Reason 大三上期末复习软件工程的时候，发现其他高校在GitHub上开源了他们学校的期末试题，我很受触动。期末

2 Jan 18, 2022

Easy genetic ancestry predictions in Python

ezancestry Easily visualize your direct-to-consumer genetics next to 2500+ samples from the 1000 genomes project. Evaluate the performance of a custom

38 Jan 02, 2023

Raster Vision is an open source Python framework for building computer vision models on satellite, aerial, and other large imagery sets

Raster Vision is an open source Python framework for building computer vision models on satellite, aerial, and other large imagery sets (including obl

1.7k Dec 22, 2022