Extensible, parallel implementations of t-SNE

Last update: Jan 03, 2023

Overview

openTSNE

openTSNE is a modular Python implementation of t-Distributed Stochasitc Neighbor Embedding (t-SNE) [1], a popular dimensionality-reduction algorithm for visualizing high-dimensional data sets. openTSNE incorporates the latest improvements to the t-SNE algorithm, including the ability to add new data points to existing embeddings [2], massive speed improvements [3] [4], enabling t-SNE to scale to millions of data points and various tricks to improve global alignment of the resulting visualizations [5].

Macosko 2015 mouse retina t-SNE embedding

A visualization of 44,808 single cell transcriptomes obtained from the mouse retina [6] embedded using the multiscale kernel trick to better preserve the global aligment of the clusters.

Documentation
User Guide and Tutorial
Examples: basic, advanced, preserving global alignment, embedding large data sets
Speed benchmarks

Installation

openTSNE requires Python 3.6 or higher in order to run.

Conda

openTSNE can be easily installed from conda-forge with

conda install --channel conda-forge opentsne

Conda package

PyPi

openTSNE is also available through pip and can be installed with

pip install opentsne

PyPi package

Installing from source

If you wish to install openTSNE from source, please run

python setup.py install

in the root directory to install the appropriate dependencies and compile the necessary binary files.

Please note that openTSNE requires a C/C++ compiler to be available on the system. Additionally, numpy must be pre-installed in the active environment.

In order for openTSNE to utilize multiple threads, the C/C++ compiler must support OpenMP. In practice, almost all compilers implement this with the exception of older version of clang on OSX systems.

To squeeze the most out of openTSNE, you may also consider installing FFTW3 prior to installation. FFTW3 implements the Fast Fourier Transform, which is heavily used in openTSNE. If FFTW3 is not available, openTSNE will use numpy’s implementation of the FFT, which is slightly slower than FFTW. The difference is only noticeable with large data sets containing millions of data points.

A hello world example

Getting started with openTSNE is very simple. First, we'll load up some data using scikit-learn

from sklearn import datasets

iris = datasets.load_iris()
x, y = iris["data"], iris["target"]

then, we'll import and run

from openTSNE import TSNE

embedding = TSNE().fit(x)

Citation

If you make use of openTSNE for your work we would appreciate it if you would cite the paper

@article {Poli{\v c}ar731877,
    author = {Poli{\v c}ar, Pavlin G. and Stra{\v z}ar, Martin and Zupan, Bla{\v z}},
    title = {openTSNE: a modular Python library for t-SNE dimensionality reduction and embedding},
    year = {2019},
    doi = {10.1101/731877},
    publisher = {Cold Spring Harbor Laboratory},
    URL = {https://www.biorxiv.org/content/early/2019/08/13/731877},
    eprint = {https://www.biorxiv.org/content/early/2019/08/13/731877.full.pdf},
    journal = {bioRxiv}
}

openTSNE implements two efficient algorithms for t-SNE. Please consider citing the original authors of the algorithm that you use. If you use FIt-SNE (default), then the citation is [4] below, but if you use Barnes-Hut the citation is [3].

References

[1]	Van Der Maaten, Laurens, and Hinton, Geoffrey. “Visualizing data using t-SNE.” Journal of Machine Learning Research 9.Nov (2008): 2579-2605.

[2]	Poličar, Pavlin G., Martin Stražar, and Blaž Zupan. “Embedding to Reference t-SNE Space Addresses Batch Effects in Single-Cell Classification.” BioRxiv (2019): 671404.

[3]	(1, 2) Van Der Maaten, Laurens. “Accelerating t-SNE using tree-based algorithms.” Journal of Machine Learning Research 15.1 (2014): 3221-3245.

[4]	(1, 2) Linderman, George C., et al. "Fast interpolation-based t-SNE for improved visualization of single-cell RNA-seq data." Nature Methods 16.3 (2019): 243.

[5]	Kobak, Dmitry, and Berens, Philipp. “The art of using t-SNE for single-cell transcriptomics.” Nature Communications 10, 5416 (2019).

[6]	Macosko, Evan Z., et al. “Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets.” Cell 161.5 (2015): 1202-1214.

Comments

A bunch of comments and questions
Hi Pavlin! Great work. I did not know about Orange but I am working with scRNA-seq data myself (cf. your Zeisel2018 example) and I am using Python, so it's interesting to see developments in that direction.

I have a couple of scattered comments/questions that I will just dump here. This isn't a real "issue".

You say that BH is much faster than FFT for smaller datasets. That's interesting; I did not notice this. What kind of numbers are you talking about here? I was under impression that with n<10k both methods are so fast (I guess all 1000 iterations under 1 min?) that the exact time does not really matter...

Any specific reason to use "Python/Numba implementation of nearest neighbor descent" for approximate nearest neighbours? There are some popular libraries, e.g. annoy. Is your implementation much faster than that? Because otherwise it could be easier to use a well-known established library... I think Leland McInnes is using something similar (Numba implementation of nearest neighbor descent) in his UMAP; did you follow him here?

I did not look at the actual code, but from the description on the main page it sounds that you don't have a vanilla t-SNE implementation in here. Is it true? I think it would be nice to have vanilla t-SNE in here too. For datasets with n=1k-2k it's pretty fast and I guess many people would prefer to use vanilla t-SNE if possible.

I noticed you writing this in one of the closed issues:

we allow new data to be added into the existing embedding by direct optimization. To my knowledge, no other library does this. It's sometimes difficult to get nice embeddings like this, but it may have potential.

That's interesting. How exactly are you doing this? You fix the existing embedding, compute all the affinities for the extended dataset (original data + new data) and then optimize the cost by allowing only the positions of the new points to change? Something like that?

George sped up his code quite a bit by adding multithreading to the F_attr computations. He is now implementing multithreading for the repulsive forces too. See https://github.com/KlugerLab/FIt-SNE/pull/32, and the discussion there. This might be interesting for you too. Or are you already using multithreading during gradient descent?

I am guessing that your Zeisel2018 plot is colored using the same 16 "megaclusters" that Zeisel et al. use in Figure 1B (https://www.cell.com/cms/attachment/f1754f20-890c-42f5-aa27-bbb243127883/gr1_lrg.jpg). If so, it would be great if you used the same colors as in their figure; this would ease the comparison. Of course you are not trying to make comparisons here, but this is something that would be interesting to me personally :)
opened by dkobak 37
Runtime and RAM usage compared to FIt-SNE

I understand that openTSNE is expected to be slower than FIt-SNE, but I'd like to understand how much slower it is in typical situations. As I reported earlier, when I run it on 70000x50 PCA-reduced MNIST data with default parameters and n_jobs=-1, I get ~60 seconds with FIt-SNE and ~120 seconds with openTSNE. Every 50 iterations take around 2s vs around 4s.

I did not check for this specific case, but I suspect that FFT takes only a small fraction of this time, and the computational bottleneck is formed by the attractive forces. Can one profile openTSNE and see how much time is taken by different steps, such as repulsive/attractive computations?

Apart from that, and possibly even more worryingly, I replicated the data 6x and added some noise, to get a 420000x50 data matrix. It takes FIt-SNE around 1Gb of RAM to allocate the space for the kNN matrix, so it works just fine on my laptop. However, openTSNE rapidly took >7Gb of RAM and crashed the kernel (I have 16 Gb but around half was taken by other processes). This happened in the first seconds, so I assume it happens during the kNN search. Does pynndescent eat up so much memory in this case?
discussion

opened by dkobak 25
Why does transform() have exaggeration=2 by default?
The parameters of the transform function are

def transform(self, X, perplexity=5, initialization="median", k=25, learning_rate=100, n_iter=100, exaggeration=2, momentum=0, max_grad_norm=0.05):

so it has exaggeration=2 by default. Why? This looks unintuitive to me: exaggeration is a slightly "weird" trick that can arguably be very useful for huge data sets, but I would expect the out-of-sample embedding to work just fine with it. Am I missing something?

I am also curious why momentum is set to 0 (unlike in normal tSNE optimization), but here I don't have any intuition for what it should be.

Another question is: will this function work with n_iter=0 if one just wants to get an embedding using medians of k nearest neighbours? That would be handy. Or is there another way to get this? Perhaps from prepare_partial?

And lastly, when transform() is applied to points from a very different data set (imagine positioning Smart-seq2 cells onto a 10x Chromium reference), I prefer to use correlation distances because I suspect Euclidean distances might be completely off (even when the original tSNE was done using Euclidean distances). I think openTSNE currently does not support this, right? Did you have any problems with that? One could perhaps allow transform() to take a metric argument (is correlation among the supported metrics, btw?). The downside is that if this metric is different from the metric used to prepare the embedding, then the nearest neighbours object will have to be recomputed, so it will suddenly become much slower. Let me know if I should post it as a separate issue.
question
opened by dkobak 25
Implement auto early exaggeration
Implements #218.

First, early_exaggeration="auto" is now set to max(12, exaggeration).

Second, the learning rate. We have various functions that currently take learning_rate="auto" and set it to max(200, N/12). I did not change this, because those functions usually do not know what the early exaggeration was. So I kept it as is. I only changed the behaviour of the base class: there learning_rate="auto" is now set to max(200, N/early_exaggeration).

This works as intended:

X = np.random.randn(10000,10) TSNE(verbose=True).fit(X) # Prints # TSNE(early_exaggeration=12, verbose=True) # Uses lr=833.33 TSNE(verbose=True, exaggeration=5).fit(X) # Prints # TSNE(early_exaggeration=12, exaggeration=5, verbose=True) # Uses lr=833.33 TSNE(verbose=True, exaggeration=20).fit(X) # Prints # TSNE(early_exaggeration=20, exaggeration=20, verbose=True) # Uses lr=500.00

(Note that the learning rate is currently not printed by the repr(self) because it's kept as "auto" at construction time and only set later. That's also how we had it before.)
opened by dkobak 24
Add spectral initialization using diffusion maps
Description of changes

Fixes #110.

I ended up implementing diffusion maps only because computationally, computing the leading eigenvectors is much faster than the smallest eigenvectors, and of the various spectral methods, diffusion maps are the only ones that require this. I checked what UMAP does - it uses the symmetric normalized laplacian for initialization - but they manually set the number of lanczos iteration limit, which I don't understand. This seemed like the better option.

@dkobak Do you want to take a look at this? I implemented this using scipy.sparse.linalg.svds because it turns out to be faster than scipy.sparse.linalg.eigsh and it seemed to produce strange results when I increased the error tolerance, while svds results seemed reasonable.

Includes

[X] Code changes

[ ] Tests

[ ] Documentation
opened by pavlin-policar 22
`pynndescent` has recently changed

Expected behaviour

Return the embedding

Actual behaviour

Return the embedding with one warning : .../miniconda3/lib/python3.7/site-packages/openTSNE/nearest_neighbors.py:181: UserWarning: pynndescent has recently changed which distance metrics are supported, and openTSNE.nearest_neighbors has not been updated. Please notify the developers of this change. "pynndescent has recently changed which distance metrics are supported, "

Steps to reproduce the behavior

Hello World steps

opened by VallinP 18
Added Annoy support

Added Annoy support as per #101. Annoy is used by default if it supports the given metric and if the input data is not scipy.sparse (otherwise Pynndescent is used).

This needs installed Annoy (I installed this https://anaconda.org/conda-forge/python-annoy), but I wasn't sure where to add this dependency.

opened by dkobak 16

FFT parameters and runtime for very expanded embeddings

I have been doing some experiments on convergence and running t-SNE for many more iterations than I normally do. And I again noticed something that I used to see every now and then: the runtime jumps wildly between "epochs" of 50 iterations. This only happens when the embedding is very expanded and so FFT gets really slow. Look:

Iteration   50, KL divergence 4.8674, 50 iterations in 1.8320 sec
Iteration  100, KL divergence 4.3461, 50 iterations in 1.8760 sec
Iteration  150, KL divergence 4.0797, 50 iterations in 2.6252 sec
Iteration  200, KL divergence 3.9082, 50 iterations in 4.5062 sec
Iteration  250, KL divergence 3.7864, 50 iterations in 5.4258 sec
Iteration  300, KL divergence 3.6957, 50 iterations in 7.2500 sec
Iteration  350, KL divergence 3.6259, 50 iterations in 9.0705 sec
Iteration  400, KL divergence 3.5711, 50 iterations in 10.1077 sec
Iteration  450, KL divergence 3.5271, 50 iterations in 12.2412 sec
Iteration  500, KL divergence 3.4909, 50 iterations in 13.6440 sec
Iteration  550, KL divergence 3.4604, 50 iterations in 14.6127 sec
Iteration  600, KL divergence 3.4356, 50 iterations in 17.2364 sec
Iteration  650, KL divergence 3.4143, 50 iterations in 17.6973 sec
Iteration  700, KL divergence 3.3986, 50 iterations in 27.9720 sec
Iteration  750, KL divergence 3.3914, 50 iterations in 34.0480 sec
Iteration  800, KL divergence 3.3863, 50 iterations in 34.4572 sec
Iteration  850, KL divergence 3.3820, 50 iterations in 36.9247 sec
Iteration  900, KL divergence 3.3779, 50 iterations in 47.0994 sec
Iteration  950, KL divergence 3.3737, 50 iterations in 40.8424 sec
Iteration 1000, KL divergence 3.3696, 50 iterations in 62.1549 sec
Iteration 1050, KL divergence 3.3653, 50 iterations in 30.6310 sec
Iteration 1100, KL divergence 3.3613, 50 iterations in 44.9781 sec
Iteration 1150, KL divergence 3.3571, 50 iterations in 36.9257 sec
Iteration 1200, KL divergence 3.3531, 50 iterations in 66.3830 sec
Iteration 1250, KL divergence 3.3493, 50 iterations in 37.7215 sec
Iteration 1300, KL divergence 3.3457, 50 iterations in 33.7942 sec
Iteration 1350, KL divergence 3.3421, 50 iterations in 33.7507 sec
Iteration 1400, KL divergence 3.3387, 50 iterations in 59.2065 sec
Iteration 1450, KL divergence 3.3354, 50 iterations in 36.3713 sec
Iteration 1500, KL divergence 3.3323, 50 iterations in 39.1894 sec
Iteration 1550, KL divergence 3.3293, 50 iterations in 67.3239 sec
Iteration 1600, KL divergence 3.3265, 50 iterations in 33.9837 sec
Iteration 1650, KL divergence 3.3238, 50 iterations in 63.5015 sec

For the record, this is on full MNIST with uniform k=15 affinity, n_jobs=-1. Note that after it gets to 30 seconds / 50 iterations, it starts fluctuating between 30 and 60. This does not make sense.

I suspect it may be related to how interpolation params are chosen depending on the grid size. Can it be that those heuristics may need improvement?

Incidentally, can it be that the interpolation params can be relaxed once the embedding becomes very large (e.g. span larger than [-100,100]) so that optimisation runs faster without -- perhaps! -- compromising the approximation too much?

CCing to @linqiaozhi.

opened by dkobak 15

Pynndescent build/query
We discussed this before, but I've been playing around with some sparse data now and wanted to report some runtimes.

When using pynndecent, openTSNE runs build() with n_neighbors=15 and then query() with n_neighbors=3*perplexity. At the same time, Leland said that that's not efficient and the recommended way to use pynndescent is to run build() with the desired number of neighbors and then simply take its constructed kNN graph without querying. You said that you ran some benchmarks and found your way to be faster. Here are runtimes I got on X that is sparse of size (100000, 9630).

nn = NNDescent(X, metric='cosine', n_neighbors=15) # Wall time: 39 s nn.query(X, k=15) # Wall time: 1min 57s nn.query(X, k=90) # Wall time: 3min 21s nn90 = NNDescent(X, metric='cosine', n_neighbors=90) # Wall time: 7min 45s nn90.query(X, k=90) # Wall time: 57min 53s

For k=90 it is indeed faster to build with k=15 and then query with k=90, so I can confirm your observation.

My only suggestion would be to modify the NNDescent class so that if the desired k is less than some threshold then build is done with k+1 and then the constructed tree is returned without query. We can simply use 15 as the threshold. I did this locally and can PR.
opened by dkobak 15
Cannot pass random_state to PerplexityBasedNN when using Annoy
Hi Pavlin,

this is quite a miniscule bug, but I noticed that when using PerplexityBasedNN it fails when you pass it a numpy RandomState instance as it uses that for the call to the AnnoyIndex(...).set_seed(seed) call. Since the documentation says that is accepts both an integer and a numpy random state, I guess this is a (tiny) bug.

Expected behaviour

It sets a seed for the internal random state of annoy.

Actual behaviour

It crashes with a TypeError:

File "/home/jnb/dev/openTSNE/openTSNE/nearest_neighbors.py", line 276, in build self.index.set_seed(self.random_state) TypeError: an integer is required (got type numpy.random.mtrand.RandomState)

Steps to reproduce the behavior

import numpy as np from openTSNE import PerplexityBasedNN random_state = np.random.default_rng(333) data = random_state.uniform(size=(10000,10)) PerplexityBasedNN(rdata, andom_state=random_state)

Fix

in nearest_neighbors.py line 275 can be changed from self.index.set_seed(self.random_state) to

if isinstance(self.random_state, int): self.index.set_seed(self.random_state) else: # has to be a numpy RandomState self.index.set_seed(self.random_state.randint(-(2 ** 31), 2 ** 31))

Let me know if it should come as a pull request or if you'll just incorporate it like this. Cheers
opened by jnboehm 14
Workaround for -1 in pynndescent index
Fixes #130 .

Changes:

Query() is only used for k>15.

n_jobs fixed to 1 for sparse inputs to avoid a pynndescent bug

find all points where index contains -1 values, and let them randomly attract each other.
opened by dkobak 14
Switching spectral initialization to sklean.manifold.SpectralEmbeddings

A student in our lab is currently looking into spectral initialization, and she found out that openTSNE.init.spectral(tol=0) in some cases does not agree to sklearn.manifold.SpectralEmbedding(affinity='precomputed'). In some cases it does agree perfectly or near-perfectly, but we have an example when the result is very different, and SpectralEmbedding gives what seems like a more meaningful result.

I looked at the math, and it seems that they should conceptually be computing the same thing (SpectralEmbedding finds eigenvectors of the L_sym, whereas init.spectral finds generalized eigenvectors or W and D, but that should be equivalent, as per https://jlmelville.github.io/smallvis/spectral.html @jlmelville). We don't know what the difference is due to. It may be numerical.

However, conceptually, it seems sensible if openTSNE would simply outsource the computation to sklearn.

A related issue is that init.spectral is not reproducible and gives different results with each run. Apparently the way we initialize v0 makes ARPACK to still have some randomness. Sklearn gets around this by initializing v0 differently. I guess openTSNE should do the same -- but of course if we end up simply calling SpectralEmbedding then it won't matter.

opened by dkobak 10
Online Documentation Not Rendering Python Code

Expected behaviour

Actual behaviour

Steps to reproduce the behavior

Go to: https://opentsne.readthedocs.io/en/latest/examples/02_advanced_usage/02_advanced_usage.html

opened by MattScicluna 0
Missed reference

I think you missed our paper in your citations.

Zhirong Yang, Jaakko Peltonen and Samuel Kaski. Scalable Optimization of Neighbor Embedding for Visualization. In ICML 2013.

opened by rozyangno 2
Memory collapses with precomputed block matrix

Expected behaviour

When I run tSNE on a symmetric 200x200 block matrix such as this one I expect TSNE to return 4 distinct clusters (actually 4 points only). Sklearn yields this.

Actual behaviour

Using openTSNE the terminal crashes with full memory (50% of the time). If it survives the clusters are visible, however the result is not as satisfying.

Steps to reproduce the behavior

matrix = Block matrix tsne = TSNE(metric='precomputed', initialization='spectral', negative_gradient_method='bh') embedding = tsne.fit(matrix)

NOTE: I am using the direct installation from GitHub this morning.
bug

opened by fsvbach 17

Releases(v0.6.2)

v0.6.2(Mar 18, 2022)
Changes

By default, we now use the MultiscaleMixture affinity model, enabling us to pass in a list of perplexities instead of a single perplexity value. This is fully backwards compatible.

Previously, perplexity values would be changed according to the dataset. E.g. we pass in perplexity=100 with N=150. Then TSNE.perplexity would be equal to 50. Instead, keep this value as is and add an effective_perplexity_ attribute (following the convention from scikit-learn, which puts in the corrected perplexity values.

Fix bug where interpolation grid was being prepared even when using BH optimization during transform.

Enable calling .transform with precomputed distances. In this case, the data matrix will be assumed to be a distance matrix.

Build changes

Build with oldest-supported-numpy

Build linux wheels on manylinux2014 instead of manylinux2010, following numpy's example

Build MacOS wheels on macOS-10.15 instead of macos-10.14 Azure VM

Fix potential problem with clang-13, which actually does optimization with infinities using the -ffast-math flag

Source code(tar.gz)
Source code(zip)
v0.6.0(Apr 25, 2021)
Changes:

Remove affinites from TSNE construction, allow custom affinities and initialization in .fit method. This improves the API when dealing with non-tabular data. This is not backwards compatible.

Add metric="precomputed". This includes the addition of openTSNE.nearest_neighbors.PrecomputedDistanceMatrix and openTSNE.nearest_neighbors.PrecomputedNeighbors.

Add knn_index parameter to openTSNE.affinity classes.

Add (less-than-ideal) workaround for pickling Annoy objects.

Extend the range of recommended FFTW boxes up to 1000.

Remove deprecated openTSNE.nearest_neighbors.BallTree.

Remove deprecated openTSNE.callbacks.ErrorLogger.

Remove deprecated TSNE.neighbors_method property.

Add and set as default negative_gradient_method="auto".

Source code(tar.gz)
Source code(zip)
v0.5.0(Dec 24, 2020)
Main changes:

Build wheels for MacOS target 10.6

Update to annoy v1.17.0, this should result in much faster multi-threaded performance

Source code(tar.gz)
Source code(zip)
v0.4.0(May 4, 2020)
Major changes:

Remove numba dependency, switch over to using Annoy nearest neighbor search. Pynndescent is now optional and can be used if installed manually.

Massively speed-up transform by keeping reference interpolation grid fixed. Limit new points to circle centered around reference embedding.

Implement variable degrees of freedom.

Minor changes:

Add spectral initialization using diffusion maps.

Replace cumbersome ErrorLogger callback with the verbose flag.

Change the default number of iterations to 750.

Add learning_rate="auto" option.

Remove the min_grad_norm parameter.

Bugfixes:

Fix case where KL divergence was sometimes reported as NaN.

Source code(tar.gz)
Source code(zip)
v0.2.0(Sep 11, 2018)

In order to make usage as simple as possible and remove and external dependencies on FFTW (which needed to be installed locally before), this update replaces FFTW with numpy's FFT.
Source code(tar.gz)
Source code(zip)

Owner

Pavlin Poličar

PhD student working on applying machine learning methods to biomedical and scRNA-seq data.

GitHub Repository https://opentsne.rtfd.io

Open-questions - Open questions for Bellingcat technical contributors

Open questions for Bellingcat technical contributors These are difficult, long-term projects that would contribute to open source investigations at Be

234 Dec 31, 2022

Matplotlib JOTA style for making figures

Matplotlib JOTA style for making figures This repo has Matplotlib JOTA style to format plots and figures for publications and presentation.

2 May 05, 2022

Example scripts for generating plots of Bohemian matrices

Bohemian Eigenvalue Plotting Examples This repository contains examples of generating plots of Bohemian eigenvalues. The examples in this repository a

5 Nov 12, 2022

Small project demonstrating the use of Grafana and InfluxDB for monitoring the speed of an internet connection

Speedtest monitor for Grafana A small project that allows internet speed monitoring using Grafana, InfluxDB 2 and Speedtest. Demo Requirements Docker

3 Aug 06, 2021

Python script to generate a visualization of various sorting algorithms, image or video.

sorting_algo_visualizer Python script to generate a visualization of various sorting algorithms, image or video.

146 Nov 12, 2022

🐞 📊 Ladybug extension to generate 2D charts

ladybug-charts Ladybug extension to generate 2D charts. Installation pip install ladybug-charts QuickStart import ladybug_charts API Documentation Loc

3 Dec 30, 2022

A workshop on data visualization in Python with notebooks and exercises for following along.

Beyond the Basics: Data Visualization in Python The human brain excels at finding patterns in visual representations, which is why data visualizations

162 Dec 05, 2022

Extract and visualize information from Gurobi log files

GRBlogtools Extract information from Gurobi log files and generate pandas DataFrames or Excel worksheets for further processing. Also includes a wrapp

56 Nov 17, 2022

Implement the Perspective open source code in preparation for data visualization

Task Overview | Installation Instructions | Link to Module 2 Introduction Experience Technology at JP Morgan Chase Try out what real work is like in t

1 Jan 23, 2022

🎨 Python3 binding for `@AntV/G2Plot` Plotting Library .

PyG2Plot 🎨 Python3 binding for @AntV/G2Plot which an interactive and responsive charting library. Based on the grammar of graphics, you can easily ma

990 Jan 05, 2023

Tandem Mass Spectrum Prediction with Graph Transformers

MassFormer This is the original implementation of MassFormer, a graph transformer for small molecule MS/MS prediction. Check out the preprint on arxiv

13 Oct 27, 2022

The Python ensemble sampling toolkit for affine-invariant MCMC

emcee The Python ensemble sampling toolkit for affine-invariant MCMC emcee is a stable, well tested Python implementation of the affine-invariant ense

1.3k Jan 04, 2023

A dashboard built using Plotly-Dash for interactive visualization of Dex-connected individuals across the country.

Dashboard For The DexConnect Platform of Dexterity Global Working prototype submission for internship at Dexterity Global Group. Dashboard for real ti

2 Jun 15, 2021

The interactive graphing library for Python (includes Plotly Express) :sparkles:

plotly.py Latest Release User forum PyPI Downloads License Data Science Workspaces Our recommended IDE for Plotly’s Python graphing library is Dash En

12.7k Jan 05, 2023

Collection of scripts for making high quality beautiful math-related posters.

Poster Collection of scripts for making high quality beautiful math-related posters. The poster can have as large printing size as 3x2 square feet wit

3 Jun 09, 2022

plotly scatterplots which show molecule images on hover!

molplotly Plotly scatterplots which show molecule images on hovering over the datapoints! Required packages: pandas rdkit jupyter_dash ➡️ See example.

150 Dec 28, 2022

A high performance implementation of HDBSCAN clustering. http://hdbscan.readthedocs.io/en/latest/

HDBSCAN Now a part of scikit-learn-contrib HDBSCAN - Hierarchical Density-Based Spatial Clustering of Applications with Noise. Performs DBSCAN over va

91 Dec 29, 2022

Geospatial Data Visualization using PyGMT

Example script to visualize topographic data, earthquake data, and tomographic data on a map

2 Jul 30, 2022

Automatically Visualize any dataset, any size with a single line of code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.

AutoViz Automatically Visualize any dataset, any size with a single line of code. AutoViz performs automatic visualization of any dataset with one lin

1k Jan 02, 2023

A Graph Learning library for Humans

A Graph Learning library for Humans These novel algorithms include but are not limited to: A graph construction and graph searching class can be found

1 Feb 08, 2022

Extensible, parallel implementations of t-SNE

Related tags

Overview

openTSNE

Installation

Conda

PyPi

Installing from source

A hello world example

Citation

References

Comments

Description of changes

Includes

Expected behaviour

Actual behaviour

Steps to reproduce the behavior

Expected behaviour

Actual behaviour

Steps to reproduce the behavior

Fix

Expected behaviour

Actual behaviour

Steps to reproduce the behavior

Expected behaviour

Actual behaviour

Steps to reproduce the behavior

Releases(v0.6.2)

v0.6.2(Mar 18, 2022)

Changes

Build changes

v0.6.0(Apr 25, 2021)

v0.5.0(Dec 24, 2020)

v0.4.0(May 4, 2020)

v0.2.0(Sep 11, 2018)

Owner

Pavlin Poličar

Open-questions - Open questions for Bellingcat technical contributors

Matplotlib JOTA style for making figures

Example scripts for generating plots of Bohemian matrices

Small project demonstrating the use of Grafana and InfluxDB for monitoring the speed of an internet connection

Python script to generate a visualization of various sorting algorithms, image or video.

🐞 📊 Ladybug extension to generate 2D charts

A workshop on data visualization in Python with notebooks and exercises for following along.

Extract and visualize information from Gurobi log files

Implement the Perspective open source code in preparation for data visualization

🎨 Python3 binding for `@AntV/G2Plot` Plotting Library .

Tandem Mass Spectrum Prediction with Graph Transformers

The Python ensemble sampling toolkit for affine-invariant MCMC

A dashboard built using Plotly-Dash for interactive visualization of Dex-connected individuals across the country.

The interactive graphing library for Python (includes Plotly Express) :sparkles:

Collection of scripts for making high quality beautiful math-related posters.

plotly scatterplots which show molecule images on hover!

A high performance implementation of HDBSCAN clustering. http://hdbscan.readthedocs.io/en/latest/

Geospatial Data Visualization using PyGMT

Automatically Visualize any dataset, any size with a single line of code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.

A Graph Learning library for Humans