PyClustering is a Python, C++ data mining library.

Overview

Build Status Linux MacOS Build Status Win Coverage Status PyPi Download Counter JOSS

PyClustering

pyclustering is a Python, C++ data mining library (clustering algorithm, oscillatory networks, neural networks). The library provides Python and C++ implementations (C++ pyclustering library) of each algorithm or model. C++ pyclustering library is a part of pyclustering and supported for Linux, Windows and MacOS operating systems.

Version: 0.11.dev

License: The 3-Clause BSD License

E-Mail: [email protected]

Documentation: https://pyclustering.github.io/docs/0.10.1/html/

Homepage: https://pyclustering.github.io/

PyClustering Wiki: https://github.com/annoviko/pyclustering/wiki

Dependencies

Required packages: scipy, matplotlib, numpy, Pillow

Python version: >=3.6 (32-bit, 64-bit)

C++ version: >= 14 (32-bit, 64-bit)

Performance

Each algorithm is implemented using Python and C/C++ language, if your platform is not supported then Python implementation is used, otherwise C/C++. Implementation can be chosen by ccore flag (by default it is always 'True' and it means that C/C++ is used), for example:

# As by default - C/C++ part of the library is used
xmeans_instance_1 = xmeans(data_points, start_centers, 20, ccore=True);

# The same - C/C++ part of the library is used by default
xmeans_instance_2 = xmeans(data_points, start_centers, 20);

# Switch off core - Python is used
xmeans_instance_3 = xmeans(data_points, start_centers, 20, ccore=False);

Installation

Installation using pip3 tool:

$ pip3 install pyclustering

Manual installation from official repository using Makefile:

# get sources of the pyclustering library, for example, from repository
$ mkdir pyclustering
$ cd pyclustering/
$ git clone https://github.com/annoviko/pyclustering.git .

# compile CCORE library (core of the pyclustering library).
$ cd ccore/
$ make ccore_64bit      # build for 64-bit OS

# $ make ccore_32bit    # build for 32-bit OS

# return to parent folder of the pyclustering library
$ cd ../

# install pyclustering library
$ python3 setup.py install

# optionally - test the library
$ python3 setup.py test

Manual installation using CMake:

# get sources of the pyclustering library, for example, from repository
$ mkdir pyclustering
$ cd pyclustering/
$ git clone https://github.com/annoviko/pyclustering.git .

# generate build files.
$ mkdir build
$ cmake ..

# build pyclustering-shared target depending on what was generated (Makefile or MSVC solution)
# if Makefile has been generated then
$ make pyclustering-shared

# return to parent folder of the pyclustering library
$ cd ../

# install pyclustering library
$ python3 setup.py install

# optionally - test the library
$ python3 setup.py test

Manual installation using Microsoft Visual Studio solution:

  1. Clone repository from: https://github.com/annoviko/pyclustering.git
  2. Open folder pyclustering/ccore
  3. Open Visual Studio project ccore.sln
  4. Select solution platform: x86 or x64
  5. Build pyclustering-shared project.
  6. Add pyclustering folder to python path or install it using setup.py
# install pyclustering library
$ python3 setup.py install

# optionally - test the library
$ python3 setup.py test

Proposals, Questions, Bugs

In case of any questions, proposals or bugs related to the pyclustering please contact to [email protected] or create an issue here.

PyClustering Status

Branch master 0.10.dev 0.10.1.rel
Build (Linux, MacOS) Build Status Linux MacOS Build Status Linux MacOS 0.10.dev Build Status Linux 0.10.1.rel
Build (Win) Build Status Win Build Status Win 0.10.dev Build Status Win 0.10.1.rel
Code Coverage Coverage Status Coverage Status 0.10.dev Coverage Status 0.10.1.rel

Cite the Library

If you are using pyclustering library in a scientific paper, please, cite the library:

Novikov, A., 2019. PyClustering: Data Mining Library. Journal of Open Source Software, 4(36), p.1230. Available at: http://dx.doi.org/10.21105/joss.01230.

BibTeX entry:

@article{Novikov2019,
    doi         = {10.21105/joss.01230},
    url         = {https://doi.org/10.21105/joss.01230},
    year        = 2019,
    month       = {apr},
    publisher   = {The Open Journal},
    volume      = {4},
    number      = {36},
    pages       = {1230},
    author      = {Andrei Novikov},
    title       = {{PyClustering}: Data Mining Library},
    journal     = {Journal of Open Source Software}
}

Brief Overview of the Library Content

Clustering algorithms and methods (module pyclustering.cluster):

Algorithm Python C++
Agglomerative
BANG  
BIRCH  
BSAS
CLARANS  
CLIQUE
CURE
DBSCAN
Elbow
EMA  
Fuzzy C-Means
GA (Genetic Algorithm)
G-Means
HSyncNet
K-Means
K-Means++
K-Medians
K-Medoids
MBSAS
OPTICS
ROCK
Silhouette
SOM-SC
SyncNet
Sync-SOM  
TTSAS
X-Means

Oscillatory networks and neural networks (module pyclustering.nnet):

Model Python C++
CNN (Chaotic Neural Network)  
fSync (Oscillatory network based on Landau-Stuart equation and Kuramoto model)  
HHN (Oscillatory network based on Hodgkin-Huxley model)
Hysteresis Oscillatory Network  
LEGION (Local Excitatory Global Inhibitory Oscillatory Network)
PCNN (Pulse-Coupled Neural Network)
SOM (Self-Organized Map)
Sync (Oscillatory network based on Kuramoto model)
SyncPR (Oscillatory network for pattern recognition)
SyncSegm (Oscillatory network for image segmentation)

Graph Coloring Algorithms (module pyclustering.gcolor):

Algorithm Python C++
DSatur  
Hysteresis  
GColorSync  

Containers (module pyclustering.container):

Algorithm Python C++
KD Tree
CF Tree  

Examples in the Library

The library contains examples for each algorithm and oscillatory network model:

Clustering examples: pyclustering/cluster/examples

Graph coloring examples: pyclustering/gcolor/examples

Oscillatory network examples: pyclustering/nnet/examples

Where are examples?

Code Examples

Data clustering by CURE algorithm

from pyclustering.cluster import cluster_visualizer;
from pyclustering.cluster.cure import cure;
from pyclustering.utils import read_sample;
from pyclustering.samples.definitions import FCPS_SAMPLES;

# Input data in following format [ [0.1, 0.5], [0.3, 0.1], ... ].
input_data = read_sample(FCPS_SAMPLES.SAMPLE_LSUN);

# Allocate three clusters.
cure_instance = cure(input_data, 3);
cure_instance.process();
clusters = cure_instance.get_clusters();

# Visualize allocated clusters.
visualizer = cluster_visualizer();
visualizer.append_clusters(clusters, input_data);
visualizer.show();

Data clustering by K-Means algorithm

from pyclustering.cluster.kmeans import kmeans, kmeans_visualizer
from pyclustering.cluster.center_initializer import kmeans_plusplus_initializer
from pyclustering.samples.definitions import FCPS_SAMPLES
from pyclustering.utils import read_sample

# Load list of points for cluster analysis.
sample = read_sample(FCPS_SAMPLES.SAMPLE_TWO_DIAMONDS)

# Prepare initial centers using K-Means++ method.
initial_centers = kmeans_plusplus_initializer(sample, 2).initialize()

# Create instance of K-Means algorithm with prepared centers.
kmeans_instance = kmeans(sample, initial_centers)

# Run cluster analysis and obtain results.
kmeans_instance.process()
clusters = kmeans_instance.get_clusters()
final_centers = kmeans_instance.get_centers()

# Visualize obtained results
kmeans_visualizer.show_clusters(sample, clusters, final_centers)

Data clustering by OPTICS algorithm

from pyclustering.cluster import cluster_visualizer
from pyclustering.cluster.optics import optics, ordering_analyser, ordering_visualizer
from pyclustering.samples.definitions import FCPS_SAMPLES
from pyclustering.utils import read_sample

# Read sample for clustering from some file
sample = read_sample(FCPS_SAMPLES.SAMPLE_LSUN)

# Run cluster analysis where connectivity radius is bigger than real
radius = 2.0
neighbors = 3
amount_of_clusters = 3
optics_instance = optics(sample, radius, neighbors, amount_of_clusters)

# Performs cluster analysis
optics_instance.process()

# Obtain results of clustering
clusters = optics_instance.get_clusters()
noise = optics_instance.get_noise()
ordering = optics_instance.get_ordering()

# Visualize ordering diagram
analyser = ordering_analyser(ordering)
ordering_visualizer.show_ordering_diagram(analyser, amount_of_clusters)

# Visualize clustering results
visualizer = cluster_visualizer()
visualizer.append_clusters(clusters, sample)
visualizer.show()

Simulation of oscillatory network PCNN

from pyclustering.nnet.pcnn import pcnn_network, pcnn_visualizer

# Create Pulse-Coupled neural network with 10 oscillators.
net = pcnn_network(10)

# Perform simulation during 100 steps using binary external stimulus.
dynamic = net.simulate(50, [1, 1, 1, 0, 0, 0, 0, 1, 1, 1])

# Allocate synchronous ensembles from the output dynamic.
ensembles = dynamic.allocate_sync_ensembles()

# Show output dynamic.
pcnn_visualizer.show_output_dynamic(dynamic, ensembles)

Simulation of chaotic neural network CNN

from pyclustering.cluster import cluster_visualizer
from pyclustering.samples.definitions import SIMPLE_SAMPLES
from pyclustering.utils import read_sample
from pyclustering.nnet.cnn import cnn_network, cnn_visualizer

# Load stimulus from file.
stimulus = read_sample(SIMPLE_SAMPLES.SAMPLE_SIMPLE3)

# Create chaotic neural network, amount of neurons should be equal to amount of stimulus.
network_instance = cnn_network(len(stimulus))

# Perform simulation during 100 steps.
steps = 100
output_dynamic = network_instance.simulate(steps, stimulus)

# Display output dynamic of the network.
cnn_visualizer.show_output_dynamic(output_dynamic)

# Display dynamic matrix and observation matrix to show clustering phenomenon.
cnn_visualizer.show_dynamic_matrix(output_dynamic)
cnn_visualizer.show_observation_matrix(output_dynamic)

# Visualize clustering results.
clusters = output_dynamic.allocate_sync_ensembles(10)
visualizer = cluster_visualizer()
visualizer.append_clusters(clusters, stimulus)
visualizer.show()

Illustrations

Cluster allocation on FCPS dataset collection by DBSCAN:

Clustering by DBSCAN

Cluster allocation by OPTICS using cluster-ordering diagram:

Clustering by OPTICS

Partial synchronization (clustering) in Sync oscillatory network:

Partial synchronization in Sync oscillatory network

Cluster visualization by SOM (Self-Organized Feature Map)

Cluster visualization by SOM

Comments
  • Performance Issue - OPTICS

    Performance Issue - OPTICS

    I am running OPTICS algorithm on 50k data points, since the data is text it has around 5k features. The time taken to run the program seems huge. Tried using ccore but doesnt seem to improve. Is there any way that I could improve performance.

    Investigation Optimization 
    opened by swetha0613 19
  • ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

    ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

    Thank you for your library, it is very useful for me and the data mining community. I wanted to run birch algorithm but I had this error from the cftree.py: if (merged_entry.get_diameter() > self.__threshold): ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().

    Also when I want to use the parameter diameter when I instantiate the birch algorithm, I get this error: birch_instance = birch(x,3,diameter=0.1) TypeError: init() got an unexpected keyword argument 'diameter'.

    One last question, would it be possible to leave the parameter number_clusters optional to let the user use other clustering algorithms in the last step of birch instead of the hierarchical method?

    Bug Question 
    opened by nabilEM 13
  • How to use pyclustering kmedoids using gower distance matrix?

    How to use pyclustering kmedoids using gower distance matrix?

    Hi,

    Not sure if this has already been asked but I have a dataframe consisting of categorical and numerical data. I want to cluster this data to extract features. I use the following code from https://sourceforge.net/projects/gower-distance-4python/files/ to calculate the gower distance.

    My code is as follows:

    `import pyclustering 
    
    from sklearn.metrics.pairwise import pairwise_distances
    import numpy as np    
    from pyclustering.cluster.kmedoids import kmedoids;
    from pyclustering.utils import read_sample;
    from pyclustering.cluster.center_initializer import kmeans_plusplus_initializer
    from pyclustering.cluster.elbow import elbow
    from pyclustering.cluster.kmeans import kmeans
    from pyclustering.cluster.encoder import cluster_encoder, type_encoding
    
    D = gower_distances(filtOrdersGower_subset)
    initial_medoids = kmeans_plusplus_initializer(D, 4).initialize(return_index=True)
    kmedoids_instance = kmedoids(D,initial_medoids, data_type='distance_matrix');
    
    kmedoids_instance.process();
    clusters = kmedoids_instance.get_clusters();
    `
    

    how do i plot these clusters/ get what features in my data are most important? New to pyclustering @annoviko

    Question 
    opened by zahs123 13
  • k-medioids with custom distance

    k-medioids with custom distance

    I am new to pyclustering. Rummaging through the source code I didn't see how I could insert custom distance (either by passing a callable that computes pairwise distance or a precomputed distance matrix). Could you help? Thanks.

    To be more specific, the following is the sort of thing I'm talking about:

    import numpy as np
    from scipy.cluster.hierarchy import linkage, fcluster
    
    def my_dist(u,v): # exemplifying using a weird distance metric.
        return (u + v).sum()
    
    data = np.array([[1,2,3,4],
                     [5,6,7,8]])
    clust = linkage(data, method='average', metric=my_dist)
    prediction = fcluster(clust, 2, criterion='maxclust')
    
    Question Proposal 
    opened by suwangcompling 13
  • Anyway to lose the matplotlib dependency or make it optional?

    Anyway to lose the matplotlib dependency or make it optional?

    I'm getting the following:

    Traceback (most recent call last):
      File "/Users/alex/dev/something/extractor/ml.py", line 8, in <module>
        from pyclustering.cluster.kmeans import kmeans
      File "/Users/alex/dev/venv/content-extractor-3pp1FAW1/lib/python3.6/site-packages/pyclustering/cluster/__init__.py", line 26, in <module>
        import matplotlib.pyplot as plt;
      File "/Users/alex/dev/venv/content-extractor-3pp1FAW1/lib/python3.6/site-packages/matplotlib/pyplot.py", line 115, in <module>
        _backend_mod, new_figure_manager, draw_if_interactive, _show = pylab_setup()
      File "/Users/alex/dev/venv/content-extractor-3pp1FAW1/lib/python3.6/site-packages/matplotlib/backends/__init__.py", line 62, in pylab_setup
        [backend_name], 0)
      File "/Users/alex/dev/venv/content-extractor-3pp1FAW1/lib/python3.6/site-packages/matplotlib/backends/backend_macosx.py", line 17, in <module>
        from matplotlib.backends import _macosx
    
    RuntimeError: Python is not installed as a framework. The Mac OS X
    backend will not be able to function correctly if Python is not
    installed as a framework. See the Python documentation for more
    information on installing Python as a framework on Mac OS X. Please
    either reinstall Python as a framework, or try one of the other
    backends. If you are using (Ana)Conda please install python.app and
    replace the use of 'python' with 'pythonw'. See 'Working with
    Matplotlib on OSX' in the Matplotlib FAQ for more information.
    

    Don't need plotting, just the clusters :-/ Perhaps move the

    import matplotlib.pyplot as plt;
    import matplotlib.gridspec as gridspec;
    

    inside the show() function?

    Investigation Optimization 
    opened by awhillas 10
  • Missing labels_ and predict function for K-Means

    Missing labels_ and predict function for K-Means

    Great work. but k-means missing labels_ and predict function like sklearn ~ https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html#sklearn.cluster.KMeans

    Enhancement Proposal 
    opened by liufsd 9
  • [ccore]ccore.so can not find

    [ccore]ccore.so can not find

    when i use "python xmeans_examples.py" at first it runs correctly,but after a few seconds,i got this problem: image could you help me.thanks so much!~

    Question 
    opened by paulinsider 9
  • G-Means: Setting maximum number of clusters like for X-Means

    G-Means: Setting maximum number of clusters like for X-Means

    Hi,

    I wanted to ask if it is possible to add a k_max parameter to the call to gmeans? So Similar to xmeans, which support this parameter. The reason is that gmeans returns for some datasets a really large number of clusters (sometimes it is even the same of the size of the dataset, which is the worst case). I do not know the reason behind this, but it would be nice if I could limit the number of clusters as I can do for xmeans.

    Enhancement Proposal 
    opened by tschechlovdev 8
  • [pyclustering.cluster.kmedians] exception: access violation reading 0x(Memory Address)

    [pyclustering.cluster.kmedians] exception: access violation reading 0x(Memory Address)

    Hi,

    When using kmedians, i get an error related to an access violation when reading a memory address. This happens if i use ccore=True. If i use ccore=False kmedians_obj.process() returns no medians or clusters. My guess is that it is related to the number of clusters (and maybe tolerance), although i am not sure. It happens when the ratio number of clusters to points is below 1/10 (The number of values and clusters i was using when got it was between 380-450 for 45 clusters). However, it might be interesting to try to capture the error so it is more informative.

    Thanks a lot for your work!

    Bug Question 
    opened by jordiarjona 8
  • [pyclustering.cluster.rock] Use ROCK for clustering data set.

    [pyclustering.cluster.rock] Use ROCK for clustering data set.

    Hi,

    I am trying to use the Robust Clustering Algorithm for Categorical Attributes (ROCK) algorithm on a data set containing categorical attributes but getting an error that data can not be str. How can I use this method with categorical data set.

    Thanks, Naser

    Question 
    opened by NaserMonsefi 8
  • kmedoids returns empty cluster lists for version 0.10.1

    kmedoids returns empty cluster lists for version 0.10.1

    Hi,

    Previously, code working on one server with version 0.9.3.1 worked as expected. However, the same code run on a different server with version 0.10.1 returned some empty clusters for the same dataset and initial medoids.

    initial_medoids=[0,1,2,3] kmedoids_instance=kmedoids(df2,initial_medoids,metric=metric) kmedoids_instance.process() clusters=kmedoids_instance.get_clusters() medoids=kmedoids_instance.get_medoids() print(clusters)

    The above would return indices for clusters 0 and 1 but empty lists for clusters 2 and 3, despite there not being any missing in my data df2. I would expect at the very least, the medoids themselves to be in clusters 2 and 3.

    Thank you, this is a great package, I really appreciate it.

    Lauren

    Bug 
    opened by laurenleesc 7
  • Reference for the

    Reference for the "Elbow length" method?

    The documentation of the elbow package suggests this is based on the reference Thorndike 1953: https://github.com/annoviko/pyclustering/blob/bf4f51a472622292627ec8c294eb205585e50f52/pyclustering/cluster/elbow.py#L4 https://github.com/annoviko/pyclustering/blob/bf4f51a472622292627ec8c294eb205585e50f52/docs/citation.bib#L552-L556 Yet, I cannot find the "Elbow length" equation used in this reference, in fact he appears very skeptical that such elbows can be reliably identified (for a good reason...). Is there another reference for this particular method?

    opened by kno10 0
  • xmeans does not agree to paper?

    xmeans does not agree to paper?

    The last term, p * 0.5 * log(N), should be in the sum only once IMHO. It is in the top BIC equation (j is the model index, not the cluster index), not in the l(Dn) equation where n is the cluster index) in https://web.cs.dal.ca/~shepherd/courses/csci6403/clustering/xmeans.pdf No guarantees that everything else is fine.

    I also rename sigma_sqrt to sigma_sq because it is supposed to be sigma square, not square root.

    Note that if sigma_multiplier = float('-inf'), the result will always be infinity, won't it?

    opened by kno10 0
  • Build failed: 'numeric_limits' is not a member of 'std'

    Build failed: 'numeric_limits' is not a member of 'std'

    platform: Arch Linux
    gcc version 12.1.1 20220730 (GCC) 
    

    When buildling package, gcc throws error:

    In file included from src/cluster/bsas.cpp:10:
    ./include/pyclustering/cluster/bsas.hpp:92:44: error: 'numeric_limits' is not a member of 'std'
       92 |         double        m_distance    = std::numeric_limits<double>::max();   /**< Distance between the cluster and a specific point. */
          |                                            ^~~~~~~~~~~~~~
    ./include/pyclustering/cluster/bsas.hpp:92:59: error: expected primary-expression before 'double'
       92 |         double        m_distance    = std::numeric_limits<double>::max();   /**< Distance between the cluster and a specific point. */
          |                                                           ^~~~~~
    make[1]: *** [ccore.mk:154: obj/ccore/64-bit/cluster/bsas.o] Error 1
    make[1]: *** Waiting for unfinished jobs....
    In file included from ./include/pyclustering/cluster/mbsas.hpp:12,
                     from src/cluster/mbsas.cpp:10:
    ./include/pyclustering/cluster/bsas.hpp:92:44: error: 'numeric_limits' is not a member of 'std'
       92 |         double        m_distance    = std::numeric_limits<double>::max();   /**< Distance between the cluster and a specific point. */
          |                                            ^~~~~~~~~~~~~~
    ./include/pyclustering/cluster/bsas.hpp:92:59: error: expected primary-expression before 'double'
       92 |         double        m_distance    = std::numeric_limits<double>::max();   /**< Distance between the cluster and a specific point. */
          |                                                           ^~~~~~
    make[1]: *** [ccore.mk:154: obj/ccore/64-bit/cluster/mbsas.o] Error 1
    src/cluster/agglomerative.cpp: In member function 'void pyclustering::clst::agglomerative::merge_by_average_link()':
    src/cluster/agglomerative.cpp:89:44: error: 'numeric_limits' is not a member of 'std'
       89 |     double minimum_average_distance = std::numeric_limits<double>::max();
          |                                            ^~~~~~~~~~~~~~
    src/cluster/agglomerative.cpp:89:59: error: expected primary-expression before 'double'
       89 |     double minimum_average_distance = std::numeric_limits<double>::max();
          |                                                           ^~~~~~
    src/cluster/agglomerative.cpp: In member function 'void pyclustering::clst::agglomerative::merge_by_centroid_link()':
    src/cluster/agglomerative.cpp:123:44: error: 'numeric_limits' is not a member of 'std'
      123 |     double minimum_average_distance = std::numeric_limits<double>::max();
          |                                            ^~~~~~~~~~~~~~
    src/cluster/agglomerative.cpp:123:59: error: expected primary-expression before 'double'
      123 |     double minimum_average_distance = std::numeric_limits<double>::max();
          |                                                           ^~~~~~
    src/cluster/agglomerative.cpp: In member function 'void pyclustering::clst::agglomerative::merge_by_complete_link()':
    src/cluster/agglomerative.cpp:149:45: error: 'numeric_limits' is not a member of 'std'
      149 |     double minimum_complete_distance = std::numeric_limits<double>::max();
          |                                             ^~~~~~~~~~~~~~
    src/cluster/agglomerative.cpp:149:60: error: expected primary-expression before 'double'
      149 |     double minimum_complete_distance = std::numeric_limits<double>::max();
          |                                                            ^~~~~~
    src/cluster/agglomerative.cpp: In member function 'void pyclustering::clst::agglomerative::merge_by_signle_link()':
    src/cluster/agglomerative.cpp:184:43: error: 'numeric_limits' is not a member of 'std'
      184 |     double minimum_single_distance = std::numeric_limits<double>::max();
          |                                           ^~~~~~~~~~~~~~
    src/cluster/agglomerative.cpp:184:58: error: expected primary-expression before 'double'
      184 |     double minimum_single_distance = std::numeric_limits<double>::max();
          |                                                          ^~~~~~
    src/cluster/agglomerative.cpp:193:54: error: 'numeric_limits' is not a member of 'std'
      193 |             double candidate_minimum_distance = std::numeric_limits<double>::max();
          |                                                      ^~~~~~~~~~~~~~
    src/cluster/agglomerative.cpp:193:69: error: expected primary-expression before 'double'
      193 |             double candidate_minimum_distance = std::numeric_limits<double>::max();
          |                                                                     ^~~~~~
    make[1]: *** [ccore.mk:154: obj/ccore/64-bit/cluster/agglomerative.o] Error 1
    make[1]: Leaving directory '/tmp/makepkg/python-pyclustering-git/src/pyclustering/ccore'
    make: *** [makefile:53: ccore_64bit] Error 2
    
    opened by Catty2014 0
  • predict error for kmeans

    predict error for kmeans

    `from pyclustering.cluster.kmeans import kmeans, kmeans_visualizer from pyclustering.cluster.center_initializer import kmeans_plusplus_initializer from pyclustering.samples.definitions import FCPS_SAMPLES from pyclustering.utils import read_sample samples = read_sample(FCPS_SAMPLES.SAMPLE_TWO_DIAMONDS) initial_centers = kmeans_plusplus_initializer(samples, 2).initialize() kmeans_instance = kmeans(samples, initial_centers) kmeans_instance.process() clusters = kmeans_instance.get_clusters() final_centers = kmeans_instance.get_centers()

    kmeans_instance.predict(samples)`

    and i meet this:


    AttributeError Traceback (most recent call last) /tmp/ipykernel_20827/3994711565.py in ----> 1 kmeans_instance.predict(samples)

    ~/envs/envs/spark_seg/lib/python3.7/site-packages/pyclustering/cluster/kmeans.py in predict(self, points) 441 for index_point in range(len(nppoints)): 442 if self.__metric.get_type() != type_metric.USER_DEFINED: --> 443 differences[index_point] = self.__metric(nppoints[index_point], self.__centers) 444 else: 445 differences[index_point] = [self.__metric(nppoints[index_point], center) for center in self.__centers]

    ~/envs/envs/spark_seg/lib/python3.7/site-packages/pyclustering/utils/metric.py in call(self, point1, point2) 130 131 """ --> 132 return self.__calculator(point1, point2) 133 134

    ~/envs/envs/spark_seg/lib/python3.7/site-packages/pyclustering/utils/metric.py in euclidean_distance_square_numpy(object1, object2) 368 369 """ --> 370 if len(object1.shape) > 1 or len(object2.shape) > 1: 371 return numpy.sum(numpy.square(object1 - object2), axis=1).T 372 else:

    AttributeError: 'list' object has no attribute 'shape'

    opened by BeHappyForMe 0
  • (Minor issue) Typo in repository description (pyclustring)

    (Minor issue) Typo in repository description (pyclustring)

    There is an "e" missing in the word "pyclustring" in the repository description. It should say "PyClustering" instead of "pyclustring".

    typo

    opened by 99991 0
Releases(0.10.1.2)
  • 0.10.1.2(Nov 25, 2020)

  • 0.10.1.1(Nov 24, 2020)

  • 0.10.1(Nov 19, 2020)

    pyclustering 0.10.1 library is a collection of clustering algorithms, oscillatory networks, etc.

    GENERAL CHANGES:

    • The library is distributed under BSD-3-Clause library. See: https://github.com/annoviko/pyclustering/issues/517

    • C++ pyclustering can be built using CMake. See: https://github.com/annoviko/pyclustering/issues/603

    • Supported dumping and loading for DBSCAN algorithm via pickle (Python: pyclustering.cluster.dbscan). See: https://github.com/annoviko/pyclustering/issues/650

    • Package installer resolves all required dependencies automatically. See: https://github.com/annoviko/pyclustering/issues/647

    • Introduced human-readable error for genetic clustering algorithm in case of non-normalized data (Python: pyclustering.cluster.ga). See: https://github.com/annoviko/pyclustering/issues/597

    • Optimized windows implementation parallel_for and parallel_for_each by using pyclustering::parallel instead of PPL that affects all algorithms which use these functions (C++: pyclustering::parallel). See: https://github.com/annoviko/pyclustering/issues/642

    • Optimized parallel_for algorithm for short cycles that affects all algorithms which use parallel_for (C++: pyclustering::parallel). See: https://github.com/annoviko/pyclustering/issues/642

    • Introduced kstep parameter for elbow algorithm to use custom K search steps (Python: pyclustering.cluster.elbow, C++: pyclustering::cluster::elbow). See: https://github.com/annoviko/pyclustering/issues/489

    • Introduced p_step parameter for parallel_for function (C++: pyclustering::parallel). See: https://github.com/annoviko/pyclustering/issues/640

    • Optimized python implementation of K-Medoids algorithm (Python: pyclustering.cluster.kmedoids). See: https://github.com/annoviko/pyclustering/issues/526

    • C++ pyclustering CLIQUE interface returns human-readable errors (Python: pyclustering.cluster.clique). See: https://github.com/annoviko/pyclustering/issues/635 See: https://github.com/annoviko/pyclustering/issues/634

    • Introduced metric parameter for X-Means algorithm to use custom metric for clustering (Python: pyclustering.cluster.xmeans; C++ pyclustering::clst::xmeans). See: https://github.com/annoviko/pyclustering/issues/619

    • Introduced alpha and beta probabilistic bounds for MNDL splitting criteria for X-Means algorithm (Python: pyclustering.cluster.xmeans; C++: pyclustering::clst::xmeans). See: https://github.com/annoviko/pyclustering/issues/624

    CORRECTED MAJOR BUGS:

    • Corrected bug with a command python3 -m pyclustering.tests that was using the current folder to find tests to run (Python: pyclustering). See: https://github.com/annoviko/pyclustering/issues/648

    • Corrected bug with Elbow algorithm where kmax is not used to calculate K (Python: pyclustering.cluster.elbow; C++: pyclustering::clst::elbow). See: https://github.com/annoviko/pyclustering/issues/639

    • Corrected implementation of K-Medians (PAM) algorithm that is aligned with original algorithm (Python: pyclustering.cluster.kmedoids; C++: pyclustering::clst::kmedoids). See: https://github.com/annoviko/pyclustering/issues/503

    • Corrected literature references that were for K-Medians (PAM) implementation (Python: pyclustering.cluster.kmedoids). See: https://github.com/annoviko/pyclustering/pull/572

    • Corrected bug when K-Medoids updates input parameter initial_medoids that were provided to the algorithm (Python: pyclustering.cluster.kmedoids). See: https://github.com/annoviko/pyclustering/issues/630

    • Corrected bug with Euclidean distance when numpy is used (Python: pyclustering.utils.metric). See: https://github.com/annoviko/pyclustering/issues/625

    • Corrected bug with Minkowski distance when numpy is used (Python: pyclustering.utils.metric). See: https://github.com/annoviko/pyclustering/issues/626

    • Corrected bug with Gower distance when numpy calculation is used and data shape is bigger than 1 (Python: pyclustering.utils.metric). See: https://github.com/annoviko/pyclustering/issues/627

    • Corrected MNDL splitting criteria for X-Means algorithm (Python: pyclustering.cluster.xmeans; C++: pyclustering::clst::xmeans). See: https://github.com/annoviko/pyclustering/issues/623

    Source code(tar.gz)
    Source code(zip)
    pyclustering-0.10.1.tar.gz(2.45 MB)
  • 0.10.0.1(Aug 17, 2020)

    pyclustering 0.10.0.1 library is a collection of clustering algorithms and methods, oscillatory networks, etc.

    GENERAL CHANGES:

    • Metadata of the library is updated. See: no reference

    • Supported command test for setup.py script (Python: pyclustering). See: https://github.com/annoviko/pyclustering/issues/607

    • Introduced parameter random_seed for algorithms/models to control the seed of the random functionality: kmeans++, random_center_initializer, ga, gmeans, xmeans, som, somsc, elbow, silhouette_ksearch (Python: pyclustering.cluster; C++: pyclustering.clst). See: https://github.com/annoviko/pyclustering/issues/578

    • Introduced parameter k_max to G-Means algorithm to use it as an optional stop condition for the algorithm (Python: pyclustering.cluster.gmeans; C++: pyclustering::clst::gmeans). See: https://github.com/annoviko/pyclustering/issues/602

    • Implemented method save() for cluster_visualizer and cluster_visualizer_multidim to save visualization to file (Python: pyclustering.cluster). See: https://github.com/annoviko/pyclustering/issues/601

    • Optimization of CURE algorithm using balanced KD-tree (Python: pyclustering.cluster.cure; C++: pyclustering::clst::cure). See: https://github.com/annoviko/pyclustering/issues/589

    • Optimization of OPTICS algorithm using balanced KD-tree (Python: pyclustering.cluster.optics; C++: pyclustering::clst::optics). See: https://github.com/annoviko/pyclustering/issues/588

    • Optimization of DBSCAN algorithm using balanced KD-tree (Python: pyclustering.cluster.dbscan; C++: pyclustering::clst::dbscan). See: https://github.com/annoviko/pyclustering/issues/587

    • Implemented new optimized balanced KD-tree kdtree_balanced (Python: pyclustering.cluster.kdtree; C++: pyclustering::container::kdtree_balanced). See: https://github.com/annoviko/pyclustering/issues/379

    • Implemented KD-tree graphical visualizer kdtree_visualizer for KD-trees with 2-dimensional data (Python: pyclustering.container.kdtree). See: https://github.com/annoviko/pyclustering/issues/586

    • Updated interface of each clustering algorithm in C/C++ pyclustering cluster_data is substituted by concrete classes (C++ pyclustering::clst). See: https://github.com/annoviko/pyclustering/issues/577

    CORRECTED MAJOR BUGS:

    • Bug with wrong data type for scores in Silhouette K-search algorithm in case of using C++ (Python: pyclustering.cluster.silhouette). See: https://github.com/annoviko/pyclustering/issues/606

    • Bug with a random distribution in the random center initializer (Python: pyclustering.cluster.center_initializer). See: https://github.com/annoviko/pyclustering/issues/573

    • Bug with incorrect converting Index List and Object List to Labeling when clusters do not contains one or more points from an input data (Python pyclustering.cluster.encoder). See: https://github.com/annoviko/pyclustering/issues/596

    • Bug with an exception in case of using user-defined metric for K-Means algorithm (Python pyclustering.cluster.kmeans). See: https://github.com/annoviko/pyclustering/pull/600

    • Memory leakage in the interface between python and C++ pyclustering library in case of CURE algorithm usage (C++ pyclustering). See: https://github.com/annoviko/pyclustering/issues/581

    Source code(tar.gz)
    Source code(zip)
    pyclustering-0.10.0.1.tar.gz(2.55 MB)
  • 0.10.0(Aug 17, 2020)

    pyclustering 0.10.0 library is a collection of clustering algorithms and methods, oscillatory networks, etc.

    GENERAL CHANGES:

    • Supported command test for setup.py script (Python: pyclustering). See: https://github.com/annoviko/pyclustering/issues/607

    • Introduced parameter random_seed for algorithms/models to control the seed of the random functionality: kmeans++, random_center_initializer, ga, gmeans, xmeans, som, somsc, elbow, silhouette_ksearch (Python: pyclustering.cluster; C++: pyclustering.clst). See: https://github.com/annoviko/pyclustering/issues/578

    • Introduced parameter k_max to G-Means algorithm to use it as an optional stop condition for the algorithm (Python: pyclustering.cluster.gmeans; C++: pyclustering::clst::gmeans). See: https://github.com/annoviko/pyclustering/issues/602

    • Implemented method save() for cluster_visualizer and cluster_visualizer_multidim to save visualization to file (Python: pyclustering.cluster). See: https://github.com/annoviko/pyclustering/issues/601

    • Optimization of CURE algorithm using balanced KD-tree (Python: pyclustering.cluster.cure; C++: pyclustering::clst::cure). See: https://github.com/annoviko/pyclustering/issues/589

    • Optimization of OPTICS algorithm using balanced KD-tree (Python: pyclustering.cluster.optics; C++: pyclustering::clst::optics). See: https://github.com/annoviko/pyclustering/issues/588

    • Optimization of DBSCAN algorithm using balanced KD-tree (Python: pyclustering.cluster.dbscan; C++: pyclustering::clst::dbscan). See: https://github.com/annoviko/pyclustering/issues/587

    • Implemented new optimized balanced KD-tree kdtree_balanced (Python: pyclustering.cluster.kdtree; C++: pyclustering::container::kdtree_balanced). See: https://github.com/annoviko/pyclustering/issues/379

    • Implemented KD-tree graphical visualizer kdtree_visualizer for KD-trees with 2-dimensional data (Python: pyclustering.container.kdtree). See: https://github.com/annoviko/pyclustering/issues/586

    • Updated interface of each clustering algorithm in C/C++ pyclustering cluster_data is substituted by concrete classes (C++ pyclustering::clst). See: https://github.com/annoviko/pyclustering/issues/577

    CORRECTED MAJOR BUGS:

    • Bug with wrong data type for scores in Silhouette K-search algorithm in case of using C++ (Python: pyclustering.cluster.silhouette). See: https://github.com/annoviko/pyclustering/issues/606

    • Bug with a random distribution in the random center initializer (Python: pyclustering.cluster.center_initializer). See: https://github.com/annoviko/pyclustering/issues/573

    • Bug with incorrect converting Index List and Object List to Labeling when clusters do not contains one or more points from an input data (Python pyclustering.cluster.encoder). See: https://github.com/annoviko/pyclustering/issues/596

    • Bug with an exception in case of using user-defined metric for K-Means algorithm (Python pyclustering.cluster.kmeans). See: https://github.com/annoviko/pyclustering/pull/600

    • Memory leakage in the interface between python and C++ pyclustering library in case of CURE algorithm usage (C++ pyclustering). See: https://github.com/annoviko/pyclustering/issues/581

    Source code(tar.gz)
    Source code(zip)
  • 0.9.3.1(Dec 24, 2019)

  • 0.9.3(Dec 23, 2019)

    pyclustering 0.9.3 library is a collection of clustering algorithms and methods, oscillatory networks, etc.

    GENERAL CHANGES:

    • Introduced get_cf_clusters and get_cf_entries methods for BIRCH algorithm to get CF-entry encoding information (pyclustering.cluster.birch). See: https://github.com/annoviko/pyclustering/issues/569

    • Introduced predict method for SOMSC algorithm to find closest clusters for specified points (pyclustering.cluster.somsc). See: https://github.com/annoviko/pyclustering/issues/546

    • Parallel optimization of C++ pyclustering compilation process. See: https://github.com/annoviko/pyclustering/issues/553

    • Include folder for easy integration to other C++ projects. See: https://github.com/annoviko/pyclustering/issues/554

    • Introduced new targets to build static libraries on Windows platform. See: https://github.com/annoviko/pyclustering/issues/555

    • Introduced new targets to build static libraries on Linux/MacOS platforms. See: https://github.com/annoviko/pyclustering/issues/556

    CORRECTED MAJOR BUGS:

    • Bug with incorrect finding of closest CF-entry (pyclustering.container.cftree). See: https://github.com/annoviko/pyclustering/issues/564

    • Bug with incorrect BIRCH clustering due incorrect leaf analysis (pyclustering.cluster.birch). See: https://github.com/annoviko/pyclustering/issues/563

    • Bug with incorrect search procedure of farthest nodes in CF-tree (pyclustering.container.cftree). See: https://github.com/annoviko/pyclustering/issues/551

    • Bug with crash during clustering with the same points in case of BIRCH (pyclustering.cluster.birch). See: https://github.com/annoviko/pyclustering/issues/561

    Source code(tar.gz)
    Source code(zip)
    pyclustering-0.9.3-binaries-all.tar.gz(2.51 MB)
  • 0.9.2(Oct 10, 2019)

    pyclustering 0.9.2 library is a collection of clustering algorithms and methods, oscillatory networks, etc.

    GENERAL CHANGES:

    • Introduced checking of input arguments for clustering algorithm to provide human-readable errors (pyclustering.cluster). See: https://github.com/annoviko/pyclustering/issues/548

    • Implemented functionality to perform Anderson-Darling test for Gaussian distribution (ccore.stats). See: https://github.com/annoviko/pyclustering/issues/550

    • Implemented new clustering algorithm G-Means (pyclustering.cluster.gmeans, ccore.clst.gmeans). See: https://github.com/annoviko/pyclustering/issues/506

    • Introduced parameter repeat to improve parameters in X-Means algorithm (pyclustering.cluster.xmeans, ccore.clst.xmeans). See: https://github.com/annoviko/pyclustering/issues/525

    • Introduced new distance metric: Gower (pyclustering.utils.metric, ccore.utils.metric). See: https://github.com/annoviko/pyclustering/issues/544

    • Introduced sampling algorithms reservoir_r and reservoir_x (pyclustering.utils.sampling). See: https://github.com/annoviko/pyclustering/issues/542

    • Introduced parameter data_type to Silhouette method to use distance matrix (pyclustering.cluster.silhouette, ccore.clst.silhouette). See: https://github.com/annoviko/pyclustering/issues/543

    • Optimization of HHN (Hodgkin-Huxley Neural Network) by parallel processing (ccore.nnet.hhn). See: https://github.com/annoviko/pyclustering/issues/541

    • Introduced get_total_wce method for xmeans algorithm to find WCE (pyclustering.cluster.xmeans). See: https://github.com/annoviko/pyclustering/issues/508

    CORRECTED MAJOR BUGS:

    • Incorrect center initialization in K-Means++ when candidates are not farthest (pyclustering.cluster.center_initializer). See: https://github.com/annoviko/pyclustering/issues/549
    Source code(tar.gz)
    Source code(zip)
    pyclustering-0.9.2-binaries-all.tar.gz(2.50 MB)
  • 0.9.1(Sep 4, 2019)

    pyclustering 0.9.1 library is a collection of clustering algorithms and methods, oscillatory networks, neural networks, etc.

    GENERAL CHANGES:

    • Introduced predict method for X-Means algorithm to find closest clusters for particular points (pyclustering.cluster.xmeans). See: https://github.com/annoviko/pyclustering/issues/540

    • Optimization of OPTICS algorithm by reducing complexity (ccore.clst.optics). See: https://github.com/annoviko/pyclustering/issues/521

    • Optimization of K-Medians algorithm by parallel processing (ccore.clst.kmedians). See: https://github.com/annoviko/pyclustering/issues/529

    • Introduced predict method for K-Medoids algorithm to find closest clusters for particular points (pyclustering.cluster.kmedoids). See: https://github.com/annoviko/pyclustering/issues/527

    • Introduced predict method for K-Means algorithm to find closest clusters for particular points (pyclustering.cluster.kmeans). See: https://github.com/annoviko/pyclustering/issues/515

    • Parallel optimization of Elbow method. (ccore.clst.elbow). See: https://github.com/annoviko/pyclustering/issues/511

    Source code(tar.gz)
    Source code(zip)
    pyclustering-0.9.1-binaries-all.tar.gz(2.41 MB)
  • 0.9.0(Apr 18, 2019)

    pyclustering 0.9.0 library is a collection of clustering algorithms and methods, oscillatory networks, neural networks, etc.

    GENERAL CHANGES:

    • CCORE (pyclustering core) is supported for MacOS. See: https://github.com/annoviko/pyclustering/issues/486

    • Introduced parallel Fuzzy C-Means algorithm (pyclustering.cluster.fcm, ccore.clst.fcm). See: https://github.com/annoviko/pyclustering/issues/386

    • Introduced new 'itermax' parameter for K-Means, K-Medians, K-Medoids algorithm to control maximum amount of iterations (pyclustering.cluster, ccore.clst). See: https://github.com/annoviko/pyclustering/issues/496

    • Implemented Silhouette and Silhouette K-Search algorithm for CCORE (ccore.clst.silhouette, ccore.clst.silhouette_ksearch). See: https://github.com/annoviko/pyclustering/issues/490

    • Implemented CLIQUE algorithms (pyclustering.cluster.clique, ccore.clst.clique). See: https://github.com/annoviko/pyclustering/issues/381

    • Introduced new distance metrics: Canberra and Chi Square (pyclustering.utils.metric, ccore.utils.metric). See: https://github.com/annoviko/pyclustering/issues/482

    • Optimization of CURE algorithm (C++ implementation) by using heap (multiset) instead of list to store clusters in queue (ccore.clst.cure). See: https://github.com/annoviko/pyclustering/issues/479

    CORRECTED MAJOR BUGS:

    • Bug with crossover mask generation for genetic clustering algorithm (pyclustering.cluster.ga). See: https://github.com/annoviko/pyclustering/pull/474

    • Bug with hanging of K-Medians algorithm for some cases when algorithm is initialized by wrong amount of centers (ccore.clst.kmedians). See: https://github.com/annoviko/pyclustering/issues/498

    • Bug with incorrect center initialization, when the same point can be placed to result more than once (pyclustering.cluster.center_initializer, ccore.clst.kmeans_plus_plus). See: https://github.com/annoviko/pyclustering/issues/497

    • Bug with incorrect clustering in case of CURE python implementation when clusters are allocated incorrectly (pyclustering.cluster.cure). See: https://github.com/annoviko/pyclustering/issues/483

    • Bug with incorrect distance calculation for kmeans++ in case of index representation for centers (pyclustering.cluster.center_initializer). See: https://github.com/annoviko/pyclustering/issues/485

    Source code(tar.gz)
    Source code(zip)
    pyclustering-0.9.0-binaries-all.tar.gz(2.35 MB)
  • 0.8.2-joss(Apr 11, 2019)

    pyclustering 0.8.2-joss library is a collection of clustering algorithms and methods, oscillatory networks, neural networks, etc.

    It is a special release for JOSS (The Journal of Open Source Software). This version contains only cosmetic changes related to documentation and project description that have been introduced after JOSS reivew.

    Source code(tar.gz)
    Source code(zip)
  • 0.8.2(Nov 19, 2018)

    pyclustering 0.8.2 library is a collection of clustering algorithms and methods, oscillatory networks, neural networks, etc.

    GENERAL CHANGES:

    • Implemented Silhouette method and Silhouette KSearcher to find out proper amount of clusters (pyclustering.cluster.silhouette). See: https://github.com/annoviko/pyclustering/issues/416

    • Introduced new 'return_index' parameter for kmeans_plus_plus and random_center_initializer algorithms (method 'initialize') to initialize initial medoids (pyclustering.cluster.center_initializer). See: https://github.com/annoviko/pyclustering/issues/421

    • Display warning instead of throwing error if matplotlib or Pillow cannot be imported (MAC OS X problems). See: https://github.com/annoviko/pyclustering/issues/455

    • Implemented Random Center Initializer for CCORE (ccore.clst.random_center_initializer). See: no reference.

    • Implemented Elbow method to find out proper amount of clusters in dataset (pyclustering.cluster.elbow, ccore.clst.elbow). See: https://github.com/annoviko/pyclustering/issues/416

    • Introduced new method 'get_optics_objects' for OPTICS algorithm to obtain detailed information about ordering (pyclustering.cluster.optics, ccore.clst.optics). See: https://github.com/annoviko/pyclustering/issues/464

    • Added new clustering answers for SAMPLE SIMPLE data collections (pyclustering.samples). See: https://github.com/annoviko/pyclustering/issues/459

    • Implemented multidimensional cluster visualizer (pyclustering.cluster). See: https://github.com/annoviko/pyclustering/issues/450

    • Parallel optimization of K-Medoids algorithm (ccore.clst.kmedoids). See: https://github.com/annoviko/pyclustering/issues/447

    • Parallel optimization of K-Means and X-Means (that uses K-Means) algorithms (ccore.clst.kmeans, ccore.clst.xmeans). See: https://github.com/annoviko/pyclustering/issues/451

    • Introduced new threshold parameter 'amount of block points' to BANG algorithm to allocate outliers more precisely (pyclustering.cluster.bang). See: https://github.com/annoviko/pyclustering/issues/446

    • Optimization of conveying results from C++ to Python for K-Medians and K-Medoids (pyclustering.cluster.kmedoids, pyclustering.cluster.kmedians). See: https://github.com/annoviko/pyclustering/issues/445

    • Implemented cluster generator (pyclustering.cluster.generator). See: https://github.com/annoviko/pyclustering/issues/444

    • Implemented BANG animator to render animation of clustering process (pyclustering.cluster.bang). See: https://github.com/annoviko/pyclustering/issues/442

    • Optimization of CURE algorithm by using Euclidean Square distance (pyclustering.cluster.cure, ccore.clst.cure). See: https://github.com/annoviko/pyclustering/issues/439

    • Supported numpy.ndarray points in KD-tree (pyclustering.container.kdtree). See: https://github.com/annoviko/pyclustering/issues/438

    CORRECTED MAJOR BUGS:

    • Bug with clustering failure in case of non-numpy user defined metric for K-Means algorithm (pyclustering.cluster.kmeans). See: https://github.com/annoviko/pyclustering/issues/471

    • Bug with animation of correlation matrix in case of new versions of matplotlib (pyclustering.nnet.sync). See: no reference.

    • Bug with SOM and pickle when it was not possible to store and load network using pickle (pyclustering.nnet.som). See: https://github.com/annoviko/pyclustering/issues/456

    • Bug with DBSCAN when points are marked as a noise (pyclustering.cluster.dbscan). See: https://github.com/annoviko/pyclustering/issues/462

    • Bug with randomly enabled connection weights in case of SyncNet based algorithms using CCORE interface (pyclustering.nnet.syncnet). See: https://github.com/annoviko/pyclustering/issues/452

    • Bug with calculation weighted connection for Sync based clustering algorithms in C++ implementation (ccore.nnet.syncnet). See: no reference

    • Bug with failure in case of numpy.ndarray data type in python part of CURE algorithm (pyclustering.cluster.cure). See: https://github.com/annoviko/pyclustering/issues/438

    • Bug with BANG algorithm with empty dimensions - when data contains column with the same values (pyclustering.cluster.bang). See: https://github.com/annoviko/pyclustering/issues/449

    Source code(tar.gz)
    Source code(zip)
    pyclustering-0.8.2-binaries-all.tar.gz(1.97 MB)
  • 0.8.1(May 29, 2018)

    pyclustering 0.8.1 library is collection of clustering algorithms, oscillatory networks, neural networks, etc.

    GENERAL CHANGES:

    • Implemented feature to use specific metric for distance calculation in K-Means algorithm (pyclustering.cluster.kmeans, ccore.clst.kmeans). See: https://github.com/annoviko/pyclustering/issues/434

    • Implemented BANG-clustering algorithm with result visualizer (pyclustering.cluster.bang). See: https://github.com/annoviko/pyclustering/issues/424

    • Implemented feature to use specific metric for distance calculation in K-Medians algorithm (pyclustering.cluster.kmedians, ccore.clst.kmedians). See: https://github.com/annoviko/pyclustering/issues/429

    • Supported new type of input data for K-Medoids - distance matrix (pyclustering.cluster.kmedoids, ccore.clst.kmedoids). See: https://github.com/annoviko/pyclustering/issues/418

    • Implemented TTSAS algorithm (pyclustering.cluster.ttsas, ccore.clst.ttsas). See: https://github.com/annoviko/pyclustering/issues/398

    • Implemented MBSAS algorithm (pyclustering.cluster.mbsas, ccore.clst.mbsas). See: https://github.com/annoviko/pyclustering/issues/398

    • Implemented BSAS algorithm (pyclustering.cluster.bsas, ccore.clst.bsas). See: https://github.com/annoviko/pyclustering/issues/398

    • Implemented feature to use specific metric for distance calculation in K-Medoids algorithm (pyclustering.cluster.kmedoids, ccore.clst.kmedoids). See: https://github.com/annoviko/pyclustering/issues/417

    • Implemented distance metric collection (pyclustering.utils.metric, ccore.utils.metric). See: no reference.

    • Supported new type of input data for OPTICS - distance matrix (pyclustering.cluster.optics, ccore.clst.optics). See: https://github.com/annoviko/pyclustering/issues/412

    • Supported new type of input data for DBSCAN - distance matrix (pyclustering.cluster.dbscan, ccore.clst.dbscan). See: no reference.

    • Implemented K-Means observer and visualizer to visualize and animate clustering results (pyclustering.cluster.kmeans, ccore.clst.kmeans). See: no reference.

    CORRECTED MAJOR BUGS:

    • Bug with out of range in K-Medians (pyclustering.cluster.kmedians, ccore.clst.kmedians). See: https://github.com/annoviko/pyclustering/issues/428

    • Bug with fast linking in PCNN (python implementation only) that wasn't used despite the corresponding option (pyclustering.nnet.pcnn). See: https://github.com/annoviko/pyclustering/issues/419

    Source code(tar.gz)
    Source code(zip)
  • 0.8.0(Feb 23, 2018)

    pyclustering 0.8.0 library is collection of clustering algorithms, oscillatory networks, neural networks, etc.

    GENERAL CHANGES:

    • Optimization K-Means++ algorithm using numpy (pyclustering.cluster.center_initializer). See: no reference.

    • Implemented K-Means++ initializer for CCORE (ccore.clst.kmeans_plus_plus). See: https://github.com/annoviko/pyclustering/issues/382

    • Optimization of X-Means clustering process by using KMeans++ for initial centers of split regions (pyclustering.cluster.xmeans, ccore.clst.xmeans). See: https://github.com/annoviko/pyclustering/issues/382

    • Implemented parallel Sync-family algorithms for C/C++ implementation (CCORE) only (ccore.sync). See: https://github.com/annoviko/pyclustering/issues/170

    • C/C++ implementation is used by default to increase performance. See: https://github.com/annoviko/pyclustering/issues/393

    • Ignore 'ccore' flag to use C/C++ if platform is not supported (pyclustering.core). See: https://github.com/annoviko/pyclustering/issues/393

    • Optimization of python implementation of the K-Means algorithm using numpy (pyclustering.cluster.kmeans). See: https://github.com/annoviko/pyclustering/issues/403

    • Implemented dynamic visualizer for oscillatory networks (pyclustering.nnet.dynamic_visualizer). See: no reference.

    • Implemented C/C++ Hodgkin-Huxley oscillatory network for image segmentation in CCORE to increase performance (ccore.hhn, pyclustering.nnet.hhn). See: https://github.com/annoviko/pyclustering/issues/217

    • Performance optimization for CCORE on linux platform. See: no reference.

    • 32-bit platform of CCORE is supported for Linux OS. See: https://github.com/annoviko/pyclustering/issues/253

    • 32-bit platform of CCORE is supported for Windows OS. See: https://github.com/annoviko/pyclustering/issues/253

    • Implemented method 'get_probabilities()' for obtaining belong probability in EM-algorithm (pyclustering.cluster.ema). See: https://github.com/annoviko/pyclustering/issues/387

    • Python implementation of CURE algorithm method 'get_clusters()' returns list of indexes (pyclustering.cluster.cure). See: https://github.com/annoviko/pyclustering/issues/384

    • Implemented parallel processing for X-Means algorithm (ccore.clst.xmeans). See: https://github.com/annoviko/pyclustering/issues/372

    • Implemented pool threads for parallel processing (ccore.parallel). See: https://github.com/annoviko/pyclustering/issues/383

    • Optimization of OPTICS algorithm using KD-tree for searching nearest neighbors (pyclustering.cluster.optics, ccore.optics). See: https://github.com/annoviko/pyclustering/issues/370

    • Optimization of DBSCAN algorithm using KD-tree for searching nearest neighbors (pyclustering.cluster.dbscan, ccore.dbscan). See: https://github.com/annoviko/pyclustering/issues/369

    CORRECTED MAJOR BUGS:

    • Incorrect type of medoid's index in K-Medians algorithm in case of Python 2.x (pyclustering.cluster.kmedoids). See: https://github.com/annoviko/pyclustering/issues/415

    • Hanging of method 'find_node' in KD-tree if it does not contain node with specified point and payload (pyclustering.container.kdtree). See: no reference.

    • Incorrect clustering by CURE algorithm in some cases when data have a lot of identical points (pyclustering.cluster.cure). See: https://github.com/annoviko/pyclustering/issues/414

    • Segmentation fault in CURE algorithm in some cases when data have a lot of identical points (ccore.clst.cure). See: no reference.

    • Incorrect segmentation by Python version of syncsegm - oscillatory network based on sync for image segmentation (pyclustering.nnet.syncsegm). See: https://github.com/annoviko/pyclustering/issues/409

    • Zero value of sigma under logarithm function in Python version of pyclustering X-Means algorithm (pyclustering.cluster.xmeans). See: https://github.com/annoviko/pyclustering/issues/407

    • Amplitude threshold is ignored during synchronous ensembles allocation for amplitude output dynamic 'allocate_sync_ensembles' - affect HNN, LEGION (pyclustering.utils). See: no reference.

    • Wrong indexes can be returned during synchronous ensembles allocation for amplitude output dynamic 'allocate_sync_ensembles' - affect HNN, LEGION (pyclustering.utils). See: no reference.

    • Amount of allocated clusters can be differ from amount of centers in X-Means algorithm (ccore.clst.xmeans). See: https://github.com/annoviko/pyclustering/issues/389

    • Amount of allocated clusters can be bigger than kmax in X-Means algorithm (pyclustering.cluster.xmeans, ccore.clst.xmeans). See: https://github.com/annoviko/pyclustering/issues/388

    • Corrected bug with returned nullptr in method 'kdtree_searcher::find_nearest_node()' (ccore.container.kdtree). See: no reference.

    Source code(tar.gz)
    Source code(zip)
  • 0.7.2(Oct 23, 2017)

    pyclustering 0.7.2 library is collection of clustering algorithms, oscillatory networks, neural networks, etc.

    GENERAL CHANGES (pyclustering):

    • Correction for setup failure with PKG-INFO.rst.
    Source code(tar.gz)
    Source code(zip)
  • 0.7.1(Oct 19, 2017)

    pyclustering 0.7.1 library is collection of clustering algorithms, osicllatory networks, neural networks, etc.

    GENERAL CHANGES (pyclustering):

    • Metadata of the package is updated.
    Source code(tar.gz)
    Source code(zip)
  • 0.7.0(Oct 16, 2017)

    pyclustering 0.7.0 library is collection of clustering algorithms, oscllatory networks, neural networks, etc.

    GENERAL CHANGES (pyclustering):

    • Implemented Expectation-Maximization clustering algorithm for Gaussian Mixute Model and clustering visualizer for this particular algorithm (pyclustering.cluster.ema) See: https://github.com/annoviko/pyclustering/issues/16

    • Implemented Genetic Clustering Algorithm (GCA) and clustering visualizer for this particular algorithm (pyclustering.cluster.ga) See: https://github.com/annoviko/pyclustering/issues/360

    • Implemented feature to obtain and visualize evolution of order parameter and local order parameter for Sync network and Sync-based algorithms (pyclustering.nnet.sync). See: https://github.com/annoviko/pyclustering/issues/355

    • Implemented K-Means++ method for initialization of initial centers for algorithms like K-Means or X-Means (pyclustering.cluster.center_initializer). See: https://github.com/annoviko/pyclustering/issues/354

    • Implemented fSync oscillatory network that is based on Landau-Stuart equation and Kuramoto model (pyclustering.nnet.fsync). See: https://github.com/annoviko/pyclustering/issues/168

    • Optimization of pyclustering client to core library 'CCORE' library (pyclustering.core). See: https://github.com/annoviko/pyclustering/issues/289 See: https://github.com/annoviko/pyclustering/issues/351

    • Implemented feature to show network structure of Sync family oscillatory networks in case 'ccore' usage. See: https://github.com/annoviko/pyclustering/issues/344

    • Implemented feature to colorize OPTICS ordering diagram when amount of clusters is specified. See: no reference.

    • Improved clustering results in case of usage MNDL splitting criterion for small datasets. See: https://github.com/annoviko/pyclustering/issues/328

    • Feature to display connectivity radius on cluster-ordering diagram by ordering_visualizer (pyclustering.cluster.optics). See: https://github.com/annoviko/pyclustering/issues/314

    • Feature to use CCORE implementation of OPTICS algorithm to take advance in performance (pyclustering.cluster.optics). See: https://github.com/annoviko/pyclustering/issues/120

    • Implemented feature to shows animation of pattern recognition process that has been performed by the SyncPR oscillatory network. Method 'animate_pattern_recognition()' of class 'syncpr_visualizer' (pyclustering.nnet.syncpr). See: https://www.youtube.com/watch?v=Ro7KbApL4MQ See: https://www.youtube.com/watch?v=iIusOsGehoY

    • Implemented feature to obtain nodes of specified level of CF-tree. Method 'get_level_nodes()' of class 'cftree' (pyclustering.container.cftree). See: no reference.

    • Implemented feature to allocate/display/animate phase matrix: 'allocate_phase_matrix()', 'show_phase_matrix()', 'animate_phase_matrix()' (pyclustering.nnet.sync). See: no reference.

    • Implemented chaotic neural network where clustering phenomenon can be observed: 'cnn_network', 'cnn_dynamic', 'cnn_visualizer' (pyclustering.nnet.cnn). See: https://github.com/annoviko/pyclustering/issues/301

    • Implemented feature to analyse ordering diagram using amout of clusters that should be allocated as an input parameter to calculate correct connvectity radius for clustering (pyclustering.cluster.optics). See: https://github.com/annoviko/pyclustering/issues/307

    • Implemented feature to omit usage of initial centers - X-Means starts processing from random initial center (pyclustering.cluster.xmeans). See: no reference.

    • Implemented feature for cluster visualizer: cluster attributes (pyclustering.cluster). See: https://github.com/annoviko/pyclustering/issues/295

    • Implemented SOM-SC algorithm (SOM Simple Clustering) (pyclustering.cluster.somsc). See: https://github.com/annoviko/pyclustering/issues/321

    GENERAL CHANGES (ccore):

    • Implemented feature to obtain and visualize evolution of order parameter and local order parameter for Sync network and Sync-based algorithms (ccore.nnet.sync). See: https://github.com/annoviko/pyclustering/issues/355

    • Cygwin x64 platform is supported (ccore). See: https://github.com/annoviko/pyclustering/issues/353

    • Optimization of CCORE library interface (ccore.interface). See: https://github.com/annoviko/pyclustering/issues/289

    • Implemented MNDL splitting crinterion for X-Means algorithm (ccore.cluster_analysis.xmeans). See: https://github.com/annoviko/pyclustering/issues/159

    • Implemented OPTICS algorithm and interface for client that results all clustering results (ccore.cluster_analysis.optics). See: https://github.com/annoviko/pyclustering/issues/120

    • Implmeneted packing of connectivity matrix of Sync family oscillatory networks (ccore.interface.sync_interface). See: https://github.com/annoviko/pyclustering/issues/344

    CORRECTED MAJOR BUGS:

    • Bug with segmentation fault during 'free()' on some linux operating systems. See: no reference.

    • Bug with sending the first element to cluster in OPTICS even if it is noise element. See: no reference.

    • Bug with amount of allocated clusters by K-Medoids algorithm in Python implementation and CCORE (pyclustering.cluster.kmedoids, ccore.cluster.medoids). See: https://github.com/annoviko/pyclustering/issues/366 See: https://github.com/annoviko/pyclustering/issues/367

    • Bug with getting neighbors and getting information about connections in Sync-based network and algorithms in case of usage CCORE. See: no reference.

    • Bug with calculation of number of oscillations for output dynamics. See: no reference.

    • Memory leakage in LEGION in case of CCORE usage - API function 'legion_destroy()' was not called (pyclustering.nnet.legion). See: no reference.

    • Bug with crash of antmeans algorithm for python version 3.6.0:414df79263a11, Dec 23 2016 [MSC v.1900 64 bit (AMD64)] (pyclustering.cluster.antmeans). See: https://github.com/annoviko/pyclustering/issues/350

    • Memory leakage in destructor of 'pyclustering_package' - exchange mechanism between ccore and pyclustering (ccore.interface.pyclustering_package'). See: https://github.com/annoviko/pyclustering/issues/347

    • Bug with loosing of the initial state of hSync output dynamic in case of CCORE usage (ccore.cluster.hsyncnet). See: https://github.com/annoviko/pyclustering/issues/346

    • Bug with hSync output dynamic that was displayed with discontinous parts as a set of rectangles (pyclustering.cluster.hsyncnet). See: https://github.com/annoviko/pyclustering/issues/345

    • Bug with visualization of CNN network in case 3D data (pyclustering.nnet.cnn). See: https://github.com/annoviko/pyclustering/issues/338

    • Bug with CCORE wrapper crashing after returning value from CCORE (pyclustering.core). See: https://github.com/annoviko/pyclustering/issues/337

    • Bug with calculation BIC splitting criterion for X-Means algorithm (pyclustering.cluster.xmeans). See: https://github.com/annoviko/pyclustering/issues/326

    • Bug with calculation MNDL splitting criterion for X-Means algorithm (pyclustering.cluster.xmeans). See: https://github.com/annoviko/pyclustering/issues/328

    • Bug with loss of CF-nodes in CF-tree during inserting that leads unbalanced CF-tree (pyclustering.container.cftree). See: https://github.com/annoviko/pyclustering/issues/304

    • Bug with time stamps for each iteration in hsyncnet algorithm (ccore.cluster.hsyncnet). See: https://github.com/annoviko/pyclustering/issues/306

    • Bug with memory occupation by CCORE DBSCAN implementation due to adjacency matrix usage (ccore.cluster.dbscan). See: https://github.com/annoviko/pyclustering/issues/309

    • Bug with CURE: always finds max two representative points (pyclustering.cluster.cure). See: https://github.com/annoviko/pyclustering/issues/310

    • Bug with infinite loop in case of incorrect number of clusters 'ordering_analyser' (pyclustering.cluster.optics). See: https://github.com/annoviko/pyclustering/issues/317

    • Bug with incorrect connectivity radius for allocation specified amount of clusters 'ordering_analyser' (pyclustering.cluster.optics). See: https://github.com/annoviko/pyclustering/issues/316

    • Bug with clusters are allocated in the homogeneous ordering 'ordering_analyser' (pyclustering.cluster.optics). See: https://github.com/annoviko/pyclustering/issues/315

    Source code(tar.gz)
    Source code(zip)
  • 0.6.6(Oct 7, 2016)

    pyclustring 0.6.6 library is collection of clustering algorithms, oscllatory networks, neural networks, etc.

    GENERAL CHANGES (pyclustering):

    • Implemented phase oscillatory network syncpr (pyclustering.nnet.syncpr). See: https://github.com/annoviko/pyclustering/issues/208
    • Feature for pyclustering.nnet.syncpr that allows to use ccore library for solving. See: https://github.com/annoviko/pyclustering/issues/232
    • Optimized simulation algorithm for sync oscillatory network (pyclustering.nnet.sync) when collecting results are not requested. See: https://github.com/annoviko/pyclustering/issues/233
    • Images of english alphabet 100x100. See: https://github.com/annoviko/pyclustering/commit/aa28f1a8a363fbeb5f074d22ec1e8258a1dd0579
    • Implemented feature to use rectangular network structures in oscillatory networks. See: https://github.com/annoviko/pyclustering/issues/259
    • Implemented CLARANS algorithm (pyclustering.cluster.clarans). See: https://github.com/annoviko/pyclustering/issues/52
    • Implemented feature to analyse and visualize results of hysteresis oscillatory network (pyclustering.nnet.hysteresis). See: https://github.com/annoviko/pyclustering/issues/75
    • Implemented feature to analyse and visualize results of graph coloring algorithm based on hysteresis oscillatory network (pyclustering.gcolor.hysteresis). See: https://github.com/annoviko/pyclustering/issues/75
    • Implemented ant colony based algorithm for TSP problem (pyclustering.tsp.antcolony). See: https://github.com/annoviko/pyclustering/pull/277
    • Implemented feature to use CCORE K-Medians algorithm using argument 'ccore' to ensure high performance (pyclustering.cluster.kmedians). See: https://github.com/annoviko/pyclustering/issues/231
    • Implemented feature to place several plots on each row using parameter 'maximum number of rows' for cluster visualizer (pyclustering.cluster). See: https://github.com/annoviko/pyclustering/issues/274
    • Implemented feature to specify initial number of neighbors to calculate initial connectivity radius and increase percent of number of neighbors (or radius if total number of object is exceeded) on each step (pyclustering.cluster.hsyncnet). See: https://github.com/annoviko/pyclustering/issues/284
    • Implemented double-layer oscillatory network based on modified Kuramoto model for image segmentation (pyclustering.nnet.syncsegm). See: no reference
    • Added new examples and demos. See: no reference
    • Implemented feature to use CCORE K-Medoids algorithm using argument 'ccore' to ensure high performance (pyclustering.cluster.kmedoids). See: https://github.com/annoviko/pyclustering/issues/230
    • Implemented feature for CURE algorithm that provides additional information about clustering results: representative points and mean point of each cluster (pyclustering.cluster.cure). See: https://github.com/annoviko/pyclustering/issues/292
    • Implemented feature to animate analysed output dynamic of Sync family oscillatory networks (sync_visualizer, syncnet_visualizer): correlation matrix, phase coordinates, cluster allocation (pyclustering.nnet.sync, pyclustering.cluster.syncnet). See: https://www.youtube.com/watch?v=5S5mFYVihso See: https://www.youtube.com/watch?v=Vd-ww9PcZvI See: https://www.youtube.com/watch?v=QYPqWoyNHO8 See: https://www.youtube.com/watch?v=RA0MiC2WlbY
    • Improved algorithm SYNC-SOM: accuracy of clustering and calculation are improved in line with proof of concept where connection between oscillator in the second layer (that is represented by the self-organized feature map) should be created in line with classical radius like in SyncNet, but indirectly: if objects that correspond to two different neurons can be connected than neurons should be also connected with each other (pyclustering.cluster.syncsom). See: https://github.com/annoviko/pyclustering/issues/297

    GENERAL CHANGES (ccore):

    • Implemented phase oscillatory network for pattern recognition syncpr (ccore.cluster.syncpr). See: https://github.com/annoviko/pyclustering/issues/232
    • Implemented agglomerative algorithm for cluster analysis (ccore.cluster.agglomerative). See: https://github.com/annoviko/pyclustering/issues/212
    • Implemented feature to use rectangular network structures in oscillatory networks. See: https://github.com/annoviko/pyclustering/issues/259
    • Implemented ant colony based algorithm for TSP problem (ccore.tsp.antcolony). See: https://github.com/annoviko/pyclustering/pull/277
    • Implemented K-Medians algorithm for cluster analysis (ccore.cluster.kmedians). See: https://github.com/annoviko/pyclustering/issues/231
    • Implemented feature to specify initial number of neighbors to calculate initial connectivity radius and increase percent of number of neighbors (or radius if total number of object is exceeded) on each step (ccore.cluster.hsyncnet). https://github.com/annoviko/pyclustering/issues/284
    • Implemented K-Medoids algorithm for cluster analysis (ccore.cluster.kmedoids). See: https://github.com/annoviko/pyclustering/issues/230
    • Implemented feature for CURE algorithm that provides additional information about clustering results: representative points and mean point of each cluster (ccore.cluster.cure). See: https://github.com/annoviko/pyclustering/issues/293
    • Implemented new class collection to oscillatory and neural network constructing. See: https://github.com/annoviko/pyclustering/issues/264
    • Memory usage optimization for ROCK algorithm. See: no reference

    CORRECTED MAJOR BUGS:

    • Bug with callback methods in ccore library in syncnet (ccore.cluster.syncnet) and hsyncnet (ccore.cluster.hsyncnet) that may lead to loss of accuracy.
    • Bug with division by zero in kmeans algorithm (ccore.kmeans, pyclustering.cluster.kmeans) when cluster after center updating is not able to capture object. See: https://github.com/annoviko/pyclustering/issues/238
    • Bug with stack overflow in KD tree in case of big data (pyclustering.container.kdtree, ccore.container.kdtree). See: https://github.com/annoviko/pyclustering/pull/239 See: https://github.com/annoviko/pyclustering/issues/255 See: https://github.com/annoviko/pyclustering/issues/254
    • Bug with incorrect clustering in case of the same elements in cure algorithm (pyclustering.cluster.cure). See: https://github.com/annoviko/pyclustering/pull/239
    • Bug with execution fail in case of wrong number of initial medians and in case of the same objects with several initial medians (pyclustering.cluster.kmedians). See: https://github.com/annoviko/pyclustering/issues/256
    • Bug with calculation synchronous ensembles near by zero: oscillators 2*pi and 0 are considered as different (pyclustering.nnet.sync, ccore.nnet.sync). See: https://github.com/annoviko/pyclustering/issues/263
    • Bug with cluster allocation in kmedoids algorithm in case of the same objects with several initial medoids (pyclustering.cluster.kmedoids). See: https://github.com/annoviko/pyclustering/issues/269
    • Bug with visualization of clusters in 3D (pyclustering.cluster). See: https://github.com/annoviko/pyclustering/issues/273
    • Bug with obtaining nearest entry for absorbing during inserting node (pyclustering.container.cftree). See: https://github.com/annoviko/pyclustering/issues/282
    • Bug with SOM method show_network() in case of usage CCORE (pyclustering.nnet.som). See: https://github.com/annoviko/pyclustering/issues/283
    • Bug with cluster allocation in case of switched off dynamic collecting (pyclustering.cluster.hsyncnet). See: https://github.com/annoviko/pyclustering/issues/285
    • Bug with execution fail during clustering data with rough values of initial medians (pyclustering.cluster.kmedians). See: https://github.com/annoviko/pyclustering/issues/286
    • Bug with meamory leakage on interface between CCORE and pyclustering (ccore). See: no reference
    • Bug with allocation correlation matrix in case of usage CCORE (pyclustering.nnet.sync). See: https://github.com/annoviko/pyclustering/issues/288
    • Bug with memory leakage in CURE algorithm - deallocation of representative points (ccore.cluster.cure). See: https://github.com/annoviko/pyclustering/issues/294
    • Bug with cluster visualization in case of 1D input data (pyclustering.cluster). See: https://github.com/annoviko/pyclustering/issues/296
    • Bug with loss of CF-nodes in CF-tree during inserting that leads unbalanced CF-tree (pyclustering.container.cftree). See: https://github.com/annoviko/pyclustering/issues/304
    • Bug with time stamps for each iteration in hsyncnet algorithm (ccore.cluster.hsyncnet). See: https://github.com/annoviko/pyclustering/issues/306
    • Bug with memory occupation by CCORE DBSCAN implementation due to adjacency matrix usage (ccore.cluster.dbscan). See: https://github.com/annoviko/pyclustering/issues/309
    • Bug with CURE: always finds max two representative points (pyclustering.cluster.cure). See: https://github.com/annoviko/pyclustering/issues/310
    Source code(tar.gz)
    Source code(zip)
Owner
Andrei Novikov
PhD in Computer Science. Software Scientist at ThermoFisher Scientific.
Andrei Novikov
PySpark bindings for H3, a hierarchical hexagonal geospatial indexing system

h3-pyspark: Uber's H3 Hexagonal Hierarchical Geospatial Indexing System in PySpark PySpark bindings for the H3 core library. For available functions,

Kevin Schaich 12 Dec 24, 2022
Maximum Covariance Analysis in Python

xMCA | Maximum Covariance Analysis in Python The aim of this package is to provide a flexible tool for the climate science community to perform Maximu

Niclas Rieger 39 Jan 03, 2023
Conduits - A Declarative Pipelining Tool For Pandas

Conduits - A Declarative Pipelining Tool For Pandas Traditional tools for declaring pipelines in Python suck. They are mostly imperative, and can some

Kale Miller 7 Nov 21, 2021
Extract data from a wide range of Internet sources into a pandas DataFrame.

pandas-datareader Up to date remote data access for pandas, works for multiple versions of pandas. Installation Install using pip pip install pandas-d

Python for Data 2.5k Jan 09, 2023
Intake is a lightweight package for finding, investigating, loading and disseminating data.

Intake: A general interface for loading data Intake is a lightweight set of tools for loading and sharing data in data science projects. Intake helps

Intake 851 Jan 01, 2023
Additional tools for particle accelerator data analysis and machine information

PyLHC Tools This package is a collection of useful scripts and tools for the Optics Measurements and Corrections group (OMC) at CERN. Documentation Au

PyLHC 3 Apr 13, 2022
Project under the certification "Data Analysis with Python" on FreeCodeCamp

Sea Level Predictor Assignment You will anaylize a dataset of the global average sea level change since 1880. You will use the data to predict the sea

Bhavya Gopal 3 Jan 31, 2022
WaveFake: A Data Set to Facilitate Audio DeepFake Detection

WaveFake: A Data Set to Facilitate Audio DeepFake Detection This is the code repository for our NeurIPS 2021 (Track on Datasets and Benchmarks) paper

Chair for Sys­tems Se­cu­ri­ty 27 Dec 22, 2022
Jupyter notebooks for the book "The Elements of Statistical Learning".

This repository contains Jupyter notebooks implementing the algorithms found in the book and summary of the textbook.

Madiyar 369 Dec 30, 2022
Full automated data pipeline using docker images

Create postgres tables from CSV files This first section is only relate to creating tables from CSV files using postgres container alone. Just one of

1 Nov 21, 2021
Pipetools enables function composition similar to using Unix pipes.

Pipetools Complete documentation pipetools enables function composition similar to using Unix pipes. It allows forward-composition and piping of arbit

186 Dec 29, 2022
Detailed analysis on fraud claims in insurance companies, gives you information as to why huge loss take place in insurance companies

Insurance-Fraud-Claims Detailed analysis on fraud claims in insurance companies, gives you information as to why huge loss take place in insurance com

1 Jan 27, 2022
yt is an open-source, permissively-licensed Python library for analyzing and visualizing volumetric data.

The yt Project yt is an open-source, permissively-licensed Python library for analyzing and visualizing volumetric data. yt supports structured, varia

The yt project 367 Dec 25, 2022
For making Tagtog annotation into csv dataset

tagtog_relation_extraction for making Tagtog annotation into csv dataset How to Use On Tagtog 1. Go to Project Downloads 2. Download all documents,

hyeong 4 Dec 28, 2021
Open source platform for Data Science Management automation

Hydrosphere examples This repo contains demo scenarios and pre-trained models to show Hydrosphere capabilities. Data and artifacts management Some mod

hydrosphere.io 6 Aug 10, 2021
Snakemake workflow for converting FASTQ files to self-contained CRAM files with maximum lossless compression.

Snakemake workflow: name A Snakemake workflow for description Usage The usage of this workflow is described in the Snakemake Workflow Catalog. If

Algorithms for reproducible bioinformatics (Koesterlab) 1 Dec 16, 2021
Data Analytics: Modeling and Studying data relating to climate change and adoption of electric vehicles

Correlation-Study-Climate-Change-EV-Adoption Data Analytics: Modeling and Studying data relating to climate change and adoption of electric vehicles I

Jonathan Feng 1 Jan 03, 2022
Example Of Splunk Search Query With Python And Splunk Python SDK

SSQAuto (Splunk Search Query Automation) Example Of Splunk Search Query With Python And Splunk Python SDK installation: ➜ ~ git clone https://github.c

AmirHoseinTangsiriNET 1 Nov 14, 2021
Creating a statistical model to predict 10 year treasury yields

Predicting 10-Year Treasury Yields Intitially, I wanted to see if the volatility in the stock market, represented by the VIX index (data source), had

10 Oct 27, 2021
Hg002-qc-snakemake - HG002 QC Snakemake

HG002 QC Snakemake To Run Resources and data specified within snakefile (hg002QC

Juniper A. Lake 2 Feb 16, 2022