PyClustering is a Python, C++ data mining library.

Overview

Build Status Linux MacOS Build Status Win Coverage Status PyPi Download Counter JOSS

PyClustering

pyclustering is a Python, C++ data mining library (clustering algorithm, oscillatory networks, neural networks). The library provides Python and C++ implementations (C++ pyclustering library) of each algorithm or model. C++ pyclustering library is a part of pyclustering and supported for Linux, Windows and MacOS operating systems.

Version: 0.11.dev

License: The 3-Clause BSD License

E-Mail: [email protected]

Documentation: https://pyclustering.github.io/docs/0.10.1/html/

Homepage: https://pyclustering.github.io/

PyClustering Wiki: https://github.com/annoviko/pyclustering/wiki

Dependencies

Required packages: scipy, matplotlib, numpy, Pillow

Python version: >=3.6 (32-bit, 64-bit)

C++ version: >= 14 (32-bit, 64-bit)

Performance

Each algorithm is implemented using Python and C/C++ language, if your platform is not supported then Python implementation is used, otherwise C/C++. Implementation can be chosen by ccore flag (by default it is always 'True' and it means that C/C++ is used), for example:

# As by default - C/C++ part of the library is used
xmeans_instance_1 = xmeans(data_points, start_centers, 20, ccore=True);

# The same - C/C++ part of the library is used by default
xmeans_instance_2 = xmeans(data_points, start_centers, 20);

# Switch off core - Python is used
xmeans_instance_3 = xmeans(data_points, start_centers, 20, ccore=False);

Installation

Installation using pip3 tool:

$ pip3 install pyclustering

Manual installation from official repository using Makefile:

# get sources of the pyclustering library, for example, from repository
$ mkdir pyclustering
$ cd pyclustering/
$ git clone https://github.com/annoviko/pyclustering.git .

# compile CCORE library (core of the pyclustering library).
$ cd ccore/
$ make ccore_64bit      # build for 64-bit OS

# $ make ccore_32bit    # build for 32-bit OS

# return to parent folder of the pyclustering library
$ cd ../

# install pyclustering library
$ python3 setup.py install

# optionally - test the library
$ python3 setup.py test

Manual installation using CMake:

# get sources of the pyclustering library, for example, from repository
$ mkdir pyclustering
$ cd pyclustering/
$ git clone https://github.com/annoviko/pyclustering.git .

# generate build files.
$ mkdir build
$ cmake ..

# build pyclustering-shared target depending on what was generated (Makefile or MSVC solution)
# if Makefile has been generated then
$ make pyclustering-shared

# return to parent folder of the pyclustering library
$ cd ../

# install pyclustering library
$ python3 setup.py install

# optionally - test the library
$ python3 setup.py test

Manual installation using Microsoft Visual Studio solution:

  1. Clone repository from: https://github.com/annoviko/pyclustering.git
  2. Open folder pyclustering/ccore
  3. Open Visual Studio project ccore.sln
  4. Select solution platform: x86 or x64
  5. Build pyclustering-shared project.
  6. Add pyclustering folder to python path or install it using setup.py
# install pyclustering library
$ python3 setup.py install

# optionally - test the library
$ python3 setup.py test

Proposals, Questions, Bugs

In case of any questions, proposals or bugs related to the pyclustering please contact to [email protected] or create an issue here.

PyClustering Status

Branch master 0.10.dev 0.10.1.rel
Build (Linux, MacOS) Build Status Linux MacOS Build Status Linux MacOS 0.10.dev Build Status Linux 0.10.1.rel
Build (Win) Build Status Win Build Status Win 0.10.dev Build Status Win 0.10.1.rel
Code Coverage Coverage Status Coverage Status 0.10.dev Coverage Status 0.10.1.rel

Cite the Library

If you are using pyclustering library in a scientific paper, please, cite the library:

Novikov, A., 2019. PyClustering: Data Mining Library. Journal of Open Source Software, 4(36), p.1230. Available at: http://dx.doi.org/10.21105/joss.01230.

BibTeX entry:

@article{Novikov2019,
    doi         = {10.21105/joss.01230},
    url         = {https://doi.org/10.21105/joss.01230},
    year        = 2019,
    month       = {apr},
    publisher   = {The Open Journal},
    volume      = {4},
    number      = {36},
    pages       = {1230},
    author      = {Andrei Novikov},
    title       = {{PyClustering}: Data Mining Library},
    journal     = {Journal of Open Source Software}
}

Brief Overview of the Library Content

Clustering algorithms and methods (module pyclustering.cluster):

Algorithm Python C++
Agglomerative
BANG  
BIRCH  
BSAS
CLARANS  
CLIQUE
CURE
DBSCAN
Elbow
EMA  
Fuzzy C-Means
GA (Genetic Algorithm)
G-Means
HSyncNet
K-Means
K-Means++
K-Medians
K-Medoids
MBSAS
OPTICS
ROCK
Silhouette
SOM-SC
SyncNet
Sync-SOM  
TTSAS
X-Means

Oscillatory networks and neural networks (module pyclustering.nnet):

Model Python C++
CNN (Chaotic Neural Network)  
fSync (Oscillatory network based on Landau-Stuart equation and Kuramoto model)  
HHN (Oscillatory network based on Hodgkin-Huxley model)
Hysteresis Oscillatory Network  
LEGION (Local Excitatory Global Inhibitory Oscillatory Network)
PCNN (Pulse-Coupled Neural Network)
SOM (Self-Organized Map)
Sync (Oscillatory network based on Kuramoto model)
SyncPR (Oscillatory network for pattern recognition)
SyncSegm (Oscillatory network for image segmentation)

Graph Coloring Algorithms (module pyclustering.gcolor):

Algorithm Python C++
DSatur  
Hysteresis  
GColorSync  

Containers (module pyclustering.container):

Algorithm Python C++
KD Tree
CF Tree  

Examples in the Library

The library contains examples for each algorithm and oscillatory network model:

Clustering examples: pyclustering/cluster/examples

Graph coloring examples: pyclustering/gcolor/examples

Oscillatory network examples: pyclustering/nnet/examples

Where are examples?

Code Examples

Data clustering by CURE algorithm

from pyclustering.cluster import cluster_visualizer;
from pyclustering.cluster.cure import cure;
from pyclustering.utils import read_sample;
from pyclustering.samples.definitions import FCPS_SAMPLES;

# Input data in following format [ [0.1, 0.5], [0.3, 0.1], ... ].
input_data = read_sample(FCPS_SAMPLES.SAMPLE_LSUN);

# Allocate three clusters.
cure_instance = cure(input_data, 3);
cure_instance.process();
clusters = cure_instance.get_clusters();

# Visualize allocated clusters.
visualizer = cluster_visualizer();
visualizer.append_clusters(clusters, input_data);
visualizer.show();

Data clustering by K-Means algorithm

from pyclustering.cluster.kmeans import kmeans, kmeans_visualizer
from pyclustering.cluster.center_initializer import kmeans_plusplus_initializer
from pyclustering.samples.definitions import FCPS_SAMPLES
from pyclustering.utils import read_sample

# Load list of points for cluster analysis.
sample = read_sample(FCPS_SAMPLES.SAMPLE_TWO_DIAMONDS)

# Prepare initial centers using K-Means++ method.
initial_centers = kmeans_plusplus_initializer(sample, 2).initialize()

# Create instance of K-Means algorithm with prepared centers.
kmeans_instance = kmeans(sample, initial_centers)

# Run cluster analysis and obtain results.
kmeans_instance.process()
clusters = kmeans_instance.get_clusters()
final_centers = kmeans_instance.get_centers()

# Visualize obtained results
kmeans_visualizer.show_clusters(sample, clusters, final_centers)

Data clustering by OPTICS algorithm

from pyclustering.cluster import cluster_visualizer
from pyclustering.cluster.optics import optics, ordering_analyser, ordering_visualizer
from pyclustering.samples.definitions import FCPS_SAMPLES
from pyclustering.utils import read_sample

# Read sample for clustering from some file
sample = read_sample(FCPS_SAMPLES.SAMPLE_LSUN)

# Run cluster analysis where connectivity radius is bigger than real
radius = 2.0
neighbors = 3
amount_of_clusters = 3
optics_instance = optics(sample, radius, neighbors, amount_of_clusters)

# Performs cluster analysis
optics_instance.process()

# Obtain results of clustering
clusters = optics_instance.get_clusters()
noise = optics_instance.get_noise()
ordering = optics_instance.get_ordering()

# Visualize ordering diagram
analyser = ordering_analyser(ordering)
ordering_visualizer.show_ordering_diagram(analyser, amount_of_clusters)

# Visualize clustering results
visualizer = cluster_visualizer()
visualizer.append_clusters(clusters, sample)
visualizer.show()

Simulation of oscillatory network PCNN

from pyclustering.nnet.pcnn import pcnn_network, pcnn_visualizer

# Create Pulse-Coupled neural network with 10 oscillators.
net = pcnn_network(10)

# Perform simulation during 100 steps using binary external stimulus.
dynamic = net.simulate(50, [1, 1, 1, 0, 0, 0, 0, 1, 1, 1])

# Allocate synchronous ensembles from the output dynamic.
ensembles = dynamic.allocate_sync_ensembles()

# Show output dynamic.
pcnn_visualizer.show_output_dynamic(dynamic, ensembles)

Simulation of chaotic neural network CNN

from pyclustering.cluster import cluster_visualizer
from pyclustering.samples.definitions import SIMPLE_SAMPLES
from pyclustering.utils import read_sample
from pyclustering.nnet.cnn import cnn_network, cnn_visualizer

# Load stimulus from file.
stimulus = read_sample(SIMPLE_SAMPLES.SAMPLE_SIMPLE3)

# Create chaotic neural network, amount of neurons should be equal to amount of stimulus.
network_instance = cnn_network(len(stimulus))

# Perform simulation during 100 steps.
steps = 100
output_dynamic = network_instance.simulate(steps, stimulus)

# Display output dynamic of the network.
cnn_visualizer.show_output_dynamic(output_dynamic)

# Display dynamic matrix and observation matrix to show clustering phenomenon.
cnn_visualizer.show_dynamic_matrix(output_dynamic)
cnn_visualizer.show_observation_matrix(output_dynamic)

# Visualize clustering results.
clusters = output_dynamic.allocate_sync_ensembles(10)
visualizer = cluster_visualizer()
visualizer.append_clusters(clusters, stimulus)
visualizer.show()

Illustrations

Cluster allocation on FCPS dataset collection by DBSCAN:

Clustering by DBSCAN

Cluster allocation by OPTICS using cluster-ordering diagram:

Clustering by OPTICS

Partial synchronization (clustering) in Sync oscillatory network:

Partial synchronization in Sync oscillatory network

Cluster visualization by SOM (Self-Organized Feature Map)

Cluster visualization by SOM

Comments
  • Performance Issue - OPTICS

    Performance Issue - OPTICS

    I am running OPTICS algorithm on 50k data points, since the data is text it has around 5k features. The time taken to run the program seems huge. Tried using ccore but doesnt seem to improve. Is there any way that I could improve performance.

    Investigation Optimization 
    opened by swetha0613 19
  • ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

    ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

    Thank you for your library, it is very useful for me and the data mining community. I wanted to run birch algorithm but I had this error from the cftree.py: if (merged_entry.get_diameter() > self.__threshold): ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().

    Also when I want to use the parameter diameter when I instantiate the birch algorithm, I get this error: birch_instance = birch(x,3,diameter=0.1) TypeError: init() got an unexpected keyword argument 'diameter'.

    One last question, would it be possible to leave the parameter number_clusters optional to let the user use other clustering algorithms in the last step of birch instead of the hierarchical method?

    Bug Question 
    opened by nabilEM 13
  • How to use pyclustering kmedoids using gower distance matrix?

    How to use pyclustering kmedoids using gower distance matrix?

    Hi,

    Not sure if this has already been asked but I have a dataframe consisting of categorical and numerical data. I want to cluster this data to extract features. I use the following code from https://sourceforge.net/projects/gower-distance-4python/files/ to calculate the gower distance.

    My code is as follows:

    `import pyclustering 
    
    from sklearn.metrics.pairwise import pairwise_distances
    import numpy as np    
    from pyclustering.cluster.kmedoids import kmedoids;
    from pyclustering.utils import read_sample;
    from pyclustering.cluster.center_initializer import kmeans_plusplus_initializer
    from pyclustering.cluster.elbow import elbow
    from pyclustering.cluster.kmeans import kmeans
    from pyclustering.cluster.encoder import cluster_encoder, type_encoding
    
    D = gower_distances(filtOrdersGower_subset)
    initial_medoids = kmeans_plusplus_initializer(D, 4).initialize(return_index=True)
    kmedoids_instance = kmedoids(D,initial_medoids, data_type='distance_matrix');
    
    kmedoids_instance.process();
    clusters = kmedoids_instance.get_clusters();
    `
    

    how do i plot these clusters/ get what features in my data are most important? New to pyclustering @annoviko

    Question 
    opened by zahs123 13
  • k-medioids with custom distance

    k-medioids with custom distance

    I am new to pyclustering. Rummaging through the source code I didn't see how I could insert custom distance (either by passing a callable that computes pairwise distance or a precomputed distance matrix). Could you help? Thanks.

    To be more specific, the following is the sort of thing I'm talking about:

    import numpy as np
    from scipy.cluster.hierarchy import linkage, fcluster
    
    def my_dist(u,v): # exemplifying using a weird distance metric.
        return (u + v).sum()
    
    data = np.array([[1,2,3,4],
                     [5,6,7,8]])
    clust = linkage(data, method='average', metric=my_dist)
    prediction = fcluster(clust, 2, criterion='maxclust')
    
    Question Proposal 
    opened by suwangcompling 13
  • Anyway to lose the matplotlib dependency or make it optional?

    Anyway to lose the matplotlib dependency or make it optional?

    I'm getting the following:

    Traceback (most recent call last):
      File "/Users/alex/dev/something/extractor/ml.py", line 8, in <module>
        from pyclustering.cluster.kmeans import kmeans
      File "/Users/alex/dev/venv/content-extractor-3pp1FAW1/lib/python3.6/site-packages/pyclustering/cluster/__init__.py", line 26, in <module>
        import matplotlib.pyplot as plt;
      File "/Users/alex/dev/venv/content-extractor-3pp1FAW1/lib/python3.6/site-packages/matplotlib/pyplot.py", line 115, in <module>
        _backend_mod, new_figure_manager, draw_if_interactive, _show = pylab_setup()
      File "/Users/alex/dev/venv/content-extractor-3pp1FAW1/lib/python3.6/site-packages/matplotlib/backends/__init__.py", line 62, in pylab_setup
        [backend_name], 0)
      File "/Users/alex/dev/venv/content-extractor-3pp1FAW1/lib/python3.6/site-packages/matplotlib/backends/backend_macosx.py", line 17, in <module>
        from matplotlib.backends import _macosx
    
    RuntimeError: Python is not installed as a framework. The Mac OS X
    backend will not be able to function correctly if Python is not
    installed as a framework. See the Python documentation for more
    information on installing Python as a framework on Mac OS X. Please
    either reinstall Python as a framework, or try one of the other
    backends. If you are using (Ana)Conda please install python.app and
    replace the use of 'python' with 'pythonw'. See 'Working with
    Matplotlib on OSX' in the Matplotlib FAQ for more information.
    

    Don't need plotting, just the clusters :-/ Perhaps move the

    import matplotlib.pyplot as plt;
    import matplotlib.gridspec as gridspec;
    

    inside the show() function?

    Investigation Optimization 
    opened by awhillas 10
  • Missing labels_ and predict function for K-Means

    Missing labels_ and predict function for K-Means

    Great work. but k-means missing labels_ and predict function like sklearn ~ https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html#sklearn.cluster.KMeans

    Enhancement Proposal 
    opened by liufsd 9
  • [ccore]ccore.so can not find

    [ccore]ccore.so can not find

    when i use "python xmeans_examples.py" at first it runs correctly,but after a few seconds,i got this problem: image could you help me.thanks so much!~

    Question 
    opened by paulinsider 9
  • G-Means: Setting maximum number of clusters like for X-Means

    G-Means: Setting maximum number of clusters like for X-Means

    Hi,

    I wanted to ask if it is possible to add a k_max parameter to the call to gmeans? So Similar to xmeans, which support this parameter. The reason is that gmeans returns for some datasets a really large number of clusters (sometimes it is even the same of the size of the dataset, which is the worst case). I do not know the reason behind this, but it would be nice if I could limit the number of clusters as I can do for xmeans.

    Enhancement Proposal 
    opened by tschechlovdev 8
  • [pyclustering.cluster.kmedians] exception: access violation reading 0x(Memory Address)

    [pyclustering.cluster.kmedians] exception: access violation reading 0x(Memory Address)

    Hi,

    When using kmedians, i get an error related to an access violation when reading a memory address. This happens if i use ccore=True. If i use ccore=False kmedians_obj.process() returns no medians or clusters. My guess is that it is related to the number of clusters (and maybe tolerance), although i am not sure. It happens when the ratio number of clusters to points is below 1/10 (The number of values and clusters i was using when got it was between 380-450 for 45 clusters). However, it might be interesting to try to capture the error so it is more informative.

    Thanks a lot for your work!

    Bug Question 
    opened by jordiarjona 8
  • [pyclustering.cluster.rock] Use ROCK for clustering data set.

    [pyclustering.cluster.rock] Use ROCK for clustering data set.

    Hi,

    I am trying to use the Robust Clustering Algorithm for Categorical Attributes (ROCK) algorithm on a data set containing categorical attributes but getting an error that data can not be str. How can I use this method with categorical data set.

    Thanks, Naser

    Question 
    opened by NaserMonsefi 8
  • kmedoids returns empty cluster lists for version 0.10.1

    kmedoids returns empty cluster lists for version 0.10.1

    Hi,

    Previously, code working on one server with version 0.9.3.1 worked as expected. However, the same code run on a different server with version 0.10.1 returned some empty clusters for the same dataset and initial medoids.

    initial_medoids=[0,1,2,3] kmedoids_instance=kmedoids(df2,initial_medoids,metric=metric) kmedoids_instance.process() clusters=kmedoids_instance.get_clusters() medoids=kmedoids_instance.get_medoids() print(clusters)

    The above would return indices for clusters 0 and 1 but empty lists for clusters 2 and 3, despite there not being any missing in my data df2. I would expect at the very least, the medoids themselves to be in clusters 2 and 3.

    Thank you, this is a great package, I really appreciate it.

    Lauren

    Bug 
    opened by laurenleesc 7
  • Reference for the

    Reference for the "Elbow length" method?

    The documentation of the elbow package suggests this is based on the reference Thorndike 1953: https://github.com/annoviko/pyclustering/blob/bf4f51a472622292627ec8c294eb205585e50f52/pyclustering/cluster/elbow.py#L4 https://github.com/annoviko/pyclustering/blob/bf4f51a472622292627ec8c294eb205585e50f52/docs/citation.bib#L552-L556 Yet, I cannot find the "Elbow length" equation used in this reference, in fact he appears very skeptical that such elbows can be reliably identified (for a good reason...). Is there another reference for this particular method?

    opened by kno10 0
  • xmeans does not agree to paper?

    xmeans does not agree to paper?

    The last term, p * 0.5 * log(N), should be in the sum only once IMHO. It is in the top BIC equation (j is the model index, not the cluster index), not in the l(Dn) equation where n is the cluster index) in https://web.cs.dal.ca/~shepherd/courses/csci6403/clustering/xmeans.pdf No guarantees that everything else is fine.

    I also rename sigma_sqrt to sigma_sq because it is supposed to be sigma square, not square root.

    Note that if sigma_multiplier = float('-inf'), the result will always be infinity, won't it?

    opened by kno10 0
  • Build failed: 'numeric_limits' is not a member of 'std'

    Build failed: 'numeric_limits' is not a member of 'std'

    platform: Arch Linux
    gcc version 12.1.1 20220730 (GCC) 
    

    When buildling package, gcc throws error:

    In file included from src/cluster/bsas.cpp:10:
    ./include/pyclustering/cluster/bsas.hpp:92:44: error: 'numeric_limits' is not a member of 'std'
       92 |         double        m_distance    = std::numeric_limits<double>::max();   /**< Distance between the cluster and a specific point. */
          |                                            ^~~~~~~~~~~~~~
    ./include/pyclustering/cluster/bsas.hpp:92:59: error: expected primary-expression before 'double'
       92 |         double        m_distance    = std::numeric_limits<double>::max();   /**< Distance between the cluster and a specific point. */
          |                                                           ^~~~~~
    make[1]: *** [ccore.mk:154: obj/ccore/64-bit/cluster/bsas.o] Error 1
    make[1]: *** Waiting for unfinished jobs....
    In file included from ./include/pyclustering/cluster/mbsas.hpp:12,
                     from src/cluster/mbsas.cpp:10:
    ./include/pyclustering/cluster/bsas.hpp:92:44: error: 'numeric_limits' is not a member of 'std'
       92 |         double        m_distance    = std::numeric_limits<double>::max();   /**< Distance between the cluster and a specific point. */
          |                                            ^~~~~~~~~~~~~~
    ./include/pyclustering/cluster/bsas.hpp:92:59: error: expected primary-expression before 'double'
       92 |         double        m_distance    = std::numeric_limits<double>::max();   /**< Distance between the cluster and a specific point. */
          |                                                           ^~~~~~
    make[1]: *** [ccore.mk:154: obj/ccore/64-bit/cluster/mbsas.o] Error 1
    src/cluster/agglomerative.cpp: In member function 'void pyclustering::clst::agglomerative::merge_by_average_link()':
    src/cluster/agglomerative.cpp:89:44: error: 'numeric_limits' is not a member of 'std'
       89 |     double minimum_average_distance = std::numeric_limits<double>::max();
          |                                            ^~~~~~~~~~~~~~
    src/cluster/agglomerative.cpp:89:59: error: expected primary-expression before 'double'
       89 |     double minimum_average_distance = std::numeric_limits<double>::max();
          |                                                           ^~~~~~
    src/cluster/agglomerative.cpp: In member function 'void pyclustering::clst::agglomerative::merge_by_centroid_link()':
    src/cluster/agglomerative.cpp:123:44: error: 'numeric_limits' is not a member of 'std'
      123 |     double minimum_average_distance = std::numeric_limits<double>::max();
          |                                            ^~~~~~~~~~~~~~
    src/cluster/agglomerative.cpp:123:59: error: expected primary-expression before 'double'
      123 |     double minimum_average_distance = std::numeric_limits<double>::max();
          |                                                           ^~~~~~
    src/cluster/agglomerative.cpp: In member function 'void pyclustering::clst::agglomerative::merge_by_complete_link()':
    src/cluster/agglomerative.cpp:149:45: error: 'numeric_limits' is not a member of 'std'
      149 |     double minimum_complete_distance = std::numeric_limits<double>::max();
          |                                             ^~~~~~~~~~~~~~
    src/cluster/agglomerative.cpp:149:60: error: expected primary-expression before 'double'
      149 |     double minimum_complete_distance = std::numeric_limits<double>::max();
          |                                                            ^~~~~~
    src/cluster/agglomerative.cpp: In member function 'void pyclustering::clst::agglomerative::merge_by_signle_link()':
    src/cluster/agglomerative.cpp:184:43: error: 'numeric_limits' is not a member of 'std'
      184 |     double minimum_single_distance = std::numeric_limits<double>::max();
          |                                           ^~~~~~~~~~~~~~
    src/cluster/agglomerative.cpp:184:58: error: expected primary-expression before 'double'
      184 |     double minimum_single_distance = std::numeric_limits<double>::max();
          |                                                          ^~~~~~
    src/cluster/agglomerative.cpp:193:54: error: 'numeric_limits' is not a member of 'std'
      193 |             double candidate_minimum_distance = std::numeric_limits<double>::max();
          |                                                      ^~~~~~~~~~~~~~
    src/cluster/agglomerative.cpp:193:69: error: expected primary-expression before 'double'
      193 |             double candidate_minimum_distance = std::numeric_limits<double>::max();
          |                                                                     ^~~~~~
    make[1]: *** [ccore.mk:154: obj/ccore/64-bit/cluster/agglomerative.o] Error 1
    make[1]: Leaving directory '/tmp/makepkg/python-pyclustering-git/src/pyclustering/ccore'
    make: *** [makefile:53: ccore_64bit] Error 2
    
    opened by Catty2014 0
  • predict error for kmeans

    predict error for kmeans

    `from pyclustering.cluster.kmeans import kmeans, kmeans_visualizer from pyclustering.cluster.center_initializer import kmeans_plusplus_initializer from pyclustering.samples.definitions import FCPS_SAMPLES from pyclustering.utils import read_sample samples = read_sample(FCPS_SAMPLES.SAMPLE_TWO_DIAMONDS) initial_centers = kmeans_plusplus_initializer(samples, 2).initialize() kmeans_instance = kmeans(samples, initial_centers) kmeans_instance.process() clusters = kmeans_instance.get_clusters() final_centers = kmeans_instance.get_centers()

    kmeans_instance.predict(samples)`

    and i meet this:


    AttributeError Traceback (most recent call last) /tmp/ipykernel_20827/3994711565.py in ----> 1 kmeans_instance.predict(samples)

    ~/envs/envs/spark_seg/lib/python3.7/site-packages/pyclustering/cluster/kmeans.py in predict(self, points) 441 for index_point in range(len(nppoints)): 442 if self.__metric.get_type() != type_metric.USER_DEFINED: --> 443 differences[index_point] = self.__metric(nppoints[index_point], self.__centers) 444 else: 445 differences[index_point] = [self.__metric(nppoints[index_point], center) for center in self.__centers]

    ~/envs/envs/spark_seg/lib/python3.7/site-packages/pyclustering/utils/metric.py in call(self, point1, point2) 130 131 """ --> 132 return self.__calculator(point1, point2) 133 134

    ~/envs/envs/spark_seg/lib/python3.7/site-packages/pyclustering/utils/metric.py in euclidean_distance_square_numpy(object1, object2) 368 369 """ --> 370 if len(object1.shape) > 1 or len(object2.shape) > 1: 371 return numpy.sum(numpy.square(object1 - object2), axis=1).T 372 else:

    AttributeError: 'list' object has no attribute 'shape'

    opened by BeHappyForMe 0
  • (Minor issue) Typo in repository description (pyclustring)

    (Minor issue) Typo in repository description (pyclustring)

    There is an "e" missing in the word "pyclustring" in the repository description. It should say "PyClustering" instead of "pyclustring".

    typo

    opened by 99991 0
Releases(0.10.1.2)
  • 0.10.1.2(Nov 25, 2020)

  • 0.10.1.1(Nov 24, 2020)

  • 0.10.1(Nov 19, 2020)

    pyclustering 0.10.1 library is a collection of clustering algorithms, oscillatory networks, etc.

    GENERAL CHANGES:

    • The library is distributed under BSD-3-Clause library. See: https://github.com/annoviko/pyclustering/issues/517

    • C++ pyclustering can be built using CMake. See: https://github.com/annoviko/pyclustering/issues/603

    • Supported dumping and loading for DBSCAN algorithm via pickle (Python: pyclustering.cluster.dbscan). See: https://github.com/annoviko/pyclustering/issues/650

    • Package installer resolves all required dependencies automatically. See: https://github.com/annoviko/pyclustering/issues/647

    • Introduced human-readable error for genetic clustering algorithm in case of non-normalized data (Python: pyclustering.cluster.ga). See: https://github.com/annoviko/pyclustering/issues/597

    • Optimized windows implementation parallel_for and parallel_for_each by using pyclustering::parallel instead of PPL that affects all algorithms which use these functions (C++: pyclustering::parallel). See: https://github.com/annoviko/pyclustering/issues/642

    • Optimized parallel_for algorithm for short cycles that affects all algorithms which use parallel_for (C++: pyclustering::parallel). See: https://github.com/annoviko/pyclustering/issues/642

    • Introduced kstep parameter for elbow algorithm to use custom K search steps (Python: pyclustering.cluster.elbow, C++: pyclustering::cluster::elbow). See: https://github.com/annoviko/pyclustering/issues/489

    • Introduced p_step parameter for parallel_for function (C++: pyclustering::parallel). See: https://github.com/annoviko/pyclustering/issues/640

    • Optimized python implementation of K-Medoids algorithm (Python: pyclustering.cluster.kmedoids). See: https://github.com/annoviko/pyclustering/issues/526

    • C++ pyclustering CLIQUE interface returns human-readable errors (Python: pyclustering.cluster.clique). See: https://github.com/annoviko/pyclustering/issues/635 See: https://github.com/annoviko/pyclustering/issues/634

    • Introduced metric parameter for X-Means algorithm to use custom metric for clustering (Python: pyclustering.cluster.xmeans; C++ pyclustering::clst::xmeans). See: https://github.com/annoviko/pyclustering/issues/619

    • Introduced alpha and beta probabilistic bounds for MNDL splitting criteria for X-Means algorithm (Python: pyclustering.cluster.xmeans; C++: pyclustering::clst::xmeans). See: https://github.com/annoviko/pyclustering/issues/624

    CORRECTED MAJOR BUGS:

    • Corrected bug with a command python3 -m pyclustering.tests that was using the current folder to find tests to run (Python: pyclustering). See: https://github.com/annoviko/pyclustering/issues/648

    • Corrected bug with Elbow algorithm where kmax is not used to calculate K (Python: pyclustering.cluster.elbow; C++: pyclustering::clst::elbow). See: https://github.com/annoviko/pyclustering/issues/639

    • Corrected implementation of K-Medians (PAM) algorithm that is aligned with original algorithm (Python: pyclustering.cluster.kmedoids; C++: pyclustering::clst::kmedoids). See: https://github.com/annoviko/pyclustering/issues/503

    • Corrected literature references that were for K-Medians (PAM) implementation (Python: pyclustering.cluster.kmedoids). See: https://github.com/annoviko/pyclustering/pull/572

    • Corrected bug when K-Medoids updates input parameter initial_medoids that were provided to the algorithm (Python: pyclustering.cluster.kmedoids). See: https://github.com/annoviko/pyclustering/issues/630

    • Corrected bug with Euclidean distance when numpy is used (Python: pyclustering.utils.metric). See: https://github.com/annoviko/pyclustering/issues/625

    • Corrected bug with Minkowski distance when numpy is used (Python: pyclustering.utils.metric). See: https://github.com/annoviko/pyclustering/issues/626

    • Corrected bug with Gower distance when numpy calculation is used and data shape is bigger than 1 (Python: pyclustering.utils.metric). See: https://github.com/annoviko/pyclustering/issues/627

    • Corrected MNDL splitting criteria for X-Means algorithm (Python: pyclustering.cluster.xmeans; C++: pyclustering::clst::xmeans). See: https://github.com/annoviko/pyclustering/issues/623

    Source code(tar.gz)
    Source code(zip)
    pyclustering-0.10.1.tar.gz(2.45 MB)
  • 0.10.0.1(Aug 17, 2020)

    pyclustering 0.10.0.1 library is a collection of clustering algorithms and methods, oscillatory networks, etc.

    GENERAL CHANGES:

    • Metadata of the library is updated. See: no reference

    • Supported command test for setup.py script (Python: pyclustering). See: https://github.com/annoviko/pyclustering/issues/607

    • Introduced parameter random_seed for algorithms/models to control the seed of the random functionality: kmeans++, random_center_initializer, ga, gmeans, xmeans, som, somsc, elbow, silhouette_ksearch (Python: pyclustering.cluster; C++: pyclustering.clst). See: https://github.com/annoviko/pyclustering/issues/578

    • Introduced parameter k_max to G-Means algorithm to use it as an optional stop condition for the algorithm (Python: pyclustering.cluster.gmeans; C++: pyclustering::clst::gmeans). See: https://github.com/annoviko/pyclustering/issues/602

    • Implemented method save() for cluster_visualizer and cluster_visualizer_multidim to save visualization to file (Python: pyclustering.cluster). See: https://github.com/annoviko/pyclustering/issues/601

    • Optimization of CURE algorithm using balanced KD-tree (Python: pyclustering.cluster.cure; C++: pyclustering::clst::cure). See: https://github.com/annoviko/pyclustering/issues/589

    • Optimization of OPTICS algorithm using balanced KD-tree (Python: pyclustering.cluster.optics; C++: pyclustering::clst::optics). See: https://github.com/annoviko/pyclustering/issues/588

    • Optimization of DBSCAN algorithm using balanced KD-tree (Python: pyclustering.cluster.dbscan; C++: pyclustering::clst::dbscan). See: https://github.com/annoviko/pyclustering/issues/587

    • Implemented new optimized balanced KD-tree kdtree_balanced (Python: pyclustering.cluster.kdtree; C++: pyclustering::container::kdtree_balanced). See: https://github.com/annoviko/pyclustering/issues/379

    • Implemented KD-tree graphical visualizer kdtree_visualizer for KD-trees with 2-dimensional data (Python: pyclustering.container.kdtree). See: https://github.com/annoviko/pyclustering/issues/586

    • Updated interface of each clustering algorithm in C/C++ pyclustering cluster_data is substituted by concrete classes (C++ pyclustering::clst). See: https://github.com/annoviko/pyclustering/issues/577

    CORRECTED MAJOR BUGS:

    • Bug with wrong data type for scores in Silhouette K-search algorithm in case of using C++ (Python: pyclustering.cluster.silhouette). See: https://github.com/annoviko/pyclustering/issues/606

    • Bug with a random distribution in the random center initializer (Python: pyclustering.cluster.center_initializer). See: https://github.com/annoviko/pyclustering/issues/573

    • Bug with incorrect converting Index List and Object List to Labeling when clusters do not contains one or more points from an input data (Python pyclustering.cluster.encoder). See: https://github.com/annoviko/pyclustering/issues/596

    • Bug with an exception in case of using user-defined metric for K-Means algorithm (Python pyclustering.cluster.kmeans). See: https://github.com/annoviko/pyclustering/pull/600

    • Memory leakage in the interface between python and C++ pyclustering library in case of CURE algorithm usage (C++ pyclustering). See: https://github.com/annoviko/pyclustering/issues/581

    Source code(tar.gz)
    Source code(zip)
    pyclustering-0.10.0.1.tar.gz(2.55 MB)
  • 0.10.0(Aug 17, 2020)

    pyclustering 0.10.0 library is a collection of clustering algorithms and methods, oscillatory networks, etc.

    GENERAL CHANGES:

    • Supported command test for setup.py script (Python: pyclustering). See: https://github.com/annoviko/pyclustering/issues/607

    • Introduced parameter random_seed for algorithms/models to control the seed of the random functionality: kmeans++, random_center_initializer, ga, gmeans, xmeans, som, somsc, elbow, silhouette_ksearch (Python: pyclustering.cluster; C++: pyclustering.clst). See: https://github.com/annoviko/pyclustering/issues/578

    • Introduced parameter k_max to G-Means algorithm to use it as an optional stop condition for the algorithm (Python: pyclustering.cluster.gmeans; C++: pyclustering::clst::gmeans). See: https://github.com/annoviko/pyclustering/issues/602

    • Implemented method save() for cluster_visualizer and cluster_visualizer_multidim to save visualization to file (Python: pyclustering.cluster). See: https://github.com/annoviko/pyclustering/issues/601

    • Optimization of CURE algorithm using balanced KD-tree (Python: pyclustering.cluster.cure; C++: pyclustering::clst::cure). See: https://github.com/annoviko/pyclustering/issues/589

    • Optimization of OPTICS algorithm using balanced KD-tree (Python: pyclustering.cluster.optics; C++: pyclustering::clst::optics). See: https://github.com/annoviko/pyclustering/issues/588

    • Optimization of DBSCAN algorithm using balanced KD-tree (Python: pyclustering.cluster.dbscan; C++: pyclustering::clst::dbscan). See: https://github.com/annoviko/pyclustering/issues/587

    • Implemented new optimized balanced KD-tree kdtree_balanced (Python: pyclustering.cluster.kdtree; C++: pyclustering::container::kdtree_balanced). See: https://github.com/annoviko/pyclustering/issues/379

    • Implemented KD-tree graphical visualizer kdtree_visualizer for KD-trees with 2-dimensional data (Python: pyclustering.container.kdtree). See: https://github.com/annoviko/pyclustering/issues/586

    • Updated interface of each clustering algorithm in C/C++ pyclustering cluster_data is substituted by concrete classes (C++ pyclustering::clst). See: https://github.com/annoviko/pyclustering/issues/577

    CORRECTED MAJOR BUGS:

    • Bug with wrong data type for scores in Silhouette K-search algorithm in case of using C++ (Python: pyclustering.cluster.silhouette). See: https://github.com/annoviko/pyclustering/issues/606

    • Bug with a random distribution in the random center initializer (Python: pyclustering.cluster.center_initializer). See: https://github.com/annoviko/pyclustering/issues/573

    • Bug with incorrect converting Index List and Object List to Labeling when clusters do not contains one or more points from an input data (Python pyclustering.cluster.encoder). See: https://github.com/annoviko/pyclustering/issues/596

    • Bug with an exception in case of using user-defined metric for K-Means algorithm (Python pyclustering.cluster.kmeans). See: https://github.com/annoviko/pyclustering/pull/600

    • Memory leakage in the interface between python and C++ pyclustering library in case of CURE algorithm usage (C++ pyclustering). See: https://github.com/annoviko/pyclustering/issues/581

    Source code(tar.gz)
    Source code(zip)
  • 0.9.3.1(Dec 24, 2019)

  • 0.9.3(Dec 23, 2019)

    pyclustering 0.9.3 library is a collection of clustering algorithms and methods, oscillatory networks, etc.

    GENERAL CHANGES:

    • Introduced get_cf_clusters and get_cf_entries methods for BIRCH algorithm to get CF-entry encoding information (pyclustering.cluster.birch). See: https://github.com/annoviko/pyclustering/issues/569

    • Introduced predict method for SOMSC algorithm to find closest clusters for specified points (pyclustering.cluster.somsc). See: https://github.com/annoviko/pyclustering/issues/546

    • Parallel optimization of C++ pyclustering compilation process. See: https://github.com/annoviko/pyclustering/issues/553

    • Include folder for easy integration to other C++ projects. See: https://github.com/annoviko/pyclustering/issues/554

    • Introduced new targets to build static libraries on Windows platform. See: https://github.com/annoviko/pyclustering/issues/555

    • Introduced new targets to build static libraries on Linux/MacOS platforms. See: https://github.com/annoviko/pyclustering/issues/556

    CORRECTED MAJOR BUGS:

    • Bug with incorrect finding of closest CF-entry (pyclustering.container.cftree). See: https://github.com/annoviko/pyclustering/issues/564

    • Bug with incorrect BIRCH clustering due incorrect leaf analysis (pyclustering.cluster.birch). See: https://github.com/annoviko/pyclustering/issues/563

    • Bug with incorrect search procedure of farthest nodes in CF-tree (pyclustering.container.cftree). See: https://github.com/annoviko/pyclustering/issues/551

    • Bug with crash during clustering with the same points in case of BIRCH (pyclustering.cluster.birch). See: https://github.com/annoviko/pyclustering/issues/561

    Source code(tar.gz)
    Source code(zip)
    pyclustering-0.9.3-binaries-all.tar.gz(2.51 MB)
  • 0.9.2(Oct 10, 2019)

    pyclustering 0.9.2 library is a collection of clustering algorithms and methods, oscillatory networks, etc.

    GENERAL CHANGES:

    • Introduced checking of input arguments for clustering algorithm to provide human-readable errors (pyclustering.cluster). See: https://github.com/annoviko/pyclustering/issues/548

    • Implemented functionality to perform Anderson-Darling test for Gaussian distribution (ccore.stats). See: https://github.com/annoviko/pyclustering/issues/550

    • Implemented new clustering algorithm G-Means (pyclustering.cluster.gmeans, ccore.clst.gmeans). See: https://github.com/annoviko/pyclustering/issues/506

    • Introduced parameter repeat to improve parameters in X-Means algorithm (pyclustering.cluster.xmeans, ccore.clst.xmeans). See: https://github.com/annoviko/pyclustering/issues/525

    • Introduced new distance metric: Gower (pyclustering.utils.metric, ccore.utils.metric). See: https://github.com/annoviko/pyclustering/issues/544

    • Introduced sampling algorithms reservoir_r and reservoir_x (pyclustering.utils.sampling). See: https://github.com/annoviko/pyclustering/issues/542

    • Introduced parameter data_type to Silhouette method to use distance matrix (pyclustering.cluster.silhouette, ccore.clst.silhouette). See: https://github.com/annoviko/pyclustering/issues/543

    • Optimization of HHN (Hodgkin-Huxley Neural Network) by parallel processing (ccore.nnet.hhn). See: https://github.com/annoviko/pyclustering/issues/541

    • Introduced get_total_wce method for xmeans algorithm to find WCE (pyclustering.cluster.xmeans). See: https://github.com/annoviko/pyclustering/issues/508

    CORRECTED MAJOR BUGS:

    • Incorrect center initialization in K-Means++ when candidates are not farthest (pyclustering.cluster.center_initializer). See: https://github.com/annoviko/pyclustering/issues/549
    Source code(tar.gz)
    Source code(zip)
    pyclustering-0.9.2-binaries-all.tar.gz(2.50 MB)
  • 0.9.1(Sep 4, 2019)

    pyclustering 0.9.1 library is a collection of clustering algorithms and methods, oscillatory networks, neural networks, etc.

    GENERAL CHANGES:

    • Introduced predict method for X-Means algorithm to find closest clusters for particular points (pyclustering.cluster.xmeans). See: https://github.com/annoviko/pyclustering/issues/540

    • Optimization of OPTICS algorithm by reducing complexity (ccore.clst.optics). See: https://github.com/annoviko/pyclustering/issues/521

    • Optimization of K-Medians algorithm by parallel processing (ccore.clst.kmedians). See: https://github.com/annoviko/pyclustering/issues/529

    • Introduced predict method for K-Medoids algorithm to find closest clusters for particular points (pyclustering.cluster.kmedoids). See: https://github.com/annoviko/pyclustering/issues/527

    • Introduced predict method for K-Means algorithm to find closest clusters for particular points (pyclustering.cluster.kmeans). See: https://github.com/annoviko/pyclustering/issues/515

    • Parallel optimization of Elbow method. (ccore.clst.elbow). See: https://github.com/annoviko/pyclustering/issues/511

    Source code(tar.gz)
    Source code(zip)
    pyclustering-0.9.1-binaries-all.tar.gz(2.41 MB)
  • 0.9.0(Apr 18, 2019)

    pyclustering 0.9.0 library is a collection of clustering algorithms and methods, oscillatory networks, neural networks, etc.

    GENERAL CHANGES:

    • CCORE (pyclustering core) is supported for MacOS. See: https://github.com/annoviko/pyclustering/issues/486

    • Introduced parallel Fuzzy C-Means algorithm (pyclustering.cluster.fcm, ccore.clst.fcm). See: https://github.com/annoviko/pyclustering/issues/386

    • Introduced new 'itermax' parameter for K-Means, K-Medians, K-Medoids algorithm to control maximum amount of iterations (pyclustering.cluster, ccore.clst). See: https://github.com/annoviko/pyclustering/issues/496

    • Implemented Silhouette and Silhouette K-Search algorithm for CCORE (ccore.clst.silhouette, ccore.clst.silhouette_ksearch). See: https://github.com/annoviko/pyclustering/issues/490

    • Implemented CLIQUE algorithms (pyclustering.cluster.clique, ccore.clst.clique). See: https://github.com/annoviko/pyclustering/issues/381

    • Introduced new distance metrics: Canberra and Chi Square (pyclustering.utils.metric, ccore.utils.metric). See: https://github.com/annoviko/pyclustering/issues/482

    • Optimization of CURE algorithm (C++ implementation) by using heap (multiset) instead of list to store clusters in queue (ccore.clst.cure). See: https://github.com/annoviko/pyclustering/issues/479

    CORRECTED MAJOR BUGS:

    • Bug with crossover mask generation for genetic clustering algorithm (pyclustering.cluster.ga). See: https://github.com/annoviko/pyclustering/pull/474

    • Bug with hanging of K-Medians algorithm for some cases when algorithm is initialized by wrong amount of centers (ccore.clst.kmedians). See: https://github.com/annoviko/pyclustering/issues/498

    • Bug with incorrect center initialization, when the same point can be placed to result more than once (pyclustering.cluster.center_initializer, ccore.clst.kmeans_plus_plus). See: https://github.com/annoviko/pyclustering/issues/497

    • Bug with incorrect clustering in case of CURE python implementation when clusters are allocated incorrectly (pyclustering.cluster.cure). See: https://github.com/annoviko/pyclustering/issues/483

    • Bug with incorrect distance calculation for kmeans++ in case of index representation for centers (pyclustering.cluster.center_initializer). See: https://github.com/annoviko/pyclustering/issues/485

    Source code(tar.gz)
    Source code(zip)
    pyclustering-0.9.0-binaries-all.tar.gz(2.35 MB)
  • 0.8.2-joss(Apr 11, 2019)

    pyclustering 0.8.2-joss library is a collection of clustering algorithms and methods, oscillatory networks, neural networks, etc.

    It is a special release for JOSS (The Journal of Open Source Software). This version contains only cosmetic changes related to documentation and project description that have been introduced after JOSS reivew.

    Source code(tar.gz)
    Source code(zip)
  • 0.8.2(Nov 19, 2018)

    pyclustering 0.8.2 library is a collection of clustering algorithms and methods, oscillatory networks, neural networks, etc.

    GENERAL CHANGES:

    • Implemented Silhouette method and Silhouette KSearcher to find out proper amount of clusters (pyclustering.cluster.silhouette). See: https://github.com/annoviko/pyclustering/issues/416

    • Introduced new 'return_index' parameter for kmeans_plus_plus and random_center_initializer algorithms (method 'initialize') to initialize initial medoids (pyclustering.cluster.center_initializer). See: https://github.com/annoviko/pyclustering/issues/421

    • Display warning instead of throwing error if matplotlib or Pillow cannot be imported (MAC OS X problems). See: https://github.com/annoviko/pyclustering/issues/455

    • Implemented Random Center Initializer for CCORE (ccore.clst.random_center_initializer). See: no reference.

    • Implemented Elbow method to find out proper amount of clusters in dataset (pyclustering.cluster.elbow, ccore.clst.elbow). See: https://github.com/annoviko/pyclustering/issues/416

    • Introduced new method 'get_optics_objects' for OPTICS algorithm to obtain detailed information about ordering (pyclustering.cluster.optics, ccore.clst.optics). See: https://github.com/annoviko/pyclustering/issues/464

    • Added new clustering answers for SAMPLE SIMPLE data collections (pyclustering.samples). See: https://github.com/annoviko/pyclustering/issues/459

    • Implemented multidimensional cluster visualizer (pyclustering.cluster). See: https://github.com/annoviko/pyclustering/issues/450

    • Parallel optimization of K-Medoids algorithm (ccore.clst.kmedoids). See: https://github.com/annoviko/pyclustering/issues/447

    • Parallel optimization of K-Means and X-Means (that uses K-Means) algorithms (ccore.clst.kmeans, ccore.clst.xmeans). See: https://github.com/annoviko/pyclustering/issues/451

    • Introduced new threshold parameter 'amount of block points' to BANG algorithm to allocate outliers more precisely (pyclustering.cluster.bang). See: https://github.com/annoviko/pyclustering/issues/446

    • Optimization of conveying results from C++ to Python for K-Medians and K-Medoids (pyclustering.cluster.kmedoids, pyclustering.cluster.kmedians). See: https://github.com/annoviko/pyclustering/issues/445

    • Implemented cluster generator (pyclustering.cluster.generator). See: https://github.com/annoviko/pyclustering/issues/444

    • Implemented BANG animator to render animation of clustering process (pyclustering.cluster.bang). See: https://github.com/annoviko/pyclustering/issues/442

    • Optimization of CURE algorithm by using Euclidean Square distance (pyclustering.cluster.cure, ccore.clst.cure). See: https://github.com/annoviko/pyclustering/issues/439

    • Supported numpy.ndarray points in KD-tree (pyclustering.container.kdtree). See: https://github.com/annoviko/pyclustering/issues/438

    CORRECTED MAJOR BUGS:

    • Bug with clustering failure in case of non-numpy user defined metric for K-Means algorithm (pyclustering.cluster.kmeans). See: https://github.com/annoviko/pyclustering/issues/471

    • Bug with animation of correlation matrix in case of new versions of matplotlib (pyclustering.nnet.sync). See: no reference.

    • Bug with SOM and pickle when it was not possible to store and load network using pickle (pyclustering.nnet.som). See: https://github.com/annoviko/pyclustering/issues/456

    • Bug with DBSCAN when points are marked as a noise (pyclustering.cluster.dbscan). See: https://github.com/annoviko/pyclustering/issues/462

    • Bug with randomly enabled connection weights in case of SyncNet based algorithms using CCORE interface (pyclustering.nnet.syncnet). See: https://github.com/annoviko/pyclustering/issues/452

    • Bug with calculation weighted connection for Sync based clustering algorithms in C++ implementation (ccore.nnet.syncnet). See: no reference

    • Bug with failure in case of numpy.ndarray data type in python part of CURE algorithm (pyclustering.cluster.cure). See: https://github.com/annoviko/pyclustering/issues/438

    • Bug with BANG algorithm with empty dimensions - when data contains column with the same values (pyclustering.cluster.bang). See: https://github.com/annoviko/pyclustering/issues/449

    Source code(tar.gz)
    Source code(zip)
    pyclustering-0.8.2-binaries-all.tar.gz(1.97 MB)
  • 0.8.1(May 29, 2018)

    pyclustering 0.8.1 library is collection of clustering algorithms, oscillatory networks, neural networks, etc.

    GENERAL CHANGES:

    • Implemented feature to use specific metric for distance calculation in K-Means algorithm (pyclustering.cluster.kmeans, ccore.clst.kmeans). See: https://github.com/annoviko/pyclustering/issues/434

    • Implemented BANG-clustering algorithm with result visualizer (pyclustering.cluster.bang). See: https://github.com/annoviko/pyclustering/issues/424

    • Implemented feature to use specific metric for distance calculation in K-Medians algorithm (pyclustering.cluster.kmedians, ccore.clst.kmedians). See: https://github.com/annoviko/pyclustering/issues/429

    • Supported new type of input data for K-Medoids - distance matrix (pyclustering.cluster.kmedoids, ccore.clst.kmedoids). See: https://github.com/annoviko/pyclustering/issues/418

    • Implemented TTSAS algorithm (pyclustering.cluster.ttsas, ccore.clst.ttsas). See: https://github.com/annoviko/pyclustering/issues/398

    • Implemented MBSAS algorithm (pyclustering.cluster.mbsas, ccore.clst.mbsas). See: https://github.com/annoviko/pyclustering/issues/398

    • Implemented BSAS algorithm (pyclustering.cluster.bsas, ccore.clst.bsas). See: https://github.com/annoviko/pyclustering/issues/398

    • Implemented feature to use specific metric for distance calculation in K-Medoids algorithm (pyclustering.cluster.kmedoids, ccore.clst.kmedoids). See: https://github.com/annoviko/pyclustering/issues/417

    • Implemented distance metric collection (pyclustering.utils.metric, ccore.utils.metric). See: no reference.

    • Supported new type of input data for OPTICS - distance matrix (pyclustering.cluster.optics, ccore.clst.optics). See: https://github.com/annoviko/pyclustering/issues/412

    • Supported new type of input data for DBSCAN - distance matrix (pyclustering.cluster.dbscan, ccore.clst.dbscan). See: no reference.

    • Implemented K-Means observer and visualizer to visualize and animate clustering results (pyclustering.cluster.kmeans, ccore.clst.kmeans). See: no reference.

    CORRECTED MAJOR BUGS:

    • Bug with out of range in K-Medians (pyclustering.cluster.kmedians, ccore.clst.kmedians). See: https://github.com/annoviko/pyclustering/issues/428

    • Bug with fast linking in PCNN (python implementation only) that wasn't used despite the corresponding option (pyclustering.nnet.pcnn). See: https://github.com/annoviko/pyclustering/issues/419

    Source code(tar.gz)
    Source code(zip)
  • 0.8.0(Feb 23, 2018)

    pyclustering 0.8.0 library is collection of clustering algorithms, oscillatory networks, neural networks, etc.

    GENERAL CHANGES:

    • Optimization K-Means++ algorithm using numpy (pyclustering.cluster.center_initializer). See: no reference.

    • Implemented K-Means++ initializer for CCORE (ccore.clst.kmeans_plus_plus). See: https://github.com/annoviko/pyclustering/issues/382

    • Optimization of X-Means clustering process by using KMeans++ for initial centers of split regions (pyclustering.cluster.xmeans, ccore.clst.xmeans). See: https://github.com/annoviko/pyclustering/issues/382

    • Implemented parallel Sync-family algorithms for C/C++ implementation (CCORE) only (ccore.sync). See: https://github.com/annoviko/pyclustering/issues/170

    • C/C++ implementation is used by default to increase performance. See: https://github.com/annoviko/pyclustering/issues/393

    • Ignore 'ccore' flag to use C/C++ if platform is not supported (pyclustering.core). See: https://github.com/annoviko/pyclustering/issues/393

    • Optimization of python implementation of the K-Means algorithm using numpy (pyclustering.cluster.kmeans). See: https://github.com/annoviko/pyclustering/issues/403

    • Implemented dynamic visualizer for oscillatory networks (pyclustering.nnet.dynamic_visualizer). See: no reference.

    • Implemented C/C++ Hodgkin-Huxley oscillatory network for image segmentation in CCORE to increase performance (ccore.hhn, pyclustering.nnet.hhn). See: https://github.com/annoviko/pyclustering/issues/217

    • Performance optimization for CCORE on linux platform. See: no reference.

    • 32-bit platform of CCORE is supported for Linux OS. See: https://github.com/annoviko/pyclustering/issues/253

    • 32-bit platform of CCORE is supported for Windows OS. See: https://github.com/annoviko/pyclustering/issues/253

    • Implemented method 'get_probabilities()' for obtaining belong probability in EM-algorithm (pyclustering.cluster.ema). See: https://github.com/annoviko/pyclustering/issues/387

    • Python implementation of CURE algorithm method 'get_clusters()' returns list of indexes (pyclustering.cluster.cure). See: https://github.com/annoviko/pyclustering/issues/384

    • Implemented parallel processing for X-Means algorithm (ccore.clst.xmeans). See: https://github.com/annoviko/pyclustering/issues/372

    • Implemented pool threads for parallel processing (ccore.parallel). See: https://github.com/annoviko/pyclustering/issues/383

    • Optimization of OPTICS algorithm using KD-tree for searching nearest neighbors (pyclustering.cluster.optics, ccore.optics). See: https://github.com/annoviko/pyclustering/issues/370

    • Optimization of DBSCAN algorithm using KD-tree for searching nearest neighbors (pyclustering.cluster.dbscan, ccore.dbscan). See: https://github.com/annoviko/pyclustering/issues/369

    CORRECTED MAJOR BUGS:

    • Incorrect type of medoid's index in K-Medians algorithm in case of Python 2.x (pyclustering.cluster.kmedoids). See: https://github.com/annoviko/pyclustering/issues/415

    • Hanging of method 'find_node' in KD-tree if it does not contain node with specified point and payload (pyclustering.container.kdtree). See: no reference.

    • Incorrect clustering by CURE algorithm in some cases when data have a lot of identical points (pyclustering.cluster.cure). See: https://github.com/annoviko/pyclustering/issues/414

    • Segmentation fault in CURE algorithm in some cases when data have a lot of identical points (ccore.clst.cure). See: no reference.

    • Incorrect segmentation by Python version of syncsegm - oscillatory network based on sync for image segmentation (pyclustering.nnet.syncsegm). See: https://github.com/annoviko/pyclustering/issues/409

    • Zero value of sigma under logarithm function in Python version of pyclustering X-Means algorithm (pyclustering.cluster.xmeans). See: https://github.com/annoviko/pyclustering/issues/407

    • Amplitude threshold is ignored during synchronous ensembles allocation for amplitude output dynamic 'allocate_sync_ensembles' - affect HNN, LEGION (pyclustering.utils). See: no reference.

    • Wrong indexes can be returned during synchronous ensembles allocation for amplitude output dynamic 'allocate_sync_ensembles' - affect HNN, LEGION (pyclustering.utils). See: no reference.

    • Amount of allocated clusters can be differ from amount of centers in X-Means algorithm (ccore.clst.xmeans). See: https://github.com/annoviko/pyclustering/issues/389

    • Amount of allocated clusters can be bigger than kmax in X-Means algorithm (pyclustering.cluster.xmeans, ccore.clst.xmeans). See: https://github.com/annoviko/pyclustering/issues/388

    • Corrected bug with returned nullptr in method 'kdtree_searcher::find_nearest_node()' (ccore.container.kdtree). See: no reference.

    Source code(tar.gz)
    Source code(zip)
  • 0.7.2(Oct 23, 2017)

    pyclustering 0.7.2 library is collection of clustering algorithms, oscillatory networks, neural networks, etc.

    GENERAL CHANGES (pyclustering):

    • Correction for setup failure with PKG-INFO.rst.
    Source code(tar.gz)
    Source code(zip)
  • 0.7.1(Oct 19, 2017)

    pyclustering 0.7.1 library is collection of clustering algorithms, osicllatory networks, neural networks, etc.

    GENERAL CHANGES (pyclustering):

    • Metadata of the package is updated.
    Source code(tar.gz)
    Source code(zip)
  • 0.7.0(Oct 16, 2017)

    pyclustering 0.7.0 library is collection of clustering algorithms, oscllatory networks, neural networks, etc.

    GENERAL CHANGES (pyclustering):

    • Implemented Expectation-Maximization clustering algorithm for Gaussian Mixute Model and clustering visualizer for this particular algorithm (pyclustering.cluster.ema) See: https://github.com/annoviko/pyclustering/issues/16

    • Implemented Genetic Clustering Algorithm (GCA) and clustering visualizer for this particular algorithm (pyclustering.cluster.ga) See: https://github.com/annoviko/pyclustering/issues/360

    • Implemented feature to obtain and visualize evolution of order parameter and local order parameter for Sync network and Sync-based algorithms (pyclustering.nnet.sync). See: https://github.com/annoviko/pyclustering/issues/355

    • Implemented K-Means++ method for initialization of initial centers for algorithms like K-Means or X-Means (pyclustering.cluster.center_initializer). See: https://github.com/annoviko/pyclustering/issues/354

    • Implemented fSync oscillatory network that is based on Landau-Stuart equation and Kuramoto model (pyclustering.nnet.fsync). See: https://github.com/annoviko/pyclustering/issues/168

    • Optimization of pyclustering client to core library 'CCORE' library (pyclustering.core). See: https://github.com/annoviko/pyclustering/issues/289 See: https://github.com/annoviko/pyclustering/issues/351

    • Implemented feature to show network structure of Sync family oscillatory networks in case 'ccore' usage. See: https://github.com/annoviko/pyclustering/issues/344

    • Implemented feature to colorize OPTICS ordering diagram when amount of clusters is specified. See: no reference.

    • Improved clustering results in case of usage MNDL splitting criterion for small datasets. See: https://github.com/annoviko/pyclustering/issues/328

    • Feature to display connectivity radius on cluster-ordering diagram by ordering_visualizer (pyclustering.cluster.optics). See: https://github.com/annoviko/pyclustering/issues/314

    • Feature to use CCORE implementation of OPTICS algorithm to take advance in performance (pyclustering.cluster.optics). See: https://github.com/annoviko/pyclustering/issues/120

    • Implemented feature to shows animation of pattern recognition process that has been performed by the SyncPR oscillatory network. Method 'animate_pattern_recognition()' of class 'syncpr_visualizer' (pyclustering.nnet.syncpr). See: https://www.youtube.com/watch?v=Ro7KbApL4MQ See: https://www.youtube.com/watch?v=iIusOsGehoY

    • Implemented feature to obtain nodes of specified level of CF-tree. Method 'get_level_nodes()' of class 'cftree' (pyclustering.container.cftree). See: no reference.

    • Implemented feature to allocate/display/animate phase matrix: 'allocate_phase_matrix()', 'show_phase_matrix()', 'animate_phase_matrix()' (pyclustering.nnet.sync). See: no reference.

    • Implemented chaotic neural network where clustering phenomenon can be observed: 'cnn_network', 'cnn_dynamic', 'cnn_visualizer' (pyclustering.nnet.cnn). See: https://github.com/annoviko/pyclustering/issues/301

    • Implemented feature to analyse ordering diagram using amout of clusters that should be allocated as an input parameter to calculate correct connvectity radius for clustering (pyclustering.cluster.optics). See: https://github.com/annoviko/pyclustering/issues/307

    • Implemented feature to omit usage of initial centers - X-Means starts processing from random initial center (pyclustering.cluster.xmeans). See: no reference.

    • Implemented feature for cluster visualizer: cluster attributes (pyclustering.cluster). See: https://github.com/annoviko/pyclustering/issues/295

    • Implemented SOM-SC algorithm (SOM Simple Clustering) (pyclustering.cluster.somsc). See: https://github.com/annoviko/pyclustering/issues/321

    GENERAL CHANGES (ccore):

    • Implemented feature to obtain and visualize evolution of order parameter and local order parameter for Sync network and Sync-based algorithms (ccore.nnet.sync). See: https://github.com/annoviko/pyclustering/issues/355

    • Cygwin x64 platform is supported (ccore). See: https://github.com/annoviko/pyclustering/issues/353

    • Optimization of CCORE library interface (ccore.interface). See: https://github.com/annoviko/pyclustering/issues/289

    • Implemented MNDL splitting crinterion for X-Means algorithm (ccore.cluster_analysis.xmeans). See: https://github.com/annoviko/pyclustering/issues/159

    • Implemented OPTICS algorithm and interface for client that results all clustering results (ccore.cluster_analysis.optics). See: https://github.com/annoviko/pyclustering/issues/120

    • Implmeneted packing of connectivity matrix of Sync family oscillatory networks (ccore.interface.sync_interface). See: https://github.com/annoviko/pyclustering/issues/344

    CORRECTED MAJOR BUGS:

    • Bug with segmentation fault during 'free()' on some linux operating systems. See: no reference.

    • Bug with sending the first element to cluster in OPTICS even if it is noise element. See: no reference.

    • Bug with amount of allocated clusters by K-Medoids algorithm in Python implementation and CCORE (pyclustering.cluster.kmedoids, ccore.cluster.medoids). See: https://github.com/annoviko/pyclustering/issues/366 See: https://github.com/annoviko/pyclustering/issues/367

    • Bug with getting neighbors and getting information about connections in Sync-based network and algorithms in case of usage CCORE. See: no reference.

    • Bug with calculation of number of oscillations for output dynamics. See: no reference.

    • Memory leakage in LEGION in case of CCORE usage - API function 'legion_destroy()' was not called (pyclustering.nnet.legion). See: no reference.

    • Bug with crash of antmeans algorithm for python version 3.6.0:414df79263a11, Dec 23 2016 [MSC v.1900 64 bit (AMD64)] (pyclustering.cluster.antmeans). See: https://github.com/annoviko/pyclustering/issues/350

    • Memory leakage in destructor of 'pyclustering_package' - exchange mechanism between ccore and pyclustering (ccore.interface.pyclustering_package'). See: https://github.com/annoviko/pyclustering/issues/347

    • Bug with loosing of the initial state of hSync output dynamic in case of CCORE usage (ccore.cluster.hsyncnet). See: https://github.com/annoviko/pyclustering/issues/346

    • Bug with hSync output dynamic that was displayed with discontinous parts as a set of rectangles (pyclustering.cluster.hsyncnet). See: https://github.com/annoviko/pyclustering/issues/345

    • Bug with visualization of CNN network in case 3D data (pyclustering.nnet.cnn). See: https://github.com/annoviko/pyclustering/issues/338

    • Bug with CCORE wrapper crashing after returning value from CCORE (pyclustering.core). See: https://github.com/annoviko/pyclustering/issues/337

    • Bug with calculation BIC splitting criterion for X-Means algorithm (pyclustering.cluster.xmeans). See: https://github.com/annoviko/pyclustering/issues/326

    • Bug with calculation MNDL splitting criterion for X-Means algorithm (pyclustering.cluster.xmeans). See: https://github.com/annoviko/pyclustering/issues/328

    • Bug with loss of CF-nodes in CF-tree during inserting that leads unbalanced CF-tree (pyclustering.container.cftree). See: https://github.com/annoviko/pyclustering/issues/304

    • Bug with time stamps for each iteration in hsyncnet algorithm (ccore.cluster.hsyncnet). See: https://github.com/annoviko/pyclustering/issues/306

    • Bug with memory occupation by CCORE DBSCAN implementation due to adjacency matrix usage (ccore.cluster.dbscan). See: https://github.com/annoviko/pyclustering/issues/309

    • Bug with CURE: always finds max two representative points (pyclustering.cluster.cure). See: https://github.com/annoviko/pyclustering/issues/310

    • Bug with infinite loop in case of incorrect number of clusters 'ordering_analyser' (pyclustering.cluster.optics). See: https://github.com/annoviko/pyclustering/issues/317

    • Bug with incorrect connectivity radius for allocation specified amount of clusters 'ordering_analyser' (pyclustering.cluster.optics). See: https://github.com/annoviko/pyclustering/issues/316

    • Bug with clusters are allocated in the homogeneous ordering 'ordering_analyser' (pyclustering.cluster.optics). See: https://github.com/annoviko/pyclustering/issues/315

    Source code(tar.gz)
    Source code(zip)
  • 0.6.6(Oct 7, 2016)

    pyclustring 0.6.6 library is collection of clustering algorithms, oscllatory networks, neural networks, etc.

    GENERAL CHANGES (pyclustering):

    • Implemented phase oscillatory network syncpr (pyclustering.nnet.syncpr). See: https://github.com/annoviko/pyclustering/issues/208
    • Feature for pyclustering.nnet.syncpr that allows to use ccore library for solving. See: https://github.com/annoviko/pyclustering/issues/232
    • Optimized simulation algorithm for sync oscillatory network (pyclustering.nnet.sync) when collecting results are not requested. See: https://github.com/annoviko/pyclustering/issues/233
    • Images of english alphabet 100x100. See: https://github.com/annoviko/pyclustering/commit/aa28f1a8a363fbeb5f074d22ec1e8258a1dd0579
    • Implemented feature to use rectangular network structures in oscillatory networks. See: https://github.com/annoviko/pyclustering/issues/259
    • Implemented CLARANS algorithm (pyclustering.cluster.clarans). See: https://github.com/annoviko/pyclustering/issues/52
    • Implemented feature to analyse and visualize results of hysteresis oscillatory network (pyclustering.nnet.hysteresis). See: https://github.com/annoviko/pyclustering/issues/75
    • Implemented feature to analyse and visualize results of graph coloring algorithm based on hysteresis oscillatory network (pyclustering.gcolor.hysteresis). See: https://github.com/annoviko/pyclustering/issues/75
    • Implemented ant colony based algorithm for TSP problem (pyclustering.tsp.antcolony). See: https://github.com/annoviko/pyclustering/pull/277
    • Implemented feature to use CCORE K-Medians algorithm using argument 'ccore' to ensure high performance (pyclustering.cluster.kmedians). See: https://github.com/annoviko/pyclustering/issues/231
    • Implemented feature to place several plots on each row using parameter 'maximum number of rows' for cluster visualizer (pyclustering.cluster). See: https://github.com/annoviko/pyclustering/issues/274
    • Implemented feature to specify initial number of neighbors to calculate initial connectivity radius and increase percent of number of neighbors (or radius if total number of object is exceeded) on each step (pyclustering.cluster.hsyncnet). See: https://github.com/annoviko/pyclustering/issues/284
    • Implemented double-layer oscillatory network based on modified Kuramoto model for image segmentation (pyclustering.nnet.syncsegm). See: no reference
    • Added new examples and demos. See: no reference
    • Implemented feature to use CCORE K-Medoids algorithm using argument 'ccore' to ensure high performance (pyclustering.cluster.kmedoids). See: https://github.com/annoviko/pyclustering/issues/230
    • Implemented feature for CURE algorithm that provides additional information about clustering results: representative points and mean point of each cluster (pyclustering.cluster.cure). See: https://github.com/annoviko/pyclustering/issues/292
    • Implemented feature to animate analysed output dynamic of Sync family oscillatory networks (sync_visualizer, syncnet_visualizer): correlation matrix, phase coordinates, cluster allocation (pyclustering.nnet.sync, pyclustering.cluster.syncnet). See: https://www.youtube.com/watch?v=5S5mFYVihso See: https://www.youtube.com/watch?v=Vd-ww9PcZvI See: https://www.youtube.com/watch?v=QYPqWoyNHO8 See: https://www.youtube.com/watch?v=RA0MiC2WlbY
    • Improved algorithm SYNC-SOM: accuracy of clustering and calculation are improved in line with proof of concept where connection between oscillator in the second layer (that is represented by the self-organized feature map) should be created in line with classical radius like in SyncNet, but indirectly: if objects that correspond to two different neurons can be connected than neurons should be also connected with each other (pyclustering.cluster.syncsom). See: https://github.com/annoviko/pyclustering/issues/297

    GENERAL CHANGES (ccore):

    • Implemented phase oscillatory network for pattern recognition syncpr (ccore.cluster.syncpr). See: https://github.com/annoviko/pyclustering/issues/232
    • Implemented agglomerative algorithm for cluster analysis (ccore.cluster.agglomerative). See: https://github.com/annoviko/pyclustering/issues/212
    • Implemented feature to use rectangular network structures in oscillatory networks. See: https://github.com/annoviko/pyclustering/issues/259
    • Implemented ant colony based algorithm for TSP problem (ccore.tsp.antcolony). See: https://github.com/annoviko/pyclustering/pull/277
    • Implemented K-Medians algorithm for cluster analysis (ccore.cluster.kmedians). See: https://github.com/annoviko/pyclustering/issues/231
    • Implemented feature to specify initial number of neighbors to calculate initial connectivity radius and increase percent of number of neighbors (or radius if total number of object is exceeded) on each step (ccore.cluster.hsyncnet). https://github.com/annoviko/pyclustering/issues/284
    • Implemented K-Medoids algorithm for cluster analysis (ccore.cluster.kmedoids). See: https://github.com/annoviko/pyclustering/issues/230
    • Implemented feature for CURE algorithm that provides additional information about clustering results: representative points and mean point of each cluster (ccore.cluster.cure). See: https://github.com/annoviko/pyclustering/issues/293
    • Implemented new class collection to oscillatory and neural network constructing. See: https://github.com/annoviko/pyclustering/issues/264
    • Memory usage optimization for ROCK algorithm. See: no reference

    CORRECTED MAJOR BUGS:

    • Bug with callback methods in ccore library in syncnet (ccore.cluster.syncnet) and hsyncnet (ccore.cluster.hsyncnet) that may lead to loss of accuracy.
    • Bug with division by zero in kmeans algorithm (ccore.kmeans, pyclustering.cluster.kmeans) when cluster after center updating is not able to capture object. See: https://github.com/annoviko/pyclustering/issues/238
    • Bug with stack overflow in KD tree in case of big data (pyclustering.container.kdtree, ccore.container.kdtree). See: https://github.com/annoviko/pyclustering/pull/239 See: https://github.com/annoviko/pyclustering/issues/255 See: https://github.com/annoviko/pyclustering/issues/254
    • Bug with incorrect clustering in case of the same elements in cure algorithm (pyclustering.cluster.cure). See: https://github.com/annoviko/pyclustering/pull/239
    • Bug with execution fail in case of wrong number of initial medians and in case of the same objects with several initial medians (pyclustering.cluster.kmedians). See: https://github.com/annoviko/pyclustering/issues/256
    • Bug with calculation synchronous ensembles near by zero: oscillators 2*pi and 0 are considered as different (pyclustering.nnet.sync, ccore.nnet.sync). See: https://github.com/annoviko/pyclustering/issues/263
    • Bug with cluster allocation in kmedoids algorithm in case of the same objects with several initial medoids (pyclustering.cluster.kmedoids). See: https://github.com/annoviko/pyclustering/issues/269
    • Bug with visualization of clusters in 3D (pyclustering.cluster). See: https://github.com/annoviko/pyclustering/issues/273
    • Bug with obtaining nearest entry for absorbing during inserting node (pyclustering.container.cftree). See: https://github.com/annoviko/pyclustering/issues/282
    • Bug with SOM method show_network() in case of usage CCORE (pyclustering.nnet.som). See: https://github.com/annoviko/pyclustering/issues/283
    • Bug with cluster allocation in case of switched off dynamic collecting (pyclustering.cluster.hsyncnet). See: https://github.com/annoviko/pyclustering/issues/285
    • Bug with execution fail during clustering data with rough values of initial medians (pyclustering.cluster.kmedians). See: https://github.com/annoviko/pyclustering/issues/286
    • Bug with meamory leakage on interface between CCORE and pyclustering (ccore). See: no reference
    • Bug with allocation correlation matrix in case of usage CCORE (pyclustering.nnet.sync). See: https://github.com/annoviko/pyclustering/issues/288
    • Bug with memory leakage in CURE algorithm - deallocation of representative points (ccore.cluster.cure). See: https://github.com/annoviko/pyclustering/issues/294
    • Bug with cluster visualization in case of 1D input data (pyclustering.cluster). See: https://github.com/annoviko/pyclustering/issues/296
    • Bug with loss of CF-nodes in CF-tree during inserting that leads unbalanced CF-tree (pyclustering.container.cftree). See: https://github.com/annoviko/pyclustering/issues/304
    • Bug with time stamps for each iteration in hsyncnet algorithm (ccore.cluster.hsyncnet). See: https://github.com/annoviko/pyclustering/issues/306
    • Bug with memory occupation by CCORE DBSCAN implementation due to adjacency matrix usage (ccore.cluster.dbscan). See: https://github.com/annoviko/pyclustering/issues/309
    • Bug with CURE: always finds max two representative points (pyclustering.cluster.cure). See: https://github.com/annoviko/pyclustering/issues/310
    Source code(tar.gz)
    Source code(zip)
Owner
Andrei Novikov
PhD in Computer Science. Software Scientist at ThermoFisher Scientific.
Andrei Novikov
Data cleaning tools for Business analysis

Datacleaning datacleaning tools for Business analysis This program is made for Vicky's work. You can use it, too. 数据清洗 该数据清洗工具是为了商业分析 这个程序是为了Vicky的工作而

Lin Jian 3 Nov 16, 2021
Ejercicios Panda usando Pandas

Readme Below we add configuration details to locally test your application To co

1 Jan 22, 2022
Analytical view of olist e-commerce in Brazil

Analysis of E-Commerce Public Dataset by Olist The objective of this project is to propose an analytical view of olist e-commerce in Brazil. For this

Gurpreet Singh 1 Jan 11, 2022
An extension to pandas dataframes describe function.

pandas_summary An extension to pandas dataframes describe function. The module contains DataFrameSummary object that extend describe() with: propertie

Mourad 450 Dec 30, 2022
A data analysis using python and pandas to showcase trends in school performance.

A data analysis using python and pandas to showcase trends in school performance. A data analysis to showcase trends in school performance using Panda

Jimmy Faccioli 0 Sep 07, 2021
Analyze the Gravitational wave data stored at LIGO/VIRGO observatories

Gravitational-Wave-Analysis This project showcases how to analyze the Gravitational wave data stored at LIGO/VIRGO observatories, using Python program

1 Jan 23, 2022
Data Competition: automated systems that can detect whether people are not wearing masks or are wearing masks incorrectly

Table of contents Introduction Dataset Model & Metrics How to Run Quickstart Install Training Evaluation Detection DATA COMPETITION The COVID-19 pande

Thanh Dat Vu 1 Feb 27, 2022
A utility for functional piping in Python that allows you to access any function in any scope as a partial.

WithPartial Introduction WithPartial is a simple utility for functional piping in Python. The package exposes a context manager (used with with) calle

Michael Milton 1 Oct 26, 2021
WaveFake: A Data Set to Facilitate Audio DeepFake Detection

WaveFake: A Data Set to Facilitate Audio DeepFake Detection This is the code repository for our NeurIPS 2021 (Track on Datasets and Benchmarks) paper

Chair for Sys­tems Se­cu­ri­ty 27 Dec 22, 2022
SparseLasso: Sparse Solutions for the Lasso

SparseLasso: Sparse Solutions for the Lasso Introduction SparseLasso provides a Scikit-Learn based estimation of the Lasso with cross-validation tunin

Gabriel Okasa 1 Nov 08, 2021
Python script to automate the plotting and analysis of percentage depth dose and dose profile simulations in TOPAS.

topas-create-graphs A script to automatically plot the results of a topas simulation Works for percentage depth dose (pdd) and dose profiles (dp). Dep

Sebastian Schäfer 10 Dec 08, 2022
ELFXtract is an automated analysis tool used for enumerating ELF binaries

ELFXtract ELFXtract is an automated analysis tool used for enumerating ELF binaries Powered by Radare2 and r2ghidra This is specially developed for PW

Monish Kumar 49 Nov 28, 2022
Python reader for Linked Data in HDF5 files

Linked Data are becoming more popular for user-created metadata in HDF5 files.

The HDF Group 8 May 17, 2022
PySpark Structured Streaming ROS Kafka ApacheSpark Cassandra

PySpark-Structured-Streaming-ROS-Kafka-ApacheSpark-Cassandra The purpose of this project is to demonstrate a structured streaming pipeline with Apache

Zekeriyya Demirci 5 Nov 13, 2022
Python package to transfer data in a fast, reliable, and packetized form.

pySerialTransfer Python package to transfer data in a fast, reliable, and packetized form.

PB2 101 Dec 07, 2022
A multi-platform GUI for bit-based analysis, processing, and visualization

A multi-platform GUI for bit-based analysis, processing, and visualization

Mahlet 529 Dec 19, 2022
Parses data out of your Google Takeout (History, Activity, Youtube, Locations, etc...)

google_takeout_parser parses both the Historical HTML and new JSON format for Google Takeouts caches individual takeout results behind cachew merge mu

Sean Breckenridge 27 Dec 28, 2022
Python Package for DataHerb: create, search, and load datasets.

The Python Package for DataHerb A DataHerb Core Service to Create and Load Datasets.

DataHerb 4 Feb 11, 2022
NumPy aware dynamic Python compiler using LLVM

Numba A Just-In-Time Compiler for Numerical Functions in Python Numba is an open source, NumPy-aware optimizing compiler for Python sponsored by Anaco

Numba 8.2k Jan 07, 2023
A variant of LinUCB bandit algorithm with local differential privacy guarantee

Contents LDP LinUCB Description Model Architecture Dataset Environment Requirements Script Description Script and Sample Code Script Parameters Launch

Weiran Huang 4 Oct 25, 2022