Clustering is a popular approach to detect patterns in unlabeled data

Last update: Nov 11, 2022

Related tags

Overview

Visual Clustering

Clustering is a popular approach to detect patterns in unlabeled data. Existing clustering methods typically treat samples in a dataset as points in a metric space and compute distances to group together similar points. Visual Clustering a different way of clustering points in 2-dimensional space, inspired by how humans "visually" cluster data. The algorithm is based on trained neural networks that perform instance segmentation on plotted data.

For more details, see the accompanying paper: "Clustering Plotted Data by Image Segmentation", arXiv preprint, and please use the citation below.

@article{naous2021clustering,
  title={Clustering Plotted Data by Image Segmentation},
  author={Naous, Tarek and Sarkar, Srinjay and Abid, Abubakar and Zou, James},
  journal={arXiv preprint arXiv:2110.05187},
  year={2021}
}

Installation

pip install visual-clustering

Usage

The algorithm can be used the same way as the classical clustering algorithms in scikit-learn:
You first import the class VisualClustering and create an instance of it.

from visual_clustering import VisualClustering

model = VisualClustering(median_filter_size = 1, max_filter_size= 1)

The parameters median_filter_size and max_filter_size are set to 1 by default.
You can experiment with different values to see what works best for your dataset !

Let's create a simple synthetic dataset of blobs.

from sklearn import datasets

data = datasets.make_blobs(n_samples=50000, centers=6, random_state=23,center_box=(-30, 30))
plt.scatter(data[0][:, 0], data[0][:, 1], s=1, c='black')

To cluster the dataset, use the fit function of the model:

predictions = model.fit(data[0])

Visualizing the results

You can visualize the results using matplotlib as you would normally do with classical clustering algorithms:

import matplotlib.pyplot as plt
from itertools import cycle, islice
import numpy as np

colors = np.array(list(islice(cycle(["#000000", '#377eb8', '#ff7f00', '#4daf4a', '#f781bf', '#a65628', '#984ea3']), int(max(predictions) + 1))))
#Black color for outliers (if any)
colors = np.append(colors, ["#000000"])
plt.scatter(data[0][:, 0], data[0][:, 1], s=10, color=colors[predictions.astype('int8')])

Run this code inside a colab notebook:
https://colab.research.google.com/drive/1DcZXhKnUpz1GDoGaJmpS6VVNXVuaRmE5?usp=sharing

Dependencies

Make sure that you have the following libraries installed:

transformers 4.15.0
scipy 1.4.1
tensorflow 2.7.0
keras 2.7.0
numpy 1.19.5
cv2 4.1.2
skimage 0.18.3

Contact

Clustering is a popular approach to detect patterns in unlabeled data

Related tags

Overview

Visual Clustering

Installation

Usage

Visualizing the results

Dependencies

Contact

Owner

Tarek Naous

9th place solution in "Santa 2020 - The Candy Cane Contest"

Code for paper "Learning to Reweight Examples for Robust Deep Learning"

Implementation of gaze tracking and demo

Title: Heart-Failure-Classification

A deep-learning pipeline for segmentation of ambiguous microscopic images.

NER for Indian languages

Code for paper "ASAP-Net: Attention and Structure Aware Point Cloud Sequence Segmentation"

[ICCV2021] Official code for "Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition"

Plotting points that lie on the intersection of the given curves using gradient descent.

[ICLR2021] Unlearnable Examples: Making Personal Data Unexploitable

True Few-Shot Learning with Language Models

Winning solution of the Indoor Location & Navigation Kaggle competition

learning and feeling SLAM together with hands-on-experiments

Implementation of ICCV 2021 oral paper -- A Novel Self-Supervised Learning for Gaussian Mixture Model

Mahadi-Now - This Is Pakistani Just Now Login Tools

Satellite labelling tool for manual labelling of storm top features such as overshooting tops, above-anvil plumes, cold U/Vs, rings etc.

The Generic Manipulation Driver Package - Implements a ROS Interface over the robotics toolbox for Python

Implementation of Shape Generation and Completion Through Point-Voxel Diffusion

Cascading Feature Extraction for Fast Point Cloud Registration (BMVC 2021)

The codes reproduce the figures and statistics in the paper, "Controlling for multiple covariates," by Mark Tygert.