MiniSom is a minimalistic implementation of the Self Organizing Maps

Overview

MiniSom

Self Organizing Maps

MiniSom is a minimalistic and Numpy based implementation of the Self Organizing Maps (SOM). SOM is a type of Artificial Neural Network able to convert complex, nonlinear statistical relationships between high-dimensional data items into simple geometric relationships on a low-dimensional display. Minisom is designed to allow researchers to easily build on top of it and to give students the ability to quickly grasp its details.

Updates about MiniSom are posted on Twitter.

Installation

Just use pip:

pip install minisom

or download MiniSom to a directory of your choice and use the setup script:

git clone https://github.com/JustGlowing/minisom.git
python setup.py install

How to use it

In order to use MiniSom you need your data organized as a Numpy matrix where each row corresponds to an observation or as list of lists like the following:

data = [[ 0.80,  0.55,  0.22,  0.03],
        [ 0.82,  0.50,  0.23,  0.03],
        [ 0.80,  0.54,  0.22,  0.03],
        [ 0.80,  0.53,  0.26,  0.03],
        [ 0.79,  0.56,  0.22,  0.03],
        [ 0.75,  0.60,  0.25,  0.03],
        [ 0.77,  0.59,  0.22,  0.03]]      

Then you can train MiniSom just as follows:

from minisom import MiniSom    
som = MiniSom(6, 6, 4, sigma=0.3, learning_rate=0.5) # initialization of 6x6 SOM
som.train(data, 100) # trains the SOM with 100 iterations

You can obtain the position of the winning neuron on the map for a given sample as follows:

som.winner(data[0])

For an overview of all the features implemented in minisom you can browse the following examples: https://github.com/JustGlowing/minisom/tree/master/examples

Export a SOM and load it again

A model can be saved using pickle as follows

import pickle
som = MiniSom(7, 7, 4)

# ...train the som here

# saving the som in the file som.p
with open('som.p', 'wb') as outfile:
    pickle.dump(som, outfile)

and can be loaded as follows

with open('som.p', 'rb') as infile:
    som = pickle.load(infile)

Note that if a lambda function is used to define the decay factor MiniSom will not be pickable anymore.

Explore parameters

You can use this dashboard to explore the effect of the parameters on a sample dataset: https://share.streamlit.io/justglowing/minisom/dashboard/dashboard.py

Examples

Here are some of the charts you'll see how to generate in the examples:

Seeds map Class assignment
Handwritteng digits mapping Hexagonal Topology som hexagonal toplogy
Color quantization Outliers detection

Other tutorials

How to cite MiniSom

@misc{vettigliminisom,
  title={MiniSom: minimalistic and NumPy-based implementation of the Self Organizing Map},
  author={Giuseppe Vettigli},
  year={2018},
  url={https://github.com/JustGlowing/minisom/},
}

Who uses Minisom?

Guidelines to contribute

  1. In the description of your Pull Request explain clearly what does it implements/fixes and your changes. Possibly give an example in the description of the PR. In cases that the PR is about a code speedup, report a reproducible example and quantify the speedup.
  2. Give your pull request a helpful title that summarises what your contribution does.
  3. Write unit tests for your code and make sure the existing tests are up to date. pytest can be used for this:
pytest minisom.py
  1. Make sure that there a no stylistic issues using pycodestyle:
pycodestyle minisom.py
  1. Make sure your code is properly commented and documented. Each public method needs to be documented as the existing ones.
Comments
  • Introducing possibility to train the SOM so that learning_rate and sigma are constant during one epoch.

    Introducing possibility to train the SOM so that learning_rate and sigma are constant during one epoch.

    This pull request introduces the possibility to train the SOM so that learning_rate and sigma are only being decreased after each epoch. During one epoch the SOM is updated once per given input vector (=len(data) times) with constant learning_rate and sigma. This should lead to a greater independence between the order of the input vectors and the resulting SOM.

    In order to use this feature, one only has to use train_epochs() instead of train().

    learning_rate and sigma could (should?) technically be updated only once every epoch but in order to change as little code as possible those parameters are still updated every time update() gets called (but with constant paramters during one epoch). This could be 'optimised' if desired.

    opened by jriege555 22
  • Fixed topographic_error() and quantization_error()

    Fixed topographic_error() and quantization_error()

    Problems:

    • The previous topographic_error() method is incorrect. bmu_1 and bmu_2 are not the coordinates of the best two matching units.
    • The previous topographic_error() and quantization_error() uses explicit for-loops, which is very slow.

    Fixes:

    • Fixed incorrect implementation of topographic_error() method.
    • Changed the topographic_error() and quantization_error() methods with vectorized implementation.
    opened by wei-zhang-thz 17
  • quantization error (theoretical question)

    quantization error (theoretical question)

    I have a question about the interpretability of the quantization error.

    How can we know that the SOM is reliable ? does the quantization error need to be lower than a certain value ?

    For exemple, in my case, i have a quantization errror of 7.0 which is quite high in comparison to the exemple given in the documentation. Does that mean my som is not reliable ?

    question 
    opened by lachhebo 13
  • Do you know why nodes change completely when I reran the same setup with varying number of iterations?

    Do you know why nodes change completely when I reran the same setup with varying number of iterations?

    Hey :-)

    First of all thank you for providing this tool, it seems very handy! I am using SOM with geopotential height anomalies over a given region as input variables to cluster meteorological circulation patterns (ca. 2000 observations). What is really strange is that the SOM nodes differ completely when I rerun the same setup with more iterations (e.g. doubling from 10000 to 20000). It produces nodes not only in a different order, but also such that have no analogue in the new SOM... Is there anything I am doing wrong?

    Thank you very much - below some details about the setup

    The example I am using most often is sigma=1 (Gaussian), lr=0.5, SOM sizes between 2x4 to 4x5. The problem occurs no matter the initialization (pca or random) and no matter the training (single, batch, random). My code is basically only:

    SOM

    som = MiniSom(som_m, som_n, ndims, sigma=sigma, learning_rate=lr, neighborhood_function='gaussian') som.pca_weights_init(somarr) som.train_batch(somarr,10000,verbose=True)

    ...

    plot

    for m in range(som_m): for n in range(som_n): ax... pltarr = som.get_weights()[m,n,:].reshape((nsomlat,nsomlon)) p = ax.contourf(somlons,somlats,pltarr,cmap='seismic', transform=ccrs.PlateCarree())

    question 
    opened by michel039 12
  • Vectorized the _activate function

    Vectorized the _activate function

    Great library, but I noticed that the training code for your SOMs is not vectorized. You use the fast_norm function a lot, which may be faster than linalg.norm for 1D arrays, but iterating over every spot in the SOM is a lot slower than just calling linalg.norm.

    This pull request replaces fast_norm with linalg.norm in 2 places where I saw iteration over the whole SOM. Some simple testing with a 100x100 SOM showed ~40x speedup on my laptop.

    After making the changes, the unit tests failed, which I believe is caused by incorrectly setting up the testing weights as a 2D array rather than a 3D array. So I changed that too, and now the unit tests pass. I also did a few rough tests of my own, and the results of self.winner(x) and the training seem to be the same as before.

    opened by AustinT 11
  • Time Series

    Time Series

    Hello! I am trying to use my time series data for the example uploaded, but I encounter this error when initializing pca. Also, the second image is the error that I encounter when I use random initialization.

    image image

    opened by jaybhiesantos 10
  • How to cluster images?

    How to cluster images?

    I would like to know how to cluster images instead of reading CSV I want to read all images from disk and cluster those images using SOM.

    Can you please share some examples?

    opened by balavenkatesh3322 10
  • Example: Hexagonal Topology bokeh

    Example: Hexagonal Topology bokeh

    Summary

    This branch actions on https://github.com/JustGlowing/minisom/issues/86 by adding to the existing examples/HexagonalTopology.ipynb notebook an interactive bokeh example of the equivalent matplotlib plot.

    The purpose of adding interactivity was so that further exploration could be conducted on the plot to see where the original data points are mapped to in the SOM space.

    Check

    • [x] This branch adds value to the main repository, so it is worthwhile to include.
    • [x] The bokeh plot is equivalent to the matplotlib plot.
    • [ ] The code is error free and works on your machine.
    • [x] The logic of showing data points in the hover tooltip is sound.

    Note

    This "closes #86".

    opened by avisionh 10
  • speed up in update method

    speed up in update method

    Hi! Thanks for sharing the library! I noticed that if you change the loop in the update method with an einsum operation you can speed up the training by some amount. Hope you find it useful. Christos

    opened by Sourmpis 10
  • Add topographic error calculation for hexagonal grid

    Add topographic error calculation for hexagonal grid

    This PR adds the functionality for Topographic Error calculation, computed by finding the first-best-matching and second-best-matching neurons in the hexagonal grid.

    Screenshot 2022-04-12 005139

    The topographic error calculation is based on the above equation, which considers if the first-best-matching and second-best-matching neurons are neighbors in the SOM grid.

    opened by TharindaDilshan 9
  • new visualizations

    new visualizations

    Hi, I have implemented a number of visualizations in the BasicUsage file. Addionally, I did some minor changes (mainly typos) in some other files. As this is my first use of github, I do not know how to separate both topics and make two pull requests... I hope this works out!

    opened by bijae 9
  • Topographic error wrong for hexagonal topography with rectangular grid

    Topographic error wrong for hexagonal topography with rectangular grid

    Hi,

    I am trying to get the topographic error from a SOM with 11x7 neurons, hexagonal topography.

    When I do, I get this error:

         21     return (-1, -1)
         22 y = som._weights.shape[1]
    ---> 23 coords = som.convert_map_to_euclidean((index % y, int(index/y)))
         24 return coords
    
    File ~/.local/lib/python3.8/site-packages/minisom.py:243, in MiniSom.convert_map_to_euclidean(self, xy)
        237 def convert_map_to_euclidean(self, xy):
        238     """Converts map coordinates into euclidean coordinates
        239     that reflects the chosen topology.
        240 
        241     Only useful if the topology chosen is not rectangular.
        242     """
    --> 243     return self._xx.T[xy], self._yy.T[xy]
    
    IndexError: index 8 is out of bounds for axis 1 with size 7
    

    I don't think this line of code makes sense:

    coords = som.convert_map_to_euclidean((index % y, int(index/y)))

    Shouldn't the parameters be inverted, e.g.:

    coords = som.convert_map_to_euclidean((int(index/y), index % y))

    Anyway, thanks for the amazing work!

    bug 
    opened by mbarison 6
  • Matching Matlab hyperparameters

    Matching Matlab hyperparameters

    Hi there!Thank you for this great work!

    I switched to using python from the Matlab, version of SOM However I found the result was quite different. Where I could have a perfect 100% in MatLab but somehow only get 19% in f1-score here.

    The only thing I changed from the default setting in Matlab is using a 10*10. som = MiniSom(10, 10, 4096, sigma=1.5, learning_rate=0.7,activation_distance='euclidean', neighborhood_function='gaussian', topology='hexagonal', random_seed=10) And this is what I had for my settings using minisom.

    Any suggestions so I could maybe recreate the result from Matlab?

    Thank you in advance!

    question 
    opened by AmousQiu 3
  • Is there a way to obtain a distance of each point to its BMU?

    Is there a way to obtain a distance of each point to its BMU?

    Hi, first and foremost thank you for your great work and allowing to implement SOM algorithm in such convienent way. I wanted to ask if there is a possibility to obtain a kind of list with the distances between each point and its Best Matching Unit (Node) on trained SOM grid? I have read the documentation and saw different attributes for the SOM object, however it appears to me that none of them allow to return the (euclidean) distance to BMU. Thanks in advance for support!

    question 
    opened by JMiklaszewski 1
  • Is there an option to obtain the BMU value directly?

    Is there an option to obtain the BMU value directly?

    Hi there,

    I am trying to use BMU values a metric to classify my data. Features are seismic attributes. Your function “distance_from_weights” was my first guess but it´s not exporting BMUS directly. We do have to manipulate it to remove the second BMU.

    np.argsort(distance_from_weights(data), axis=1)[:, :2] -----> np.argsort(distance_from_weights(data), axis=1)[:, :1]

    Do you mind to build that function?

    question 
    opened by akol67 1
  • Wrong value in topographic error function?

    Wrong value in topographic error function?

    So a topographic error occurs when the two bmu of a sample are not adjacent. Shouldn't then t = 1? If the bmu are two hops apart in a corner, their euclidean distance is sqrt(2) = 1.4142 . So with distance > 1.42 this doesn't count as an error. Or am I missing something?

    question 
    opened by SandroMartens 0
  • Example spatio-temporal climate data

    Example spatio-temporal climate data

    This pull request is to load a SOM example on climate data notebook, which is usually 2D (time, lat, lon).

    I've been looking a lot into SOM examples, and it's hard to find examples on climate data...so I hope this notebook can help future users (and also me, if you find something wrong on the use).

    For the example, I've used the tutorial dataset from Xarray.

    opened by carocamargo 2
Releases(2.3.0)
Owner
Giuseppe Vettigli
Data Scientist, teaching fellow, Python enthusiast, fearless visionarist, lateral thinker.
Giuseppe Vettigli
A light weight data augmentation tool for training CNNs and Viola Jones detectors

hey-daug A light weight data augmentation tool for training CNNs and Viola Jones detectors (Haar Cascades). This tool inflates your data by up to six

Jaiyam Sharma 2 Nov 23, 2019
TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning

TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning Authors: Yixuan Su, Fangyu Liu, Zaiqiao Meng, Lei Shu, Ehsan Shareghi, and Nig

Yixuan Su 79 Nov 04, 2022
Official Implementation of "Third Time's the Charm? Image and Video Editing with StyleGAN3" https://arxiv.org/abs/2201.13433

Third Time's the Charm? Image and Video Editing with StyleGAN3 Yuval Alaluf*, Or Patashnik*, Zongze Wu, Asif Zamir, Eli Shechtman, Dani Lischinski, Da

531 Dec 20, 2022
An easy way to build PyTorch datasets. Modularly build datasets and automatically cache processed results

EasyDatas An easy way to build PyTorch datasets. Modularly build datasets and automatically cache processed results Installation pip install git+https

Ximing Yang 4 Dec 14, 2021
[NeurIPS2021] Exploring Architectural Ingredients of Adversarially Robust Deep Neural Networks

Exploring Architectural Ingredients of Adversarially Robust Deep Neural Networks Code for NeurIPS 2021 Paper "Exploring Architectural Ingredients of A

Hanxun Huang 26 Dec 01, 2022
交互式标注软件,暂定名 iann

iann 交互式标注软件,暂定名iann。 安装 按照官网介绍安装paddle。 安装其他依赖 pip install -r requirements.txt 运行 git clone https://github.com/PaddleCV-SIG/iann/ cd iann python iann

294 Dec 30, 2022
DIR-GNN - Discovering Invariant Rationales for Graph Neural Networks

DIR-GNN "Discovering Invariant Rationales for Graph Neural Networks" (ICLR 2022)

Ying-Xin (Shirley) Wu 70 Nov 13, 2022
Object Detection Projekt in GKI WS2021/22

tfObjectDetection Object Detection Projekt with tensorflow in GKI WS2021/22 Docker Container: docker run -it --name --gpus all -v path/to/project:p

Tim Eggers 1 Jul 18, 2022
Official code release for 3DV 2021 paper Human Performance Capture from Monocular Video in the Wild.

Official code release for 3DV 2021 paper Human Performance Capture from Monocular Video in the Wild.

Chen Guo 58 Dec 24, 2022
Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)

Taming Visually Guided Sound Generation • [Project Page] • [ArXiv] • [Poster] • • Listen for the samples on our project page. Overview We propose to t

Vladimir Iashin 226 Jan 03, 2023
Hypercomplex Neural Networks with PyTorch

HyperNets Hypercomplex Neural Networks with PyTorch: this repository would be a container for hypercomplex neural network modules to facilitate resear

Eleonora Grassucci 21 Dec 27, 2022
details on efforts to dump the Watermelon Games Paprium cart

Reminder, if you like these repos, fork them so they don't disappear https://github.com/ArcadeHustle/WatermelonPapriumDump/fork Big thanks to Fonzie f

Hustle Arcade 29 Dec 11, 2022
Adaptive, interpretable wavelets across domains (NeurIPS 2021)

Adaptive wavelets Wavelets which adapt given data (and optionally a pre-trained model). This yields models which are faster, more compressible, and mo

Yu Group 50 Dec 16, 2022
Contrastively Disentangled Sequential Variational Audoencoder

Contrastively Disentangled Sequential Variational Audoencoder (C-DSVAE) Overview This is the implementation for our C-DSVAE, a novel self-supervised d

Junwen Bai 35 Dec 24, 2022
Code for "Training Neural Networks with Fixed Sparse Masks" (NeurIPS 2021).

Fisher Induced Sparse uncHanging (FISH) Mask This repo contains the code for Fisher Induced Sparse uncHanging (FISH) Mask training, from "Training Neu

Varun Nair 37 Dec 30, 2022
Multi-layer convolutional LSTM with Pytorch

Convolution_LSTM_pytorch Thanks for your attention. I haven't got time to maintain this repo for a long time. I recommend this repo which provides an

Zijie Zhuang 733 Dec 30, 2022
Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms

FNet: Mixing Tokens with Fourier Transforms Pytorch implementation of Fnet : Mixing Tokens with Fourier Transforms. Citation: @misc{leethorp2021fnet,

Rishikesh (ऋषिकेश) 218 Jan 05, 2023
Article Reranking by Memory-enhanced Key Sentence Matching for Detecting Previously Fact-checked Claims.

MTM This is the official repository of the paper: Article Reranking by Memory-enhanced Key Sentence Matching for Detecting Previously Fact-checked Cla

ICTMCG 13 Sep 17, 2022
hySLAM is a hybrid SLAM/SfM system designed for mapping

HySLAM Overview hySLAM is a hybrid SLAM/SfM system designed for mapping. The system is based on ORB-SLAM2 with some modifications and refactoring. Raú

Brian Hopkinson 15 Oct 10, 2022
Capsule endoscopy detection DACON challenge

capsule_endoscopy_detection (DACON Challenge) Overview Yolov5, Yolor, mmdetection기반의 모델을 사용 (총 11개 모델 앙상블) 모든 모델은 학습 시 Pretrained Weight을 yolov5, yolo

MAILAB 11 Nov 25, 2022