TriMap: Large-scale Dimensionality Reduction Using Triplets

Related tags

Deep Learningtrimap
Overview

TriMap

TriMap is a dimensionality reduction method that uses triplet constraints to form a low-dimensional embedding of a set of points. The triplet constraints are of the form "point i is closer to point j than point k". The triplets are sampled from the high-dimensional representation of the points and a weighting scheme is used to reflect the importance of each triplet.

TriMap provides a significantly better global view of the data than the other dimensionality reduction methods such t-SNE, LargeVis, and UMAP. The global structure includes relative distances of the clusters, multiple scales in the data, and the existence of possible outliers. We define a global score to quantify the quality of an embedding in reflecting the global structure of the data.

CIFAR-10 dataset (test set) passed through a CNN (n = 10,000, d = 1024): Notice the semantic structure unveiled by TriMap.

Visualizations of the CIFAR-10 dataset

The following implementation is in Python. Further details and more experimental results are available in the paper.

How to use TriMap

TriMap has a transformer API similar to other sklearn libraries. To use TriMap with the default parameters, simply do:

import trimap
from sklearn.datasets import load_digits

digits = load_digits()

embedding = trimap.TRIMAP().fit_transform(digits.data)

To find the embedding using a precomputed pairwise distance matrix D, pass D as input and set use_dist_matrix to True:

embedding = trimap.TRIMAP(use_dist_matrix=True).fit_transform(D)

You can also pass the precomputed k-nearest neighbors and their corresponding distances as a tuple (knn_nbrs, knn_distances). Note that the rows must be in order, starting from point 0 to n-1. This feature also requires X to compute the embedding

embedding = trimap.TRIMAP(knn_tuple=(knn_nbrs, knn_distances)).fit_transform(X)

To calculate the global score, do:

gs = trimap.TRIMAP(verbose=False).global_score(digits.data, embedding)
print("global score %2.2f" % gs)

Parameters

The list of parameters is given blow:

  • n_dims: Number of dimensions of the embedding (default = 2)
  • n_inliers: Number of nearest neighbors for forming the nearest neighbor triplets (default = 10).
  • n_outliers: Number of outliers for forming the nearest neighbor triplets (default = 5).
  • n_random: Number of random triplets per point (default = 5).
  • distance: Distance measure ('euclidean' (default), 'manhattan', 'angular', 'hamming')
  • weight_adj: The value of gamma for the log-transformation (default = 500.0).
  • lr: Learning rate (default = 1000.0).
  • n_iters: Number of iterations (default = 400).

The other parameters include:

  • knn_tuple: Use the precomputed nearest-neighbors information in form of a tuple (knn_nbrs, knn_distances) (default = None)
  • use_dist_matrix: Use the precomputed pairwise distance matrix (default = False)
  • apply_pca: Reduce the number of dimensions of the data to 100 if necessary before applying the nearest-neighbor search (default = True).
  • opt_method: Optimization method {'sd' (steepest descent), 'momentum' (GD with momentum), 'dbd' (delta-bar-delta, default)}.
  • verbose: Print the progress report (default = True).
  • return_seq: Store the intermediate results and return the results in a tensor (default = False).

An example of adjusting these parameters:

import trimap
from sklearn.datasets import load_digits

digits = load_digits()

embedding = trimap.TRIMAP(n_inliers=20,
                          n_outliers=10,
                          n_random=10,
                          weight_adj=1000.0).fit_transform(digits.data)

The nearest-neighbor calculation is performed using ANNOY.

Examples

The following are some of the results on real-world datasets. The values of nearest-neighbor accuracy and global score are shown as a pair (NN, GS) on top of each figure. For more results, please refer to our paper.

USPS Handwritten Digits (n = 11,000, d = 256)

Visualizations of the USPS dataset

20 News Groups (n = 18,846, d = 100)

Visualizations of the 20 News Groups dataset

Tabula Muris (n = 53,760, d = 23,433)

Visualizations of the Tabula Muris Mouse Tissues dataset

MNIST Handwritten Digits (n = 70,000, d = 784)

Visualizations of the MNIST dataset

Fashion MNIST (n = 70,000, d = 784)

Visualizations of the  Fashion MNIST dataset

TV News (n = 129,685, d = 100)

Visualizations of the  TV News dataset

Runtime of t-SNE, LargeVis, UMAP, and TriMap in the hh:mm:ss format on a single machine with 2.6 GHz Intel Core i5 CPU and 16 GB of memory is given in the following table. We limit the runtime of each method to 12 hours. Also, UMAP runs out of memory on datasets larger than ~4M points.

Runtime of TriMap compared to other methods

Installing

Requirements:

  • numpy
  • scikit-learn
  • numba
  • annoy

Installing annoy

If you are having trouble with installing annoy on macOS using the command:

pip3 install annoy

you can alternatively try:

pip3 install git+https://github.com/sutao/[email protected]

Install Options

If you have all the requirements installed, you can use pip:

sudo pip install trimap

Please regularly check for updates and make sure you are using the most recent version. If you have TriMap installed and would like to upgrade to the newer version, you can use the command:

sudo pip install --upgrade --force-reinstall trimap

An alternative is to install the dependencies manually using anaconda and using pip to install TriMap:

conda install numpy
conda install scikit-learn
conda install numba
conda install annoy
pip install trimap

For a manual install get this package:

wget https://github.com/eamid/trimap/archive/master.zip
unzip master.zip
rm master.zip
cd trimap-master

Install the requirements

sudo pip install -r requirements.txt

or

conda install scikit-learn numba annoy

Install the package

python setup.py install

Support and Contribution

This implementation is still a work in progress. Any comments/suggestions/bug-reports are highly appreciated. Please feel free contact me at: [email protected]. If you would like to contribute to the code, please fork the project and send me a pull request.

Citation

If you use TriMap in your publications, please cite our current reference on arXiv:

@article{2019TRIMAP,
     author = {{Amid}, Ehsan and {Warmuth}, Manfred K.},
     title = "{TriMap: Large-scale Dimensionality Reduction Using Triplets}",
     journal = {arXiv preprint arXiv:1910.00204},
     archivePrefix = "arXiv",
     eprint = {1910.00204},
     year = 2019,
}

License

Please see the LICENSE file.

Owner
Ehsan Amid
Research Scientist at Google Mountain View
Ehsan Amid
Jaxtorch (a jax nn library)

Jaxtorch (a jax nn library) This is my jax based nn library. I created this because I was annoyed by the complexity and 'magic'-ness of the popular ja

nshepperd 17 Dec 08, 2022
This project uses ViT to perform image classification tasks on DATA set CIFAR10.

Vision-Transformer-Multiprocess-DistributedDataParallel-Apex Introduction This project uses ViT to perform image classification tasks on DATA set CIFA

Kaicheng Yang 3 Jun 03, 2022
Dynamic Graph Event Detection

DyGED Dynamic Graph Event Detection Get Started pip install -r requirements.txt TODO Paper link to arxiv, and how to cite. Twitter Weather dataset tra

Mert Koşan 3 May 09, 2022
Implementation EfficientDet: Scalable and Efficient Object Detection in PyTorch

Implementation EfficientDet: Scalable and Efficient Object Detection in PyTorch

tonne 1.4k Dec 29, 2022
Yolov5+SlowFast: Realtime Action Detection Based on PytorchVideo

Yolov5+SlowFast: Realtime Action Detection A realtime action detection frame work based on PytorchVideo. Here are some details about our modification:

WuFan 181 Dec 30, 2022
Given a 2D triangle mesh, we could randomly generate cloud points that fill in the triangle mesh

generate_cloud_points Given a 2D triangle mesh, we could randomly generate cloud points that fill in the triangle mesh. Run python disp_mesh.py Or you

Peng Yu 2 Dec 24, 2021
CONditionals for Ordinal Regression and classification in PyTorch

CONDOR pytorch implementation for ordinal regression with deep neural networks. Documentation: https://GarrettJenkinson.github.io/condor_pytorch About

7 Jul 25, 2022
ICNet and PSPNet-50 in Tensorflow for real-time semantic segmentation

Real-Time Semantic Segmentation in TensorFlow Perform pixel-wise semantic segmentation on high-resolution images in real-time with Image Cascade Netwo

Oles Andrienko 219 Nov 21, 2022
This is the face keypoint train code of project face-detection-project

face-key-point-pytorch 1. Data structure The structure of landmarks_jpg is like below: |--landmarks_jpg |----AFW |------AFW_134212_1_0.jpg |------AFW_

I‘m X 3 Nov 27, 2022
A comprehensive and up-to-date developer education platform for Urbit.

curriculum A comprehensive and up-to-date developer education platform for Urbit. This project organizes developer capabilities into a hierarchy of co

Sigilante 36 Oct 04, 2022
Unofficial implementation of Proxy Anchor Loss for Deep Metric Learning

Proxy Anchor Loss for Deep Metric Learning Unofficial pytorch, tensorflow and mxnet implementations of Proxy Anchor Loss for Deep Metric Learning. Not

Geonmo Gu 3 Jun 09, 2021
⚡ H2G-Net for Semantic Segmentation of Histopathological Images

H2G-Net This repository contains the code relevant for the proposed design H2G-Net, which was introduced in the manuscript "Hybrid guiding: A multi-re

André Pedersen 8 Nov 24, 2022
Implementation of ToeplitzLDA for spatiotemporal stationary time series data.

Code for the ToeplitzLDA classifier proposed in here. The classifier conforms sklearn and can be used as a drop-in replacement for other LDA classifiers. For in-depth usage refer to the learning from

Jan Sosulski 5 Nov 07, 2022
Fashion Landmark Estimation with HRNet

HRNet for Fashion Landmark Estimation (Modified from deep-high-resolution-net.pytorch) Introduction This code applies the HRNet (Deep High-Resolution

SVIP Lab 91 Dec 26, 2022
Implementation of: "Exploring Randomly Wired Neural Networks for Image Recognition"

RandWireNN Unofficial PyTorch Implementation of: Exploring Randomly Wired Neural Networks for Image Recognition. Results Validation result on Imagenet

Seung-won Park 684 Nov 02, 2022
I will implement Fastai in each projects present in this repository.

DEEP LEARNING FOR CODERS WITH FASTAI AND PYTORCH The repository contains a list of the projects which I have worked on while reading the book Deep Lea

Thinam Tamang 43 Dec 20, 2022
Convolutional Neural Network for 3D meshes in PyTorch

MeshCNN in PyTorch SIGGRAPH 2019 [Paper] [Project Page] MeshCNN is a general-purpose deep neural network for 3D triangular meshes, which can be used f

Rana Hanocka 1.4k Jan 04, 2023
Cold Brew: Distilling Graph Node Representations with Incomplete or Missing Neighborhoods

Cold Brew: Distilling Graph Node Representations with Incomplete or Missing Neighborhoods Introduction Graph Neural Networks (GNNs) have demonstrated

37 Dec 15, 2022
A CNN model to detect hand gestures.

Software Used python - programming language used, tested on v3.8 miniconda - for managing virtual environment Libraries Used opencv - pip install open

Shivanshu 6 Jul 14, 2022
A library for efficient similarity search and clustering of dense vectors.

Faiss Faiss is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any

Meta Research 18.8k Jan 08, 2023