Neighbourhood Retrieval with Distance Correlation

Assign Pseudo class labels to datapoints in the latent space.

NNDC is a slim wrapper around FAISS.
NNDC transforms the space such that the Inner Product Index in FAISS (IndexFlatIP) computes the Distance Correlation.
Support for KernelPCA (non-linear PCA) for dimensionality reduction.

Installation

pip install git+https://github.com/The-Learning-Machines/nndc

Usage

dim = 128 
n = 20000

import nndc
import numpy as np

index = nndc.DCIndex(
    in_dim=dim, # Dimensionality of the input vectors
    threshold=0.2, # How far away from a vector is the neighbourhood
    out_dim=32, # Dimensionality of the vectors after PCA (only needed if using PCA)
    use_pca=True, # Use KernelPCA
    verbose=True,
    kernel="rbf" # Use Radial Basis Function as the kernel for KernelPCA
)

# Generate Random data
np.random.seed(1234)             
xb = np.random.random((n, dim)).astype('float32')
xb[:, 0] += np.arange(n) / 1000.
xq = np.random.random((100, dim)).astype('float32')
xq[:, 0] += np.arange(100) / 1000.

# Fit KernelPCA
index.add_pca_training_data(xb[:1000, :])
index.fit_pca()

# Add vectors to the Index
vector_ids = np.arange(xb.shape[0])
index.add(xb, vector_ids)

# Build a nerighbourhood graph
index.build_neighbourhood()

# Query the neighbours of vector with ID=0
neighbour_ids, neighbour_similarity = index[0]

Neighbourhood Retrieval (Nearest Neighbours) with Distance Correlation.

Related tags

Overview

Neighbourhood Retrieval with Distance Correlation

Installation

Usage

Owner

The Learning Machines

FLAML is a lightweight Python library that finds accurate machine learning models automatically, efficiently and economically

ClearML - Auto-Magical Suite of tools to streamline your ML workflow. Experiment Manager, MLOps and Data-Management

Machine Learning for Time-Series with Python.Published by Packt

Simulation of early COVID-19 using SIR model and variants (SEIR ...).

A library to generate synthetic time series data by easy-to-use factors and generator

Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray

Python Machine Learning Jupyter Notebooks (ML website)

Used Logistic Regression, Random Forest, and XGBoost to predict the outcome of Search & Destroy games from the Call of Duty World League for the 2018 and 2019 seasons.

Multiple Linear Regression using the LinearRegression class from sklearn.linear_model library

A logistic regression model for health insurance purchasing prediction

Causal Inference and Machine Learning in Practice with EconML and CausalML: Industrial Use Cases at Microsoft, TripAdvisor, Uber

A Tools that help Data Scientists and ML engineers train and deploy ML models.

pandas, scikit-learn, xgboost and seaborn integration

Machine learning that just works, for effortless production applications

Send rockets to Mars with artificial intelligence(Genetic algorithm) in python.

Adaptive: parallel active learning of mathematical functions

A chain of stores, 10 different stores and 50 different requests a 3-month demand forecast for its product.

Python implementation of Weng-Lin Bayesian ranking, a better, license-free alternative to TrueSkill

A Time Series Library for Apache Spark

A scikit-learn based module for multi-label et. al. classification