Torch-based tool for quantizing high-dimensional vectors using additive codebooks

Last update: Jan 07, 2023

Related tags

Overview

Trainable multi-codebook quantization

This repository implements a utility for use with PyTorch, and ideally GPUs, for training an efficient quantizer based on multiple single-byte codebooks. The prototypical scenario is that you have some distribution over vectors in some space, say, of dimension 512, that might come from a neural net embedding, and you want a means of encoding a vector into a short sequence of bytes (say, 4 or 8 bytes) that can be used to reconstruct the vector with minimal expected loss, measured as squared distance, i.e. squared l2 loss.

This repository provides Quantizer object that lets you do this quantization, and an associated QuantizerTrainer object that you can use to train the Quantizer. For example, you might invoke the QuantizerTrainer with 20,000 minibatches of vectors.

Usage

Installation

python3 setup.py install

Example

import torch
import quantization

trainer = quantization.QuantizerTrainer(dim=256, bytes_per_frame=4,
                                        device=torch.device('cuda'))
while not trainer.done():
   # let x be some tensor of shape (*, dim), that you will train on
   # (should not be the same on each minibatch)
   trainer.step(x)
quantizer = trainer.get_quantizer()

# let x be some tensor of shape (*, dim)..
encoded = quantizer.encode(x)  # (*, 4), dtype=uint8
x_approx = quantizer.decode(quantizer.encode(x))

To avoid versioning issues and so on, it may be easier to just include quantization.py in your repository directly (and add its requirements to your requirements.txt).

Torch-based tool for quantizing high-dimensional vectors using additive codebooks

Related tags

Overview

Trainable multi-codebook quantization

Usage

Installation

Example

Owner

Daniel Povey

Semi-Supervised Semantic Segmentation with Pixel-Level Contrastive Learning from a Class-wise Memory Bank

[ICCV 2021] Deep Hough Voting for Robust Global Registration

Train a deep learning net with OpenStreetMap features and satellite imagery.

A deep learning based semantic search platform that computes similarity scores between provided query and documents

FSL-Mate: A collection of resources for few-shot learning (FSL).

Tutoriais publicados nas nossas redes sociais para obtenção de dados, análises simples e outras tarefas relevantes no mercado financeiro.

The openspoor package is intended to allow easy transformation between different geographical and topological systems commonly used in Dutch Railway

Semantic Segmentation with SegFormer on Drone Dataset.

Age and Gender prediction using Keras

Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

Clustering with variational Bayes and population Monte Carlo

Official Pytorch Implementation for Splicing ViT Features for Semantic Appearance Transfer presenting Splice

Supercharging Imbalanced Data Learning WithCausal Representation Transfer

The Instructed Glacier Model (IGM)

sssegmentation is a general framework for our research on strongly supervised semantic segmentation.

The Malware Open-source Threat Intelligence Family dataset contains 3,095 disarmed PE malware samples from 454 families

A Factor Model for Persistence in Investment Manager Performance

Classify bird species based on their songs using SIamese Networks and 1D dilated convolutions.

Efficient Sharpness-aware Minimization for Improved Training of Neural Networks

Connecting Java/ImgLib2 + Python/NumPy