simsity

Simsity is a Super Simple Similarities Service[tm].
It's all about building a neighborhood. Literally!

This repository contains simple tools to help in similarity retrieval scenarios by making a convenient wrapper around hnswlib. Typical usecases include early stage bulk labelling and duplication discovery.

Install

You can install simsity via pip.

python -m pip install simsity

The goal of simsity is to be minimal, to make rapid prototyping very easy and to be "just enough" for medium sized datasets. You will mainly interact with these two functions.

from simsity import create_index, load_index

As their names imply, you can use these to create an index or to load one from disk.

Quickstart

from simsity import create_index, load_index

# Let's fetch some demo data
from simsity.datasets import fetch_recipes
df_recipes = fetch_recipes()
recipes = df_recipes["text"]

# Let's use embetter for embeddings 
from embetter.text import SentenceEncoder
encoder = SentenceEncoder()

# Populate the ANN vector index and use it. 
index = create_index(recipes, encoder)
texts, dists = index.query("pork")

# You can also query using vectors
v_pork = encoder.transform(["pork"])[0]
texts, dists = index.query_vector(v_pork)

You can also provide a path and then you'll be able to store/load everything.

# Make an index with a path
index = create_index(recipes, encoder, path="demo")

# Load an index from a path
reloaded_index = load_index(path="demo", encoder=encoder)
texts, dists = reloaded_index.query("pork")

That's it! Happy hacking!

Name		Name	Last commit message	Last commit date
Latest commit History 206 Commits
.github/workflows		.github/workflows
simsity		simsity
tests		tests
.flake8		.flake8
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

simsity

simsity

tests

tests

.flake8

.flake8

.gitignore

.gitignore

LICENSE

LICENSE

Makefile

Makefile

README.md

README.md

setup.py

setup.py

Repository files navigation

simsity

Install

Quickstart

About

Releases 4

Packages

Contributors 3

Languages

License

koaning/simsity

Folders and files

Latest commit

History

Repository files navigation

simsity

Install

Quickstart

About

Resources

License

Stars

Watchers

Forks

Languages