Non-Metric Space Library (NMSLIB): An efficient similarity search library and a toolkit for evaluation of k-NN methods for generic non-metric spaces.

Overview

Pypi version Downloads Downloads Build Status Windows Build Status Join the chat at https://gitter.im/nmslib/Lobby

Non-Metric Space Library (NMSLIB)

Important Notes

  • NMSLIB is generic but fast, see the results of ANN benchmarks.
  • A standalone implementation of our fastest method HNSW also exists as a header-only library.
  • All the documentation (including using Python bindings and the query server, description of methods and spaces, building the library, etc) can be found on this page.
  • For generic questions/inquiries, please, use the Gitter chat: GitHub issues page is for bugs and feature requests.

Objectives

Non-Metric Space Library (NMSLIB) is an efficient cross-platform similarity search library and a toolkit for evaluation of similarity search methods. The core-library does not have any third-party dependencies. It has been gaining popularity recently. In particular, it has become a part of Amazon Elasticsearch Service.

The goal of the project is to create an effective and comprehensive toolkit for searching in generic and non-metric spaces. Even though the library contains a variety of metric-space access methods, our main focus is on generic and approximate search methods, in particular, on methods for non-metric spaces. NMSLIB is possibly the first library with a principled support for non-metric space searching.

NMSLIB is an extendible library, which means that is possible to add new search methods and distance functions. NMSLIB can be used directly in C++ and Python (via Python bindings). In addition, it is also possible to build a query server, which can be used from Java (or other languages supported by Apache Thrift (version 0.12). Java has a native client, i.e., it works on many platforms without requiring a C++ library to be installed.

Authors: Bilegsaikhan Naidan, Leonid Boytsov, Yury Malkov, David Novak. With contributions from Ben Frederickson, Lawrence Cayton, Wei Dong, Avrelin Nikita, Dmitry Yashunin, Bob Poekert, @orgoro, @gregfriedland, Scott Gigante, Maxim Andreev, Daniel Lemire, Nathan Kurz, Alexander Ponomarenko.

Brief History

NMSLIB started as a personal project of Bilegsaikhan Naidan, who created the initial code base, the Python bindings, and participated in earlier evaluations. The most successful class of methods--neighborhood/proximity graphs--is represented by the Hierarchical Navigable Small World Graph (HNSW) due to Malkov and Yashunin (see the publications below). Other most useful methods, include a modification of the VP-tree due to Boytsov and Naidan (2013), a Neighborhood APProximation index (NAPP) proposed by Tellez et al. (2013) and improved by David Novak, as well as a vanilla uncompressed inverted file.

Credits and Citing

If you find this library useful, feel free to cite our SISAP paper [BibTex] as well as other papers listed in the end. One crucial contribution to cite is the fast Hierarchical Navigable World graph (HNSW) method [BibTex]. Please, also check out the stand-alone HNSW implementation by Yury Malkov, which is released as a header-only HNSWLib library.

License

The code is released under the Apache License Version 2.0 http://www.apache.org/licenses/. Older versions of the library include additional components, which have different licenses (but this does not apply to NMLISB 2.x):

Older versions of the library included the following components:

  • The LSHKIT, which is embedded in our library, is distributed under the GNU General Public License, see http://www.gnu.org/licenses/.
  • The k-NN graph construction algorithm NN-Descent due to Dong et al. 2011 (see the links below), which is also embedded in our library, seems to be covered by a free-to-use license, similar to Apache 2.
  • FALCONN library's licence is MIT.

Funding

Leonid Boytsov was supported by the Open Advancement of Question Answering Systems (OAQA) group and the following NSF grant #1618159: "Matching and Ranking via Proximity Graphs: Applications to Question Answering and Beyond". Bileg was supported by the iAd Center.

Related Publications

Most important related papers are listed below in the chronological order:

Comments
  • Add support to build aarch64 wheels

    Add support to build aarch64 wheels

    Travis-CI allows for the creation of aarch64 wheels.

    Build: https://travis-ci.com/github/janaknat/nmslib/builds/205780637

    There are 8-9 failures when testing hnsw. Any suggestions on how to fix these? A majority of the failures are due to expected=0.99 and calculated=~0.98.

    Tagging @jmazanec15 since he added ARM compatibility.

    opened by janaknat 33
  • Speed up pip install

    Speed up pip install

    Currently pip installing is slow, since there is a compile step. Is there any way to speed it up? On my macbook:

    time pip install --no-cache nmslib
    Collecting nmslib
      Downloading https://files.pythonhosted.org/packages/e1/95/1f7c90d682b79398c5ee3f9296be8d2640fa41de24226bcf5473c801ada6/nmslib-1.7.3.6.tar.gz (255kB)
        100% |████████████████████████████████| 256kB 8.8MB/s 
    Requirement already satisfied: pybind11>=2.0 in .../virtualenv/python3.6/lib/python3.6/site-packages (from nmslib) (2.2.4)
    Requirement already satisfied: numpy in .../virtualenv/python3.6/lib/python3.6/site-packages (from nmslib) (1.15.4)
    Installing collected packages: nmslib
      Running setup.py install for nmslib ... -
    done
    Successfully installed nmslib-1.7.3.6
    
    real	3m11.091s
    

    would it be a good idea to provide pre-compiled wheels over pip? That would also simplify the process of finding the pybind11 headers (I had to do something special to copy them in for pip when running with a --target dir)

    opened by matthen 33
  • Can't load index?

    Can't load index?

    Hi, this might me more of a question than problem in the library. I have created an index with NAPP and saved it using saveIndex. However when I load it with loadIndex I get the following error:

    Check failed: A previously saved index is apparently used with a different data set, a different data set split, and/or a different gold standard file! (detected an object index >= #of data points

    Am I doing something wrong?

    Thanks for the help.

    EDIT: The message doesn't make sense to me because I'm not "using the index with a data set", I'm just loading it.

    EDIT2: I'm using the Python interface.

    enhancement 
    opened by zommerfelds 31
  • Custom Metrics

    Custom Metrics

    Hello,

    I wanted to perform NN search on a dataset of genomes. For this task, the distance between 2 datapoints is calculated by a custom script? Is there I can incorporate this without having to create the entire NN search algorithm myself and only modify some parts of your code?

    opened by Chokerino 30
  • Python process crashes: 'pybind11::error_already_set'

    Python process crashes: 'pybind11::error_already_set'

    nmslib is the only lib in our project that relies on pybind11 and we could narrow it down to the Dask nodes that use nmslib. When we disable the nodes that use nmslib it doesn't crash.

    terminate called after throwing an instance of 'pybind11::error_already_set'
      what():  TypeError: '>=' not supported between instances of 'int' and 'NoneType'
    
    At:
      /opt/conda/envs/jobnet-env/lib/python3.6/logging/__init__.py(1546): isEnabledFor
      /opt/conda/envs/jobnet-env/lib/python3.6/logging/__init__.py(1293): debug
    
    /usr/local/bin/entrypoint.sh: line 46:    21 Aborted                 (core dumped) python scripts/cli.py "${@:2}"```
    

    Version:

    - nmslib~=1.7.2
    - pybind11=2.2
    
    opened by lukin0110 28
  • Make failed in linking Boost library

    Make failed in linking Boost library

    Hello,

    I am facing an error in this step:

    [ 75%] Linking CXX executable ../release/experiment

    All of errors liked that:

    undefined reference to `boost::program_options:

    I install latest libraries version and checked that libboost 1.58 is compatible with g++ 4.9. I think maybe it related with C++11, however It returns error in both g++ 4.9 and 4.7.

    This is my system information:

    -- The C compiler identification is GNU 4.9.3 -- The CXX compiler identification is GNU 4.9.3 -- Check for working C compiler: /usr/bin/cc -- Check for working C compiler: /usr/bin/cc -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Detecting C compile features -- Detecting C compile features - done -- Check for working CXX compiler: /usr/bin/c++ -- Check for working CXX compiler: /usr/bin/c++ -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Detecting CXX compile features -- Detecting CXX compile features - done -- Build type: Release -- GSL using gsl-config /usr/bin/gsl-config -- Using GSL from /usr -- Found GSL. -- Found Eigen3: /usr/include/eigen3 (Required is at least version "3") -- Found Eigen3. -- Boost version: 1.58.0 -- Found the following Boost libraries: -- system -- filesystem -- program_options -- Found BOOST.

    I also install Clang and LLDB 3.6. I tried search many possible solution but can not fix that :(.

    opened by nguyenv7 26
  • Python wrapper crashes while retrieving nearest neighbors when M>100

    Python wrapper crashes while retrieving nearest neighbors when M>100

    Hi, I am working on a problem where I need to retrieve ~500 nearest neighbors out of a million points. I am using the python wrapper for HNSW method. The code works perfectly well if I set the value of parameter M <=100 but setting it greater than 100, the code crashes during retrieving nearest neighbors (no issues while building the model) with an "invalid next size" error. Any idea why this might be happening? Thanks Himanshu

    bug 
    opened by hjain689 25
  • Incorrect distances returned for all-zero query

    Incorrect distances returned for all-zero query

    An all-zero query vector will result in NMSLib incorrectly reporting a distance of zero for its nearest neighbours (see example below). Is this related to #187? Is there a suggested workaround?

    # Training set (CSR sparse matrix)
    X.todense()
    # Out:
    # matrix([[4., 2., 3., 1., 0., 0., 0., 0., 0.],
    #         [2., 1., 0., 0., 3., 0., 1., 2., 1.],
    #         [4., 2., 0., 0., 3., 1., 0., 0., 0.]], dtype=float32)
    
    # Query vector (CSR sparse matrix)
    r.todense()
    # Out:
    # matrix([[0., 0., 0., 0., 0., 0., 0., 0., 0.]], dtype=float32)
    
    # Train and query
    import nmslib
    index = nmslib.init(
        method='hnsw',
        space='cosinesimil_sparse_fast',
        data_type=nmslib.DataType.SPARSE_VECTOR,
        dtype=nmslib.DistType.FLOAT)
    index.addDataPointBatch(X)
    index.createIndex()
    index.knnQueryBatch(r, k=3)
    # Out:
    # [(array([2, 1, 0], dtype=int32), array([0., 0., 0.], dtype=float32))]
    
    # Note that distances are all 0, which is incorrect!
    # Same result for dense training & query vectors.
    
    bug 
    opened by lsorber 24
  • Jaccard to method HSNW for sparse features

    Jaccard to method HSNW for sparse features

    Hi,

    I want to know if HSNW provides Jaccard (similarity or distance, does not matter), besides cosine, for sparse features. There are scenarios in which Jaccard outperforms.

    Python notebooks provided show the following metrices: l2, l2sqr_sift, cosinesimil_sparse.

    According to space_sparse_scalar.h, the following metrices seem to be implemented, or in preparation, to sparse features: #define SPACE_SPARSE_COSINE_SIMILARITY "cosinesimil_sparse" #define SPACE_SPARSE_ANGULAR_DISTANCE "angulardist_sparse" #define SPACE_SPARSE_NEGATIVE_SCALAR "negdotprod_sparse" #define SPACE_SPARSE_QUERY_NORM_NEGATIVE_SCALAR "querynorm_negdotprod_sparse"

    What does each of these metrices mean? I also saw cosinesimil_sparse_fast in a few files. What is it, and how is it compared to cosinesimil_sparse? Is it ready for use?

    I can provide a Jaccard implementation for sparse vectors, given 2 vectors implemented as hash tables, but I haven't found out how to integrate it to the code. It would also be preferable to check which metrices are already available. The closest clue I got was to expand the following files: distcomp_scalar.cc, hnsw.cc and hnsw_distfunc_opt.cc, but I am not sure which steps to make. I saw some mentions to Jaccard in space_sparse_jaccard.cc and distcomp.h. But no examples are given.

    Thanks in advance.

    opened by icarocd 24
  • pybind11.h not found when installing using pip

    pybind11.h not found when installing using pip

    I'm trying to install python bindings on Ubuntu 16.04 machine:

    $ pip3 install pybind11 nmslib
    Collecting nmslib
      Using cached https://files.pythonhosted.org/packages/de/eb/28b2060bb1750426c5618e3ad6ce830ac3cfd56cb3eccfb799e52d6064db/nmslib-1.7.2.tar.gz
    Requirement already satisfied: pybind11>=2.0 in /homes/alexandrov/.virtualenvs/pytorch/lib/python3.5/site-packages (from nmslib) (2.2.2)
    Requirement already satisfied: numpy in /homes/alexandrov/.virtualenvs/pytorch/lib/python3.5/site-packages (from nmslib) (1.14.2)
    Building wheels for collected packages: nmslib
      Running setup.py bdist_wheel for nmslib ... error
      Complete output from command /homes/alexandrov/.virtualenvs/pytorch/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-0y71oxa4/nmslib/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/pip-wheel-916r1rr9 --python-tag cp35:
      running bdist_wheel
      running build
      running build_ext
      creating tmp
      x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.5m -I/homes/alexandrov/.virtualenvs/pytorch/include/python3.5m -c /tmp/tmpwekdswov.cpp -o tmp/tmpwekdswov.o -std=c++14
      cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
      x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.5m -I/homes/alexandrov/.virtualenvs/pytorch/include/python3.5m -c /tmp/tmpyyphh022.cpp -o tmp/tmpyyphh022.o -fvisibility=hidden
      cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
      building 'nmslib' extension
      creating build
      creating build/temp.linux-x86_64-3.5
      creating build/temp.linux-x86_64-3.5/nmslib
      creating build/temp.linux-x86_64-3.5/nmslib/similarity_search
      creating build/temp.linux-x86_64-3.5/nmslib/similarity_search/src
      creating build/temp.linux-x86_64-3.5/nmslib/similarity_search/src/method
      creating build/temp.linux-x86_64-3.5/nmslib/similarity_search/src/space
      x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I./nmslib/similarity_search/include -Iinclude -Iinclude -I/homes/alexandrov/.virtualenvs/pytorch/lib/python3.5/site-packages/numpy/core/include -I/usr/include/python3.5m -I/homes/alexandrov/.virtualenvs/pytorch/include/python3.5m -c nmslib.cc -o build/temp.linux-x86_64-3.5/nmslib.o -O3 -march=native -fopenmp -DVERSION_INFO="1.7.2" -std=c++14 -fvisibility=hidden
      cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
      nmslib.cc:16:31: fatal error: pybind11/pybind11.h: No such file or directory
      compilation terminated.
      error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
    

    Clearly, pybind11 headers were not installed on my machine. This library is not packaged for apt-get (at least not for Ubuntu 16.04), so I needed to manually install from source.

    Would be nice if nmslib install script took care of this.

    opened by taketwo 23
  • Optimized index raises RuntimeError on load when saved with `negdotprod` space

    Optimized index raises RuntimeError on load when saved with `negdotprod` space

    Basically, this is what I am trying to do

    import nmslib
    
    space = 'negdotprod'
    
    vectors = [[1, 2], [3, 4], [5, 6]]
    
    index = nmslib.init(space=space, method='hnsw')
    index.addDataPointBatch(vectors)
    index.createIndex(
        {'M': 15, 'efConstruction': 200, 'skip_optimized_index': 0, 'post': 0}
    )
    index.saveIndex('test.index')
    
    new_index = nmslib.init(space=space, method='hnsw')
    new_index.loadIndex('test.index')
    

    and it raises

    Check failed: totalElementsStored_ == this->data_.size() The number of stored elements 3 doesn't match the number of data points ! Did you forget to re-load data?
    Traceback (most recent call last):
      File "8.py", line 15, in <module>
        new_index.loadIndex('test.index')
    RuntimeError: Check failed: The number of stored elements 3 doesn't match the number of data points ! Did you forget to re-load data?
    

    If I change space variable to cosinesimil, it works just fine. It seems that data points are not stored, even though hnsw method with skip_optimized_index=0 is used.

    opened by chomechome 22
  • Unable to pip install nmslib, including historic versions

    Unable to pip install nmslib, including historic versions

    Hey sorry to bother you,

    I've been trying to download scispacy via pip on windows 10 using python 3.10.0 today and it keeps failing due to errors about nmslib I've tried pip installing nmslib versions: 1.7.3.6 1.8 2.1.1

    None of them have worked though, curiously. I've had a long look around scispacys github and yours but nothing I've read has given me any solutions.

    I've also flagged it with scispacy on their github. Anyway I have no idea what's going on but just thought I'd let you know. Cheers Kind regards, Chris

    opened by Cbezz 5
  • Strict typing is needed: Using wrong input can cause distances to be all one, e.g., with cosinesimil_sparse/HNSW when calling knnQueryBatch on a dense array

    Strict typing is needed: Using wrong input can cause distances to be all one, e.g., with cosinesimil_sparse/HNSW when calling knnQueryBatch on a dense array

    Hey, I'm trying to use nmslib's HNSW with a csr_matrix containing sparse vectors.

    Creating the index works fine, adding the data and setting query time params too:

        items = ["foo is a kind of thing", "bar is another one", "this bar is a real one!", "I prefer to use a foo"] # etc, len=3000
        similar_items_index = nmslib.init(
            space="cosinesimil_sparse",
            method="hnsw",
            data_type=nmslib.DataType.SPARSE_VECTOR,
            dtype=nmslib.DistType.FLOAT,
        )
        vectorizer = TfidfVectorizer(dtype=np.float32, token_pattern=r"\S+")
        embeddings: csr_matrix = vectorizer.fit_transform(items)
        similar_items_index.addDataPointBatch(embeddings)
        similar_items_index.createIndex({"M": 128, "efConstruction": 32, "post": 2}, print_progress=False)
        similar_items_index.setQueryTimeParams({"ef": 512})
    

    But when I search with knnQueryBatch, all the returned distances are equal to 1:

    similar_items_index.knnQueryBatch([query_embedding], 5)[0]
    

    -> Knn results: ids, with distances all set to 1

    Am I missing something in the proper usage of HNSW with sparse vector data?

    Setup for reproduction
    • This uses the text-similarity data from Kaggle, downloaded in /tmp/. Any other text dataset should be fine, as computing similarity scores is not required to see the problem with returned distances.
    
    import csv
    from typing import Dict
    
    import nmslib
    import numpy as np
    from implicit.evaluation import csr_matrix
    from sklearn.feature_extraction.text import TfidfVectorizer
    
    CSV_PATH = "/tmp/data/"
    
    
    def main():
        similar_items_index = nmslib.init(
            space="cosinesimil_sparse",
            method="hnsw",
            data_type=nmslib.DataType.SPARSE_VECTOR,
            dtype=nmslib.DistType.FLOAT,
        )
        items = set()
        ids: Dict[str, int] = {}
        rids: Dict[int, str] = {}
        similarities = {}
        for file in [
            f"{CSV_PATH}/similarity-test.csv",
            f"{CSV_PATH}/similarity-train.csv",
        ]:
            with open(file) as f:
                reader = csv.reader(f, delimiter=",", quotechar="|")
                header = next(reader)
                for i, l in enumerate(reader):
                    desc_x = l[header.index("description_x")]
                    desc_y = l[header.index("description_y")]
                    similar = bool(l[header.index("same_security")])
                    id = len(items)
                    if desc_x not in items:
                        items.add(desc_x)
                        ids[desc_x] = id
                        rids[id] = desc_x
                        id_x = id
                        id += 1
                    else:
                        id_x = ids[desc_x]
                    if desc_y not in items:
                        items.add(desc_y)
                        ids[desc_y] = id
                        rids[id] = desc_y
                        id_y = id
                        id += 1
                    else:
                        id_y = ids[desc_y]
                    if similar:
                        similarities[id_x] = id_y
                        similarities[id_y] = id_x
             print(f"Loaded {len(items)}, total {len(similarities)/2} pairs of similar queries.")
             vectorizer = TfidfVectorizer(dtype=np.float32, token_pattern=r"\S+")
        embeddings: csr_matrix = vectorizer.fit_transform(items)
        print("Embedded items, adding datapoints..")
        similar_items_index.addDataPointBatch(embeddings)
        print("Creating index..")
        similar_items_index.createIndex({"M": 128, "efConstruction": 32, "post": 2}, print_progress=False)
        print("Setting index query params..")
        similar_items_index.setQueryTimeParams({"ef": 512})
        print("Searching...")
        score = 0
        total_similar = 0
        for item_id, item in enumerate(items):
            query_embedding = vectorizer.transform([item]).getrow(0).toarray()
            top_50, distances = similar_items_index.knnQueryBatch([query_embedding], 50)[0]
            top_50_texts = [rids[t] for t in top_50]
            try:
                expected = similarities[item_id]
                expected_text = rids[expected]
                if expected:
                    score += 1 if expected in top_50 else 0
            except KeyError:
                continue  # No similar noted on this item.
            total_similar += 1
        print(
            f"After querying {len(items)} of which {total_similar}, we found the similar item in the top50 {score} times."
        )
    
    
    if __name__ == "__main__":
        main()
    
    opened by PLNech 6
  • More encompassing approach for Mac M1 chips

    More encompassing approach for Mac M1 chips

    On a Mac architecture, platform.processor may return i386 even when on a Mac M1. The code below should be more accurate. See stack overflow comment, another stack overflow comment and stack overflow post for some more information / validation that the uname approach is more all encompassing.

    I was personally running into this problem and the following fix solved it for me.

    This PR is a slightly edited solution to what is contained in https://github.com/nmslib/nmslib/pull/485 with many thanks to @netj for getting this started.

    opened by JewlsIOB 3
  • Calling setQueryTimeParams results in a SIGSEGV

    Calling setQueryTimeParams results in a SIGSEGV

    Hi there! Trying to perform knnQuery on an indexed csr_matrix, I got the issue reported in #480 from this code:

            model = TfidfVectorizer(dtype=np.float32, token_pattern=r"\S+")
            embeddings = model.fit_transform(corpus_tfidf)
            logger.info(f"Creating vector index from a {len(corpus_tfidf)} corpus embedded as {embeddings.shape}...")
            index = nmslib.init(method="hnsw", space="cosinesimil_sparse", data_type=nmslib.DataType.SPARSE_VECTOR, dtype=nmslib.DistType.FLOAT)
            logger.info("Adding datapoints to index...")
            index.addDataPointBatch(embeddings)
            logger.info("Creating final index...")
            index.createIndex()
    
            logger.info(f"Search neightbors for first embedding {embeddings[0]})
            index.knnQuery(embeddings[0])
    

    As described in #480, this results in an IndexError: tuple index out of range.

    When trying to apply the index.setQueryTimeParams({'efSearch': efS, 'algoType': 'old'}) workaround mentioned in another issue , it results in a segmentation fault.

    I can reproduce it with the following minimal example, looks like even without arguments the call errors:

    index = nmslib.init(method="hnsw", space="cosinesimil_sparse", data_type=nmslib.DataType.SPARSE_VECTOR, dtype=nmslib.DistType.FLOAT)
    print("Setting index queryParams...")
    index.setQueryTimeParams()
    print("Adding datapoints to index...")
    

    ->

    Setting index queryParams...
    Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)
    

    Env info

    • python -V -> Python 3.7.11
    • pip freeze | grep nmslib -> nmslib==2.1.1
    opened by PLNech 3
  • NMSLIB doesn't work on Windows 11

    NMSLIB doesn't work on Windows 11

    Hello,

    We use nmslib as default engine for TensorFlow Similarity due to it's broad compatibility with various OSes. We got multiple reports, and I was able to confirm it, that nmslib don't install on Windows 11, potentially related to the issue #498.

    Do you have any idea if/when you will be able to take a look at this? With the increased adoption of Win11 it become problematic for us.

    Thanks :)

    opened by ebursztein 15
Releases(v2.1.1)
  • v2.1.1(Feb 3, 2021)

    Note: We unfortunately had deployment issues. As a result we had to delete several versions between 2.0.6 and 2.1.1. If you installed one of these versions, please, delete them and install a more recent version (>=2.1.1).

    The current build focuses on:

    1. Providing more efficient ("optimized") implementations for spaces: negdotprod, l1, linf.
    2. Binaries for ARM 64 (aarch64).
    Source code(tar.gz)
    Source code(zip)
  • v2.0.6(Apr 16, 2020)

  • v2.0.5(Nov 7, 2019)

    The main objective of this release to provide binary wheels. For compatibility reasons, we need to stick to basic SSE2 instructions. However, when the Python library is being imported, it prints a message suggesting that a more efficient version can be installed from sources (and tells how to do this).

    Furthermore, this release removes a lot of old code, which speeds up compilation by 70%:

    1. Non-performing methods
    2. Double-indices

    This is a step towards more lightweight NMSLIB library.

    Source code(tar.gz)
    Source code(zip)
  • v1.8.1(Jun 23, 2019)

  • v1.8(Jun 6, 2019)

    This is a clean-up release focusing on several important issues:

    1. Fixing a bug with knnQuery #370
    2. Added a possibility to save/load data efficiently from the Python bindings (and the query server) #356 Python notebooks are updated accordingly
    3. We have bit Jaccard space (many thanks @gregfriedland)
    4. Upgraded the query server to use a recent Apache Thrift
    5. Importantly the documentation is reorganized quite a bit: 5.1 There is now a single entry point for all the docs 5.2 Most of the docs are now online and only fairly technical description of search spaces and methods is in the PDF manual.
    Source code(tar.gz)
    Source code(zip)
  • v1.7.3.6(Oct 4, 2018)

  • v1.7.3.4(Aug 6, 2018)

  • v1.7.3.2(Jul 13, 2018)

  • v1.7.3.1(Jul 9, 2018)

  • v1.7.2(Feb 20, 2018)

    1. Improving concurrency in Python (preventing hanging in a certain situation https://github.com/searchivarius/nmslib/issues/291)
    2. Improving ParallelFor : passing thread ID and not starting threads in a single-thread mode.
    Source code(tar.gz)
    Source code(zip)
  • v1.7(Feb 4, 2018)

  • v1.6(Dec 15, 2016)

    Here are the list of changes for the version 1.6 (manual isn't updated yet):

    We especially thank the following people for the fixes:

    • Bileg Naidan (@bileg)
    • Bob Poekert (@bobpoekert)
    • @orgoro
    1. We simplified the build by excluding the code that required 3rd party code from the core library. In other words, the core library does not have any 3rd party dependencies (not even boost). To build the full version of library you have to run cmake as follows: cmake . -DWITH_EXTRAS=1
    2. It should now be possible to build on MAC.
    3. We improve Python bindings (thanks to @bileg) and their installation process (thanks to @bobpoekert):
      1. We merged our generic and vector bindings into a single module. We upgraded to a more standard installation process via distutils. You can run: python setup.py build and then sudo python setup.py install.
      2. We improved our support for sparse spaces: you can pass data in the form of a numpy sparse array!
      3. There are now batch multi-threaded querying and addition of data.
      4. addDataPoint* functions return a position of an inserted entry. This can be useful if you use function getDataPoint
      5. For examples of using Python API, please, see *.py files in the folder python_bindings.
      6. Note that to execute unit tests you need: python-numpy, python-scipy, and python-pandas.
    4. Because we got rid of boost, we, unfortunately, do not support command-line options WITHOUT arguments. Instead, you have pass values 0 or 1.
    5. However, the utility experiment (experiment.exe) now accepts the option recallOnly. If this option has argument 1, then the only effectiveness metric computed is recall. This is useful for evaluation of HNSW, because (for efficiency reasons) HNSW does not return proper distance values (e.g., for L2 it's a squared distance, not the original one). This makes it impossible to compute effectiveness metrics other than recall (returning wrong distance values would also lead to experiment terminating with an error message).
    6. Additional spaces:
      1. negdotprod_sparse: negative inner (dot) product. This is a sparse space.
      2. querynorm_negdotprod_sparse: query-normalized inner (dot) product, which is the dot product divded by the query norm.
      3. renyi_diverg: Renyi divergence. It has the parameter alpha.
      4. ab_diverg: α-β-divergence. It has two parameters: alpha and beta.
    7. Additional search methods:
      1. simple_invindx: A classical inverted index with a document-at-a-time processing (via a prirority queue). It doesn't have parameters, but works only with the sparse space negdotprod_sparse.
      2. falconn: we ported (created a wrapper for) a June 2016's version of FALCONN library.
        1. Unlike the original implementation, our wrapper works directly with sparse vector spaces as well as with dense vector spaces.
        2. However, our wrapper has to duplicate data twice: so this method is useful mostly as a benchmark.
        3. Our wrapper directly supports a data centering trick, which can boost performance sometimes.
        4. Most parameters (hash_family, cross_polytope, hyperplane, storage_hash_table, num_hash_bits, num_hash_tables, num_probes, num_rotations, seed, feature_hashing_dimension) merely map to FALCONN parameters.
        5. Setting additional parameters norm_data and center_data tells us to center and normalize data. Our implementation of the centering (which is done unfortunately before the hashing trick is applied) for sparse data is horribly inefficient, so we wouldn't recommend using it. Besides, it doesn't seem to improve results. Just in case, the number of sprase dimensions used for centering is controlled by the parameter max_sparse_dim_to_center.
        6. Our FALCONN wrapper would normally use the distance provided by NMSLIB, but you can force using FALCONN's distance function implementation by setting: use_falconn_dist to 1.
    Source code(tar.gz)
    Source code(zip)
  • v1.5.3(Jul 11, 2016)

  • v1.5.2(Jul 2, 2016)

  • v1.5.1(Jun 1, 2016)

  • v1.5(May 20, 2016)

    1. A new efficient method: a hierarchical (navigable) small-world graph (HNSW), contributed by Yury Malkov (@yurymalkov). Works with g++, Visual Studio, Intel Compiler, but doesn't work with Clang yet.
    2. A query server, which can have clients in C++, Java, Python, and other languages supported by Apache Thrift
    3. Python bindings for vector and non-vector spaces
    4. Improved performance of two core methods SW-graph and NAPP
    5. Better handling of the gold standard data in the benchmarking utility experiment
    6. Updated API that permits search methods to serialize indices
    7. Improved documentation (e.g., we added tuning guidelines for best methods)
    Source code(tar.gz)
    Source code(zip)
Medical Image Segmentation using Squeeze-and-Expansion Transformers

Medical Image Segmentation using Squeeze-and-Expansion Transformers Introduction This repository contains the code of the IJCAI'2021 paper 'Medical Im

askerlee 172 Dec 20, 2022
Self-supervised spatio-spectro-temporal represenation learning for EEG analysis

EEG-Oriented Self-Supervised Learning and Cluster-Aware Adaptation This repository provides a tensorflow implementation of a submitted paper: EEG-Orie

Wonjun Ko 4 Jun 09, 2022
[CVPR 2022] Thin-Plate Spline Motion Model for Image Animation.

[CVPR2022] Thin-Plate Spline Motion Model for Image Animation Source code of the CVPR'2022 paper "Thin-Plate Spline Motion Model for Image Animation"

yoyo-nb 1.4k Dec 30, 2022
PFENet: Prior Guided Feature Enrichment Network for Few-shot Segmentation (TPAMI).

PFENet This is the implementation of our paper PFENet: Prior Guided Feature Enrichment Network for Few-shot Segmentation that has been accepted to IEE

DV Lab 230 Dec 31, 2022
code from "Tensor decomposition of higher-order correlations by nonlinear Hebbian plasticity"

Code associated with the paper "Tensor decomposition of higher-order correlations by nonlinear Hebbian learning," Ocker & Buice, Neurips 2021. "plot_f

Gabriel Koch Ocker 4 Oct 16, 2022
Multi-modal Text Recognition Networks: Interactive Enhancements between Visual and Semantic Features

Multi-modal Text Recognition Networks: Interactive Enhancements between Visual and Semantic Features | paper | Official PyTorch implementation for Mul

48 Dec 28, 2022
iNAS: Integral NAS for Device-Aware Salient Object Detection

iNAS: Integral NAS for Device-Aware Salient Object Detection Introduction Integral search design (jointly consider backbone/head structures, design/de

顾宇超 77 Dec 02, 2022
JugLab 33 Dec 30, 2022
Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT

CheXbert: Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT CheXbert is an accurate, automated dee

Stanford Machine Learning Group 51 Dec 08, 2022
Generating Images with Recurrent Adversarial Networks

Generating Images with Recurrent Adversarial Networks Python (Theano) implementation of Generating Images with Recurrent Adversarial Networks code pro

Daniel Jiwoong Im 121 Sep 08, 2022
Volumetric parameterization of the placenta to a flattened template

placenta-flattening A MATLAB algorithm for volumetric mesh parameterization. Developed for mapping a placenta segmentation derived from an MRI image t

Mazdak Abulnaga 12 Mar 14, 2022
Federated learning on graph, especially on graph neural networks (GNNs), knowledge graph, and private GNN.

Federated learning on graph, especially on graph neural networks (GNNs), knowledge graph, and private GNN.

keven 198 Dec 20, 2022
All-in-one Docker container that allows a user to explore Nautobot in a lab environment.

Nautobot Lab This container is not for production use! Nautobot Lab is an all-in-one Docker container that allows a user to quickly get an instance of

Nautobot 29 Sep 16, 2022
A Fast Knowledge Distillation Framework for Visual Recognition

FKD: A Fast Knowledge Distillation Framework for Visual Recognition Official PyTorch implementation of paper A Fast Knowledge Distillation Framework f

Zhiqiang Shen 129 Dec 24, 2022
Rest API Written In Python To Classify NSFW Images.

Rest API Written In Python To Classify NSFW Images.

Wahyusaputra 2 Dec 23, 2021
Simple node deletion tool for onnx.

snd4onnx Simple node deletion tool for onnx. I only test very miscellaneous and limited patterns as a hobby. There are probably a large number of bugs

Katsuya Hyodo 6 May 15, 2022
This is an official implementation of the High-Resolution Transformer for Dense Prediction.

High-Resolution Transformer for Dense Prediction Introduction This is the official implementation of High-Resolution Transformer (HRT). We present a H

HRNet 403 Dec 13, 2022
A highly modular PyTorch framework with a focus on Neural Architecture Search (NAS).

UniNAS A highly modular PyTorch framework with a focus on Neural Architecture Search (NAS). under development (which happens mostly on our internal Gi

Cognitive Systems Research Group 19 Nov 23, 2022
Representing Long-Range Context for Graph Neural Networks with Global Attention

Graph Augmentation Graph augmentation/self-supervision/etc. Algorithms gcn gcn+virtual node gin gin+virtual node PNA GraphTrans Augmentation methods N

UC Berkeley RISE 67 Dec 30, 2022
这是一个deeplabv3-plus-pytorch的源码,可以用于训练自己的模型。

DeepLabv3+:Encoder-Decoder with Atrous Separable Convolution语义分割模型在Pytorch当中的实现 目录 性能情况 Performance 所需环境 Environment 注意事项 Attention 文件下载 Download 训练步骤

Bubbliiiing 350 Dec 28, 2022