🎐 a python library for doing approximate and phonetic matching of strings.

Last update: Dec 21, 2022

Overview

jellyfish

Jellyfish is a python library for doing approximate and phonetic matching of strings.

Written by James Turk <[email protected]> and Michael Stephens.

See https://github.com/jamesturk/jellyfish/graphs/contributors for contributors.

See http://jellyfish.readthedocs.io for documentation.

Source is available at http://github.com/jamesturk/jellyfish.

Jellyfish >= 0.7 only supports Python 3, if you need Python 2 please use 0.6.x.

Included Algorithms

String comparison:

Levenshtein Distance
Damerau-Levenshtein Distance
Jaro Distance
Jaro-Winkler Distance
Match Rating Approach Comparison
Hamming Distance

Phonetic encoding:

American Soundex
Metaphone
NYSIIS (New York State Identification and Intelligence System)
Match Rating Codex

Example Usage

>>> import jellyfish
>>> jellyfish.levenshtein_distance(u'jellyfish', u'smellyfish')
2
>>> jellyfish.jaro_distance(u'jellyfish', u'smellyfish')
0.89629629629629637
>>> jellyfish.damerau_levenshtein_distance(u'jellyfish', u'jellyfihs')
1

>>> jellyfish.metaphone(u'Jellyfish')
'JLFX'
>>> jellyfish.soundex(u'Jellyfish')
'J412'
>>> jellyfish.nysiis(u'Jellyfish')
'JALYF'
>>> jellyfish.match_rating_codex(u'Jellyfish')
'JLLFSH'

Running Tests

If you are interested in contributing to Jellyfish, you may want to run tests locally. Jellyfish uses tox to run tests, which you can setup and run as follows:

pip install tox
# cd jellyfish/
tox

🎐 a python library for doing approximate and phonetic matching of strings.

Related tags

Overview

jellyfish

Included Algorithms

Example Usage

Running Tests

Owner

James Turk

Geometry-Consistent Neural Shape Representation with Implicit Displacement Fields

Textpipe: clean and extract metadata from text

☀️ Measuring the accuracy of BBC weather forecasts in Honolulu, USA

Product-Review-Summarizer - Created a product review summarizer which clustered thousands of product reviews and summarized them into a maximum of 500 characters, saving precious time of customers and helping them make a wise buying decision.

تولید اسم های رندوم فینگیلیش

Code for the paper "Flexible Generation of Natural Language Deductions"

This repository has a implementations of data augmentation for NLP for Japanese.

Coreference resolution for English, French, German and Polish, optimised for limited training data and easily extensible for further languages

Stack based programming language that compiles to x86_64 assembly or can alternatively be interpreted in Python

Words_And_Phrases - Just a repo for useful words and phrases that might come handy in some scenarios. Feel free to add yours

HuggingTweets - Train a model to generate tweets

SimBERT升级版（SimBERTv2）！

Paradigm Shift in NLP - "Paradigm Shift in Natural Language Processing".

Simple, Fast, Powerful and Easily extensible python package for extracting patterns from text, with over than 60 predefined Regular Expressions.

Py65 65816 - Add support for the 65C816 to py65

code for modular summarization work published in ACL2021 by Krishna et al

TweebankNLP - Pre-trained Tweet NLP Pipeline (NER, tokenization, lemmatization, POS tagging, dependency parsing) + Models + Tweebank-NER

leaking paid token generator that was a shit lmao for 100$ haha

Legal text retrieval for python

ChessCoach is a neural network-based chess engine capable of natural-language commentary.