kawadi is a versatile tool that used as a form of weapon and is used to cut, shape and split wood.

Overview

kawadi

Build Status Code Coverage kawadi

kawadi (કવાડિ in Gujarati) (Axe in English) is a versatile tool that used as a form of weapon and is used to cut, shape and split wood.

kawadi is collection of small tools that I found useful for me more often. Currently it contains text search which searches a string inside another string.

Text Search

Text search in kawadi uses sliding window technique to search for a word or phrase in another text. The step size in the sliding window is 1 and the window size is the size of the word/phrase we are interested in.

For example, if the text we are interested in searching is "The big brown fox jumped over the lazy dog" and the word that we want to search is "brown fox".

len(["brown", ["fox"]]) slides = sliding_window(text, window_size) -> ['The', 'big']['big', 'brown']['brown', 'fox']['fox', 'jumped']['jumped', 'over']['over', 'the']['the', 'lazy']['lazy', 'dog'] for each slide in slides score(" ".join(slide), interested_word) if score >= threshold then select slide else continue ">
text = "The big brown fox jumped over the lazy dog"
interested_word = "brown fox"
window_size = len(interested.split()) -> len(["brown", ["fox"]])

slides = sliding_window(text, window_size) -> ['The', 'big']['big', 'brown']['brown', 'fox']['fox', 'jumped']['jumped', 'over']['over', 'the']['the', 'lazy']['lazy', 'dog']

for each slide in slides
  score(" ".join(slide), interested_word)
  if score >= threshold then
    select slide
  else
    continue

Currently, there are 3 similarity scores are calculated and averaged to calculate the final score. These similarity scores are Cosine, JaroWinkler and Normalized Levinstine similarities.

In development

  • Add functionality to accept custom user similarity metrics.
  • [] Generate documentation.
  • [] Write the custom counter

You can follow the project development in the Projects tab.

Quick Start

from kawadi.text_search import SearchInText

search = SearchInText()

text_to_find = "String distance algorithm"
text_to_search = """SIFT4 is a general purpose string distance algorithm inspired by JaroWinkler and Longest Common Subsequence. It was developed to produce a distance measure that matches as close as possible to the human perception of string distance. Hence it takes into account elements like character substitution, character distance, longest common subsequence etc. It was developed using experimental testing, and without theoretical background."""

result = search.find_in_text(text_to_find, text_to_search)

print(result)
[
    {
        "sim_score": 1.0,
        "searched_text": "string distance algorithm",
        "to_find": "string distance algorithm",
        "start": 27,
        "end": 52,
    }
]

If the text that needs to be searched is big, SearchInText can utilize multiprocessing to make the search fast.

from kawadi.text_search import SearchInText

search = SearchInText(multiprocessing=True, max_workers=8)

Custom user defined score calculation.

Its often the case that the provided string similarity score is not enough for the use case that you may have. For this very case, you can add, your own score calculation.

from kawadi.text_search import SearchInText


def my_custom_fun(**kwargs):

  slide_of_text:str = kwargs["slide_of_text"]
  text_to_find:str = kwargs["text_to_find"]

  # Here you can then go on to do preprocessing if you like,
  # or use them to count char based n-gram string matching scores.

  return score: float

search = SearchInText(search_threshold=0.9, custom_score_func= your custom func)

This custom score function will have access to two things slide_of_text for every slide in text (From the example above, "The big", "big brown" and so on...) and text_to_find.

Note: The return type of this custom function should be same as the type of search_threshold as you can see from the above example.

Installation

Stable Release: pip install kawadi
Development Head: pip install git+https://github.com/jdvala/kawadi.git

Development

See CONTRIBUTING.md for information related to developing the code.

Free software: MIT license

You might also like...
A simple tool to extract python code from a Jupyter notebook, and then run pylint on it for static analysis.

Jupyter Pylinter A simple tool to extract python code from a Jupyter notebook, and then run pylint on it for static analysis. If you find this tool us

A python tool give n number of inputs and parallelly you will get a output by separetely

http-status-finder Hello Everyone!! This is kavisurya, In this tool you can give n number of inputs and parallelly you will get a output by separetely

NetConfParser is a tool that helps you analyze the rpcs coming and going from a netconf client to a server

NetConfParser is a tool that helps you analyze the rpcs coming and going from a netconf client to a server

Basic loader is a small tool that will help you generating Cloudflare cookies

Basic Loader Cloudflare cookies loader This tool may help some people getting valide cloudflare cookies Installation 🔌 : pip install -r requirements.

Python tool to check a web applications compliance with OWASP HTTP response headers best practices

Check Your Head A quick and easy way to check a web applications response headers!

A tool for testing improper put method vulnerability
A tool for testing improper put method vulnerability

Putter-CUP A tool for testing improper put method vulnerability Usage :- python3 put.py -f live-subs.txt Result :- The result in txt file "result.txt"

Tool for generating Memory.scan() compatible instruction search patterns

scanpat Tool for generating Frida Memory.scan() compatible instruction search patterns. Powered by r2. Examples $ ./scanpat.py arm.ks:64 'sub sp, sp,

Stubmaker is an easy-to-use tool for generating python stubs.

Stubmaker is an easy-to-use tool for generating python stubs. Requirements Stubmaker is to be run under Python 3.7.4+ No side effects during

PyHook is an offensive API hooking tool written in python designed to catch various credentials within the API call.
PyHook is an offensive API hooking tool written in python designed to catch various credentials within the API call.

PyHook is the python implementation of my SharpHook project, It uses various API hooks in order to give us the desired credentials. PyHook Uses

Comments
  • :sparkles: Accept custom user defined score functions

    :sparkles: Accept custom user defined score functions

    This pull request adds feature to accept user defined custom score function for calculating string similarity.

    Pull request recommendations:

    • [x] Name your pull request your-development-type/short-description. Ex: feature/read-tiff-files
    • [ ] Link to any relevant issue in the PR description. Ex: Resolves [gh-12], adds tiff file format support
    • [x] Provide context of changes.
    • [x] Provide relevant tests for your feature or bug fix.
    • [x] Provide or update documentation for any feature added by your pull request.

    Thanks for contributing!

    opened by jdvala 1
Releases(0.0.2)
  • 0.0.2(Oct 31, 2021)

    What's Changed

    • :memo: Update README.md by @jdvala in https://github.com/jdvala/kawadi/pull/1
    • :memo: Fix README.md by @jdvala in https://github.com/jdvala/kawadi/pull/2
    • :memo: Rename repo to Kawadi by @jdvala in https://github.com/jdvala/kawadi/pull/3
    • :sparkles: Accept custom user defined score functions by @jdvala in https://github.com/jdvala/kawadi/pull/4
    • :memo: update setup.py to read README for long description. by @jdvala in https://github.com/jdvala/kawadi/pull/5

    New Contributors

    • @jdvala made their first contribution in https://github.com/jdvala/kawadi/pull/1

    Full Changelog: https://github.com/jdvala/kawadi/commits/0.0.2

    Source code(tar.gz)
    Source code(zip)
  • 0.0.1(Oct 31, 2021)

    What's Changed

    • :memo: Update README.md by @jdvala in https://github.com/jdvala/kawadi/pull/1
    • :memo: Fix README.md by @jdvala in https://github.com/jdvala/kawadi/pull/2
    • :memo: Rename repo to Kawadi by @jdvala in https://github.com/jdvala/kawadi/pull/3
    • :sparkles: Accept custom user defined score functions by @jdvala in https://github.com/jdvala/kawadi/pull/4
    • :memo: update setup.py to read README for long description. by @jdvala in https://github.com/jdvala/kawadi/pull/5

    New Contributors

    • @jdvala made their first contribution in https://github.com/jdvala/kawadi/pull/1

    Full Changelog: https://github.com/jdvala/kawadi/commits/0.0.1

    Source code(tar.gz)
    Source code(zip)
Owner
Jay Vala
Data Scientist at scoutbee
Jay Vala
A python script to generate wallpaper

wallpaper eits Warning You need to set the path to Robot Mono font in the source code. (Settings are in the main function) Usage A script that given a

Henrique Tsuyoshi Yara 5 Dec 02, 2021
convert a dict-list object from / to a typed object(class instance with type annotation)

objtyping 带类型定义的对象转换器 由来 Python不是强类型语言,开发人员没有给数据定义类型的习惯。这样虽然灵活,但处理复杂业务逻辑的时候却不够方便——缺乏类型检查可能导致很难发现错误,在IDE里编码时也没

Song Hui 15 Dec 22, 2022
Implicit hierarchical a posteriori error estimates in FEniCSx

FEniCSx Error Estimation (FEniCSx-EE) Description FEniCSx-EE is an open source library showing how various error estimation strategies can be implemen

Jack S. Hale 1 Dec 08, 2021
Package that allows for validate and sanitize of string values.

py.validator A library of string validators and sanitizers Insipired by validator.js Strings only This library validates and sanitizes strings only. P

Sanel Hadzini 22 Nov 08, 2022
PyGMT - A Python interface for the Generic Mapping Tools

PyGMT A Python interface for the Generic Mapping Tools Documentation (development version) | Contact | Try Online Why PyGMT? A beautiful map is worth

The Generic Mapping Tools (GMT) 564 Dec 28, 2022
This script allows you to retrieve all functions / variables names of a Python code, and the variables values.

Memory Extractor This script allows you to retrieve all functions / variables names of a Python code, and the variables values. How to use it ? The si

Venax 2 Dec 26, 2021
A collection of common regular expressions bundled with an easy to use interface.

CommonRegex Find all times, dates, links, phone numbers, emails, ip addresses, prices, hex colors, and credit card numbers in a string. We did the har

Madison May 1.5k Dec 31, 2022
This project is a set of programs that I use to create a README.md file.

This project is a set of programs that I use to create a README.md file.

Tom Dörr 223 Dec 24, 2022
A python module for extract domains

A python module for extract domains

Fayas Noushad 4 Aug 10, 2022
A simple package for handling variables in string.

A simple package for handling string variables. Welcome! This is a simple package for handling variables in string, You can add or remove variables wi

1 Dec 31, 2021
Find version automatically based on git tags and commit messages.

GIT-CONVENTIONAL-VERSION Find version automatically based on git tags and commit messages. The tool is very specific in its function, so it is very fl

0 Nov 07, 2021
Runes - Simple Cookies You Can Extend (similar to Macaroons)

Runes - Simple Cookies You Can Extend (similar to Macaroons) is a paper called "Macaroons: Cookies with Context

Rusty Russell 22 Dec 11, 2022
✨ Un code pour voir les disponibilités des vaccins contre le covid totalement fait en Python par moi, et en français.

Vaccine Notifier ❗ Un chois aléatoire d'un article sur Wikipedia totalement fait en Python par moi, et en français. 🔮 Grâce a une requète API, on peu

MrGabin 3 Jun 06, 2021
A sys-botbase client for remote control automation of Nintendo Switch consoles. Based on SysBot.NET, written in python.

SysBot.py A sys-botbase client for remote control automation of Nintendo Switch consoles. Based on SysBot.NET, written in python. Setup: Download the

7 Dec 16, 2022
A python module to validate input.

A python module to validate input.

Matthias 6 Sep 13, 2022
JavaScript to Python Translator & JavaScript interpreter written in 100% pure Python🚀

Pure Python JavaScript Translator/Interpreter Everything is done in 100% pure Python so it's extremely easy to install and use. Supports Python 2 & 3.

Piotr Dabkowski 2.1k Dec 30, 2022
jsoooooooon derulo - Make sure your 'jason derulo' is featured as the first part of your json data

jsonderulo Make sure your 'jason derulo' is featured as the first part of your json data Install: # python pip install jsonderulo poetry add jsonderul

jesse 3 Sep 13, 2021
HeadHunter parser

HHparser Description Program for finding work at HeadHunter service Features Find job Parse vacancies Dependencies python pip geckodriver firefox Inst

memphisboy 1 Oct 30, 2021
Script to rename and resize folders of images

script to rename and resize folders of images

Tega Brain 2 Oct 29, 2021
Check username

Checker-Oukee Check username It checks the available usernames and creates a new account for them Doesn't need proxies Create a file with usernames an

4 Jun 05, 2022