Measure file similarity in a many-to-many fashion

Overview

Mesi

Lint and Test codecov PyPI PyPI - Downloads License


Mesi is a tool to measure the similarity in a many-to-many fashion of long-form documents like Python source code or technical writing. The output can be useful in determining which of a collection of files are the most similar to each other.

Installation

Python 3.9+ and pipx are recommended, although Python 3.6+ and/or pip will also work.

pipx install mesi

If you'd like to test out Mesi before installing it, use the remote execution feature of pipx, which will temporarily download Mesi and run it in an isolated virtual environment.

pipx run mesi --help

Usage

For a directory structure that looks like:

lab-one
├── StudentOne
│   ├── pyproject.toml
│   ├── deliverables
│   │   └── python_program.py
│   └── README.md
├── StudentTwo
│   ├── pyproject.toml
│   ├── deliverables
│   │   └── python_program.py
│   └── README.md
│

where similarity should be measured between each student's deliverables/python_program.py file, run the command:

mesi lab-one/*/deliverables/python_program.py

A lower distance in the produced table equates to a higher degree of similarity.

See the help menu (mesi --help) for additional options and configuration.

Algorithms

There are many algorithms to choose from when comparing string similarity! Mesi implements all the algorithms provided by TextDistance. In general levenshtein is never a bad choice, which is why it is the default.

Bugs/Requests

Please use the GitHub issue tracker to submit bugs or request new features, options, or algorithms.

Dependencies

Mesi uses two primary dependencies for text similarity calculation: polyleven, and TextDistance. Polyleven is the default, as its singular implementation of Levenshtein distance can be faster in most situations. However, if a different edit distance algorithm is requested, TextDistance's implementations will be used.

License

Distributed under the terms of the GPL v3 license, mesi is free and open source software.

You might also like...
File-manager - A basic file manager, written in Python

File Manager A basic file manager, written in Python. Installation Install Pytho

Two scripts help you to convert csv file to md file by template

Two scripts help you to convert csv file to md file by template. One help you generate multiple md files with different filenames from the first colume of csv file. Another can generate one md file with several blocks.

A simple Python code that takes input from a csv file and makes it into a vcf file.

Contacts-Maker A simple Python code that takes input from a csv file and makes it into a vcf file. Imagine a college or a large community where each y

This program can help you to move and rename many files at once
This program can help you to move and rename many files at once

This program can help you to rename and save many files in a folder in seconds, but don't give the same name to files, it can delete both files.

Object-oriented file system path manipulation

path (aka path pie, formerly path.py) implements path objects as first-class entities, allowing common operations on files to be invoked on those path

An object-oriented approach to Python file/directory operations.

Unipath An object-oriented approach to file/directory operations Version: 1.1 Home page: https://github.com/mikeorr/Unipath Docs: https://github.com/m

File support for asyncio

aiofiles: file support for asyncio aiofiles is an Apache2 licensed library, written in Python, for handling local disk files in asyncio applications.

Object-oriented file system path manipulation

path (aka path pie, formerly path.py) implements path objects as first-class entities, allowing common operations on files to be invoked on those path

A platform independent file lock for Python
A platform independent file lock for Python

py-filelock This package contains a single module, which implements a platform independent file lock in Python, which provides a simple way of inter-p

Releases(v1.1.0)
  • v1.1.0(Dec 8, 2021)

    This release adds an --average option, which prints the average distance computed against the given files after the individual comparison distances. An unimplemented --distribution option, intended for use in the future, was also added.

    Source code(tar.gz)
    Source code(zip)
  • v1.0.2(Oct 26, 2021)

    This release adds more documentation, a --version option, and removes the bright white table styling from previous versions that were hard to read on light-themed terminals.

    Full Changelog: https://github.com/Michionlion/mesi/compare/v1.0.1...v1.0.2

    Source code(tar.gz)
    Source code(zip)
  • v1.0.1(Oct 26, 2021)

    This release fixes a bug where in some situations, some distinct parts of compared file names were not displayed. Additionally, a new option, --table-format, was added. This allows configuration of the output table format, using tabulate's table formats.

    Full Changelog: https://github.com/Michionlion/mesi/compare/v1.0.0...v1.0.1

    Source code(tar.gz)
    Source code(zip)
  • v1.0.0(Oct 25, 2021)

    Mesi is a tool to check the similarity between many different files; this is its first release, with support for many basic features and lots of text similarity/distance algorithms. You can even try out Mesi without downloading it!

    pipx run mesi --help
    

    If you don't have pipx installed, you may need to install it, but that's something you should do anyways. :smile:

    Source code(tar.gz)
    Source code(zip)
Owner
GatorEducator
Software tools developed at Allegheny College for computer science courses
GatorEducator
ZipFly is a zip archive generator based on zipfile.py

ZipFly is a zip archive generator based on zipfile.py. It was created by Buzon.io to generate very large ZIP archives for immediate sending out to clients, or for writing large ZIP archives without m

Buzon 506 Jan 04, 2023
File-manager - A basic file manager, written in Python

File Manager A basic file manager, written in Python. Installation Install Pytho

Samuel Ko 1 Feb 05, 2022
Extract longest transcript or longest CDS transcript from GTF annotation file or gencode transcripts fasta file.

Extract longest transcript or longest CDS transcript from GTF annotation file or gencode transcripts fasta file.

laojunjun 13 Nov 23, 2022
shred - A cross-platform library for securely deleting files beyond recovery.

shred Help the project financially: Donate: https://smartlegion.github.io/donate/ Yandex Money: https://yoomoney.ru/to/4100115206129186 PayPal: https:

4 Sep 04, 2021
Measure file similarity in a many-to-many fashion

Mesi Mesi is a tool to measure the similarity in a many-to-many fashion of long-form documents like Python source code or technical writing. The outpu

GatorEducator 3 Feb 02, 2022
Lumar - Smart File Creator

Lumar is a free tool for creating and managing files. With Lumar you can quickly create any type of file, add a file content and file size. With Lumar you can also find out if Photoshop or other imag

Paul - FloatDesign 3 Dec 10, 2021
BREP : Binary Search in plaintext and gzip files

BREP : Binary Search in plaintext and gzip files Search large files in O(log n) time using binary search. We support plaintext and Gzipped files. Benc

Arnaud de Saint Meloir 5 Dec 24, 2021
Ini adalah program python untuk mengubah background foto dalam 1 folder, tidak perlu satu satu

Myherokuapp my web drive You can see my web drive and can request film/Application do you want in here my blog you can visit my blog RemBg ini adalah

XnuxersXploitXen 13 Dec 01, 2022
Sheet Data Image/PDF-to-CSV Converter

Sheet Data Image/PDF-to-CSV Converter

Quy Truong 5 Nov 22, 2021
A small Python module for determining appropriate platform-specific dirs, e.g. a "user data dir".

the problem What directory should your app use for storing user data? If running on macOS, you should use: ~/Library/Application Support/AppName If

ActiveState Software 948 Dec 31, 2022
A platform independent file lock for Python

py-filelock This package contains a single module, which implements a platform independent file lock in Python, which provides a simple way of inter-p

Benedikt Schmitt 497 Jan 05, 2023
This project is a set of programs that I use to create a README.md file.

🤖 codex-readme 📜 codex-readme What is it? This project is a set of programs that I use to create a README.md file. How does it work? It reads progra

Tom Dörr 224 Jan 07, 2023
The best way to convert files on your computer, be it .pdf to .png, .pdf to .docx, .png to .ico, or anything you can imagine.

The best way to convert files on your computer, be it .pdf to .png, .pdf to .docx, .png to .ico, or anything you can imagine.

JareBear 2 Nov 20, 2021
ValveVMF - A python library to parse Valve's VMF files

ValveVMF ValveVMF is a Python library for parsing .vmf files for the Source Engi

pySourceSDK 2 Jan 02, 2022
Publicly Open Amazon AWS S3 Bucket Viewer

S3Viewer Publicly open storage viewer (Amazon S3 Bucket, Azure Blob, FTP server, HTTP Index Of/) s3viewer is a free tool for security researchers that

Sharon Brizinov 377 Dec 02, 2022
Python function to stream unzip all the files in a ZIP archive: without loading the entire ZIP file or any of its files into memory at once

Python function to stream unzip all the files in a ZIP archive: without loading the entire ZIP file or any of its files into memory at once

Department for International Trade 206 Jan 02, 2023
FUSE filesystem Python scripts for Nintendo console files

ninfs (formerly fuse-3ds) is a FUSE program to extract data from Nintendo game consoles. It works by presenting a virtual filesystem with the contents of your games, NAND, or SD card contents, and yo

Ian Burgwin 343 Jan 02, 2023
dotsend is a web application which helps you to upload your large files and share file via link

dotsend is a web application which helps you to upload your large files and share file via link

Devocoe 0 Dec 03, 2022
A python script to pull the transactions of an Algorand wallet and put them into a CSV file.

AlgoCSV A python script to pull the transactions of an Algorand wallet and put them into a CSV file. Dependancies: Requests Main features: Groups: Com

21 Jun 25, 2022
A tool written in python to generate basic repo files from github

A tool written in python to generate basic repo files from github

Riley 7 Dec 02, 2021