Query multiple mongoDB database collections easily

Overview

leakscoop

Perform queries across multiple MongoDB databases and collections, where the field names and the field content structure in each database may vary.

The Problem

Suppose you've got two database collections, "leak1" and "leak2"

In leak1, the schema looks like this:

FIRST_NAME: "JOHN"
LAST_NAME: "DOE"

and in leak2, the schema looks like this:

FName: "John"
LName: "Doe"

A simple program to iterate through all your collections and perform queries wouldn't work, because:

  • the field names are different. Notice that in leak1, the first name field is FIRST_NAME, while in leak2, the first name field is named FName.
  • the field values might be structured differently. In leak1, everything is captialized. In leak2, it's all title-case.

This program lets you write a configuration for each collection, specifying, in JSON, how to query each field.

It's a work in progress, but so far, it works pretty well. It'll probably be easier to understand if you take a look at the config files under ./collections/. Each JSON file under ./collections/ should be an array of objects. The program automatically processes all JSON files under that directory.

Some more info for how the configurations work can be found in the wiki.

Example Usage:

Find all records of a guy named John Doe.

python3 -m dev --firstname John --lastname Doe

Each database will be searched, and results will be put into a new file under ./results/

Find all records for someone with an address of "1234 NW Long St" python3 -m dev --address "1234 NW long st"

Adding a zipcode to the end, or a state/province might speed up the query (depending on how you index your databases)

Screenshot

(information redacted for this person's privacy)

image Configured fields will print to the console, while all the other fields in a result will be saved under ./results/.

Owner
bagel
bagel
SAP HANA Connector in pure Python

SAP HANA Database Client for Python A pure Python client for the SAP HANA Database based on the SAP HANA Database SQL Command Network Protocol. pyhdb

SAP 299 Nov 20, 2022
Lazydata: Scalable data dependencies for Python projects

lazydata: scalable data dependencies lazydata is a minimalist library for including data dependencies into Python projects. Problem: Keeping all data

629 Nov 21, 2022
Pystackql - Python wrapper for StackQL

pystackql - Python Library for StackQL Python wrapper for StackQL Usage from pys

StackQL Studios 6 Jul 01, 2022
aiomysql is a library for accessing a MySQL database from the asyncio

aiomysql aiomysql is a "driver" for accessing a MySQL database from the asyncio (PEP-3156/tulip) framework. It depends on and reuses most parts of PyM

aio-libs 1.5k Jan 03, 2023
Application which allows you to make PostgreSQL databases with Python

Automate PostgreSQL Databases with Python Application which allows you to make PostgreSQL databases with Python I used the psycopg2 library which is u

Marc-Alistair Coffi 0 Dec 31, 2021
High level Python client for Elasticsearch

Elasticsearch DSL Elasticsearch DSL is a high-level library whose aim is to help with writing and running queries against Elasticsearch. It is built o

elastic 3.6k Jan 03, 2023
A pythonic interface to Amazon's DynamoDB

PynamoDB A Pythonic interface for Amazon's DynamoDB. DynamoDB is a great NoSQL service provided by Amazon, but the API is verbose. PynamoDB presents y

2.1k Dec 30, 2022
Python MYSQL CheatSheet.

Python MYSQL CheatSheet Python mysql cheatsheet. Install Required Windows(WAMP) Download and Install from HERE Linux(LAMP) install packages. sudo apt

Mohammad Dori 4 Jul 15, 2022
Logica is a logic programming language that compiles to StandardSQL and runs on Google BigQuery.

Logica: language of Big Data Logica is an open source declarative logic programming language for data manipulation. Logica is a successor to Yedalog,

Evgeny Skvortsov 1.5k Dec 30, 2022
Find graph motifs using intuitive notation

d o t m o t i f Find graph motifs using intuitive notation DotMotif is a library that identifies subgraphs or motifs in a large graph. It looks like t

APL BRAIN 45 Jan 02, 2023
Py2neo is a comprehensive toolkit for working with Neo4j from within Python applications or from the command line.

Py2neo v3 Py2neo is a client library and toolkit for working with Neo4j from within Python applications and from the command line. The core library ha

64 Oct 14, 2022
PyMongo - the Python driver for MongoDB

PyMongo Info: See the mongo site for more information. See GitHub for the latest source. Documentation: Available at pymongo.readthedocs.io Author: Mi

mongodb 3.7k Jan 08, 2023
Redis client for Python asyncio (PEP 3156)

Redis client for Python asyncio. Redis client for the PEP 3156 Python event loop. This Redis library is a completely asynchronous, non-blocking client

Jonathan Slenders 554 Dec 04, 2022
sync/async MongoDB ODM, yes.

μMongo: sync/async ODM μMongo is a Python MongoDB ODM. It inception comes from two needs: the lack of async ODM and the difficulty to do document (un)

Scille 428 Dec 29, 2022
TileDB-Py is a Python interface to the TileDB Storage Engine.

TileDB-Py TileDB-Py is a Python interface to the TileDB Storage Engine. Quick Links Installation Build Instructions TileDB Documentation Python API re

TileDB, Inc. 149 Nov 28, 2022
An asyncio compatible Redis driver, written purely in Python. This is really just a pet-project for me.

asyncredis An asyncio compatible Redis driver. Just a pet-project. Information asyncredis is, like I've said above, just a pet-project for me. I reall

Vish M 1 Dec 25, 2021
Making it easy to query APIs via SQL

Shillelagh Shillelagh (ʃɪˈleɪlɪ) is an implementation of the Python DB API 2.0 based on SQLite (using the APSW library): from shillelagh.backends.apsw

Beto Dealmeida 207 Dec 30, 2022
Python Wrapper For sqlite3 and aiosqlite

Python Wrapper For sqlite3 and aiosqlite

6 May 30, 2022
Class to connect to XAMPP MySQL Database

MySQL-DB-Connection-Class Class to connect to XAMPP MySQL Database Basta fazer o download o mysql_connect.py e modificar os parâmetros que quiser. E d

Alexandre Pimentel 4 Jul 12, 2021
PubMed Mapper: A Python library that map PubMed XML to Python object

pubmed-mapper: A Python Library that map PubMed XML to Python object 中文文档 1. Philosophy view UML Programmatically access PubMed article is a common ta

灵魂工具人 33 Dec 08, 2022