MongoDB utility to inflate the contents of small collection to a new larger collection

Overview

MongoDB Data Inflater ("data-inflater")

The data-inflater tool is a MongoDB utility to automate the creation of a new large database collection using data sourced from an existing smaller database collection.

By default, the utility will use the Atlas 'sample data set' database collection sample_mflix.movies as the source collection. However, most users will provide parameters to the utility to specify the use of their own database and source collection. If you do want to use the Atlas sample data set, see the sample data manual page for more information.

The data-inflater utility issues multiple concurrent aggregation processes, each copying batches of records in parallel for increased performance. The resulting collection will contain documents with duplicated data but with new unique _id field values. The variance ratio of data in the new collection will approximately reflect the variance ratio of the source collection. Therefore, you should ensure you have supplied at least a few different documents (if not a few hundred or thousand) in the source collection.

If you are running a sharded cluster, the utility will ensure the target collection is sharded with a shard key, and where it can, it will pre-split the chunks to avoid subsequent needless balancer overhead. For example, if you specify the --shardkey parameter for this utility to reference a field (e.g. product_name) as the range based shard key, before creating the target collection, the utility will introspect the spread of values for the shard key field (e.g. product_name). The utility will then create pre-split chunks in the new empty target collection before any data is copied to it, to maximise performance.

How To Run

In a running MongoDB cluster (self-managed or running in Atlas), ensure you have created and populated a source collection with at least one sample record in it (ideally more with varying values for the fields across the different documents to reflect the shape and variance you desire).

Ensure Python3 (version 3.8 or greater) and the MongoDB Python Driver (PyMongo) are already installed on your workstation. Example to install PyMongo:

pip3 install --user pymongo

Ensure the .py script is executable and then execute the following to view the utility's help instructions and the full list of parameters that you can provide:

./data-inflater.py -h

Execute the following to connect to a locally running single server database (default port) to copy and expand the data from an existing source collection, mydb.mySrcColl, to an a new collection, mydb.myDestColl, which will contain 1 million records:

./data-inflater.py --url 'mongodb://localhost:27017' -d 'mydb' -c 'mySrcColl' -t 'myDestColl' -s 1000000

Execute the following to connect to an Atlas cluster (ensure you've already loaded the Atlas sample data set), to inflate the data from the source movies collection to the new movies_big collection, which will contain 100 million records (note, first change the URL username, password and hostname shown, to match the URL of your Atlas cluster):

./data-inflater.py --url 'mongodb+srv://usr:[email protected]/'
Owner
Paul Done
Paul Done
Use generator for range function

Use the generator for the range function! installation method: pip install yrange How to use: First import yrange in your application. You can then wo

1 Oct 28, 2021
JavaScript to Python Translator & JavaScript interpreter written in 100% pure Python🚀

Pure Python JavaScript Translator/Interpreter Everything is done in 100% pure Python so it's extremely easy to install and use. Supports Python 2 & 3.

Piotr Dabkowski 2.1k Dec 30, 2022
pydsinternals - A Python native library containing necessary classes, functions and structures to interact with Windows Active Directory.

pydsinternals - Directory Services Internals Library A Python native library containing necessary classes, functions and structures to interact with W

Podalirius 36 Dec 14, 2022
Python based utilities for interacting with digital multimeters that are built on the FS9721-LP3 chipset.

Python based utilities for interacting with digital multimeters that are built on the FS9721-LP3 chipset.

Fergus 1 Feb 02, 2022
A meme error handler for python

Pwython OwO what's this? Pwython is project aiming to fill in one of the biggest problems with python, which is that it is slow lacks owoified text. N

SystematicError 23 Jan 15, 2022
Every 2 minutes, check for visa slots at VFS website

vfs-visa-slot-germany Every 2 minutes, check for visa slots at VFS website. If there are any, send a call and a message of the format: Sent from your

12 Dec 15, 2022
Protect your eyes from eye strain using this simple and beautiful, yet extensible break reminder

Protect your eyes from eye strain using this simple and beautiful, yet extensible break reminder

Gobinath 1.2k Jan 01, 2023
A repo for working with and building daos

DAO Mix DAO Mix About How to DAO No Code Tools Getting Started Prerequisites Installation Usage On-Chain Governance Example Off-Chain governance Examp

Brownie Mixes 86 Dec 19, 2022
jsoooooooon derulo - Make sure your 'jason derulo' is featured as the first part of your json data

jsonderulo Make sure your 'jason derulo' is featured as the first part of your json data Install: # python pip install jsonderulo poetry add jsonderul

jesse 3 Sep 13, 2021
We provide useful util functions. When adding a util function, please add a description of the util function.

Utils Collection Motivation When we implement codes, we often search for util functions that are already implemented. Here, we are going to share util

6 Sep 09, 2021
A script copies movie and TV files to your GD drive, or create Hard Link in a seperate dir, in Emby-happy struct.

torcp A script copies movie and TV files to your GD drive, or create Hard Link in a seperate dir, in Emby-happy struct. Usage: python3 torcp.py -h Exa

ccf2012 105 Dec 22, 2022
A simple gpsd client and python library.

gpsdclient A small and simple gpsd client and library Installation Needs Python 3 (no other dependencies). If you want to use the library, use pip: pi

Thomas Feldmann 33 Nov 24, 2022
Hide new MacBook Pro notch with black wallpaper.

Hide new MacBook Pro notch with black wallpaper.

Wang Chao 1 Oct 27, 2021
Python library to decorate and beautify strings

outputformat Python library to decorate and beautify your standard output 💖 Ins

Felipe Delestro Matos 259 Dec 13, 2022
a simple function that randomly generates and applies console text colors

ChangeConsoleTextColour a simple function that randomly generates and applies console text colors This repository corresponds to my Python Functions f

Mariya 6 Sep 20, 2022
MITRE ATT&CK Lookup Tool

MITRE ATT&CK Lookup Tool attack-lookup is a tool that lets you easily check what Tactic, Technique, or Sub-technique ID maps to what name, and vice ve

Curated Intel 33 Nov 22, 2022
JavaScript-style async programming for Python.

promisio JavaScript-style async programming for Python. Examples Create a promise-based async function using the promisify decorator. It works on both

Miguel Grinberg 191 Dec 30, 2022
The git for the Python Story Utility Package library.

SUP The git for the Python Story Utility Package library. Installation: Install SUP by simply running pip install psup in your terminal. Check out our

Enoki 6 Nov 27, 2022
A python package for your Kali Linux distro that find the fastest mirror and configure your apt to use that mirror

Kali Mirror Finder Using Single Python File A python package for your Kali Linux distro that find the fastest mirror and configure your apt to use tha

MrSingh 6 Dec 12, 2022
A collection of utility functions to prototype geometry processing research in python

gpytoolbox This repo is a work in progress and contains general utility functions I have needed to code while trying to work on geometry process resea

Silvia Sellán 73 Jan 06, 2023