MongoDB utility to inflate the contents of small collection to a new larger collection

Overview

MongoDB Data Inflater ("data-inflater")

The data-inflater tool is a MongoDB utility to automate the creation of a new large database collection using data sourced from an existing smaller database collection.

By default, the utility will use the Atlas 'sample data set' database collection sample_mflix.movies as the source collection. However, most users will provide parameters to the utility to specify the use of their own database and source collection. If you do want to use the Atlas sample data set, see the sample data manual page for more information.

The data-inflater utility issues multiple concurrent aggregation processes, each copying batches of records in parallel for increased performance. The resulting collection will contain documents with duplicated data but with new unique _id field values. The variance ratio of data in the new collection will approximately reflect the variance ratio of the source collection. Therefore, you should ensure you have supplied at least a few different documents (if not a few hundred or thousand) in the source collection.

If you are running a sharded cluster, the utility will ensure the target collection is sharded with a shard key, and where it can, it will pre-split the chunks to avoid subsequent needless balancer overhead. For example, if you specify the --shardkey parameter for this utility to reference a field (e.g. product_name) as the range based shard key, before creating the target collection, the utility will introspect the spread of values for the shard key field (e.g. product_name). The utility will then create pre-split chunks in the new empty target collection before any data is copied to it, to maximise performance.

How To Run

In a running MongoDB cluster (self-managed or running in Atlas), ensure you have created and populated a source collection with at least one sample record in it (ideally more with varying values for the fields across the different documents to reflect the shape and variance you desire).

Ensure Python3 (version 3.8 or greater) and the MongoDB Python Driver (PyMongo) are already installed on your workstation. Example to install PyMongo:

pip3 install --user pymongo

Ensure the .py script is executable and then execute the following to view the utility's help instructions and the full list of parameters that you can provide:

./data-inflater.py -h

Execute the following to connect to a locally running single server database (default port) to copy and expand the data from an existing source collection, mydb.mySrcColl, to an a new collection, mydb.myDestColl, which will contain 1 million records:

./data-inflater.py --url 'mongodb://localhost:27017' -d 'mydb' -c 'mySrcColl' -t 'myDestColl' -s 1000000

Execute the following to connect to an Atlas cluster (ensure you've already loaded the Atlas sample data set), to inflate the data from the source movies collection to the new movies_big collection, which will contain 100 million records (note, first change the URL username, password and hostname shown, to match the URL of your Atlas cluster):

./data-inflater.py --url 'mongodb+srv://usr:[email protected]/'
Owner
Paul Done
Paul Done
Script to decrypt / import chromium (edge/chrome) cookies

Cloonie Script to decrypt / import chromium (edge/chrome) cookies. Requirements Install the python dependencies via pip: pip install -r requirements.t

Lorenzo Bernardi 5 Sep 13, 2022
A toolkit for writing and executing automation scripts for Final Fantasy XIV

XIV Scripter This is a tool for scripting out series of actions in FFXIV. It allows for custom actions to be defined in config.yaml as well as custom

Jacob Beel 1 Dec 09, 2021
🌲 A simple BST (Binary Search Tree) generator written in python

Tree-Traversals (BST) 🌲 A simple BST (Binary Search Tree) generator written in python Installation Use the package manager pip to install BST. Usage

Jan Kupczyk 1 Dec 12, 2021
A library from RCTI+ to handle RabbitMQ tasks (connect, send, receive, etc) in Python.

Introduction A library from RCTI+ to handle RabbitMQ tasks (connect, send, receive, etc) in Python. Requirements Python =3.7.3 Pika ==1.2.0 Aio-pika

Dali Kewara 1 Feb 05, 2022
This code renames subtitle file names to your video files names, so you don't need to rename them manually.

Rename Subtitle This code renames your subtitle file names to your video file names so you don't need to do it manually Note: It only works for series

Mostafa Kazemi 4 Sep 12, 2021
BOLT12 Lightning Address Format

BOLT12 Address Support (DRAFT!) Inspired by the awesome lightningaddress.com, except for BOLT12: Supports BOLT12 Allows BOLT12 vendor string authentic

Rusty Russell 28 Sep 14, 2022
Let's renew the puzzle collection. We'll produce a collection of new puzzles out of the lichess game database.

Let's renew the puzzle collection. We'll produce a collection of new puzzles out of the lichess game database.

Thibault Duplessis 96 Jan 03, 2023
Automatically Generate Rulesets for IIS for Intelligent HTTP/S C2 Redirection

Automatically Generate Rulesets for IIS for Intelligent HTTP/S C2 Redirection This project converts a Cobalt Strike profile to a functional web.config

Jesse 99 Dec 13, 2022
Course-parsing - Parsing Course Info for NIT Kurukshetra

Parsing Course Info for NIT Kurukshetra Overview This repository houses code for

Saksham Mittal 3 Feb 03, 2022
✨ Un pierre feuille ciseaux totalement fait en Python par moi, et en français.

Pierre Feuille Ciseaux ❗ Un pierre feuille ciseaux totalement fait en Python par moi. 🔮 Avec l'utilisation du module "random", j'ai pu faire un choix

MrGabin 3 Jun 06, 2021
EVE-NG tools, A Utility to make operations with EVE-NG more friendly.

EVE-NG tools, A Utility to make operations with EVE-NG more friendly. Also it support different snapshot operations with same style as Libvirt/KVM

Bassem Aly 8 Jan 05, 2023
Pampy: The Pattern Matching for Python you always dreamed of.

Pampy: Pattern Matching for Python Pampy is pretty small (150 lines), reasonably fast, and often makes your code more readable and hence easier to rea

Claudio Santini 3.5k Jan 06, 2023
Definitely legit social credit generator with python

definitely-legit-social-credit-generator I made this simple GUI program for a meme, no cap. Video: https://youtu.be/RmjxKtoli04 How to run: Clone this

Joshua Malabanan 8 Nov 01, 2021
[P]ython [w]rited [B]inary [C]onverter

pwbinaryc [P]ython [w]rited [Binary] [C]onverter You have rights to: Modify the code and use it private (friends are allowed too) Make a page and redi

0 Jun 21, 2022
Group imports from Windows binaries

importsort This is a tool that I use to group imports from Windows binaries. Sometimes, you have a gigantic folder full of executables, and you want t

【☆ ゆう ☆ 】 15 Aug 27, 2022
Color box that provides various colors‘ rgb decimal code.

colorbox Color box that provides various colors‘ rgb decimal code

1 Dec 07, 2021
Helper script to bootstrap a Python environment with the tools required to build and install packages.

python-bootstrap Helper script to bootstrap a Python environment with the tools required to build and install packages. Usage $ python -m bootstrap.bu

Filipe Laíns 7 Oct 06, 2022
ZX Spectrum Utilities: (zx-spectrum-utils)

Here are a few utility programs that can be used with the zx spectrum. The ZX Spectrum is one of the first home computers from the early 1980s.

Graham Oakes 4 Mar 07, 2022
A Python class for checking the status of an enabled Minecraft server

mcstatus provides an easy way to query Minecraft servers for any information they can expose. It provides three modes of access (query, status and ping), the differences of which are listed below in

Nathan Adams 1.1k Jan 06, 2023
一款不需要买代理来减少扫网站目录被封概率的扫描器,适用于中小规格字典。

PoorScanner使用说明书 -工具在不同环境下可能不怎么稳定,如果有什么问题恳请大家反馈。说明书有什么错误的地方也大家欢迎指正。 更新记录 2021.8.23 修复了云函数主程序 gitee上传文件接口写错了的BUG(之前把自己的上传地址写死进去了,没从配置文件里读) 更新了说明书 PoorS

14 Aug 02, 2022