create custom test databases that are populated with fake data

Overview

Screenshot


Latest Version Status

About

Generate fake but valid data filled databases for test purposes using most popular patterns(AFAIK). Current support is sqlite, mysql, postgresql, mongodb, redis, couchdb.

Installation

The installation through pypi retrieves 'fake-factory' as a main dependency.

pip install fake2db

Optional requirements

PostgreSQL
pip install psycopg2

For psycopg2 to install you need pg_config in your system.

On Mac, the solution is to install postgresql:

brew install postgresql

On CentOS, the solution is to install postgresql-devel:

sudo yum install postgresql-devel

Mongodb

pip install pymongo

Redis

pip install redis

MySQL

mysql connector is needed for mysql db generation:

http://dev.mysql.com/downloads/connector/python/

CouchDB

pip install couchdb

Usage

--rows argument is pretty clear :) integer

--db argument takes 6 possible options : sqlite, mysql, postgresql, mongodb, redis, couchdb

--name argument is OPTIONAL. When it is absent fake2db will name db's randomly.

--host argument is OPTIONAL. Hostname to use for database connection. Not used for sqlite.

--port argument is OPTIONAL. Port to use for database connection. Not used for sqlite.

--username argument is OPTIONAL. Username for the database user.

--password argument is OPTIONAL. Password for database user. Only supported for mysql & postgresql.

--locale argument is OPTIONAL. The localization of data to be generated ('en_US' as default).

--seed argument is OPTIONAL. Integer for seeding random generator to produce the same data set between runs. Note: uuid4 values still generated randomly.

fake2db --rows 200 --db sqlite

fake2db --rows 1500 --db postgresql --name test_database_postgre

fake2db --db postgresql --rows 2500 --host container.local --password password --user docker

fake2db --rows 200 --db sqlite --locale cs_CZ --seed 1337

In addition to the databases supported in the db argument, you can also run fake2db with FoundationDB SQL Layer. Once SQL Layer is installed, simply use the postgresql generator and specify the SQL Layer port. For example:

fake2db --rows --db postgresql --port 15432

Custom Database Generation

If you want to create a custom db/table, you have to provide --custom parameter followed by the column item you want. At the point in time, i mapped all the possible column items you can use here:

https://github.com/emirozer/fake2db/blob/master/fake2db/custom.py

Feed any keys you want to the custom flag:

fake2db.py --rows 250 --db mysql --username mysql --password somepassword --custom name date country

fake2db.py --rows 1500 --db mysql --password randompassword --custom currency_code credit_card_full credit_card_provider

fake2db.py --rows 20 --db mongodb --custom name date country

Sample output - SQLite

Screenshot

Screenshot

Screenshot

Issues
  • Removed psycopg/pymongo dependency

    Removed psycopg/pymongo dependency

    Fixes #19

    opened by mauricioabreu 11
  • [WIP] Docker support

    [WIP] Docker support

    The makefile helps with the build and publish. πŸ˜„

    opened by brunowego 7
  • Change import paths to fix #31

    Change import paths to fix #31

    The code works again after I changed the import paths as shown here. This fixes #31 for me.

    opened by vzhong 6
  • FoundationDB SQL Layer Support

    FoundationDB SQL Layer Support

    Modified to parameterize the hostname and port for PostgreSQL, SQLite, and MongoDB.

    opened by jenschelkopf 6
  • Allow faster data loading.

    Allow faster data loading.

    I would like to create a table with a million rows or more to stress test a service I am evaluating.

    Using a server hosted by aws (rds mysql 5.6), I'm currently using no cpu on my macbook so it is definitely network bound.

    I'm able to load about 3000 rows per 5 minutes. This speed is really slow when you would like to generate a very large table (e.g. many millions of rows)

    Some possible suggestions:

    1. bulk insert
    2. save to a csv, and then call a sql command to load from csv
    3. create a pool of connections and insert through each connection
    opened by thomasdziedzic 5
  • Error when running cli

    Error when running cli

    I tried with and without a virtualenv for dependency isolation, using the following command:

    fake2db --db postgresql --host 0.0.0.0 --port 5632 --password password --username user --rows 10 --name mydb

    Which gave me:

    File "/usr/local/bin/fake2db", line 7, in <module>
        from fake2db.fake2db import main
      File "/usr/local/lib/python3.5/site-packages/fake2db/fake2db.py", line 6, in <module>
        from custom import faker_options_container
    ImportError: No module named 'custom'
    

    But the custom.py file exists, it's just not finding it. Installed using pip 9.0.1 and python 2.7.10.

    sudo seems to work to bypass the error. Not sure why it needs it, but this is probably unrelated to the specific package, so I'll close this.

    opened by christabor 5
  • Allow custom schemata

    Allow custom schemata

    Let us specify (via a schema.json file or something similar) a custom schemata. Something compatible with https://github.com/topliceanu/mongoose-gen would be great.

    opened by Californian 5
  • Missing faker_options_container

    Missing faker_options_container

    SUMMARY: Followed README, attempted to run fake2db.py but encounter missing dependencies. The following replicates my steps to include confirmation of dependencies:

    ERROR: _ Traceback (most recent call last): File "fake2db.py", line 6, in from .custom import faker_options_container ValueError: Attempted relative import in non-package _

    STEPS:

    # pip install fake2db Requirement already satisfied: fake2db in /Users/jason/Documents/Scripts/fake2db Requirement already satisfied: fake-factory>=0.5.3 in /usr/local/lib/python2.7/site-packages (from fake2db)

    # pip install psycopg2 Requirement already satisfied: psycopg2 in /usr/local/lib/python2.7/site-packages

    # cd /Users/jason/Documents/Scripts/fake2db *# ls LICENSE docs fake2db.egg-info setup.py README.md fake2db requirements.txt

    # pip install -r requirements.txt Obtaining file:///Users/jason/Documents/Scripts/fake2db (from -r requirements.txt (line 3)) Requirement already satisfied: fake-factory>=0.5.3 in /usr/local/lib/python2.7/site-packages (from fake2db==0.5.2->-r requirements.txt (line 3)) Installing collected packages: fake2db Found existing installation: fake2db 0.5.2 Uninstalling fake2db-0.5.2: Successfully uninstalled fake2db-0.5.2 Running setup.py develop for fake2db Successfully installed fake2db

    # cd fake2db # ls init.py custom.py mongodb_handler.py redis_handler.py base_handler.py fake2db.py mysql_handler.py sqlite_handler.py couchdb_handler.py helpers.py postgresql_handler.py

    # python fake2db.py --db postgresql --host 127.0.0.1 --port 5432 --password fakepassword --username postgres --name TESTDB --rows 100 --custom address

    Traceback (most recent call last): File "fake2db.py", line 6, in from .custom import faker_options_container ValueError: Attempted relative import in non-package

    opened by JasonBrannon 4
  • Update README.md

    Update README.md

    opened by tunavargi 4
  • AttributeError: 'NoneType' object has no attribute 'execute'

    AttributeError: 'NoneType' object has no attribute 'execute'

    Cool idea; wanted to play with this.

    OSX 10.9.5 MySQL 5.5.38 Python 2.7.5

    Installed http://dev.mysql.com/downloads/connector/python/

    Opened new terminal (iTerm2) session

    Ran $ fake2db --rows 200 --db mysql

    Got:

    Traceback (most recent call last):
      File "/usr/local/bin/fake2db", line 8, in <module>
        load_entry_point('fake2db==0.1.5', 'console_scripts', 'fake2db')()
      File "/Library/Python/2.7/site-packages/fake2db/fake2db.py", line 106, in main
        fake_mysql_handler.fake2db_mysql_initiator(host, port, int(args.rows))
      File "/Library/Python/2.7/site-packages/fake2db/mysql_handler.py", line 42, in fake2db_mysql_initiator
        cursor.execute(tables[key])
    AttributeError: 'NoneType' object has no attribute 'execute'
    

    Any thoughts? Thanks!

    opened by ericdorsey 3
  • --database option

    --database option

    When using against postgresql the database name is expected to be the same as the --username

    opened by SteveChurch 0
  • Allow multiple columns with same faker key OR allow column naming

    Allow multiple columns with same faker key OR allow column naming

    I found myself needing to do the following to replicate a table structure:

    fake2db --rows 1500 --db mysql --name=test_speed --username root --password secret --custom date_time_this_year random_digit_not_null random_digit_not_null uuid4 word boolean boolean boolean random_number random_number word last_name word word word word last_name year
    

    Then I found out that I couldn't use duplicate faker keys for my columns, so using random_digit_not_null twice is not possible.

    I wrote some code in the mysql handler to append keys to the columns in order to allow duplicates (see this commit https://github.com/denitsa-cm/fake2db/commit/73df91998d6c3929e90d5c211f61b765c560510d)

    I do think the mysql_handler is probably not the best place for this -> the unique columns should somehow be formatted further up and passed to all handlers. But it's the first time I'm touching python so I just hacked it a bit for my purposes.

    Then the command above would result in a table structure like this:

    image

    Would be nice if something like this was supported out of the box for all database engines. Or perhaps even better, the option to name the columns in addition to providing the faker keys. This would alleviate the issue altogether.

    opened by denitsa-md 0
  • database already exists

    database already exists

    I inserted 10000 to the db named fake. Then I want to insert another 10000. But error happened.

    fake2db --rows 10000 --db postgresql --name fake --username postgres 2019-03-18 09:55:19,907 www Rows argument : 20000 2019-03-18 09:55:20,036 www database "fake" already exists

    Traceback (most recent call last): File "/home/www/pyenv/aws/bin/fake2db", line 11, in sys.exit(main()) File "/home/www/pyenv/aws/local/lib/python2.7/site-packages/fake2db/fake2db.py", line 167, in main number_of_rows=args.rows, name=args.name, custom=custom) File "/home/www/pyenv/aws/local/lib/python2.7/site-packages/fake2db/postgresql_handler.py", line 18, in fake2db_initiator cursor, conn = self.database_caller_creator(number_of_rows, **connection_kwargs) File "/home/www/pyenv/aws/local/lib/python2.7/site-packages/fake2db/postgresql_handler.py", line 46, in database_caller_creator cur.execute('CREATE DATABASE %s;' % dbname) psycopg2.ProgrammingError: database "fake" already exists

    Thanks for your great work.

    opened by jjuu 1
  • custom table name and number

    custom table name and number

    It would be good if --custom let us choose several custom table and let us name them.

    opened by noraj 0
  • How to get auto-increment ID and specify custom column names?

    How to get auto-increment ID and specify custom column names?

    Hi,

    I just ran this command:

    $ fake2db --rows 3 --db sqlite --custom name date country
    

    And the result that I got:

    db

    I have 2 questions here:

    1. How do I get auto-increment ID? For example, Francis get id 1 and Robert get id 3?
    2. How do I rename the custom column names? For example, full_name instead of name and birth_date instead of date.

    Thank you in advance for your help. This project is very cool and helpful. πŸ™‚

    opened by zulhfreelancer 0
  • Populate an existing database + schema?

    Populate an existing database + schema?

    Hey there – is there any way we could run this against an existing database (with a flag or something is fine) and not get errors? We'd like to add this to our migrations in our dev environments, but we end up having to create a new database, then import it into the primary db.

    We'd be happy to make adjustments in a PR if you want to outline the easiest / best way to do so. Thanks :)

    opened by dwelch2344 1
  • mongodb hanging on 1 million rows

    mongodb hanging on 1 million rows

    trying to insert 1 million rows into mongodb but it hangs

    opened by deeTEEcee 5
Releases(1.0.0)
Owner
Emir Ozer
polyglot programmer & distributed systems engineer.
Emir Ozer
Faker is a Python package that generates fake data for you.

Faker is a Python package that generates fake data for you. Whether you need to bootstrap your database, create good-looking XML documents, fill-in yo

Daniele Faraglia 13.6k Jan 23, 2022
Mixer -- Is a fixtures replacement. Supported Django, Flask, SqlAlchemy and custom python objects.

The Mixer is a helper to generate instances of Django or SQLAlchemy models. It's useful for testing and fixture replacement. Fast and convenient test-

Kirill Klenov 798 Jan 26, 2022
Green is a clean, colorful, fast python test runner.

Green -- A clean, colorful, fast python test runner. Features Clean - Low redundancy in output. Result statistics for each test is vertically aligned.

Nathan Stocks 718 Jan 12, 2022
splinter - python test framework for web applications

splinter - python tool for testing web applications splinter is an open source tool for testing web applications using Python. It lets you automate br

Cobra Team 2.5k Jan 23, 2022
A test fixtures replacement for Python

factory_boy factory_boy is a fixtures replacement based on thoughtbot's factory_bot. As a fixtures replacement tool, it aims to replace static, hard t

FactoryBoy project 2.7k Jan 21, 2022
create custom test databases that are populated with fake data

About Generate fake but valid data filled databases for test purposes using most popular patterns(AFAIK). Current support is sqlite, mysql, postgresql

Emir Ozer 2.1k Jan 27, 2022
Fully Automated YouTube Channel ▢️with Added Extra Features.

Fully Automated Youtube Channel β–’β–ˆβ–€β–€β–ˆ β–ˆβ–€β–€β–ˆ β–€β–€β–ˆβ–€β–€ β–€β–€β–ˆβ–€β–€ β–ˆβ–‘β–‘β–ˆ β–ˆβ–€β–€β–„ β–ˆβ–€β–€ β–ˆβ–€β–€β–ˆ β–’β–ˆβ–€β–€β–„ β–ˆβ–‘β–‘β–ˆ β–‘β–‘β–ˆβ–‘β–‘ β–‘β–’β–ˆβ–‘β–‘ β–ˆβ–‘β–‘β–ˆ β–ˆβ–€β–€β–„ β–ˆβ–€β–€ β–ˆβ–„β–„β–€ β–’β–ˆβ–„β–„β–ˆ β–€β–€β–€β–€ β–‘β–‘β–€β–‘β–‘ β–‘β–’β–ˆβ–‘β–‘ β–‘β–€β–€β–€ β–€β–€β–€β–‘

sam-sepiol 57 Jan 23, 2022
Mimesis is a high-performance fake data generator for Python, which provides data for a variety of purposes in a variety of languages.

Mimesis - Fake Data Generator Description Mimesis is a high-performance fake data generator for Python, which provides data for a variety of purposes

Isaak Uchakaev 3.5k Jan 25, 2022
Mimesis is a high-performance fake data generator for Python, which provides data for a variety of purposes in a variety of languages.

Mimesis - Fake Data Generator Description Mimesis is a high-performance fake data generator for Python, which provides data for a variety of purposes

Isaak Uchakaev 3.5k Jan 24, 2022
dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.

dbd: database prototyping tool dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL d

Zdenek Svoboda 20 Jan 20, 2022
Faker is a Python package that generates fake data for you.

Faker is a Python package that generates fake data for you. Whether you need to bootstrap your database, create good-looking XML documents, fill-in yo

Daniele Faraglia 13.6k Jan 21, 2022
Faker is a Python package that generates fake data for you.

Faker is a Python package that generates fake data for you. Whether you need to bootstrap your database, create good-looking XML documents, fill-in yo

Daniele Faraglia 13.6k Jan 23, 2022
A library for generating fake data and populating database tables.

Knockoff Factory A library for generating mock data and creating database fixtures that can be used for unit testing. Table of content Installation Ch

Nike Inc. 28 Oct 20, 2021
Faker is a Python package that generates fake data for you.

Faker is a Python package that generates fake data for you. Whether you need to bootstrap your database, create good-looking XML documents, fill-in yo

Daniele Faraglia 13.6k Jan 28, 2022
FakeDataGen is a Full Valid Fake Data Generator.

FakeDataGen is a Full Valid Fake Data Generator. This tool helps you to create fake accounts (in Spanish format) with fully valid data. Within this in

Joel GM 32 Jan 22, 2022
Ethereum ETL lets you convert blockchain data into convenient formats like CSVs and relational databases.

Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions.

Blockchain ETL 1.5k Jan 21, 2022
A discord bot consuming Notion API to add, retrieve data to Notion databases.

Notion-DiscordBot A discord bot consuming Notion API to add and retrieve data from Notion databases. Instructions to use the bot: Pre-Requisites: a)In

Servatom 33 Jan 18, 2022
Anomaly detection on SQL data warehouses and databases

With CueObserve, you can run anomaly detection on data in your SQL data warehouses and databases. Getting Started Install via Docker docker run -p 300

Cuebook 139 Jan 27, 2022
The OHSDI OMOP Common Data Model allows for the systematic analysis of healthcare observational databases.

The OHSDI OMOP Common Data Model allows for the systematic analysis of healthcare observational databases.

Bell Eapen 9 Jan 4, 2022
Estoult - a Python toolkit for data mapping with an integrated query builder for SQL databases

Estoult Estoult is a Python toolkit for data mapping with an integrated query builder for SQL databases. It currently supports MySQL, PostgreSQL, and

halcyon[nouveau] 11 Nov 5, 2021
Django project starter on steroids: quickly create a Django app AND generate source code for data models + REST/GraphQL APIs (the generated code is auto-linted and has 100% test coverage).

Create Django App ?? We're a Django project starter on steroids! One-line command to create a Django app with all the dependencies auto-installed AND

imagine.ai 69 Jan 10, 2022
Buy early bsc gems with custom gas fee, slippage, amount. Auto approve token after buy. Sell buyed token with custom gas fee, slippage, amount. And more.

Pancakeswap Sniper bot Full version of Pancakeswap sniping bot used to snipe during fair coin launches. With advanced options and a graphical user int

Jesus Crypto 185 Jan 25, 2022
Custom Weapons 3 attribute support for Custom Weapons X

CW3toX Allows use of Custom Weapons 3 attributes in Custom Weapons X. Requiremen

null 1 Dec 29, 2021
Nautobot-custom-jobs - Custom jobs for Nautobot

nautobot-custom-jobs This repo contains custom jobs for Nautobot. Installation P

Dan Peachey 3 Jan 18, 2022
a plugin for py.test that changes the default look and feel of py.test (e.g. progressbar, show tests that fail instantly)

pytest-sugar pytest-sugar is a plugin for pytest that shows failures and errors instantly and shows a progress bar. Requirements You will need the fol

Teemu 818 Jan 23, 2022
This is our ARTS test set, an enriched test set to probe Aspect Robustness of ABSA.

This is the repository for our 2020 paper "Tasty Burgers, Soggy Fries: Probing Aspect Robustness in Aspect-Based Sentiment Analysis". Data We provide

null 31 Oct 26, 2021
Pynguin, The PYthoN General UnIt Test geNerator is a test-generation tool for Python

Pynguin, the PYthoN General UnIt test geNerator, is a tool that allows developers to generate unit tests automatically.

Chair of Software Engineering II, Uni Passau 861 Jan 23, 2022
Official Pytorch implementation of Test-Agnostic Long-Tailed Recognition by Test-Time Aggregating Diverse Experts with Self-Supervision.

This repository is the official Pytorch implementation of Test-Agnostic Long-Tailed Recognition by Test-Time Aggregating Diverse Experts with Self-Supervision.

vanint 54 Jan 19, 2022