Picka: A Python module for data generation and randomization.

Last update: Nov 30, 2021

Related tags

Overview

Picka: A Python module for data generation and randomization.

Author:	Anthony Long
Version:	1.0.1 - Fixed the broken image stuff. Whoops

What is Picka?

Picka generates randomized data for testing.

Data is generated both from a database of known good data (which is included), or by generating realistic data (valid), using string formatting (behind the scenes).

Picka has a function for any field you would need filled in. With selenium, something like would populate the "field-name-here" box for you, 100 times with random names.

for x in xrange(101):
        self.selenium.type('field-name-here', picka.male_name())

But this is just the beginning. Other ways to implement this, include using dicts:

user_information = {
        "first_name": picka.male_name(),
        "last_name": picka.last_name(),
        "email_address": picka.email(10, extension='example.org'),
        "password": picka.password_numerical(6),
}

This would provide:

{
        "first_name": "Jack",
        "last_name": "Logan",
        "email_address": "[email protected]",
        "password": "485444"
}

Don't forget, since all of the data is considered "clean" or valid - you can also use it to fill selects and other form fields with pre-defined values. For example, if you were to generate a state; picka.state() the result would be "Alabama". You can use this result to directly select a state in an address drop-down box.

Examples:

Selenium

def search_for_garbage():
        selenium.open('http://yahoo.com')
        selenium.type('id=search_box', picka.random_string(10))
        selenium.submit()

def test_search_for_garbage_results():
        search_for_garbage()
        selenium.wait_for_page_to_load('30000')
        assert selenium.get_xpath_count('id=results') == 0

Webdriver

driver = webdriver.Firefox()
driver.get("http://somesite.com")
x = {
        "name": [
                "#name",
                picka.name()
        ]
}
driver.find_element_by_css_selector(
        x["name"][0]).send_keys(x["name"][1]
)

Funcargs / pytest

def pytest_generate_tests(metafunc):
        if "test_string" in metafunc.funcargnames:
                for i in range(10):
                        metafunc.addcall(funcargs=dict(numiter=picka.random_string(20)))

def test_func(test_string):
        assert test_string.isalpha()
        assert len(test_string) == 20

MySQL / SQLite

first, last, age = picka.first_name(), picka.last_name(), picka.age()
cursor.execute(
   "insert into user_data (first_name, last_name, age) VALUES (?, ?, ?)",
   (first, last, age)
)

HTTP

def post(host, data):
        http = httplib.HTTP(host)
        return http.send(data)

def test_post_result():
        post("www.spam.egg/bacon.htm", picka.random_string(10))

Comments

No test suite

Slightly ironic, a test data generation toolkit which doesnt have a test suite.

Also setup.py doesnt declare Python 3 support, hence the need for a test suite to validate it works correctly.

opened by jayvdb 1
Additional Functionality for Testers to Add Their Own Data

Picka provides general data for testing. Leveraging this effort provides custom test data. Test data is not limited to just preconfigured values when it's possible to add custom test data. Data can be accessed sequentially, randomly or completely.

opened by bkuehlhorn 1
Fixed test file, added alternative sentence maker
Fixed usage of number in tests (it takes one arg, not two)

Added sentence_actual, which returns an actual sentence from the Sherlock text.

Added _picka._Book class to hold the text and split sentences read from Sherlock. Users can call sentence() without reading the entire file again and again.

Added test of sentence_actual to picka.tests

The sentence_actual function has some nice features:

You're much less likely to get a sentence fragment

You can specify a minimum and maximum number of words

It should be relatively efficient, because the split sentences are cached by the _Book class.

The sentences aren't always perfect, but I think that has to do with the source. A book other than Sherlock Holmes, preferably one with less dialog, would give more "normal" sentences.
opened by TadLeonard 1
Library does not take locale into account
The library assumes an English locale is used (e.g., English-language hardcoded month names). Ideally the library would use locale-dependent constants so that computations are done correctly (e.g., the duration of a month in month_and_day):

>>> locale.setlocale(locale.LC_ALL, 'it_IT') 'it_IT' >>> picka.month() 'Marzo' >>> picka.month_and_day() 'Maggio 2'
opened by svisser 0
picka.age will return ages outside of the bounds

If I call picka.age(1, 1) repeatedly I get 1 and 2 as results. I would have expected it to always return 1. Note that this situation can occur when passing variables to picka.age, I don't expect people to write this in their code themselves.

I can also get ages outside of the bounds when I call picka.age(0, 1) which resorts to using the default values and can therefore return any age within the default values.

opened by svisser 0
Module name means "cunt"

I'm not sure if this is a real issue, but when I look at this module I cannot do so with a straight face. "Picka" is "cunt" in Serbian, Macedonian, Bosnian, Croatian, and I'm unsure as to whether there are other languages where this holds.

While not grounds for any specific action, I find this largely amusing and just wanted to share.

opened by geomaster 2

Releases(v0.96)

v0.96(Jan 17, 2014)

hex, rbg, image and more.
Source code(tar.gz)
Source code(zip)
picka-0.9.6.tar.gz(8.13 MB)
picka-0.9.6.zip(8.18 MB)

Owner

Anthony

GitHub Repository http://antlong.com

2019 Data Science Bowl

Kaggle-2019-Data-Science-Bowl-Solution - Here i present my solution to kaggle 2019 data science bowl and how i improved it to win a silver medal in that competition.

1 Jan 01, 2022

First and foremost, we want dbt documentation to retain a DRY principle. Every time we repeat ourselves, we waste our time. Second, we want to understand column level lineage and automate impact analysis.

dbt-osmosis First and foremost, we want dbt documentation to retain a DRY principle. Every time we repeat ourselves, we waste our time. Second, we wan

150 Jan 06, 2023

Feature engineering and machine learning: together at last

Feature engineering and machine learning: together at last! Lambdo is a workflow engine which significantly simplifies data analysis by unifying featu

14 Sep 15, 2022

pyETT: Python library for Eleven VR Table Tennis data

pyETT: Python library for Eleven VR Table Tennis data Documentation Documentation for pyETT is located at https://pyett.readthedocs.io/. Installation

5 Nov 19, 2022

Python scripts aim to use a Random Forest machine learning algorithm to predict the water affinity of Metal-Organic Frameworks

The following Python scripts aim to use a Random Forest machine learning algorithm to predict the water affinity of Metal-Organic Frameworks (MOFs). The training set is extracted from the Cambridge S

1 Jan 09, 2022

Picka: A Python module for data generation and randomization.

Related tags

Overview

Picka: A Python module for data generation and randomization.

What is Picka?

Examples:

Selenium

Webdriver

Funcargs / pytest

MySQL / SQLite

HTTP

Comments

No test suite

Additional Functionality for Testers to Add Their Own Data

Fixed test file, added alternative sentence maker

Library does not take locale into account

picka.age will return ages outside of the bounds

Module name means "cunt"

Releases(v0.96)

v0.96(Jan 17, 2014)

Owner

Anthony

2019 Data Science Bowl

First and foremost, we want dbt documentation to retain a DRY principle. Every time we repeat ourselves, we waste our time. Second, we want to understand column level lineage and automate impact analysis.

Feature engineering and machine learning: together at last

pyETT: Python library for Eleven VR Table Tennis data

Python scripts aim to use a Random Forest machine learning algorithm to predict the water affinity of Metal-Organic Frameworks

Yet Another Workflow Parser for SecurityHub

Port of dplyr and other related R packages in python, using pipda.

Demonstrate a Dataflow pipeline that saves data from an API into BigQuery table

Data science/Analysis Health Care Portfolio

Using Python to derive insights on particular Pokemon, Types, Generations, and Stats

Statsmodels: statistical modeling and econometrics in Python

Produces a summary CSV report of an Amber Electric customer's energy consumption and cost data.

Snakemake workflow for converting FASTQ files to self-contained CRAM files with maximum lossless compression.

An implementation of the largeVis algorithm for visualizing large, high-dimensional datasets, for R

Pipetools enables function composition similar to using Unix pipes.

Python Library for learning (Structure and Parameter) and inference (Statistical and Causal) in Bayesian Networks.

A library to create multi-page Streamlit applications with ease.

My solution to the book A Collection of Data Science Take-Home Challenges

A fast, flexible, and performant feature selection package for python.

Stream-Kafka-ELK-Stack - Weather data streaming using Apache Kafka and Elastic Stack.