Tools for test driven data-wrangling and data validation.

Overview

datatest: Test driven data-wrangling and data validation

Apache 2.0 License Supported Python Versions Installation Requirements Development Repository Current Build Status Development Status Documentation (stable) Documentation (latest)

Datatest helps to speed up and formalize data-wrangling and data validation tasks. It implements a system of validation methods, difference classes, and acceptance managers. Datatest can help you:

  • Clean and wrangle data faster and more accurately.
  • Maintain a record of checks and decisions regarding important data sets.
  • Distinguish between ideal criteria and acceptible deviation.
  • Validate the input and output of data pipeline components.
  • Measure progress of data preparation tasks.
  • On-board new team members with an explicit and structured process.

Datatest can be used directly in your own projects or as part of a testing framework like pytest or unittest. It has no hard dependencies; it's tested on Python 2.6, 2.7, 3.2 through 3.10, PyPy, and PyPy3; and is freely available under the Apache License, version 2.

Documentation:
Official:

Code Examples

Validating a Dictionary of Lists

from datatest import validate, accepted, Invalid


data = {
    'A': [1, 2, 3, 4],
    'B': ['x', 'y', 'x', 'x'],
    'C': ['foo', 'bar', 'baz', 'EMPTY']
}

validate(data.keys(), {'A', 'B', 'C'})

validate(data['A'], int)

validate(data['B'], {'x', 'y'})

with accepted(Invalid('EMPTY')):
    validate(data['C'], str.islower)

Validating a Pandas DataFrame

import pandas as pd
from datatest import register_accessors, accepted, Invalid


register_accessors()
df = pd.read_csv('data.csv')

df.columns.validate({'A', 'B', 'C'})

df['A'].validate(int)

df['B'].validate({'x', 'y'})

with accepted(Invalid('EMPTY')):
    df['C'].validate(str.islower)

Installation

The easiest way to install datatest is to use pip:

pip install datatest

If you are upgrading from version 0.11.0 or newer, use the --upgrade option:

pip install --upgrade datatest

Upgrading From Version 0.9.6

If you have an existing codebase of older datatest scripts, you should upgrade using the following steps:

  • Install datatest 0.10.0 first:

    pip install --force-reinstall datatest==0.10.0
  • Run your existing code and check for DeprecationWarnings.

  • Update the parts of your code that use deprecated features.

  • Once your code is running without DeprecationWarnings, install the latest version of datatest:

    pip install --upgrade datatest

Stuntman Mike

If you need bug-fixes or features that are not available in the current stable release, you can "pip install" the development version directly from GitHub:

pip install --upgrade https://github.com/shawnbrown/datatest/archive/master.zip

All of the usual caveats for a development install should apply---only use this version if you can risk some instability or if you know exactly what you're doing. While care is taken to never break the build, it can happen.

Safety-first Clyde

If you need to review and test packages before installing, you can install datatest manually.

Download the latest source distribution from the Python Package Index (PyPI):

https://pypi.org/project/datatest/#files

Unpack the file (replacing X.Y.Z with the appropriate version number) and review the source code:

tar xvfz datatest-X.Y.Z.tar.gz

Change to the unpacked directory and run the tests:

cd datatest-X.Y.Z
python setup.py test

Don't worry if some of the tests are skipped. Tests for optional data sources (like pandas DataFrames or NumPy arrays) are skipped when the related third-party packages are not installed.

If the source code and test results are satisfactory, install the package:

python setup.py install

Supported Versions

Tested on Python 2.6, 2.7, 3.2 through 3.10, PyPy, and PyPy3. Datatest is pure Python and may also run on other implementations as well (check using "setup.py test" before installing).

Backward Compatibility

If you have existing tests that use API features which have changed since 0.9.0, you can still run your old code by adding the following import to the beginning of each file:

from datatest.__past__ import api09

To maintain existing test code, this project makes a best-effort attempt to provide backward compatibility support for older features. The API will be improved in the future but only in measured and sustainable ways.

All of the data used at the National Committee for an Effective Congress has been checked with datatest for several years so there is, already, a large and growing codebase that relies on current features and must be maintained into the future.

Soft Dependencies

Datatest has no hard, third-party dependencies. But if you want to interface with pandas DataFrames, NumPy arrays, or other optional data sources, you will need to install the relevant packages (pandas, numpy, etc.).

Development Repository

The development repository for datatest is hosted on GitHub.


Freely licensed under the Apache License, Version 2.0

Copyright 2014 - 2021 National Committee for an Effective Congress, et al.

Comments
  • validation errors Extra(nan) or Invalid(nan)

    validation errors Extra(nan) or Invalid(nan)

    Shaun, I am trying your package to see if I can validate a csv file by reading it in pandas. I am getting Extra(nan) dt.validate.superset() or Invalid(nan) dt.validate() . Is there a way I can include those nan in my validation sets?

    Error looks like

    E     ValidationError: may contain only elements of given superset (10000 differences): [
                Extra(nan),
                Extra(nan),
                Extra(nan),
    

    Note: I am reading this particular column as str

    E       ValidationError: does not satisfy 'str' (10000 differences): [
                Invalid(nan),
                Invalid(nan),
                Invalid(nan),
                Invalid(nan),
    

    Let me know if you find a solution or can help me debug

    opened by upretip 5
  • Crashes pytest-xdist processes (NOTE: See comments for fix.)

    Crashes pytest-xdist processes (NOTE: See comments for fix.)

    Hi, all! I've got some problem, when start my tests with pytest-xdist

    MacOS(Also check in debian) python 3.8.2

    pytest==5.4.3 pytest-xdist==1.33.0 datatest==0.9.6

    from datatest import accepted, Extra, validate as __validate
    
    
    def test_should_passed():
        with accepted(Extra):
            __validate({"qwe": 1}, {"qwe": 1}, "")
    
    
    def test_should_failed():
        with accepted(Extra):
            __validate({"qwe": 1}, {"qwe": 2}, "")
    
    
    if __name__ == '__main__':
        import sys, pytest
        sys.exit(pytest.main(['/Users/qa/PycharmProjects/qa/test123.py', '-vvv', '-n', '1', '-s']))
    

    Output:

    test123.py::test_should_passed 
    [gw0] PASSED test123.py::test_should_passed 
    test123.py::test_should_failed !!!!!!!!!!!!!!!!!!!! <ExceptionInfo RuntimeError('\'----------------------------------------------------------------------------------------------------\'.../issues\'\n\'----------------------------------------------------------------------------------------------------\'\n') tblen=14>
    
    INTERNALERROR> Traceback (most recent call last):
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/xdist/workermanage.py", line 334, in process_from_remote
    INTERNALERROR>     rep = self.config.hook.pytest_report_from_serializable(
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/pluggy/hooks.py", line 286, in __call__
    INTERNALERROR>     return self._hookexec(self, self.get_hookimpls(), kwargs)
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/pluggy/manager.py", line 93, in _hookexec
    INTERNALERROR>     return self._inner_hookexec(hook, methods, kwargs)
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/pluggy/manager.py", line 84, in <lambda>
    INTERNALERROR>     self._inner_hookexec = lambda hook, methods, kwargs: hook.multicall(
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/pluggy/callers.py", line 208, in _multicall
    INTERNALERROR>     return outcome.get_result()
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/pluggy/callers.py", line 80, in get_result
    INTERNALERROR>     raise ex[1].with_traceback(ex[2])
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/pluggy/callers.py", line 187, in _multicall
    INTERNALERROR>     res = hook_impl.function(*args)
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/_pytest/reports.py", line 355, in pytest_report_from_serializable
    INTERNALERROR>     return TestReport._from_json(data)
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/_pytest/reports.py", line 193, in _from_json
    INTERNALERROR>     kwargs = _report_kwargs_from_json(reportdict)
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/_pytest/reports.py", line 485, in _report_kwargs_from_json
    INTERNALERROR>     reprtraceback = deserialize_repr_traceback(
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/_pytest/reports.py", line 468, in deserialize_repr_traceback
    INTERNALERROR>     repr_traceback_dict["reprentries"] = [
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/_pytest/reports.py", line 469, in <listcomp>
    INTERNALERROR>     deserialize_repr_entry(x) for x in repr_traceback_dict["reprentries"]
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/_pytest/reports.py", line 464, in deserialize_repr_entry
    INTERNALERROR>     _report_unserialization_failure(entry_type, TestReport, reportdict)
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/_pytest/reports.py", line 206, in _report_unserialization_failure
    INTERNALERROR>     raise RuntimeError(stream.getvalue())
    INTERNALERROR> RuntimeError: '----------------------------------------------------------------------------------------------------'
    INTERNALERROR> 'INTERNALERROR: Unknown entry type returned: DatatestReprEntry'
    INTERNALERROR> "report_name: <class '_pytest.reports.TestReport'>"
    INTERNALERROR> {'$report_type': 'TestReport',
    INTERNALERROR>  'duration': 0.002020120620727539,
    INTERNALERROR>  'item_index': 1,
    INTERNALERROR>  'keywords': {'qa': 1, 'test123.py': 1, 'test_should_failed': 1},
    INTERNALERROR>  'location': ('test123.py', 8, 'test_should_failed'),
    INTERNALERROR>  'longrepr': {'chain': [({'extraline': None,
    INTERNALERROR>                           'reprentries': [{'data': {'lines': ['    def '
    INTERNALERROR>                                                               'test_should_failed():',
    INTERNALERROR>                                                               '        with '
    INTERNALERROR>                                                               'accepted(Extra):',
    INTERNALERROR>                                                               '>           '
    INTERNALERROR>                                                               '__validate({"qwe": '
    INTERNALERROR>                                                               '1}, {"qwe": 2}, '
    INTERNALERROR>                                                               '"")',
    INTERNALERROR>                                                               'E           '
    INTERNALERROR>                                                               'datatest.ValidationError: '
    INTERNALERROR>                                                               'does not '
    INTERNALERROR>                                                               'satisfy 2 (1 '
    INTERNALERROR>                                                               'difference): {',
    INTERNALERROR>                                                               'E               '
    INTERNALERROR>                                                               "'qwe': "
    INTERNALERROR>                                                               'Deviation(-1, '
    INTERNALERROR>                                                               '2),',
    INTERNALERROR>                                                               'E           }'],
    INTERNALERROR>                                                     'reprfileloc': {'lineno': 11,
    INTERNALERROR>                                                                     'message': 'ValidationError',
    INTERNALERROR>                                                                     'path': 'test123.py'},
    INTERNALERROR>                                                     'reprfuncargs': {'args': []},
    INTERNALERROR>                                                     'reprlocals': None,
    INTERNALERROR>                                                     'style': 'long'},
    INTERNALERROR>                                            'type': 'DatatestReprEntry'}],
    INTERNALERROR>                           'style': 'long'},
    INTERNALERROR>                          {'lineno': 11,
    INTERNALERROR>                           'message': 'datatest.ValidationError: does not '
    INTERNALERROR>                                      'satisfy 2 (1 difference): {\n'
    INTERNALERROR>                                      "    'qwe': Deviation(-1, 2),\n"
    INTERNALERROR>                                      '}',
    INTERNALERROR>                           'path': '/Users/qa/PycharmProjects/qa/test123.py'},
    INTERNALERROR>                          None)],
    INTERNALERROR>               'reprcrash': {'lineno': 11,
    INTERNALERROR>                             'message': 'datatest.ValidationError: does not '
    INTERNALERROR>                                        'satisfy 2 (1 difference): {\n'
    INTERNALERROR>                                        "    'qwe': Deviation(-1, 2),\n"
    INTERNALERROR>                                        '}',
    INTERNALERROR>                             'path': '/Users/qa/PycharmProjects/qa/test123.py'},
    INTERNALERROR>               'reprtraceback': {'extraline': None,
    INTERNALERROR>                                 'reprentries': [{'data': {'lines': ['    def '
    INTERNALERROR>                                                                     'test_should_failed():',
    INTERNALERROR>                                                                     '        '
    INTERNALERROR>                                                                     'with '
    INTERNALERROR>                                                                     'accepted(Extra):',
    INTERNALERROR>                                                                     '>           '
    INTERNALERROR>                                                                     '__validate({"qwe": '
    INTERNALERROR>                                                                     '1}, '
    INTERNALERROR>                                                                     '{"qwe": '
    INTERNALERROR>                                                                     '2}, "")',
    INTERNALERROR>                                                                     'E           '
    INTERNALERROR>                                                                     'datatest.ValidationError: '
    INTERNALERROR>                                                                     'does not '
    INTERNALERROR>                                                                     'satisfy 2 '
    INTERNALERROR>                                                                     '(1 '
    INTERNALERROR>                                                                     'difference): '
    INTERNALERROR>                                                                     '{',
    INTERNALERROR>                                                                     'E               '
    INTERNALERROR>                                                                     "'qwe': "
    INTERNALERROR>                                                                     'Deviation(-1, '
    INTERNALERROR>                                                                     '2),',
    INTERNALERROR>                                                                     'E           '
    INTERNALERROR>                                                                     '}'],
    INTERNALERROR>                                                           'reprfileloc': {'lineno': 11,
    INTERNALERROR>                                                                           'message': 'ValidationError',
    INTERNALERROR>                                                                           'path': 'test123.py'},
    INTERNALERROR>                                                           'reprfuncargs': {'args': []},
    INTERNALERROR>                                                           'reprlocals': None,
    INTERNALERROR>                                                           'style': 'long'},
    INTERNALERROR>                                                  'type': 'DatatestReprEntry'}],
    INTERNALERROR>                                 'style': 'long'},
    INTERNALERROR>               'sections': []},
    INTERNALERROR>  'nodeid': 'test123.py::test_should_failed',
    INTERNALERROR>  'outcome': 'failed',
    INTERNALERROR>  'sections': [],
    INTERNALERROR>  'testrun_uid': 'c913bf205a874a50a237dcf40d482d06',
    INTERNALERROR>  'user_properties': [],
    INTERNALERROR>  'when': 'call',
    INTERNALERROR>  'worker_id': 'gw0'}
    INTERNALERROR> 'Please report this bug at https://github.com/pytest-dev/pytest/issues'
    INTERNALERROR> '----------------------------------------------------------------------------------------------------'
    [gw0] node down: <ExceptionInfo RuntimeError('\'----------------------------------------------------------------------------------------------------\'.../issues\'\n\'----------------------------------------------------------------------------------------------------\'\n') tblen=14>
    [gw0] FAILED test123.py::test_should_failed 
    
    replacing crashed worker gw0
    [gw1] darwin Python 3.8.3 cwd: /Users/qa/PycharmProjects/qa
    INTERNALERROR> Traceback (most recent call last):
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/_pytest/main.py", line 191, in wrap_session
    INTERNALERROR>     session.exitstatus = doit(config, session) or 0
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/_pytest/main.py", line 247, in _main
    INTERNALERROR>     config.hook.pytest_runtestloop(session=session)
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/pluggy/hooks.py", line 286, in __call__
    INTERNALERROR>     return self._hookexec(self, self.get_hookimpls(), kwargs)
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/pluggy/manager.py", line 93, in _hookexec
    INTERNALERROR>     return self._inner_hookexec(hook, methods, kwargs)
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/pluggy/manager.py", line 84, in <lambda>
    INTERNALERROR>     self._inner_hookexec = lambda hook, methods, kwargs: hook.multicall(
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/pluggy/callers.py", line 208, in _multicall
    INTERNALERROR>     return outcome.get_result()
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/pluggy/callers.py", line 80, in get_result
    INTERNALERROR>     raise ex[1].with_traceback(ex[2])
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/pluggy/callers.py", line 187, in _multicall
    INTERNALERROR>     res = hook_impl.function(*args)
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/xdist/dsession.py", line 112, in pytest_runtestloop
    INTERNALERROR>     self.loop_once()
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/xdist/dsession.py", line 135, in loop_once
    INTERNALERROR>     call(**kwargs)
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/xdist/dsession.py", line 263, in worker_runtest_protocol_complete
    INTERNALERROR>     self.sched.mark_test_complete(node, item_index, duration)
    INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/xdist/scheduler/load.py", line 151, in mark_test_complete
    INTERNALERROR>     self.node2pending[node].remove(item_index)
    INTERNALERROR> KeyError: <WorkerController gw0>
    

    But if I change second test like this, all works fine:

    def test_should_failed():
        try:
            with accepted(Extra):
                __validate({"qwe": 1}, {"qwe": 2}, "")
        except:
            raise ValueError
    

    I don't know exactly where i should create bug\issue about this :)

    bug 
    opened by VasilyevAA 3
  • AcceptedExtra not working as expected with dicts

    AcceptedExtra not working as expected with dicts

    I expected with AcceptedExtra(): to ignore missing keys in dicts, but instead it raises a Deviation from None.

    Here is an example:

    actual = {'a': 1, 'b': 2}
    expected = {'b': 2}
    with AcceptedExtra():
        validate(actual, requirement=expected)
    

    The output is:

    E           ValidationError: does not satisfy mapping requirements (1 difference): {
                    'a': Deviation(+1, None),
                }
    

    Thanks for the cool package, by the way!

    opened by TheOtherDude 3
  • Add pytest framework trove classifier

    Add pytest framework trove classifier

    Adding the trove classifier will signal that datatest also acts as a pytest plugin. This will also help https://plugincompat.herokuapp.com to find it and list it as a plugin and do regular installation checks.

    For further details see this recently merged PR from hypothesis: https://github.com/HypothesisWorks/hypothesis/pull/1306

    opened by obestwalter 3
  • Magic Reduction

    Magic Reduction

    Issue #7 exposes the degree of magic that is currently present in the DataTestCase methods. Removing (or at least reducing) magic where possible would make the behavior easier to understand and explain.

    In cases where small amounts of magic are useful, methods should be renamed to better reflect what's happening.

    Illustrating the Problem

    This "magic" version:

    def test_active(self):
        self.assertDataSet('active', {'Y', 'N'})
    

    ...is roughly equivalent to:

    def test_active(self):
        subject = self.subject.set('active')
        self.assertEqual(subject, {'Y', 'N'})
    

    The magic version requires detailed knowledge about the method before a newcomer can guess what's happening. The later example is more explicit and easier to reason about.

    Having said this, the magic versions of DataTestCase's methods can save a lot of typing. So what I plan to do is:

    1. Fully implement assertEqual() integration (see issue #7) as well as other standard unittest methods (assertGreater(), etc.).
    2. Rename the existing methods to clearly denote that they run on the subject data (e.g., assertDataSum()assertSubjectSum(), etc.).
    enhancement 
    opened by shawnbrown 3
  • Unique Method

    Unique Method

    Hey Shawn - one of the problems you were speaking about at PyCon 2016 was looking to guarantee that all integers in a list were unique, in an efficient way for large sets of data?

    enhancement 
    opened by RyPeck 3
  • Fix syntax of `python_requires`

    Fix syntax of `python_requires`

    >=2.6.* isn't valid syntax for python_requires(see PEP 440).

    This was causing an alpha release of Poetry to fail to install this package. I think they're going to fix it in future releases, but regardless it'd be helpful if this syntax was fixed.

    opened by ajhynes7 2
  • pytest_runtest_makereport crashes on test exceptions

    pytest_runtest_makereport crashes on test exceptions

    If an exception is thrown within a test that uses the test_db_engine fixture, the pytest_runtest_makereport function crashes. The reason is that it uses Node's deprecated get_marker function, instead of the new get_closest_marker function. See details about this change in pytest here: https://docs.pytest.org/en/latest/mark.html#updating-code

    opened by avshalomt2 2
  • Explore ways to optimize validation and allowance flow.

    Explore ways to optimize validation and allowance flow.

    Once major pieces are in place, explore ways of optimizing the validation/allowance process. Look to implement the following possible improvements:

    • Use lazy evaluation in validate and assertion functions by returning generators instead of fully calculated containers.
    • Create optimized _validate...() functions for faster testing (short-circuit evaluation and Boolean return values) rather than using _compare...() functions in all cases.
    opened by shawnbrown 2
  • Squint objects not handled properly when used as requirements.

    Squint objects not handled properly when used as requirements.

    Squint objects are not being evaluated properly by datatest.validate() function:

    import datatest
    import squint
    
    # Create a Select object.
    select = squint.Select([['A', 'B'], ['x', 1], ['y', 2], ['z', 3]])
    
    # Compare data to itself--passes as expected.
    datatest.validate(
    	select({'A': {'B'}}),
    	select({'A': {'B'}}).fetch(),  # <- Shouldn't be necessary.
    )
    
    # Compare data to itself--fails, unexpectedly.
    datatest.validate(
    	select({'A': {'B'}}),
    	select({'A': {'B'}}),  # <- Not properly handled!
    )
    

    In the code above, the second call to datatest.validate() should pass but, instead, fails with the following message:

    Traceback (most recent call last):
      File "<input>", line 3, in <module>
    	select({'A': {'B'}}),  # <- Not properly handled!
      File "~/datatest-project/datatest/validation.py", line 291, in __call__
    	raise err
    datatest.ValidationError: does not satisfy mapping requirements (3 differences): {
    	'x': [Invalid(1)],
    	'y': [Invalid(2)],
    	'z': [Invalid(3)],
    }
    
    bug 
    opened by shawnbrown 1
  • Selector.load_data() silently fails on missing file.

    Selector.load_data() silently fails on missing file.

    The following should raise an error:

    >>> import datatest
    >>> select = datatest.Selector()
    >>> select = select.load_data('nonexistent_file.csv')
    
    bug 
    opened by shawnbrown 1
  • How to validate Pandas data type

    How to validate Pandas data type "Int64"?

    Pandas recently introduced IntegerArrays which allow integer types to also store a NaN-like value pandas.NA.

    Is there a way to use datatest to validate that a pandas.DataFrame's column is of type Int64, i.e. all values are of that type.

    I tried df["mycolumn"].validate(pd.arrays.IntegerArray) and df["mycolumn"].validate(pd.Int64Dtype) to no avail.

    opened by PanCakeConnaisseur 0
  • Understanding Pandas validation

    Understanding Pandas validation

    Hello, apologies if this is the wrong place to ask this question.

    I am stumped on how datatest's validation mechanism is passing the following example:

    dt.validate(pd.DataFrame(), pd.DataFrame({"A": [1]})
    

    The documentation states:

    For validation, DataFrame objects using the default index type are treated as sequences.

    Shouldn't I be getting the same result as dt.validate([], [1])? What am I missing?

    opened by schlich 1
  • Improve existing or create another Deviation-like difference

    Improve existing or create another Deviation-like difference

    Hello @shawnbrown It would be nice to also show actual value along with deviation and expected value. It would also be nice to be able to see the percentage deviation along with the absolute deviation. Thanks!

    opened by a-chernov 0
  • Improve error message for @working_directory decorator

    Improve error message for @working_directory decorator

    If working_directory() is used as a decorator but the developer forgets to call it with a path, the error message can be confusing because the function is passed in implicitly (via decorator handling):

    >>> from datatest import working_directory
    >>>
    >>> @working_directory
    >>> def foo():
    >>>     return True
    ...
    TypeError: stat: path should be string, bytes, os.PathLike or integer, not function
    

    This misuse is easily detectable in the code and it would be good to improve the error message to help users understand their mistake.

    opened by shawnbrown 0
  • NaT issue

    NaT issue

    Greetings, @shawnbrown

    to be short,

    my pd.Series is like: Date 0 NaT 1 NaT 2 NaT 3 2010-12-31 4 2010-12-31 Name: Date, dtype: datetime64[ns] the type of NaT is: <class 'pandas._libs.tslibs.nattype.NaTType'> when I use the following code:

    with accepted(Extra(pd.NaT)): validate(data, requirement)

    I found that it the NaTs can not be recognized. I tried many types of Extra and tried using function but all faild.

    here I need your help. Thanks for your work.

    opened by Belightar 5
  • Investigate Support for DataFrame-Protocol

    Investigate Support for DataFrame-Protocol

    Keep an eye on wesm/dataframe-protocol#1 and see if it makes sense to change datatest's normalization to support a DataFrame-protocol instead of Dataframes specifically.

    enhancement 
    opened by shawnbrown 0
Releases(0.11.1)
  • 0.11.1(Jan 4, 2021)

    • Fixed validation, predicate, and difference handling of non-comparable objects.
    • Fixed bug in normalization of Queries from squint package.
    • Changed failure output to improve error reporting with pandas accessors.
    • Changed predicate failure message to quote code objects using backticks.
    Source code(tar.gz)
    Source code(zip)
  • 0.11.0(Dec 18, 2020)

    • Removed deprecated decorators: skip(), skipIf(), skipUnless() (use unittest.skip(), etc. instead).
    • Removed deprecated aliases Selector and ProxyGroup.
    • Removed the long-deprecated allowed interface.
    • Removed deprecated acceptances: "specific", "limit", etc.
    • Removed deprecated Select, Query, and Result API. Use squint instead:
      • https://pypi.org/project/squint/
    • Removed deprecated get_reader() function. Use get-reader instead:
      • https://pypi.org/project/get-reader/
    Source code(tar.gz)
    Source code(zip)
  • 0.10.0(Dec 17, 2020)

    • Fixed bug where ValidationErrors were crashing pytest-xdist workers.

    • Added tighter Pandas integration using Pandas' extension API.

      After calling the new register_accessors() function, your existing DataFrame, Series, Index, and MultiIndex objects will have a validate() method that can be used instead of the validate() function:

      import padas as pd
      import datatest as dt
      
      dt.register_accessors()  # <- Activate Pandas integration.
      
      df = pd.DataFrame(...)
      df[['A', 'B']].validate((str, int))  # <- New accessor method.
      
    • Changed Pandas validation behavior:

      • DataFrame and Series: These objects are treated as sequences when they use a RangeIndex index (this is the default type assigned when no index is specified). And they are treated as dictionaries when they use an index of any other type--the index values become the dictionary keys.

      • Index and MultiIndex: These objects are treated as sequences.

    • Changed repr behavior of Deviation to make timedeltas more readable.

    • Added Predicate matching support for NumPy types np.character, np.integer, np.floating, and np.complexfloating.

    • Added improved NaN handling:

      • Added NaN support to accepted.keys(), accepted.args(), and validate.interval().
      • Improved existing NaN support for difference comparisons.
      • Added how-to documentation for NaN handling.
    • Added data handling support for squint.Select objects.

    • Added deprecation warnings for soon-to-be-removed functions and classes:

      • Added DeprecationWarning to get_reader function. This function is now available from the get-reader package on PyPI:

        https://pypi.org/project/get-reader/

      • Added DeprecationWarning to Select, Query, and Result classes. These classes will be deprecated in the next release but are now available from the squint package on PyPI:

        https://pypi.org/project/squint/

    • Changed validate.subset() and validate.superset() behavior:

      The semantics are now inverted. This behavior was flipped to more closely match user expectations. The previous semantics were used because they reflect the internal structure of datatest more precisely. But these are implementation details that and they are not as important as having a more intuitive API.

    • Added temporary a warning when using the new subset superset methods to alert users to the new behavior. This warning will be removed from future versions of datatest.

    • Added Python 3.9 and 3.10 testing and support.

    • Removed Python 3.1 testing and support. If you were still using this version of Python, please email me--this is a story I need to hear.

    Source code(tar.gz)
    Source code(zip)
  • 0.9.6(Jun 3, 2019)

    • Changed acceptance API to make it both less verbose and more expressive:

      • Consolidated specific-instance and class-based acceptances into a single interface.

      • Added a new accepted.tolerance() method that subsumes the behavior of accepted.deviation() by supporting Missing and Extra quantities in addition to Deviation objects.

      • Deprecated old methods:

        Old SyntaxNew Syntax
        accepted.specific(...)accepted(...)
        accepted.missing()accepted(Missing)
        accepted.extra()accepted(Extra)
        NO EQUIVALENTaccepted(CustomDifferenceClass)
        accepted.deviation(...)accepted.tolerance(...)
        accepted.limit(...)accepted.count(...)
        NO EQUIVALENTaccepted.count(..., scope='group')

        Other methods--accepted.args(), accepted.keys(), etc.--remain unchanged.

    • Changed validation to generate Deviation objects for a broader definition of quantitative values (like datetime objects)--not just for subclasses of numbers.Number.

    • Changed handling for pandas.Series objects to treat them as sequences instead of mappings.

    • Added handling for DBAPI2 cursor objects to automatically unwrap single-value rows.

    • Removed acceptance classes from datatest namespace--these were inadvertently added in a previous version but were never part of the documented API. They can still be referenced via the acceptances module:

      from datatest.acceptances import ...

    Source code(tar.gz)
    Source code(zip)
  • 0.9.5(May 1, 2019)

    • Changed difference objects to make them hashable (can now be used as set members or as dict keys).
    • Added __slots__ to difference objects to reduce memory consumption.
    • Changed name of Selector class to Select (Selector now deprecated).
    • Changed language and class names from allowed and allowance to accepted and acceptance to bring datatest more inline with manufacturing and engineering terminology. The existing allowed API is now deprecated.
    Source code(tar.gz)
    Source code(zip)
  • 0.9.4(Apr 21, 2019)

    • Added Python 3.8 testing and support.
    • Added new validate methods (moved from how-to recipes into core module):
      • Added approx() method to require for approximate numeric equality.
      • Added fuzzy() method to require strings by approximate match.
      • Added interval() method to require elements within a given interval.
      • Added set(), subset(), and superset() methods for explicit membership checking.
      • Added unique() method to require unique elements.
      • Added order() method to require elements by relative order.
    • Changed default sequence validation to check elements by index position rather than checking by relative order.
    • Added fuzzy-matching allowance to allow strings by approximate match.
    • Added Predicate class to formalize behavior--also provides inverse-matching with the inversion operator (~).
    • Added new methods to Query class:
      • Added unwrap() to remove single-element containers and return their unwrapped contents.
      • Added starmap() to unpack grouped arguments when applying a function to elements.
    • Fixed improper use of assert statements with appropriate conditional checks and error behavior.
    • Added requirement class hierarchy (using BaseRequirement). This gives users a cleaner way to implement custom validation behavior and makes the underlying codebase easier to maintain.
    • Changed name of ProxyGroup to RepeatingContainer.
    • Changed "How To" examples to use the new validation methods.
    Source code(tar.gz)
    Source code(zip)
  • 0.9.3(Jan 29, 2019)

    • Changed bundled pytest plugin to version 0.1.3:
      • This update adds testing and support for latest versions of Pytest and Python (now tested using Pytest 3.3 to 4.1 and Python 2.7 to 3.7).
      • Changed handling for 'mandatory' marker to support older and newer Pytest versions.
    Source code(tar.gz)
    Source code(zip)
  • 0.9.2(Aug 8, 2018)

    Improved data handling features and support for Python 3.7:

    • Changed Query class:
      • Added flatten() method to serialize dictionary results.
      • Added to_csv() method to quickly save results as a CSV file.
      • Changed reduce() method to accept initializer_factory as an optional argument.
      • Changed filter() method to support predicate matching.
    • Added True and False as predicates to support "truth value testing" on arbitrary objects (to match on truthy or falsy).
    • Added ProxyGroup class for performing the same operations on groups of objects at the same time (a common need when testing against reference data).
    • Changed Selector class keyword filtering to support predicate matching.
    • Added handling to get_reader() to support datatest's Selector and Result objects.
    • Fixed get_reader() bug that prevented encoding-fallback recovery when reading from StringIO buffers in Python 2.
    Source code(tar.gz)
    Source code(zip)
  • 0.9.1(Jun 22, 2018)

    • Added impoved docstrings and other documentation.
    • Changed bundled pytest plugin to version 0.1.2:
      • Added handling for a mandatory marker to support incremental testing (stops session early when a mandatory test fails).
      • Added --ignore-mandatory option to continue tests even when a mandatory test fails.
    Source code(tar.gz)
    Source code(zip)
  • 0.9.0(Apr 29, 2018)

    • Added bundled version pytest plugin to base installation.
    • Added universal composability for all allowances (using UNION and INTERSECTION via "|" and "&" operators).
    • Added allowed factory class to simplify allowance imports.
    • Changed is_valid() to valid().
    • Changed ValidationError to display differences in sorted order.
    • Added Python 2 and 3 compatible get_reader() to quickly load csv.reader-like interface for Unicode CSV, MS Excel, pandas.DataFrame, DBF, etc.
    • Added formal order of operations for allowance resolution.
    • Added formal predicate object handling.
    • Added Sphinx-tabs style docs for clear separation of pytest and unittest style examples.
    • Changed DataSource to Selector, DataQuery to Query, and DataResult to Result.
    Source code(tar.gz)
    Source code(zip)
  • 0.8.3(Nov 26, 2017)

    • New module-level functions: validate() and is_valid().
    • DataQuery selections now default to a list type when no outer-container is specified.
    • New DataQuery.apply() method for group-wise function application.
    • DataSource.fieldnames attribute is now a tuple (was a list).
    • The ValidationError repr now prints a trailing comma with the last item (for ease of copy-and-paste work flow).
    • Revised sequence validation behavior provides more precise differences.
    • New truncation support for ValidationErrors with long lists of differences.
    • Excess differences in allowed_specific() definitions no longer trigger test failures.
    • New support for user-defined functions to narrow DataSource selections.
    • Better traceback hiding for pytest.
    • Fix bug in DataQuery.map() method--now converts set types into lists.
    Source code(tar.gz)
    Source code(zip)
  • 0.8.2(Jun 11, 2017)

    • Implement Boolean composition for allowed_specific() context manager.
    • Add proper __repr__() support to DataSource and DataQuery.
    • Make sure DataQuery fails early if bad "select" syntax is used or if unknown columns are selected.
    • Add __copy__() method to DataQuery.
    • Change parent class of differences so they no longer inherit from Exception (this confused their intended use).
    • Restructure documentation for ease of reference.
    Source code(tar.gz)
    Source code(zip)
  • 0.8.1(May 31, 2017)

    • Updated DataQuery select behavior to fail immediately when invalid syntax is used (rather than later when attempting to execute the query).
    • Improved error messages to better explain what went wrong.
    Source code(tar.gz)
    Source code(zip)
  • 0.8.0(May 31, 2017)

    • Replaces old assertion methods with a single, smarter assertValid() method.
    • DataQuery implements query optimization and uses a simpler and more expressive syntax.
    • Allowances and errors have been reworked to be more expressive.
    • Allowances are now composeable with bit-wise "&" and "|" operators.
    Source code(tar.gz)
    Source code(zip)
  • 0.7.0.dev2(Aug 3, 2016)

    • Removes some of the internal magic and renames data assertions to more clearly indicate their intended use.
    • Restructures data allowances to provide more consistent parameters and more flexible usage.
    • Adds new method to assert unique values.
    • Adds full **fmtparams support for CSV handling.
    • Fixes comparison and allowance behavior for None vs. zero.
    Source code(tar.gz)
    Source code(zip)
  • 0.6.0.dev1(May 29, 2016)

tidevice can be used to communicate with iPhone device

tidevice can be used to communicate with iPhone device

Alibaba 1.8k Jan 08, 2023
show python coverage information directly in emacs

show python coverage information directly in emacs

wouter bolsterlee 30 Oct 26, 2022
A browser automation framework and ecosystem.

Selenium Selenium is an umbrella project encapsulating a variety of tools and libraries enabling web browser automation. Selenium specifically provide

Selenium 25.5k Jan 01, 2023
How to Create a YouTube Bot that Increases Views using Python Programming Language

YouTube-Bot-in-Python-Selenium How to Create a YouTube Bot that Increases Views using Python Programming Language. The app is for educational purpose

Edna 14 Jan 03, 2023
自动化爬取并自动测试所有swagger-ui.html显示的接口

swagger-hack 在测试中偶尔会碰到swagger泄露 常见的泄露如图: 有的泄露接口特别多,每一个都手动去试根本试不过来 于是用python写了个脚本自动爬取所有接口,配置好传参发包访问 原理是首先抓取http://url/swagger-resources 获取到有哪些标准及对应的文档地

jayus 534 Dec 29, 2022
Python 3 wrapper of Microsoft UIAutomation. Support UIAutomation for MFC, WindowsForm, WPF, Modern UI(Metro UI), Qt, IE, Firefox, Chrome ...

Python 3 wrapper of Microsoft UIAutomation. Support UIAutomation for MFC, WindowsForm, WPF, Modern UI(Metro UI), Qt, IE, Firefox, Chrome ...

yin kaisheng 1.6k Dec 29, 2022
Codeforces Test Parser for C/C++ & Python on Windows

Codeforces Test Parser for C/C++ & Python on Windows Installation Run pip instal

Minh Vu 2 Jan 05, 2022
:game_die: Pytest plugin to randomly order tests and control random.seed

pytest-randomly Pytest plugin to randomly order tests and control random.seed. Features All of these features are on by default but can be disabled wi

pytest-dev 471 Dec 30, 2022
Browser reload with uvicorn

uvicorn-browser This project is inspired by autoreload. Installation pip install uvicorn-browser Usage Run uvicorn-browser --help to see all options.

Marcelo Trylesinski 64 Dec 17, 2022
A pytest plugin that enables you to test your code that relies on a running Elasticsearch search engine

pytest-elasticsearch What is this? This is a pytest plugin that enables you to test your code that relies on a running Elasticsearch search engine. It

Clearcode 65 Nov 10, 2022
Instagram unfollowing bot. If this script is executed that specific accounts following will be reduced

Instagram-Unfollower-Bot Instagram unfollowing bot. If this script is executed that specific accounts following will be reduced.

Biswarup Bhattacharjee 1 Dec 24, 2021
Akulaku Create NewProduct Automation using Selenium Python

Akulaku-Create-NewProduct-Automation Akulaku Create NewProduct Automation using Selenium Python Usage: 1. Install Python 3.9 2. Open CMD on Bot Folde

Rahul Joshua Damanik 1 Nov 22, 2021
Python drivers for YeeNet firmware

yeenet-router-driver-python Python drivers for YeeNet firmware This repo is under heavy development. Many or all of these scripts are not likely to wo

Jason Paximadas 1 Dec 26, 2021
Load Testing ML Microservices for Robustness and Scalability

The demo is aimed at getting started with load testing a microservice before taking it to production. We use FastAPI microservice (to predict weather) and Locust to load test the service (locally or

Emmanuel Raj 13 Jul 05, 2022
Aioresponses is a helper for mock/fake web requests in python aiohttp package.

aioresponses Aioresponses is a helper to mock/fake web requests in python aiohttp package. For requests module there are a lot of packages that help u

402 Jan 06, 2023
A web scraping using Selenium Webdriver

Savee - Images Downloader Project using Selenium Webdriver to download images from someone's profile on https:www.savee.it website. Usage The project

Caio Eduardo Lobo 1 Dec 17, 2021
Test python asyncio-based code with ease.

aiounittest Info The aiounittest is a helper library to ease of your pain (and boilerplate), when writing a test of the asynchronous code (asyncio). Y

Krzysztof Warunek 55 Oct 30, 2022
Obsei is a low code AI powered automation tool.

Obsei is a low code AI powered automation tool. It can be used in various business flows like social listening, AI based alerting, brand image analysis, comparative study and more .

Obsei 782 Dec 31, 2022
Pynguin, The PYthoN General UnIt Test geNerator is a test-generation tool for Python

Pynguin, the PYthoN General UnIt test geNerator, is a tool that allows developers to generate unit tests automatically.

Chair of Software Engineering II, Uni Passau 997 Jan 06, 2023
A folder automation made using Watch-dog, it only works in linux for now but I assume, it will be adaptable to mac and PC as well

folder-automation A folder automation made using Watch-dog, it only works in linux for now but I assume, it will be adaptable to mac and PC as well Th

Parag Jyoti Paul 31 May 28, 2021