Handle, manipulate, and convert data with units in Python

Last update: Jan 02, 2023

Related tags

Data Analysis unyt

Overview

unyt

https://github.com/yt-project/unyt/actions/workflows/ci.yml/badge.svg?branch=master

A package for handling numpy arrays with units.

Often writing code that deals with data that has units can be confusing. A function might return an array but at least with plain NumPy arrays, there is no way to easily tell what the units of the data are without somehow knowing a priori.

The unyt package (pronounced like "unit") provides a subclass of NumPy's ndarray class that knows about units. For example, one could do:

>>> import unyt as u
>>> distance_traveled = [3.4, 5.8, 7.2] * u.mile
>>> print(distance_traveled.to('km'))
[ 5.4717696  9.3341952 11.5872768] km

And a whole lot more! See the documentation for installation instructions, more examples, and full API reference.

This package only depends on numpy and sympy. Notably, it does not depend on yt and it is written in pure Python.

Code of Conduct

The unyt package is part of The yt Project. Participating in unyt development therefore happens under the auspices of the yt community code of conduct. If for any reason you feel that the code of conduct has been violated, please send an e-mail to [email protected] with details describing the incident. All emails sent to this address will be treated with the strictest confidence by an individual who does not normally participate in yt development.

License

The unyt package is licensed under the BSD 3-clause license.

Citation

If you make use of unyt in work that leads to a publication we would appreciate a mention in the text of the paper or in the acknowledgements along with a citation to our paper in the Journal of Open Source Software. You can use the following BibTeX:

@article{Goldbaum2018,
  doi = {10.21105/joss.00809},
  url = {https://doi.org/10.21105/joss.00809},
  year  = {2018},
  month = {aug},
  publisher = {The Open Journal},
  volume = {3},
  number = {28},
  pages = {809},
  author = {Nathan J. Goldbaum and John A. ZuHone and Matthew J. Turk and Kacper Kowalik and Anna L. Rosen},
  title = {unyt: Handle,  manipulate,  and convert data with units in Python},
  journal = {Journal of Open Source Software}
}

Or the following citation format:

Goldbaum et al., (2018). unyt: Handle, manipulate, and convert data with units in Python . Journal of Open Source Software, 3(28), 809, https://doi.org/10.21105/joss.00809

Comments

daskified unyt arrays
This PR introduces the unyt_dask_array class, which implements a subclass of standard dask arrays with units attached. Still a work in progress, but it is generally useable now!

Basic usage (also shown here in a notebook) begins by using the unyt_from_dask function to create a new unyt_dask_array instance from a dask array:

from unyt.dask_array import unyt_from_dask from dask import array as dask_array x = unyt_from_dask(dask_array.random.random((10000,10000), chunks=(1000,1000)), 'm') x Out[2]: unyt_dask_array<random_sample, shape=(10000, 10000), dtype=float64, chunksize=(1000, 1000), chunktype=numpy.ndarray, units=m>

The array can be manipulated as any other dask array:

result = (x * 2).mean() result Out[3]: unyt_dask_array<mean_agg-aggregate, shape=(), dtype=float64, chunksize=(), chunktype=numpy.ndarray, units=m> result.compute() Out[4]: unyt_quantity(1.00009275, 'm')

If the return is an array, we get a unyt_array instead:

(x * 2 + x.to('cm')).mean(1).compute() Out[8]: unyt_array([1.50646938, 1.48487083, 1.49774744, ..., 1.49939197, 1.49462512, 1.48263323], 'm')

Unit conversions:

x = unyt_from_dask(dask_array.random.random((10000,10000), chunks=(1000,1000)), 'lb') x.mean().compute() Out[9]: unyt_quantity(0.50002619, 'lb') x.in_mks().mean().compute() Out[10]: unyt_quantity(0.22680806, 'kg') x.to('mg').mean().compute() Out[11]: unyt_quantity(226808.06379903, 'mg') from unyt import g x.to(g).mean().compute() Out[12]: unyt_quantity(226.8080638, 'g')

The implementation relies primarily on decorators and a hidden unyt_dask_array._unyt_array to track unit conversions and has very minimal modifications to the existing unyt codebase. If a user is running a dask client, then all the above calculations will be executed by that client (see notebook), but the implementation here only needs the dask array subset (i.e., pip install dask[array]).

Some remaining known issues:

[x] reductions return standard dask arrays when using external functions (see note below)

[x] dask is added to _on_demand_imports but haven't added it to the testing environment yet, so new tests will fail

[x] haven't yet done flake8/etc checks

[x] no new docs yet (update: added to the usage page)

[x] new tests could use a bit more tweaking

[x] squash commits? I have a lot... but would be easy to squash. let me know. (update: chose not to squash)

Note on the issue with dask reductions:

If you do:

from unyt.dask_array import unyt_from_dask from dask import array as dask_array x = unyt_from_dask(dask_array.random.random((10000,10000), chunks=(1000,1000)), 'm') x.min().compute()

You get a unyt_quantity as expected: unyt_quantity(0.50002407, 'm')

But if you use the daskified equivalent of np.min(ndarray):

dask_array.min(x).compute()

You get a plain float: 0.50002407. This isn't much of an issue for simple functions like min, but many more complex functions are not implemented as attributes. Not yet sure what the best approach is here...

Update (8/24) to the dask reductions: I've played around with many approaches focused around manually wrapping all of the dask reductions but have decided that the added complexity is not worth it. Instead, I added a user-facing function, unyt.dask_array.reduce_with_units that accepts a dask function handle, the unyt array and any args and kwargs for the dask function that internally wraps the dask function handle to track units.

standalone package?

One final note: while I've been developing this to be incorporated into unyt, the fact that there are very minimal changes to the rest of the codebase means that this could be a standalone package. Happy to go that route if it seems more appropriate!
enhancement
opened by chrishavlin 24

Unyt2.5.0 breaks matplotlib's errorbar function

unyt version: 2.5.0
Python version: 3.7.4
Operating System: MacOS Catalina, RHEL 7(?)

Description

unyt v2.5.0 is unable to create matplotlib plots that have an unyt_quantity as the axis limit when using the errorbar function, when a scatter is provided in the required 2XN format as a list of two unyt arrays.

What I Did

Example script (matplotlib3.1.2 and unyt2.5.0):

import matplotlib.pyplot as plt
import unyt

x = unyt.unyt_array([8, 9, 10], "cm")
y = unyt.unyt_array([8, 9, 10], "kg")
# It is convenient often to supply the 2XN required array
# in this format
y_scatter = [
    unyt.unyt_array([0.1, 0.2, 0.3], "kg"),
    unyt.unyt_array([0.1, 0.2, 0.3], "kg"),
]

x_lims = (unyt.unyt_quantity(5, "cm"), unyt.unyt_quantity(12, "cm"))
y_lims = (unyt.unyt_quantity(5, "kg"), unyt.unyt_quantity(12, "kg"))

plt.errorbar(x, y, yerr=y_scatter)
plt.xlim(*x_lims)
plt.ylim(*y_lims)

plt.show()

Output:

python3 test.py
Traceback (most recent call last):
  File "/private/tmp/env/lib/python3.7/site-packages/matplotlib/axis.py", line 1550, in convert_units
    ret = self.converter.convert(x, self.units, self)
  File "/private/tmp/env/lib/python3.7/site-packages/unyt/mpl_interface.py", line 105, in convert
    return value.to(*unit)
AttributeError: 'list' object has no attribute 'to'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "test.py", line 14, in <module>
    plt.errorbar(x, y, yerr=y_scatter)
  File "/private/tmp/env/lib/python3.7/site-packages/matplotlib/pyplot.py", line 2554, in errorbar
    **({"data": data} if data is not None else {}), **kwargs)
  File "/private/tmp/env/lib/python3.7/site-packages/matplotlib/__init__.py", line 1599, in inner
    return func(ax, *map(sanitize_sequence, args), **kwargs)
  File "/private/tmp/env/lib/python3.7/site-packages/matplotlib/axes/_axes.py", line 3430, in errorbar
    barcols.append(self.vlines(xo, lo, uo, **eb_lines_style))
  File "/private/tmp/env/lib/python3.7/site-packages/matplotlib/__init__.py", line 1599, in inner
    return func(ax, *map(sanitize_sequence, args), **kwargs)
  File "/private/tmp/env/lib/python3.7/site-packages/matplotlib/axes/_axes.py", line 1176, in vlines
    x = self.convert_xunits(x)
  File "/private/tmp/env/lib/python3.7/site-packages/matplotlib/artist.py", line 180, in convert_xunits
    return ax.xaxis.convert_units(x)
  File "/private/tmp/env/lib/python3.7/site-packages/matplotlib/axis.py", line 1553, in convert_units
    f'units: {x!r}') from e
matplotlib.units.ConversionError: Failed to convert value(s) to axis units: [unyt_quantity(8, 'cm'), unyt_quantity(9, 'cm'), unyt_quantity(10, 'cm')]

Even wrapping the list in a call to unyt.unyt_array doesn't save the day.

opened by JBorrow 22

ENH: Provisional support for NEP 18 (__array_function__ protocol)
My initial motivation here was to add some unit representation to error messages when comparing two unyt_array instance via functions from numpy.testing, like so:

import numpy as np import unyt as un a = [1, 2, 3] * un.cm b = [1, 2, 3] * un.km np.testing.assert_array_equal(a, b)

which yields, on master:

... AssertionError: Arrays are not equal Mismatched elements: 3 / 3 (100%) Max absolute difference: 299997. Max relative difference: 0.99999 x: unyt_array([1, 2, 3]) y: unyt_array([1, 2, 3])

and on this branch

previous version

... AssertionError: Arrays are not equal Mismatched elements: 3 / 3 (100%) Max absolute difference: 299997. cm Max relative difference: 0.99999 dimensionless x: unyt_array([1, 2, 3] cm) y: unyt_array([1, 2, 3] km)

edit:

AssertionError: Arrays are not equal Mismatched elements: 3 / 3 (100%) Max absolute difference: 299997., units='cm' Max relative difference: 0.99999, units='dimensionless' x: unyt_array([1, 2, 3], units='cm') y: unyt_array([1, 2, 3], units='km')

Incidentally, it turns out that fixing this necessitated a kick-off implementation of NEP 18, so this work laid the foundation to solve:

[x] #69

[x] #130

[ ] #50 (most likely out of scope)

More broadly, implementing NEP 18 for any was the topic of #139. Granted I need to take more time to check that I'm not going against the original intentions here. My current approach is that, since covering the whole numpy public API in one go seems like an gigantic task, I'm implementing unyt_array.__array_function__ with a fallthrough condition: if a special case isn't implemented yet, just fallback to the raw numpy implem (which is currently the behaviour for all functions subject to NEP 18). This way we can add support for more and more functions in a progressive way. I'm going to set the bar low(ish) for now, and try and fix the already reported cases, as reported above, as a first step.

An important question to address is: what should be done in the general case where we don't have a custom implementation for an array function ?

transparently default to the raw numpy implementation without a warning (this is de facto what is done as of unyt 2.8.0, and will remain the case until NEP 18 isn't at least partially implemented).

same as 1. but emit a warning (possibly, we could have a whitelist of functions that are known to be perfectly fine without a specialised implementation, for which no warning would be emitted) along the lines of

UserWarning: calling `numpy.FUNC` on a unyt_array. Results may hold incorrect units. A future version of unyt will remove this warning, and possibly change the behaviour of this function to be dimensionally correct.

Error out

Option 1 this is the current implementation of this PR because I think it is the less disruptive or noisy one. My personal opinion is that it's probably okay to have incomplete support for NEP 18 for a couple release, as long as it is clearly stated in the release notes.
bug
opened by neutrinoceros 19
ENH: optimize import time

This is an answer to #27. I shave off about 33% of unyt's import time by making Unit objects copies shallow by default, the one difference with a deep copy being that the attached UnitRegistry is shallow-copied.

Using the benchmark I described in #27, I get the import time from 1.6s to 1.0s on my machine. I hope that this doesn't have undesirable side effects. Another aspect that could be considered is that the sheer number of copies that are performed at import time is probably a sign that something else isn't optimized.

opened by neutrinoceros 16
Equality test of equivalent quantities, but with different prefixes, returns False.
unyt version: 2.4.1

Python version: 3.8.1

Operating System: Win10

Description

The quantities 1 s and 1000 ms are equal, but unyt says they're not equal.

What I Did

>>> from unyt import s, ms >>> 1*s == 1000*ms array(False)

I also find the rather surprising result:

>>> 1*s >= 999*ms array(True) >>> 1*s >= 1000*ms array(False)
opened by l-johnston 15

bugfix: fix commutativity in unyt_array operators

fix https://github.com/yt-project/yt/issues/874

Here's an actualised version of the script provided by @jzuhone at the time with updated reference outputs.


>>> import yt
>>> ds = yt.testing.fake_amr_ds()
>>> a = yt.YTArray([1,2,3], "cm")
>>> b = ds.arr([1,2,3], "code_length")

>>> a*b
old > SymbolNotFoundError: The symbol 'code_length' does not exist in this registry.
new > unyt_array([1, 4, 9], 'cm*code_length')

>>> b*a
old > unyt_array([1, 4, 9], 'cm*code_length')
new > unyt_array([1, 4, 9], 'cm*code_length')

>>> (a*b).in_units("code_length**2")
old > SymbolNotFoundError: The symbol 'code_length' does not exist in this registry.
new > unyt_array([1., 4., 9.], 'code_length**2')

>>> (b*a).in_units("code_length**2")
old > unyt_array([1., 4., 9.], 'code_length**2')
new > unyt_array([1., 4., 9.], 'code_length**2')

For reference, this issue was referenced in https://github.com/yt-project/yt/issues/2797, hence the fix.

bug

opened by neutrinoceros 14

MNT: add explicit support for Python 3.10

follow up to #194 This will likely fail CI at first, it may be a little early to expect support is already provided by unyt's dependencies, so I'll open as a draft for now, and see what happens.
enhancement

opened by neutrinoceros 13
ci: Migrate CI to GitHub Actions workflows
Closes PR #187

Requires PR #189

This PR migrates CI from Travis CI and Appveyor to use GitHub Actions workflows. The GHA CI will run across Ubuntu, MacOS, and Windows environments across CPython runtimes spanning 3.6 to 3.9. To reduce the number of runs (especially on slower runners like MacOS) the test matrix only runs on MacOS and Windows for the edge CPython versions: Python 3.6 and Python 3.9. The CI runs on a variety of event triggers:

Pushes to the master branch (PR merges trigger "push" events)

Pushes to pull requests

As a nightly CRON job (useful for being alerted to dependencies breaking APIs)

On demand manual triggers

Travis CI and Appveyor are dropped in this PR and coverage reporting is switched over to use the Codecov GHA (this will require some follow up from the maintainers as you'll want to get an optional CODECOV_TOKEN to greatly speed up reporting).
opened by matthewfeickert 13
fix: Apply Black and update usage docs code
This PR simply gets the CI passing (in local runs of tox and in GitHub Actions) so that PR #187 can proceed smoothly. It just applies Black to the code base to take care of the space differences that Black v21.4b0+ now enforces and then in the docs adds in a missing import of unt_array and then adds a write option to a h5py.File call (perhaps a somewhat recent change in h5py?).

I don't think that yt does squash and merge commits like I usually do, but if that is something that happens

Suggested squash and merge commit message

* Apply Black to codebase to revise docstring whitespace - Black v21.4b0 release notes: Black now processes one-line docstrings by stripping leading and trailing spaces, and adding a padding space when needed to break up “””” * Add missing import of unyt.unyt_array to usage docs * Add missing write option to h5py.File call in usage docs
opened by matthewfeickert 12
TST: migrate from tox-pyenv to tox-gh-actions

Since tox-pyenv looks unmaintained (no response from the maintainer in 3 weeks now), let's experiment with a candidate replacement. Because this makes tox 4 usable in CI for the first time, this may require some tweaks. I'd also need to make changes in the dev guide if this works.

opened by neutrinoceros 10
FEAT: implement unyt.unyt_quantity.from_string

This adds a from_string method to the unyt_quantity class. I originally wrote it in a separate project where I need to parse quantities from configuration (text) files, then realized it would be useful to have it as part of the library.

I consider this a draft for now. The implementation works as intended in every case I could think of (valid as well as invalid ones), but I would like to add doc strings (w/ doctests) to the actual function.

opened by neutrinoceros 10
MNT: out of date copyright headers

A bunch of files have a copyright header. Most of them use # Copyright (c) 2018, yt Development Team., but some are on # Copyright (c) 2019 ... and even one # Copyright (c) 2013 .... The LICENSE file itself uses a different header Copyright (c) 2018, Nathan Goldbaum. It would be easy to uniformise those and keep up to date with a pre-commit hook such as insert-license from https://github.com/Lucas-C/pre-commit-hooks (I've been using it for a couple years on another project, never had any issues with it). I'm happy to do it, I would just like to know if that's desired. If not, should we simply take these headers out ?

opened by neutrinoceros 1
Refining exceptions

To keep track of this important comment from @ngolbaum

I'm not really a fan of UnytError but I also don't think that should block getting the __array_function__ stuff working. I wish this was raising UnitOperationError, or we somehow made UnitOperationError more general since I would guess that's the most common type of exception people would be catching for this sort of thing and it irks me a bit that they'd need to catch more than one kind of exception for different corner cases.

We probably need to more carefully look at how exceptions work in unyt in general since right now the situation is kind of a hodgepodge, although that might need a deprecation cycle since we'd be doing an API break.

For now I'm just going to merge this, but I'd like to have a discussion about how to handle exceptions, whether we need to do some sort of deprecation cycle, and how we can make it simpler to deal with exceptions raised by unyt before we do the final release.

Originally posted by @ngoldbaum in https://github.com/yt-project/unyt/issues/338#issuecomment-1369188611

opened by neutrinoceros 0
$Additional metallicity mass fraction conversions$

Additional metallicity mass fraction conversions

This PR introduces several other common values for the solar metallicity found in the literature as new metallicity units, e.g. "Zsun_angr", etc.

The default mass fraction in "Zsun" is still the one from Cloudy and has not been touched.

Explanatory documentation has been added.

opened by jzuhone 4
Type checking unyt ?

This is mostly a question to @ngoldbaum and @jzuhone: How would you guys feel about progressively adding type hints and a type checking stage to CI ? To be clear I'm thinking about doing it at least partially myself, because numpy is almost 100% "typed" now and IMO it would make sense to follow their lead. This is a long term goal as this could be quite an undertaking (though maybe not !), so I wanted to get your sentiment on it first.

opened by neutrinoceros 4

Releases(v2.9.3)

v2.9.3(Dec 7, 2022)
What's Changed

REL: manual backports for release 2.9.3 by @neutrinoceros in https://github.com/yt-project/unyt/pull/335

Full Changelog: https://github.com/yt-project/unyt/compare/v2.9.2...v2.9.3
Source code(tar.gz)
Source code(zip)
v2.9.2(Jul 20, 2022)

See pypi for downloads and release notes.
Source code(tar.gz)
Source code(zip)
v2.9.0(Jul 14, 2022)

See pypi for downloads and release notes.
Source code(tar.gz)
Source code(zip)
v2.8.0(Oct 5, 2020)

See pypi for downloads and release notes.
Source code(tar.gz)
Source code(zip)
v2.7.2(Jun 29, 2020)

See pypi for downloads and release notes.
Source code(tar.gz)
Source code(zip)
v2.7.1(Feb 18, 2020)

See pypi for downloads and release notes.
Source code(tar.gz)
Source code(zip)
v2.7.0(Feb 7, 2020)

See pypi for downloads and release notes.
Source code(tar.gz)
Source code(zip)
v2.6.0(Jan 22, 2020)

See pypi for downloads and release notes.
Source code(tar.gz)
Source code(zip)
v2.5.0(Jan 22, 2020)

See pypi for downloads and release notes.
Source code(tar.gz)
Source code(zip)
v2.4.1(Jan 10, 2020)

See pypi for downloads and release notes.
Source code(tar.gz)
Source code(zip)
v2.4.0(Oct 31, 2019)

See pypi for downloads and release notes.
Source code(tar.gz)
Source code(zip)
v2.3.1(Aug 21, 2019)

See pypi for downloads and release notes.
Source code(tar.gz)
Source code(zip)
v2.3.0(Aug 14, 2019)

See pypi for downloads and release notes.
Source code(tar.gz)
Source code(zip)
v2.2.2(Jul 3, 2019)

Source code(tar.gz)
Source code(zip)
v2.2.1(Jul 2, 2019)

Please see the release notes for details:

https://unyt.readthedocs.io/en/stable/history.html

See pypi for release tarballs:

https://pypi.org/project/unyt/#files
Source code(tar.gz)
Source code(zip)
v2.2.0(May 2, 2019)

Please see the release notes for details:

https://unyt.readthedocs.io/en/stable/history.html

See pypi for release tarballs:

https://pypi.org/project/unyt/#files
Source code(tar.gz)
Source code(zip)
v1.0.7(Aug 13, 2018)

Archived release for JOSS paper.
Source code(tar.gz)
Source code(zip)

Owner

The yt project

A toolkit for analysis and visualization of volumetric data

GitHub Repository https://unyt.readthedocs.io

Important dataframe statistics with a single command

quick_eda Receiving dataframe statistics with one command Project description A python package for Data Scientists, Students, ML Engineers and anyone

2 Dec 19, 2021

Building house price data pipelines with Apache Beam and Spark on GCP

This project contains the process from building a web crawler to extract the raw data of house price to create ETL pipelines using Google Could Platform services.

1 Nov 22, 2021

Repositori untuk menyimpan material Long Course STMKGxHMGI tentang Geophysical Python for Seismic Data Analysis

Long Course "Geophysical Python for Seismic Data Analysis" Instruktur: Dr.rer.nat. Wiwit Suryanto, M.Si Dipersiapkan oleh: Anang Sahroni Waktu: Sesi 1

0 Dec 04, 2021

Python library for creating data pipelines with chain functional programming

PyFunctional Features PyFunctional makes creating data pipelines easy by using chained functional operators. Here are a few examples of what it can do

2.1k Jan 05, 2023

Titanic data analysis for python

Titanic-data-analysis This Repo is an analysis on Titanic_mod.csv This csv file contains some assumed data of the Titanic ship after sinking This full

1 Dec 26, 2021

Randomisation-based inference in Python based on data resampling and permutation.

67 Dec 27, 2022

Vectorizers for a range of different data types

69 Dec 29, 2022

A CLI tool to reduce the friction between data scientists by reducing git conflicts removing notebook metadata and gracefully resolving git conflicts.

databooks is a package for reducing the friction data scientists while using Jupyter notebooks, by reducing the number of git conflicts between different notebooks and assisting in the resolution of

86 Dec 25, 2022

NumPy and Pandas interface to Big Data

Blaze translates a subset of modified NumPy and Pandas-like syntax to databases and other computing systems. Blaze allows Python users a familiar inte

3.1k Jan 05, 2023

Statistical package in Python based on Pandas

Pingouin is an open-source statistical package written in Python 3 and based mostly on Pandas and NumPy. Some of its main features are listed below. F

1.2k Dec 31, 2022

Includes all files needed to satisfy hw02 requirements

HW 02 Data Sets Mean Scale Score for Asian and Hispanic Students, Grades 3 - 8 This dataset provides insights into the New York City education system

7 Oct 28, 2021

Tools for working with MARC data in Catalogue Bridge.

catbridge_tools Tools for working with MARC data in Catalogue Bridge. Borrows heavily from PyMarc

1 Nov 11, 2021

BasstatPL is a package for performing different tabulations and calculations for descriptive statistics.

BasstatPL is a package for performing different tabulations and calculations for descriptive statistics. It provides: Frequency table constr

1 Oct 31, 2021

VevestaX is an open source Python package for ML Engineers and Data Scientists.

VevestaX Track failed and successful experiments as well as features. VevestaX is an open source Python package for ML Engineers and Data Scientists.

24 Dec 14, 2022

Udacity-api-reporting-pipeline - Udacity api reporting pipeline

udacity-api-reporting-pipeline In this exercise, you'll use portions of each of

1 Feb 15, 2022

Python implementation of Principal Component Analysis

Principal Component Analysis Principal Component Analysis (PCA) is a dimension-reduction algorithm. The idea is to use the singular value decompositio

1 Nov 06, 2021

ASOUL直播间弹幕抓取&&数据分析

ASOUL直播间弹幕抓取&&数据分析（更新中）这些文件用于爬取ASOUL直播间的弹幕（其他直播间也可以）和其他信息，以及简单的数据分析生成。

159 Dec 10, 2022

HyperSpy is an open source Python library for the interactive analysis of multidimensional datasets

HyperSpy is an open source Python library for the interactive analysis of multidimensional datasets that can be described as multidimensional arrays o

411 Dec 27, 2022

Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

Data Scientist Learning Plan Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

27 Nov 01, 2022

A tool to compare differences between dataframes and create a differences report in Excel

similarpanda A module to check for differences between pandas Dataframes, and generate a report in Excel format. This is helpful in a workplace settin

9 Sep 15, 2022

Handle, manipulate, and convert data with units in Python

Related tags

Overview

unyt

Code of Conduct

License

Citation

Comments

Some remaining known issues:

standalone package?

Description

What I Did

Description

What I Did

Releases(v2.9.3)

v2.9.3(Dec 7, 2022)

What's Changed

v2.9.2(Jul 20, 2022)

v2.9.0(Jul 14, 2022)

v2.8.0(Oct 5, 2020)

v2.7.2(Jun 29, 2020)

v2.7.1(Feb 18, 2020)

v2.7.0(Feb 7, 2020)

v2.6.0(Jan 22, 2020)

v2.5.0(Jan 22, 2020)

v2.4.1(Jan 10, 2020)

v2.4.0(Oct 31, 2019)

v2.3.1(Aug 21, 2019)

v2.3.0(Aug 14, 2019)

v2.2.2(Jul 3, 2019)

v2.2.1(Jul 2, 2019)

v2.2.0(May 2, 2019)

v1.0.7(Aug 13, 2018)

Owner

The yt project

Important dataframe statistics with a single command

Building house price data pipelines with Apache Beam and Spark on GCP

Repositori untuk menyimpan material Long Course STMKGxHMGI tentang Geophysical Python for Seismic Data Analysis

Python library for creating data pipelines with chain functional programming

Titanic data analysis for python

Randomisation-based inference in Python based on data resampling and permutation.

Vectorizers for a range of different data types

A CLI tool to reduce the friction between data scientists by reducing git conflicts removing notebook metadata and gracefully resolving git conflicts.

NumPy and Pandas interface to Big Data

Statistical package in Python based on Pandas

Includes all files needed to satisfy hw02 requirements

Tools for working with MARC data in Catalogue Bridge.

BasstatPL is a package for performing different tabulations and calculations for descriptive statistics.

VevestaX is an open source Python package for ML Engineers and Data Scientists.

Udacity-api-reporting-pipeline - Udacity api reporting pipeline

Python implementation of Principal Component Analysis

ASOUL直播间弹幕抓取&&数据分析

HyperSpy is an open source Python library for the interactive analysis of multidimensional datasets

Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

A tool to compare differences between dataframes and create a differences report in Excel