NumPy and Pandas interface to Big Data

Overview

https://raw.github.com/blaze/blaze/master/docs/source/svg/blaze_med.png

Build Status Coverage Status Join the chat at https://gitter.im/blaze/blaze

Blaze translates a subset of modified NumPy and Pandas-like syntax to databases and other computing systems. Blaze allows Python users a familiar interface to query data living in other data storage systems.

Example

We point blaze to a simple dataset in a foreign database (PostgreSQL). Instantly we see results as we would see them in a Pandas DataFrame.

>>> import blaze as bz
>>> iris = bz.Data('postgresql://localhost::iris')
>>> iris
    sepal_length  sepal_width  petal_length  petal_width      species
0            5.1          3.5           1.4          0.2  Iris-setosa
1            4.9          3.0           1.4          0.2  Iris-setosa
2            4.7          3.2           1.3          0.2  Iris-setosa
3            4.6          3.1           1.5          0.2  Iris-setosa

These results occur immediately. Blaze does not pull data out of Postgres, instead it translates your Python commands into SQL (or others.)

>>> iris.species.distinct()
           species
0      Iris-setosa
1  Iris-versicolor
2   Iris-virginica

>>> bz.by(iris.species, smallest=iris.petal_length.min(),
...                      largest=iris.petal_length.max())
           species  largest  smallest
0      Iris-setosa      1.9       1.0
1  Iris-versicolor      5.1       3.0
2   Iris-virginica      6.9       4.5

This same example would have worked with a wide range of databases, on-disk text or binary files, or remote data.

What Blaze is not

Blaze does not perform computation. It relies on other systems like SQL, Spark, or Pandas to do the actual number crunching. It is not a replacement for any of these systems.

Blaze does not implement the entire NumPy/Pandas API, nor does it interact with libraries intended to work with NumPy/Pandas. This is the cost of using more and larger data systems.

Blaze is a good way to inspect data living in a large database, perform a small but powerful set of operations to query that data, and then transform your results into a format suitable for your favorite Python tools.

In the Abstract

Blaze separates the computations that we want to perform:

>>> accounts = Symbol('accounts', 'var * {id: int, name: string, amount: int}')

>>> deadbeats = accounts[accounts.amount < 0].name

From the representation of data

>>> L = [[1, 'Alice',   100],
...      [2, 'Bob',    -200],
...      [3, 'Charlie', 300],
...      [4, 'Denis',   400],
...      [5, 'Edith',  -500]]

Blaze enables users to solve data-oriented problems

>>> list(compute(deadbeats, L))
['Bob', 'Edith']

But the separation of expression from data allows us to switch between different backends.

Here we solve the same problem using Pandas instead of Pure Python.

>>> df = DataFrame(L, columns=['id', 'name', 'amount'])

>>> compute(deadbeats, df)
1      Bob
4    Edith
Name: name, dtype: object

Blaze doesn't compute these results, Blaze intelligently drives other projects to compute them instead. These projects range from simple Pure Python iterators to powerful distributed Spark clusters. Blaze is built to be extended to new systems as they evolve.

Getting Started

Blaze is available on conda or on PyPI

conda install blaze
pip install blaze

Development builds are accessible

conda install blaze -c blaze
pip install http://github.com/blaze/blaze --upgrade

You may want to view the docs, the tutorial, some blogposts, or the mailing list archives.

Development setup

The quickest way to install all Blaze dependencies with conda is as follows

conda install blaze spark -c blaze -c anaconda-cluster -y
conda remove odo blaze blaze-core datashape -y

After running these commands, clone odo, blaze, and datashape from GitHub directly. These three projects release together. Run python setup.py develop to make development installations of each.

License

Released under BSD license. See LICENSE.txt for details.

Blaze development is sponsored by Continuum Analytics.

Comments
  • Chunk CSVs using pandas.read_csv

    Chunk CSVs using pandas.read_csv

    closes #587

    • [x] performance diffs
    • [x] newline robustness (e.g., appending to the end of a file that has an existing newline)
    • [x] bypass DataDescriptor.__iter__
    enhancement 
    opened by cpcloud 69
  • ENH: Adds datetime arithmetic with timedeltas

    ENH: Adds datetime arithmetic with timedeltas

    I wanted to be able to add timedeltas to interactive symbols so I made it so.

    In [5]: ds.quandl.vix.knowledge_date
    Out[5]: 
       knowledge_date
    0      1990-01-02
    1      1990-01-03
    2      1990-01-04
    3      1990-01-05
    4      1990-01-08
    5      1990-01-09
    6      1990-01-10
    7      1990-01-11
    8      1990-01-12
    9      1990-01-15
    ...
    
    In [6]: ds.quandl.vix.knowledge_date + datetime.timedelta(days=1)
    Out[6]: 
       knowledge_date
    0      1990-01-03
    1      1990-01-04
    2      1990-01-05
    3      1990-01-06
    4      1990-01-09
    5      1990-01-10
    6      1990-01-11
    7      1990-01-12
    8      1990-01-13
    9      1990-01-16
    ...
    

    datashape already supported this I just needed to add a _(r)?add and _(r)?sub to isdatelike

    enhancement 
    opened by llllllllll 38
  • Impala Backend

    Impala Backend

    Impala is a SQL-on-HDFS solution with Python support.

    Apparently it's fast http://blog.cloudera.com/blog/2014/05/new-sql-choices-in-the-apache-hadoop-ecosystem-why-impala-continues-to-lead/ .

    It also supports Numba via impyla http://blog.cloudera.com/blog/2014/04/a-new-python-client-for-impala/.

    Maybe we should support it as a backend. One approach would be to make a SQLAlchemy dialect for it and then depend on the SQLAlchemy compute layer.

    opened by mrocklin 35
  • str arithmetic

    str arithmetic

    Closes #1045

    I wanted to try to learn how some of the symbolic stuff worked so I figured this was an easy one to tackle.

    In [1]: import blaze as bz
    
    In [2]: ds = bz.Data([('%s', 'a'), ('%s', 'b')], fields=['a', 'b'])
    
    In [3]: ds.a + ds.b
    Out[3]: 
    
    0  %sa
    1  %sb
    
    In [4]: ds.a * 2
    Out[4]: 
          a
    0  %s%s
    1  %s%s
    
    In [5]: ds.a % ds.b
    Out[5]: 
    
    0  a
    1  b
    
    

    Let me know if I this needs more tests or the tests are in the wrong place.

    enhancement expression core new expression strings 
    opened by llllllllll 32
  • Don't run spark tests if pyspark isn't available

    Don't run spark tests if pyspark isn't available

    closes #382

    • [x] py.test enforces it's own ideas about testing structure by throwing an error during the collection phase of test running. Is this a deal-breaker?
    • [x] adapt unittest.skip decorators to equivalent py.test
    opened by cpcloud 31
  • ENH: use object identity for isidentical

    ENH: use object identity for isidentical

    This changes isidentical into an object identity check. When profiling blaze, I found that this function is called a lot. This actually was in the top 5 tottime calls even with a very small percall count because of how often it was called.

    This change makes it so that all expression objects are weakly cached by their arguments. This means that any two objects that are identical will have the same object id. This will save memory by reducing copies of the same object and improve performance because is checks are extremely fast.

    enhancement expression core 
    opened by llllllllll 28
  • String matching operation

    String matching operation

    Many backends support string matching.

    • Python uses the regex module re
    • Pandas also uses re?
    • SQL uses LIKE %text%
    • Mongo I'm sure has something

    This probably warrants a Blaze operation

    enhancement 
    opened by mrocklin 28
  • MAINT: Fix all tests

    MAINT: Fix all tests

    Hello,

    So I really like this project but it seems to have been quite quiet lately.

    I spent some time today un-pinning all the packages and getting the tests to pass again. One caveat is that odo has a problem. For this, I've just odo from a repository I built. Obviously, this is not ideal.

    Let me know if this is helpful. This is an amazing set of tools.

    opened by thequackdaddy 26
  • Like

    Like

    Text matching via globstrings

    Example

    In [1]: from blaze import *
    
    In [2]: L = [[1, 'Alice',   100],
         [2, 'Bob',    -200],
         [3, 'Charlie', 300],
         [4, 'Dennis',  400],
         [5, 'Edith',  -500]]
    
    In [3]: t = Table(L, columns=['id', 'name', 'amount'])
    
    In [4]: t.like(name='Al*')
    Out[4]: 
       id   name  amount
    0   1  Alice     100
    
    In [5]: t.like(name='*e*')
    Out[5]: 
       id     name  amount
    0   1    Alice     100
    1   3  Charlie     300
    2   4   Dennis     400
    
    
    opened by mrocklin 25
  • "ValueError: Not understood" for Mongo after upgrade

    I updated blaze today (blaze {0.6.6_212_g4b7c730 -> 0.8.0}) and I got a new error.

    import blaze as bz
    apps = bz.Data("mongodb://localhost:27017/API::apps", dshape = 'var * {name: string}')
    

    That used to run fine but now I get this traceback:

    Traceback (most recent call last):
    
      File "<ipython-input-1-81a88c5ec220>", line 1, in <module>
        runfile('/Users/Dalton/Desktop/errorTest.py', wdir='/Users/Dalton/Desktop')
    
      File "/Users/Dalton/anaconda/lib/python2.7/site-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 682, in runfile
        execfile(filename, namespace)
    
      File "/Users/Dalton/anaconda/lib/python2.7/site-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 78, in execfile
        builtins.execfile(filename, *where)
    
      File "/Users/Dalton/Desktop/errorTest.py", line 10, in <module>
        apps = bz.Data("mongodb://localhost:27017/API::apps", dshape = 'var * {name: string}')
    
      File "/Users/Dalton/anaconda/lib/python2.7/site-packages/blaze/interactive.py", line 100, in Data
        result = result[field]
    
      File "/Users/Dalton/anaconda/lib/python2.7/site-packages/blaze/expr/expressions.py", line 108, in __getitem__
        raise ValueError("Not understood %s[%s]" % (self, key))
    
    ValueError: Not understood _1[apps]
    

    Let me know if there's anything I can do to be helpful.

    bug 
    opened by TDaltonC 24
  • random error with postgresql data source

    random error with postgresql data source

    I'm new to blaze, so pardon my ignorance here. I have no idea if I have to report this to odo/datashape or something else.

    I'm using blaze.Data on a postgresql table ("postgresql://"). When I try to get some data off the table with list(head(10)); in 50% of the cases (without any change on the db), I get this error:

      File "/usr/local/lib/python2.7/dist-packages/odo/into.py", line 122, in curried_into
        return into(o, other, **merge(kwargs2, kwargs1))
      File "/usr/local/lib/python2.7/dist-packages/multipledispatch/dispatcher.py", line 164, in __call__
        return func(*args, **kwargs)
      File "/usr/local/lib/python2.7/dist-packages/blaze/interactive.py", line 309, in into
        return into(a, result, **kwargs)
      File "/usr/local/lib/python2.7/dist-packages/multipledispatch/dispatcher.py", line 164, in __call__
        return func(*args, **kwargs)
      File "/usr/local/lib/python2.7/dist-packages/odo/into.py", line 25, in into_type
        return convert(a, b, dshape=dshape, **kwargs)
      File "/usr/local/lib/python2.7/dist-packages/odo/core.py", line 30, in __call__
        return _transform(self.graph, *args, **kwargs)
      File "/usr/local/lib/python2.7/dist-packages/odo/core.py", line 46, in _transform
        x = f(x, excluded_edges=excluded_edges, **kwargs)
      File "/usr/local/lib/python2.7/dist-packages/odo/convert.py", line 21, in dataframe_to_numpy
        dtype = dshape_to_numpy(dshape or discover(df))
      File "/usr/local/lib/python2.7/dist-packages/odo/numpy_dtype.py", line 55, in dshape_to_numpy
        for name, typ in zip(ds.names, ds.types)])
      File "/usr/local/lib/python2.7/dist-packages/odo/numpy_dtype.py", line 26, in unit_to_dtype
        return unit_to_dtype(str(ds).replace('int', 'float').replace('?', ''))
      File "/usr/local/lib/python2.7/dist-packages/odo/numpy_dtype.py", line 22, in unit_to_dtype
        ds = dshape(ds)
      File "/usr/local/lib/python2.7/dist-packages/datashape/util.py", line 49, in dshape
        ds = parser.parse(o, type_symbol_table.sym)
      File "/usr/local/lib/python2.7/dist-packages/datashape/parser.py", line 575, in parse
        dsp.raise_error('Invalid datashape')
      File "/usr/local/lib/python2.7/dist-packages/datashape/parser.py", line 57, in raise_error
        self.ds_str, errmsg)
    datashape.error.DataShapeSyntaxError: 
    
      File <nofile>, line 1
        float16
        ^
    
    DataShapeSyntaxError: Invalid datashape
    

    I actually wonder why this error is not reproducible. Looks like odo is randomly choosing a different conversion/coercion route? In fact, it's so random I cannot even determine whether there's a specific column type that could cause the issue.

    bug 
    opened by wavexx 23
  • BUG:TypeError: Cannot interpret 'CategoricalDtype(categories=['no', 'yes'], ordered=False)' as a data type

    BUG:TypeError: Cannot interpret 'CategoricalDtype(categories=['no', 'yes'], ordered=False)' as a data type

    I just upgraded all my python libraries, and now my previous code is started to fail. I'm using blaze with pandas. Here is my method code

    blaze.data(res)
    

    res contains below data

                 col1         age  ...                                           col31  year
    0            yes    55-64  ...                                             NaN  2011
    1             no    25-34  ...                                             NaN  2011
    2             no    55-64  ...                                             NaN  2011
    

    I'm using below dependencies

    - python=3.6.11=h4d41432_2_cpython
    - blaze=0.11.3=py36h4e06776_0
    - odo=0.5.1=py36h90ed295_0
    - pandas=1.0.5=py36h0573a6f_0
    - numpy=1.19.1=py36h3849536_2
    

    I'm getting the following error

      File "cytoolz/functoolz.pyx", line 667, in cytoolz.functoolz.pipe
      File "cytoolz/functoolz.pyx", line 642, in cytoolz.functoolz.c_pipe
      File "/home/ubuntu/miniconda/envs/my_env/lib/python3.6/site-packages/blaze/interactive.py", line 153, in data
        dshape = discover(data_source)
      File "/home/ubuntu/miniconda/envs/my_env/lib/python3.6/site-packages/multipledispatch/dispatcher.py", line 278, in __call__
        return func(*args, **kwargs)
      File "/home/ubuntu/miniconda/envs/my_env/lib/python3.6/site-packages/odo/backends/pandas.py", line 31, in discover_dataframe
        for k in df.columns])
      File "/home/ubuntu/miniconda/envs/my_env/lib/python3.6/site-packages/odo/backends/pandas.py", line 31, in <listcomp>
        for k in df.columns])
      File "/home/ubuntu/miniconda/envs/my_env/lib/python3.6/site-packages/odo/backends/pandas.py", line 23, in dshape_from_pandas
        dshape = datashape.CType.from_numpy_dtype(col.dtype)
      File "/home/ubuntu/miniconda/envs/my_env/lib/python3.6/site-packages/datashape/coretypes.py", line 779, in from_numpy_dtype
        if np.issubdtype(dt, np.datetime64):
      File "/home/ubuntu/miniconda/envs/my_env/lib/python3.6/site-packages/numpy/core/numerictypes.py", line 388, in issubdtype
        arg1 = dtype(arg1).type
    TypeError: Cannot interpret 'CategoricalDtype(categories=['no', 'yes'], ordered=False)' as a data type
    

    It's failing to parse second column values.

    opened by sureshchepuri 0
  • Deprecation warning due to invalid escape sequences in Python 3.8

    Deprecation warning due to invalid escape sequences in Python 3.8

    find . -iname '*.py'  | xargs -P 4 -I{} python -Wall -m py_compile {} 
    
    ./docs/gh-pages.py:124: DeprecationWarning: invalid escape sequence \#
      branch = re.match('\# On branch (.*)$', status).group(1)
    ./versioneer.py:467: DeprecationWarning: invalid escape sequence \s
      LONG_VERSION_PY['git'] = '''
    ./blaze/expr/collections.py:102: SyntaxWarning: "is" with a literal. Did you mean "=="?
      if self._key is () or self._key is None:
    
    opened by tirkarthi 0
  • Importing ABC from collections was removed in Python 3.9

    Importing ABC from collections was removed in Python 3.9

    $ rg "from collections.*(Awaitable|Coroutine|AsyncIterable|AsyncIterator|AsyncGenerator|Hashable|Iterable|Iterator|Generator|Reversible|Sized|Container|Callable|Collection|MutableSet|Mapping|MutableMapping|MappingView|KeysView|ItemsView|ValuesView|Sequence|MutableSequence|ByteString)\b"
    
    
    blaze/utils.py
    4:from collections import Iterator
    
    blaze/compute/pandas.py
    24:from collections import defaultdict, Iterable
    
    blaze/compute/chunks.py
    5:from collections import Iterator, Iterable
    
    blaze/compute/sql.py
    20:from collections import Iterable
    
    blaze/compute/csv.py
    8:from collections import Iterator, Iterable
    
    blaze/compute/core.py
    3:from collections import defaultdict, Iterator, Mapping
    
    blaze/compute/bcolz.py
    26:from collections import Iterator, Iterable
    
    blaze/compute/python.py
    14:from collections import Iterable, Mapping
    24:from collections import Iterator
    
    blaze/compute/json.py
    6:from collections import Iterator
    
    blaze/compute/numpy.py
    3:from collections import Iterable
    
    blaze/expr/arrays.py
    3:from collections import Iterable
    
    blaze/expr/literal.py
    3:from collections import Iterator, Mapping
    
    blaze/expr/core.py
    3:from collections import Mapping, OrderedDict
    
    blaze/expr/expressions.py
    3:from collections import Mapping
    
    blaze/tests/test_cached.py
    4:from collections import Iterator
    
    blaze/compute/tests/test_csv_compute.py
    11:from collections import Iterator
    
    blaze/compute/tests/test_python_compute.py
    2:from collections import Mapping
    11:from collections import Iterator, Iterable
    
    blaze/server/serialization/json_dumps_trusted.py
    3:from collections import Callable
    
    opened by tirkarthi 0
  • Unable to install pyhive from Anaconda Prompt

    Unable to install pyhive from Anaconda Prompt

    Dear All,

    I'm new to Anaconda/Python environment and wanted to connect to remote HDFS/Hive server. After googling found that we should have pyhive installed on Anaconda. Tried below execution and stuck with following error

    (C:\ProgramData\Anaconda3) C:\WINDOWS\system32>conda install -c blaze pyhive Fetching package metadata .................

    PackageNotFoundError: Package missing in current win-64 channels:

    • pyhive

    Close matches found; did you mean one of these?

    pyhive: pyfive, pydrive
    

    image

    opened by scbhari 1
Releases(0.11.0)
  • 0.11.0(Jul 19, 2016)

    Release 0.11.0

    :Release: 0.11.0

    New Expressions

    
    * Many new string utility expressions were added that follow the Pandas
      vectorized string methods API closely
      `<http://pandas.pydata.org/pandas-docs/stable/text.html#text-string-methods>`_.
      These are gathered under the ``.str`` sub-namespace, allowing the user to
      say::
    
        t.col.str.lower()
    
      to compute a new column with the string contents lowercased.
    
    * Likewise, many new datetime utility expressions were added to the ``.dt``
      sub-namespace, following Pandas vectorized datetime methods API
      `<http://pandas.pydata.org/pandas-docs/stable/timeseries.html>`_.
    
    Improved Expressions
    

    None

    New Backends

    
    None
    
    Improved Backends
    

    None

    Experimental Features

    
    None
    
    API Changes
    
    • The following functions were deprecated in favor of equivalent functions without the str_ name prefix:

      ====================================== =================================== deprecated function replacement function ====================================== =================================== :func:~blaze.expr.strings.str_len :func:~blaze.expr.strings.len :func:~blaze.expr.strings.str_upper :func:~blaze.expr.strings.upper :func:~blaze.expr.strings.str_lower :func:~blaze.expr.strings.lower :func:~blaze.expr.strings.str_cat :func:~blaze.expr.strings.cat ====================================== ===================================

    Bug Fixes

    
    None
    
    Miscellaneous
    

    None

    Source code(tar.gz)
    Source code(zip)
  • 0.10.1(May 6, 2016)

    New Expressions

    None

    Improved Expressions

    None

    New Backends

    None

    Improved Backends

    • Blaze server's /add endpoint was enhanced to take a more general payload (:issue:1481).
    • Adds consistency check to blaze server at startup for YAML file and dynamic addition options (:issue:1491).

    Experimental Features

    • The str_cat() expression was added, mirroring Pandas' Series.str.cat() API (:issue:1496).

    API Changes

    None

    Bug Fixes

    • The content type specification parsing was improved to accept more elaborate headers (:issue:1490).
    • The discoverablility consistency check is done before a dataset is dynamically added to the server (:issue:1498).

    Miscellaneous

    None

    Source code(tar.gz)
    Source code(zip)
  • 0.10.0(Apr 25, 2016)

    Release 0.10.0

    New Expressions

    • The sample expression allows random sampling of rows to facilitate interactive data exploration (:issue:1410). It is implemented for the Pandas, Dask, SQL, and Python backends.

    • Adds :func:~blaze.expr.expressions.coalesce expression which takes two arguments and returns the first non missing value. If both are missing then the result is missing. For example: coalesce(1, 2) == 1, coalesce(None, 1) == 1, and coalesce(None, None) == None. This is inspired by the sql function of the same name (:issue:1409).

    • Adds :func:~blaze.expr.expressions.cast expression to reinterpret an expression's dshape. This is based on C++ reinterpret_cast, or just normal C casts. For example: symbol('s', 'int32').cast('uint32').dshape == dshape('uint32'). This expression has no affect on the computation, it merely tells blaze to treat the result of the expression as the new dshape. The compute definition for cast is simply:

      @dispatch(Cast, object) def compute_up(expr, data, **kwargs): return data

      (:issue:1409).

    Improved Expressions

    • The test suite was expanded to validate proper expression input error handling (:issue:1420).
    • The :func:~blaze.expr.datetime.truncate function was refactored to raise an exception for incorrect inputs, rather than using assertions (:issue:1443).
    • The docstring for :class:~blaze.expr.collections.Merge was expanded to include examples using :class:~blaze.expr.expressions.Label to control the ordering of the columns in the result (:issue:1447).

    Improved Backends

    • Adds :class:~blaze.expr.math.greatest and :class:~blaze.expr.math.least support to the sql backend (:issue:1428).
    • Generalize Field to support :class:collections.Mapping object (:issue:1467).

    Experimental Features

    • The :class:~blaze.expr.strings.str_upper and :class:~blaze.expr.strings.str_lower expressions were added for the Pandas and SQL backends (:issue:1462). These are marked experimental since their names are subject to change. More string methods will be added in coming versions.

    API Changes

    • The :class:~blaze.expr.strings.strlen expression was deprecated in favor of :class:~blaze.expr.strings.str_len (:issue:1462).
    • Long deprecated :func:~blaze.table.Table and :func:~blaze.table.TableSymbol were removed (:issue:1441). The TableSymbol tests in test_table.py were migrated to test_symbol.py.
    • :func:~blaze.interactive.Data has been deprecated in favor of :func:~blaze.interactive.data. :class:~blaze.interactive.InteractiveSymbol has been deprecated and temporarily replaced by :class:~blaze.interactive._Data. These deprecations will be in place for the 0.10 release. In the 0.11 release, :class:~blaze.interactive._Data will be renamed to Data, calls to :func:~blaze.interactive.data will create Data instances, and :class:~blaze.interactive.InteractiveSymbol will be removed (:issue:1431 and :issue:1421).
    • :func:~blaze.compute.core.compute has a new keyword argument return_type which defaults to 'native' (:issue:1401, :issue:1411, :issue:1417), which preserves existing behavior. In the 0.11 release, return_type will be changed to default to 'core', which will odo non-core backends into core backends as the final step in a call to compute.
    • Due to API instability and on the recommendation of DyND developers, we removed the DyND dependency temporarily (:issue:1379). When DyND achieves its 1.0 release, DyND will be re-incorporated into Blaze. The existing DyND support in Blaze was rudimentary and based on an egregiously outdated and buggy version of DyND. We are aware of no actual use of DyND via Blaze in practice.
    • The :class:~blaze.expr.expressions.Expr __repr__ method's triggering of implicit computation has been deprecated. Using this aspect of Blaze will trigger a DeprecationWarning in version 0.10, and this behavior will be replaced by a standard (boring) __repr__ implementation in version 0.11. Users can explicitly trigger a computation to see a quick view of the results of an interactive expression by means of the :func:~blaze.expr.expressions.Expr.peek method. By setting the :mod:~blaze.interactive.use_new_repr flag to True, users can use the new (boring) __repr__ implementation in version 0.10 (:issue:1414 and :issue:1395).

    Bug Fixes

    • The :class:~blaze.expr.strings.str_upper and :class:~blaze.expr.strings.str_lower schemas were fixed to pass through their underlying _child's schema to ensure option types are handled correctly (:issue:1472).
    • Fixed a bug with Pandas' implementation of compute_up on :class:~blaze.expr.broadcast.Broadcast expressions (:issue:1442). Added tests for Pandas frame and series and dask dataframes on Broadcast expressions.
    • Fixed a bug with :class:~blaze.expr.collections.Sample on SQL backends (:issue:1452 :issue:1423 :issue:1424 :issue:1425).
    • Fixed several bugs relating to adding new datasets to blaze server instances (:issue:1459). Blaze server will make a best effort to ensure that the added dataset is valid and loadable; if not, it will return appropriate HTTP status codes.

    Miscellaneous

    • Adds logging to server compute endpoint. Includes expression being computed and total time to compute. (:issue:1436)
    • Merged the core and all conda recipes (:issue:1451). This simplifies the build process and makes it consistent with the single blaze package provided by the Anaconda distribution.
    • Adds a --yaml-dir option to blaze-server to indicate the server should load path-based yaml resources relative to the yaml file's d
    Source code(tar.gz)
    Source code(zip)
  • 0.8.2(Jul 9, 2015)

  • 0.8.0(Apr 26, 2015)

    Major release

    features

    • improved sql support
    • IsIn expression with pandas semantics
    • sql backend has multicolumn sort
    • group by dates in sql
    • sql backend doesn't generate nested queries when combining transforms, selections and By expressions
    • spark dataframes now join in sparksql land rather than joining as RDDs
    • mongo databases are now first class citizens
    • support for pymongo 3.0
    • start a dask backend

    bug fixes

    • char_length for sql string length rather than length, which counts bytes not characters
    • deterministic ordering for columns in a Merge expression
    • put a lock around numba ufunc generation
    • Fix variability functions on sql databases #1051
    Source code(tar.gz)
    Source code(zip)
  • 0.7.1(Jan 22, 2015)

    Version 0.7.1

    • Better array support to align numpy with dask (dot, transpose, slicing)
    • Support __array__, __iter__, __int__, ... protocols
    • Numba integration with numpy layer
    • Server works on raw datasets, not dicts. Also, support dicts as datasets.
    • SQL
      • Avoid repeated reflection
      • Support computation on metadata instances. Support schemas.
    • CachedDataset
    • pandas.HDFStore support
    • Support NumPy promotion rules
    Source code(tar.gz)
    Source code(zip)
  • 0.6.5(Oct 7, 2014)

  • 0.6.4(Oct 4, 2014)

    Release 0.6.4

    • Back CSV with pandas.read_csv. Better performance and more robust unicode support but less robust missing value support (some regressions) #597
    • Much improved SQL support #626 #650 #652 #662
    • Server supports remote execution of computations, not just indexing #631
    • Better PyTables and datetime support #608 #639
    • Support SparkSQL #592
    Source code(tar.gz)
    Source code(zip)
  • 0.6.3(Aug 31, 2014)

    Release 0.6.3

    • by takes only two arguments, the grouper and apply. the child is inferred using common_subexpression #540
    • Better handling of pandas Series object #536
    • Better printing of empty results in interactive mode #542
    • Regex dispatched resource function bound to Table #541, e.g. Table('/path/to/file.csv')
    Source code(tar.gz)
    Source code(zip)
  • 0.6.2(Aug 28, 2014)

    Release 0.6.2

    • Efficient CSV to SQL migration using native tools #454
    • Dispatched drop and create_index functions #495
    • DPlyr interface at blaze.api.dplyr. #484
    • Various bits borrowed from that interface
      • transform function adopted to main namespace
      • Summary object for named reductions
      • Keyword syntax in by and merge e.g. by(t, t.col, label=t.col2.max(), label2=t.col2.min())
    • New Computation Server #527
    • Better PyTables support #487 #496 #526
    Source code(tar.gz)
    Source code(zip)
RockNext is an Open Source extending ERPNext built on top of Frappe bringing enterprise ready utilization.

RockNext is an Open Source extending ERPNext built on top of Frappe bringing enterprise ready utilization.

Matheus Breguêz 13 Oct 12, 2022
Render tokei's output to interactive sunburst chart.

Render tokei's output to interactive sunburst chart.

134 Dec 15, 2022
GUI for visualization and interactive editing of SMPL-family body models ie. SMPL, SMPL-X, MANO, FLAME.

Body Model Visualizer Introduction This is a simple Open3D-based GUI for SMPL-family body models. This GUI lets you play with the shape, expression, a

Muhammed Kocabas 207 Jan 01, 2023
Histogramming for analysis powered by boost-histogram

Hist Hist is an analyst-friendly front-end for boost-histogram, designed for Python 3.7+ (3.6 users get version 2.4). See what's new. Installation You

Scikit-HEP Project 97 Dec 25, 2022
Create animated and pretty Pandas Dataframe or Pandas Series

Rich DataFrame Create animated and pretty Pandas Dataframe or Pandas Series, as shown below: Installation pip install rich-dataframe Usage Minimal exa

Khuyen Tran 92 Dec 26, 2022
Automate the case review on legal case documents and find the most critical cases using network analysis

Automation on Legal Court Cases Review This project is to automate the case review on legal case documents and find the most critical cases using netw

Yi Yin 7 Dec 28, 2022
Pebble is a stat's visualization tool, this will provide a skeleton to develop a monitoring tool.

Pebble is a stat's visualization tool, this will provide a skeleton to develop a monitoring tool.

Aravind Kumar G 2 Nov 17, 2021
Open-source demos hosted on Dash Gallery

Dash Sample Apps This repository hosts the code for over 100 open-source Dash apps written in Python or R. They can serve as a starting point for your

Plotly 2.7k Jan 07, 2023
A Python package for caclulations and visualizations in geological sciences.

geo_calcs A Python package for caclulations and visualizations in geological sciences. Free software: MIT license Documentation: https://geo-calcs.rea

Drew Heasman 1 Jul 12, 2022
📊📈 Serves up Pandas dataframes via the Django REST Framework for use in client-side (i.e. d3.js) visualizations and offline analysis (e.g. Excel)

📊📈 Serves up Pandas dataframes via the Django REST Framework for use in client-side (i.e. d3.js) visualizations and offline analysis (e.g. Excel)

wq framework 1.2k Jan 01, 2023
Visualization ideas for data science

Nuance I use Nuance to curate varied visualization thoughts during my data scientist career. It is not yet a package but a list of small ideas. Welcom

Li Jiangchun 16 Nov 03, 2022
Create charts with Python in a very similar way to creating charts using Chart.js

Create charts with Python in a very similar way to creating charts using Chart.js. The charts created are fully configurable, interactive and modular and are displayed directly in the output of the t

Nicolas H 68 Dec 08, 2022
Regress.me is an easy to use data visualization tool powered by Dash/Plotly.

Regress.me Regress.me is an easy to use data visualization tool powered by Dash/Plotly. Regress.me.-.Google.Chrome.2022-05-10.15-58-59.mp4 Get Started

Amar 14 Aug 14, 2022
Python scripts to manage Chia plots and drive space, providing full reports. Also monitors the number of chia coins you have.

Chia Plot, Drive Manager & Coin Monitor (V0.5 - April 20th, 2021) Multi Server Chia Plot and Drive Management Solution Be sure to ⭐ my repo so you can

338 Nov 25, 2022
Collection of data visualizing projects through Tableau, Data Wrapper, and Power BI

Data-Visualization-Projects Collection of data visualizing projects through Tableau, Data Wrapper, and Power BI Indigenous-Brands-Social-Movements Pyt

Jinwoo(Roy) Yoon 1 Feb 05, 2022
Data-FX is an addon for Blender (2.9) that allows for the visualization of data with different charts

Data-FX Data-FX is an addon for Blender (2.9) that allows for the visualization of data with different charts Currently, there are only 2 chart option

Landon Ferguson 20 Nov 21, 2022
🎨 Python Echarts Plotting Library

pyecharts Python ❤️ ECharts = pyecharts English README 📣 简介 Apache ECharts (incubating) 是一个由百度开源的数据可视化,凭借着良好的交互性,精巧的图表设计,得到了众多开发者的认可。而 Python 是一门富有表达

pyecharts 13.1k Jan 03, 2023
Parse Robinhood 1099 Tax Document from PDF into CSV

Robinhood 1099 Parser This project converts Robinhood Securities 1099 tax document from PDF to CSV file. This tool will be helpful for those who need

Keun Tae (Kevin) Park 52 Jun 10, 2022
HW 02 for CS40 - matplotlib practice

HW 02 for CS40 - matplotlib practice project instructions https://github.com/mikeizbicki/cmc-csci040/tree/2021fall/hw_02 Drake Lyric Analysis Bar Char

13 Oct 27, 2021
Generate graphs with NetworkX, natively visualize with D3.js and pywebview

webview_d3 This is some PoC code to render graphs created with NetworkX natively using D3.js and pywebview. The main benifit of this approac

byt3bl33d3r 68 Aug 18, 2022