Quickly and accurately render even the largest data.

Overview



Turn even the largest data into images, accurately

Build Status Build Status
Coverage codecov
Latest dev release Github tag dev-site
Latest release Github release PyPI version datashader version conda-forge version defaults version
Docs gh-pages site
Support Discourse

What is it?

Datashader is a data rasterization pipeline for automating the process of creating meaningful representations of large amounts of data. Datashader breaks the creation of images of data into 3 main steps:

  1. Projection

    Each record is projected into zero or more bins of a nominal plotting grid shape, based on a specified glyph.

  2. Aggregation

    Reductions are computed for each bin, compressing the potentially large dataset into a much smaller aggregate array.

  3. Transformation

    These aggregates are then further processed, eventually creating an image.

Using this very general pipeline, many interesting data visualizations can be created in a performant and scalable way. Datashader contains tools for easily creating these pipelines in a composable manner, using only a few lines of code. Datashader can be used on its own, but it is also designed to work as a pre-processing stage in a plotting library, allowing that library to work with much larger datasets than it would otherwise.

Installation

Datashader supports Python 2.7, 3.6 and 3.7 on Linux, Windows, or Mac and can be installed with conda:

conda install datashader

or with pip:

pip install datashader

For the best performance, we recommend using conda so that you are sure to get numerical libraries optimized for your platform. The latest releases are avalailable on the pyviz channel conda install -c pyviz datashader and the latest pre-release versions are avalailable on the dev-labelled channel conda install -c pyviz/label/dev datashader.

Fetching Examples

Once you've installed datashader as above you can fetch the examples:

datashader examples
cd datashader-examples

This will create a new directory called datashader-examples with all the data needed to run the examples.

To run all the examples you will need some extra dependencies. If you installed datashader within a conda environment, with that environment active run:

conda env update --file environment.yml

Otherwise create a new environment:

conda env create --name datashader --file environment.yml
conda activate datashader

Developer Instructions

  1. Install Python 3 miniconda or anaconda, if you don't already have it on your system.

  2. Clone the datashader git repository if you do not already have it:

    git clone git://github.com/holoviz/datashader.git
    
  3. Set up a new conda environment with all of the dependencies needed to run the examples:

    cd datashader
    conda env create --name datashader --file ./examples/environment.yml
    conda activate datashader
    
  4. Put the datashader directory into the Python path in this environment:

    pip install --no-deps -e .
    

Learning more

After working through the examples, you can find additional resources linked from the datashader documentation, including API documentation and papers and talks about the approach.

Some Examples

USA census

NYC races

NYC taxi

Issues
  • ENH: first draft of MPL artist

    ENH: first draft of MPL artist

    Minimal datashader aware matplotlib artist.

    from datashader.mpl_ext import DSArtist
    import matplotlib.pyplot as plt
    import matplotlib.colors as mocolor
    
    fig, ax = plt.subplots()
    da = DSArtist(ax, df, 'dropoff_x', 'dropoff_y',  ds.count('passenger_count'), norm=mcolors.LogNorm());
    ax.add_artist(da); ax.set_aspect('equal');
    
    fig.colorbar(da)
    
    

    so

    This is using DS to just do the binning and then re-using mpl's existing normalization and color-mapping tools.

    in progress 
    opened by tacaswell 55
  • ENH: updated draft of MPL artist

    ENH: updated draft of MPL artist

    Working on resolving issues with @tacaswell's #200 at Scipy 2020 sprints along with @manzt.

    The DSArtist now takes in a datashader.Pipeline object and so far can handle the case of a 2D raster with a quantitative colormap but not the other 3D categorical case when a color_key is used.

    We currently infer the colormap by applying the datashader pipeline's operations manually rather than using the callable itself. We use the aggregation part of the pipeline (agg, transform_fn) to get a vmin and vmax in order to build and set a matplotlib colormap and norm.

    image

    We'll keep working on the categorical case, but we wanted to share this now and also see if there is still interest in merging this into datashader.

    opened by nvictus 47
  • Add Polygon support

    Add Polygon support

    Overview

    This PR adds support for rasterizing Polygons

    Closes https://github.com/pyviz/datashader/issues/181.

    For example usage, see notebook at https://anaconda.org/jonmmease/datashader_polygons_pr/notebook

    GeomArray ExtensionArrays

    In order to rasterize polygons efficiently, we need a data structure that can store an array of polygon definitions in a form that is directly accessible to numba.

    To accomplish this, I added a new RaggedArray (see https://github.com/pyviz/datashader/pull/687) subclass called PolygonsArray. Each element of this array can store one or more polygons with holes. So elements of a PolygonArray are roughly equivalent to a shapely Polygon or MultiPolygon. The entire PolygonsArray is roughly equivalent to a geopandas GeometryArray of Polygon/MultiPolygon elements.

    The new PolygonsArray pandas extension array could eventually grow to support many of the operations supported by the geopandas GeoSeries. The advantage would be that these operations could be implemented in vectorized form using numba for potentially improved performance, and Dask DataFrame support would largely come for free.

    To demonstrate the potential, I added length and area properties to the PolygonArray class. These operations are ~8x faster than the equivalent GeoSeries operations, and they could also be naturally parallelized using Dask for large datasets.

    Canvas methods

    New Canvas.polygons() method has been added to rasterize polygons, and the Canvas.line() method has been updated to support these new geometry arrays, making it easy to draw polygon outlines.

    Examples

    For code and timing, see https://anaconda.org/jonmmease/datashader_polygons_pr/notebook

    texas

    world

    world_outline

    cc @jbednar @philippjfr

    opened by jonmmease 44
  • Recommended file format for large files

    Recommended file format for large files

    Datashader is agnostic about file formats, working with anything that can be loaded into a dataframe-like object (currently supporting Pandas and Dask dataframes). But because datashader focuses on having good performance for large datasets, the performance of the file format is a major factor in the usability of the library. Thus we should use examples that serve to guide users towards good solutions for their own problems, recommending and demonstrating approaches that we find to work well.

    Right now, our examples use CSV and castra or HDF5 formats. It is of course important to show a CSV example, since nearly every dataset can be obtained in CSV for import into the library. However, CSV is highly inefficient in both file size and reading speed, and it also truncates floating-point precision in ways that are problematic when zooming in closely to a dataset.

    Castra is a relatively high-performance binary format that works well for the large datasets in the examples, but it is not yet a mature project, and is not available on the main conda channel. Should we invest in making castra be more fully supported? If not, I think we should choose another binary format (HDF5?) to use for our examples.

    opened by jbednar 40
  • Add pandas ExtensionArray for storing homogeneous ragged arrays

    Add pandas ExtensionArray for storing homogeneous ragged arrays

    Overview

    This PR introduces a pandas ExtensionArray for storing a column of homogeneous ragged 1D arrays. The Datashader motivation for ragged arrays is to make it possible to store variable-length lines (fixing problems like https://github.com/pyviz/datashader/issues/464) and eventually polygons (https://github.com/pyviz/datashader/issues/181) as elements of a column in a DataFrame. Using one such shape per row makes it simpler to store associated columns of data for use with selections and filtering, hovering, etc.

    This PR currently contains only the extension array and associated testing.

    Implementation

    RaggedArray is a subclass of pandas.api.extension.ExtensionArray with a RaggedDtype that is a subclass of pandas.api.extension.ExtensionDtype. RaggedDtype takes advantage of the @register_extension_dtype decorator introduced in pandas 0.24rc1 to register itself with pandas as a datatype named 'ragged'.

    NOTE: This branch currently requires pandas 0.24rc1

    A ragged array of length n is represented by three numpy arrays:

    • mask: A boolean array of length n where values of True represent missing/NA values
    • flat_array: An array with the same datatype as the ragged array element and with a length equal to the sum of the length of all of the ragged array elements.
    • start_indices: An unsigned integer array of length n of indices into flat_array corresponding to the start of the ragged array element. For space efficiency, the precision of the unsigned integer is chosen to be the smallest available that is capable of indexing the last element in flat_array.

    Example Usage

    In[1]: from datashader.datatypes import RaggedArray
    In[2]: ra = RaggedArray([[1, 2], [], [10, 20], None, [11, 22, 33, 44]])
    In[3]: ra
    Out[3]: 
    <RaggedArray>
    [            array([1., 2.]),    array([], dtype=float64),
               array([10., 20.]),                        None,
     array([11., 22., 33., 44.])]
    Length: 5, dtype: <class 'datashader.datatypes.RaggedDtype'>
    
    In[4]: ra.flat_array
    Out[4]: array([ 1.,  2., 10., 20., 11., 22., 33., 44.])
    
    In[5]: ra.start_indices
    Out[5]: array([0, 2, 2, 4, 4], dtype=uint8)
    
    In[6]: ra.mask
    Out[6]: array([False, False, False,  True, False])
    
    In[7]: pd.array([[1, 2], [], [10, 20], None, [11, 22, 33, 44]], dtype='ragged')
    Out[7]: 
    <RaggedArray>
    [            array([1., 2.]),    array([], dtype=float64),
               array([10., 20.]),                        None,
     array([11., 22., 33., 44.])]
    Length: 5, dtype: <class 'datashader.datatypes.RaggedDtype'>
    
    In[8]: rs = pd.Series([[1, 2], [], [10, 20], None, [11, 22, 33, 44]], dtype='ragged')
    In[9]: rs
    Out[9]: 
    0              [1. 2.]
    1                   []
    2            [10. 20.]
    3                 None
    4    [11. 22. 33. 44.]
    dtype: ragged
    
    In[10]: ragged_subset = rs.loc[[0, 1, 4]]
    In[11]: ragged_subset
    Out[11]: 
    0              [1. 2.]
    1                   []
    4    [11. 22. 33. 44.]
    dtype: ragged
    
    In[12]: ragged_subset.array.mask
    Out[12]: array([False, False, False])
    
    In[13]: ragged_subset.array.flat_array
    Out[13]: array([ 1.,  2., 11., 22., 33., 44.])
    
    In[14]: ragged_subset.array.start_indices
    Out[14]: array([0, 2, 2], dtype=uint8)
    
    opened by jonmmease 39
  • Datashader crashes python with

    Datashader crashes python with "noisy" timeseries data.

    It appears as if datashader renders low noise time series data such as the following no problem:

    low_noise

    However, when attempting to zoom-in on high-noise data such as the following:

    high_noise

    python bombs out with an uncaught win32 exception:

    python_crash_image

    I have attached a python notebook which shows this behavior (at least on my machine):

    Environment:

    • Windows 7
    • new data shader environment (as of 2018-03-12)
      • datashader 0.6.5
      • python 3.6
      • jupyter 1.0.0

    The datashader envrironment was generated using the attached yml file from the datashader examples directory.

    opened by bardiche98 36
  • WIP: Tiling

    WIP: Tiling

    This PR provides tiling for datashader.

    Tiling current assumes that data is provided in meters.

    The interface for tile

    • [x] implement tiles
    • [x] implement supertiles
    • [x] implement zoom level stats scan
    • [x] test stats scan
    • [x] add example notebook
    • [x] add esri metadata output issue (https://github.com/bokeh/datashader/issues/639)
    • [x] add ogc tileset metadata output issue (https://github.com/bokeh/datashader/issues/640)
    opened by brendancol 33
  • Extend InteractiveImage to work with bokeh server

    Extend InteractiveImage to work with bokeh server

    Awesome library, I really love it. I usually embed bokeh with flask and it seems to work with datashader as well. However, I noticed that It doesn't do very well when It tries to re-render the image when zooming in. It's because the js code needs the IPython.notebook.kernel for it to work. Is there a way to make it work without the use of the IPython kernel?

    opened by DavidVillero 33
  • import error in lidar example

    import error in lidar example

    I'm trying to load the lidar example but I got an import error at:

    from holoviews.operation import datashade
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ImportError: cannot import name 'datashade'
    

    I'm using datashader and holoview installed from theirs git:master repository.

    Any clue on what I'm missing?

    opened by epifanio 32
  • Add quadmesh glyph with rectilinear and curvilinear support

    Add quadmesh glyph with rectilinear and curvilinear support

    This PR is an alternative implementation of quadmesh rasterization, initially based on the logic from https://github.com/pyviz/datashader/pull/769.

    Unlike that PR, this PR adds quadmesh glyph classes, and supports the standard datashader aggregation framework.

    Overall, the architecture fits well with that of the dataframe-based glyphs (points, line, and area). And relying on the datashader aggregation framework results in a lot less code duplication compared to https://github.com/pyviz/datashader/pull/769.

    Thanks to the variable argument expansion from https://github.com/pyviz/datashader/pull/780, the performance for rectilinear rendering is now on par with the prototype implementation.

    For curvilinear quadmesh aggregation, this PR uses a raycasting algorithm for determining which pixels to fill. I've found the performance of this approach to be ~1.5x slower than the prototype implementation which uses an area-based point inclusion approach. This algorithm isn't the only difference between the implementation, and I didn't exhaustively chase down the differences this time.

    I went with the raycasting algorithm because it handles concave and complex quads. It is also very straightforward to extend this algorithm to general polygons (with or without holes), so I think there's a good path here towards adding general polygon support to datashader as well.

    For example usage and benchmarks, see https://anaconda.org/jonmmease/quadmeshcomparisons_pr/notebook (rendered at https://nbviewer.jupyter.org/urls/notebooks.anaconda.org/jonmmease/quadmeshcomparisons_pr/download)

    Future work: This PR does not include any parallelization support, so extending this to work in a multi-threaded or distributed context is left as future work.

    @jbednar @philippjfr


    Outdated performance observations from initial PR:

    But, it's an order of magnitude slower than the implementations in https://github.com/pyviz/datashader/pull/769. Here is a notebook showing some timing results: https://anaconda.org/jonmmease/rectquadmesh_examples/notebook.

    Roughly speaking, this PR is ~13x faster than representing a rectilinear quadmesh with a trimesh. But the specialized implementation from https://github.com/pyviz/datashader/pull/769 is ~13 faster than this PR. Note that I disabled numba parallelization for these tests for consistency

    I did some performance debugging and I found that nearly all of the extra overhead in this PR, compared to the specialized implementation, is coming from the use of the aggregation framework. If in the _extend function, I don't call append but instead implement a single aggregation then the performance is comparable to the specialized implementations.

    So the bad news is that right now we need to choose between performance and consistency/maintainability for the quadmesh implementation. The good news is that there may be an order of magnitude speedup to be had across points, line, and area glyphs as well if we can work out how to optimize the aggregation framework.

    opened by jonmmease 31
  • Implements tree reduction in the dask layer

    Implements tree reduction in the dask layer

    Following discussions in https://github.com/ratt-ru/shadeMS/issues/29 (in particular, using small chunks in the dask layer would, counterintuitive, explode RAM usage), @sjperkins replaced the current chunk aggregation code in dask.py with a tree reduction.

    This has been tested extensively with https://github.com/ratt-ru/shadeMS, and was found to reduce RAM usage considerably. Not tested in a CUDA context at all though, so if somebody more knowledgable than me can take a look at it, that'd be great.

    opened by o-smirnov 30
  • Implementation of first and last reduction

    Implementation of first and last reduction

    For my current work, it is useful for me to make use of the first([column]) and last([column]) reductions, which at the moment are implemented only for rasters.

    For both reductions, I've initialised a numpy array with numpy.nan values ( create() method). The append() method is implemented as follows:

    • first([column]): check if field is not null and if agg[y,x] is null (has not yet been filled), then set agg[y,x] = field
    • last([column]): check if field is not null, then set agg[y,x] = field

    For both reductions, the finalize() method wraps the numpy array to a xarray DataArray (similar to the count() reduction).

    Please let me know if you think this may be a good fit for the project.

    opened by tselea 3
  • Do not snap trimesh vertices to pixel grid

    Do not snap trimesh vertices to pixel grid

    Closes #806.

    Previously trimesh snapped each vertex to the nearest pixel center before deciding which pixels were inside each triangle. This worked mostly OK except that it never rendered pixels in the right-hand column or bottom row, and small triangles (just a few pixels wide/high) rendered badly. This fix does not snap vertices to pixel centers to avoid these problems.

    Here are before and after images for the three test cases in issue #806, blue lines show the boundary passed to trimesh and the last example should completely fill the agg: before1 after1

    before2after2

    before3after3

    Most of the work here is changing and checking all of the trimesh tests. I have passed a human eyeball over all of the integration test results (test_pandas and test_dask), and had to rewrite all of the unit tests (test_glyphs) as the internal draw_triangle functions now take floating-point vertex locations (unsnapped).

    opened by ianthomas23 1
  • raster-aggregation in matplotlib extension

    raster-aggregation in matplotlib extension

    Is your feature request related to a problem? Please describe.

    I've already stated the problem on discourse (here), but since it does not seem to get a lot of attention there I thought I'll bring this up in a more general way on github as well.

    I'm the dev of EOmaps, which already provides an integration for datashader via the matplotlib-extension.

    While working on some improvements on the link between EOmaps and datashader, I've noticed that at the moment, (at least as I understand it) the dsshow function of the matplotlib-extension only allows for bypixel aggregations, which limits the capabilities quite a bit.

    To be more precise, I've been trying to get "raster"-like aggregations (using for example "mode" reduction) working but I could not find a way on how to do that without monkey-patching parts of the matplotlib-extension (see below)

    Describe the solution you'd like

    A clear and concise description of what you want to happen.

    at the moment, the aggregation in dsshow() is implemented like this:

    canvas = Canvas(
        plot_width=plot_width,
        plot_height=plot_height,
        x_range=x_range,
        y_range=y_range,
    )
    binned = bypixel(self.df, canvas, self.glyph, self.aggregator)
    

    however, as I understand it, a more flexible way would be to use something like

    canvas = Canvas(
        plot_width=plot_width,
        plot_height=plot_height,
        x_range=x_range,
        y_range=y_range,
    )
    
    # <aggregation-type> = "raster", "point", "line" etc. 
    # <reduction> = "mean", "median", "max" etc.
    agg = canvas.<aggregation-type>(<Dataset>, agg=<reduction>)
    binned = agg.compute()
    

    This is just a scetch to clarify what I mean...

    Describe alternatives you've considered

    Well, I've managed to get raster-aggregation working in EOmaps by monkey-patching the aggregate() function of the matplotlib-extension, but I don't think that this is a proper (and sustainable) way to provide a better datashader-integration.

    Additional context

    here's what I intend to achieve (ideally without having to temper with the matpltolib-extension myself): datashader_raster_aggregation

    opened by raphaelquast 0
  • datashader not importable with dask 2022.5.1

    datashader not importable with dask 2022.5.1

    I have set up a fresh python 3.9 conda environment with the holoviz stack, and datashader cannot be imported. I get the error down below after running import datashader, which indicates an issue within dask.

    Solution: I have downgraded dask from 2022.5.1 to 2022.2.1 and it now works.

    ALL software version info

    conda list:

    # Name                    Version                   Build  Channel
    argon2-cffi               21.3.0             pyhd3eb1b0_0
    argon2-cffi-bindings      21.2.0           py39h2bbff1b_0
    asttokens                 2.0.5              pyhd3eb1b0_0
    attrs                     21.4.0             pyhd3eb1b0_0
    backcall                  0.2.0              pyhd3eb1b0_0
    beautifulsoup4            4.11.1           py39haa95532_0
    blas                      1.0                         mkl
    bleach                    4.1.0              pyhd3eb1b0_0
    bokeh                     2.4.3                      py_0    bokeh
    bottleneck                1.3.4            py39h080aedc_0
    brotli                    1.0.9                ha925a31_2
    brotlipy                  0.7.0           py39h2bbff1b_1003
    ca-certificates           2022.4.26            haa95532_0
    certifi                   2022.5.18.1      py39haa95532_0
    cffi                      1.15.0           py39h2bbff1b_1
    charset-normalizer        2.0.4              pyhd3eb1b0_0
    click                     8.0.4            py39haa95532_0
    cloudpickle               2.0.0              pyhd3eb1b0_0
    colorama                  0.4.4              pyhd3eb1b0_0
    colorcet                  3.0.0                      py_0    pyviz
    cryptography              37.0.1           py39h21b164f_0
    cycler                    0.11.0             pyhd3eb1b0_0
    cytoolz                   0.11.0           py39h2bbff1b_0
    dask                      2022.5.0         py39haa95532_0
    dask-core                 2022.5.0         py39haa95532_0
    datashader                0.14.0                     py_0    pyviz
    datashape                 0.5.4            py39haa95532_1
    debugpy                   1.5.1            py39hd77b12b_0
    decorator                 5.1.1              pyhd3eb1b0_0
    defusedxml                0.7.1              pyhd3eb1b0_0
    distributed               2022.5.0         py39haa95532_0
    entrypoints               0.4              py39haa95532_0
    executing                 0.8.3              pyhd3eb1b0_0
    fonttools                 4.25.0             pyhd3eb1b0_0
    freetype                  2.10.4               hd328e21_0
    fsspec                    2022.3.0         py39haa95532_0
    heapdict                  1.0.1              pyhd3eb1b0_0
    holoviews                 1.14.9                     py_0    pyviz
    hvplot                    0.8.0                      py_0    pyviz
    icc_rt                    2019.0.0             h0cc432a_1
    icu                       58.2                 ha925a31_3
    idna                      3.3                pyhd3eb1b0_0
    importlib-metadata        4.11.3           py39haa95532_0
    intel-openmp              2021.4.0          haa95532_3556
    ipykernel                 6.9.1            py39haa95532_0
    ipython                   8.3.0            py39haa95532_0
    ipython_genutils          0.2.0              pyhd3eb1b0_1
    jedi                      0.18.1           py39haa95532_1
    jinja2                    3.0.3              pyhd3eb1b0_0
    jpeg                      9e                   h2bbff1b_0
    jsonschema                4.4.0            py39haa95532_0
    jupyter_client            7.2.2            py39haa95532_0
    jupyter_core              4.10.0           py39haa95532_0
    jupyterlab_pygments       0.1.2                      py_0
    kiwisolver                1.4.2            py39hd77b12b_0
    libpng                    1.6.37               h2a8f88b_0
    libtiff                   4.2.0                he0120a3_1
    libwebp                   1.2.2                h2bbff1b_0
    llvmlite                  0.38.0           py39h23ce68f_0
    locket                    1.0.0            py39haa95532_0
    lz4                       3.1.3            py39h2bbff1b_0
    lz4-c                     1.9.3                h2bbff1b_1
    markdown                  3.3.4            py39haa95532_0
    markupsafe                2.0.1            py39h2bbff1b_0
    matplotlib                3.5.1            py39haa95532_1
    matplotlib-base           3.5.1            py39hd77b12b_1
    matplotlib-inline         0.1.2              pyhd3eb1b0_2
    mistune                   0.8.4           py39h2bbff1b_1000
    mkl                       2021.4.0           haa95532_640
    mkl-service               2.4.0            py39h2bbff1b_0
    mkl_fft                   1.3.1            py39h277e83a_0
    mkl_random                1.2.2            py39hf11a4ad_0
    msgpack-python            1.0.3            py39h59b6b97_0
    multipledispatch          0.6.0            py39haa95532_0
    munkres                   1.1.4                      py_0
    nbclient                  0.5.13           py39haa95532_0
    nbconvert                 6.4.4            py39haa95532_0
    nbformat                  5.3.0            py39haa95532_0
    nest-asyncio              1.5.5            py39haa95532_0
    notebook                  6.4.11           py39haa95532_0
    numba                     0.55.1           py39hf11a4ad_0
    numexpr                   2.8.1            py39hb80d3ca_0
    numpy                     1.21.5           py39h7a0a035_2
    numpy-base                1.21.5           py39hca35cd5_2
    openssl                   1.1.1o               h2bbff1b_0
    packaging                 21.3               pyhd3eb1b0_0
    pandas                    1.4.2            py39hd77b12b_0
    pandocfilters             1.5.0              pyhd3eb1b0_0
    panel                     0.13.1                     py_0    pyviz
    param                     1.12.1                     py_0    pyviz
    parso                     0.8.3              pyhd3eb1b0_0
    partd                     1.2.0              pyhd3eb1b0_1
    pickleshare               0.7.5           pyhd3eb1b0_1003
    pillow                    9.0.1            py39hdc2b20a_0
    pip                       22.0.4                   pypi_0    pypi
    plotly                    5.6.0              pyhd3eb1b0_0
    prometheus_client         0.13.1             pyhd3eb1b0_0
    prompt-toolkit            3.0.20             pyhd3eb1b0_0
    psutil                    5.8.0            py39h2bbff1b_1
    pure_eval                 0.2.2              pyhd3eb1b0_0
    pycparser                 2.21               pyhd3eb1b0_0
    pyct                      0.4.8                      py_0    pyviz
    pyct-core                 0.4.8                      py_0    pyviz
    pygments                  2.11.2             pyhd3eb1b0_0
    pyopenssl                 22.0.0             pyhd3eb1b0_0
    pyparsing                 3.0.4              pyhd3eb1b0_0
    pyqt                      5.9.2            py39hd77b12b_6
    pyrsistent                0.18.0           py39h196d8e1_0
    pysocks                   1.7.1            py39haa95532_0
    python                    3.9.12               h6244533_0
    python-dateutil           2.8.2              pyhd3eb1b0_0
    python-fastjsonschema     2.15.1             pyhd3eb1b0_0
    pytz                      2021.3             pyhd3eb1b0_0
    pyviz_comms               2.2.0                      py_0    pyviz
    pywin32                   302              py39h2bbff1b_2
    pywinpty                  2.0.2            py39h5da7b33_0
    pyyaml                    6.0              py39h2bbff1b_1
    pyzmq                     22.3.0           py39hd77b12b_2
    qt                        5.9.7            vc14h73c81de_0
    requests                  2.27.1             pyhd3eb1b0_0
    scipy                     1.7.3            py39h0a974cb_0
    send2trash                1.8.0              pyhd3eb1b0_1
    setuptools                61.2.0           py39haa95532_0
    sip                       4.19.13          py39hd77b12b_0
    six                       1.16.0             pyhd3eb1b0_1
    sortedcontainers          2.4.0              pyhd3eb1b0_0
    soupsieve                 2.3.1              pyhd3eb1b0_0
    sqlite                    3.38.3               h2bbff1b_0
    stack_data                0.2.0              pyhd3eb1b0_0
    tbb                       2021.5.0             h59b6b97_0
    tblib                     1.7.0              pyhd3eb1b0_0
    tenacity                  8.0.1            py39haa95532_0
    terminado                 0.13.1           py39haa95532_0
    testpath                  0.5.0              pyhd3eb1b0_0
    tk                        8.6.11               h2bbff1b_1
    toolz                     0.11.2             pyhd3eb1b0_0
    tornado                   6.1              py39h2bbff1b_0
    tqdm                      4.64.0           py39haa95532_0
    traitlets                 5.1.1              pyhd3eb1b0_0
    typing-extensions         4.1.1                hd3eb1b0_0
    typing_extensions         4.1.1              pyh06a4308_0
    tzdata                    2022a                hda174b7_0
    urllib3                   1.26.9           py39haa95532_0
    vc                        14.2                 h21ff451_1
    vs2015_runtime            14.27.29016          h5e58377_2
    wcwidth                   0.2.5              pyhd3eb1b0_0
    webencodings              0.5.1            py39haa95532_1
    wheel                     0.37.1             pyhd3eb1b0_0
    win_inet_pton             1.1.0            py39haa95532_0
    wincertstore              0.2              py39haa95532_2
    winpty                    0.4.3                         4
    xarray                    0.20.1             pyhd3eb1b0_1
    xz                        5.2.5                h8cc25b3_1
    yaml                      0.2.5                he774522_0
    zict                      2.0.0              pyhd3eb1b0_0
    zipp                      3.8.0            py39haa95532_0
    zlib                      1.2.12               h8cc25b3_2
    

    Complete, minimal, self-contained example code that reproduces the issue

    import datashader
    
    

    Stack traceback and/or browser JavaScript console output

    ~\Anaconda3\lib\site-packages\datashader\__init__.py in <module>
          6 __version__ = str(param.version.Version(fpath=__file__, archive_commit="$Format:%h$",reponame="datashader"))
          7 
    ----> 8 from .core import Canvas                                 # noqa (API import)
          9 from .reductions import *                                # noqa (API import)
         10 from .glyphs import Point                                # noqa (API import)
    
    ~\Anaconda3\lib\site-packages\datashader\core.py in <module>
          6 import numpy as np
          7 import pandas as pd
    ----> 8 import dask.dataframe as dd
          9 import dask.array as da
         10 from xarray import DataArray, Dataset
    
    ~\Anaconda3\lib\site-packages\dask\__init__.py in <module>
          1 from . import config, datasets
          2 from ._version import get_versions
    ----> 3 from .base import annotate, compute, is_dask_collection, optimize, persist, visualize
          4 from .core import istask
          5 from .delayed import delayed
    
    ~\Anaconda3\lib\site-packages\dask\base.py in <module>
         16 
         17 from packaging.version import parse as parse_version
    ---> 18 from tlz import curry, groupby, identity, merge
         19 from tlz.functoolz import Compose
         20 
    
    ~\Anaconda3\lib\site-packages\tlz\__init__.py in <module>
          7 """
          8 
    ----> 9 from . import _build_tlz
    
    ~\Anaconda3\lib\site-packages\tlz\_build_tlz.py in <module>
          1 import sys
          2 import types
    ----> 3 import toolz
          4 from importlib import import_module
          5 
    
    ~\Anaconda3\lib\site-packages\toolz\__init__.py in <module>
         20 from . import curried, sandbox
         21 
    ---> 22 functoolz._sigs.create_signature_registry()
         23 
         24 from ._version import get_versions
    
    NameError: name 'functoolz' is not defined
    
    opened by hyamanieu 0
  • Way to get log axes with datashader dsshow and matplotlib

    Way to get log axes with datashader dsshow and matplotlib

    Is your feature request related to a problem? Please describe.

    I cannot find a way to get dsshow to return the points with a log axis or convert them to a log axis with matplotlib.

    Describe the solution you'd like

    Better documentation with instructions or feature addition if it is not currently doable.

    Describe alternatives you've considered

    dsshow doesn't appear to be fully linked to other datashader functionality. Matplotlib controls do not work to manipulate the object returned by dsshow.

    Additional context

    I believe this doesn't depend on versions or any other context.

    opened by rotheconrad 1
  • dsshow savefig error with matplotlib pdf and svg export. Shading intensity variation in png export.

    dsshow savefig error with matplotlib pdf and svg export. Shading intensity variation in png export.

    ALL software version info

    redhat:enterprise_linux:7.9 Linux 3.10.0-1160.49.1.el7.x86_64 Python 3.8.12 Matplotlib 3.2.2 datashader 0.14.0 pandas 1.1.3 numpy 1.21.6

    Description of expected behavior and the observed behavior

    The datashader plotted points should be the same size as the ax and export well but the scale is off and changes with dpi export settings. Additionally, the shading intensity is variable for png export.

    Complete, minimal, self-contained example code that reproduces the issue

    import matplotlib
    import matplotlib.pyplot as plt
    import datashader as ds
    from datashader.mpl_ext import dsshow
    import pandas as pd
    import numpy as np
    
    # Fake data for testing
    x = np.random.normal(size=100000)
    y = x * 3 + np.random.normal(size=100000)
    
    fig, ax = plt.subplots()
    dsartist = dsshow(
                    df,
                    ds.Point("xs", "ys"),
                    ds.count(),
                    vmin=0,
                    vmax=100,
                    norm="linear",
                    aspect="auto",
                    ax=ax
                    )
    plt.title('300 dpi')
    fig.savefig(test_300dpi.pdf, dpi=300)
    fig.savefig(test_300dpi.png, dpi=300)
    fig.savefig('test_300dpi.svg', dpi=300)
    plt.close()
    
    fig, ax = plt.subplots()
    dsartist = dsshow(
                    df,
                    ds.Point("xs", "ys"),
                    ds.count(),
                    vmin=0,
                    vmax=100,
                    norm="linear",
                    aspect="auto",
                    ax=ax
                    )
    plt.title('default dpi')
    fig.savefig(test_defaultdpi.pdf)
    fig.savefig(test_defaultdpi.png)
    fig.savefig('test_defaultdpi.svg')
    plt.close()
    

    Screenshots or screencasts of the bug in action

    image image image image image image
    opened by rotheconrad 3
Releases(v0.14.0)
  • v0.14.0(Apr 25, 2022)

    This release has been nearly a year in the making, with major new contributions from Ian Thomas, Thuy Do Thi Minh, Simon Høxbro Hansen, Maxime Liquet, and James Bednar, and additional support from Andrii Oriekhov, Philipp Rudiger, and Ajay Thorve.

    Enhancements:

    • Full support for antialiased lines of specified width (#1048, #1072). Previous antialiasing support was limited to single-pixel lines and certain floating-point reduction functions. Now supports arbitrary widths and arbitrary reduction functions, making antialiasing fully supported. Performance ranges from 1.3x to 14x slower than the simplest zero-width implementation; see benchmarks.
    • Fixed an issue with visibility on zoomed-in points plots and on overlapping line plots that was first reported in 2017, with a new option rescale_discrete_levels for how='eq_hist' (#1055)
    • Added a categorical color_key for 2D (unstacked) aggregates (#1020), for producing plots where each pixel has at most one category value
    • Improved docs:
      • A brand new polygons guide (#1071)
      • A new guide to 3D aggregations using by, now documenting using categorizer objects to do 3D numerical binning (#1071)
      • Moved documentation for spreading to its own section so it can be presented at the right pipeline stage (was mixed up with colormapping before) (#1071)
      • Added rescale_discrete_levels example (#1071)
      • Other misc doc cleanup (#1035, #1037, #1058, #1074, #1077)

    Bugfixes:

    • Fixed details of the raster coordinate calculations to match other primitives, making it simpler to overlay separately rendered results (#959, #1046)
    • Various fixes and extensions for cupy/CUDA, e.g. to use cuda for category_binning, spread, and dynspread, including cupy.interp where appropriate (#1015, #1016, #1044, #1050, #1060)
    • Infrastructure/build/ecosystem fixes (#1022, #1025, #1027, #1036, #1045, #1049, #1050, #1057, #1061, #1062, #1063, #1064)

    Compatibility:

    • Canvas.line() option antialias=True is now deprecated; use line_width=1 (or another nonzero value) instead. (#1048)
    • Removed long-deprecated bokeh_ext.py (#1059)
    • Dropped support for Python 2.7 (actually already dropped from the tests in Datashader 0.12) and 3.6 (no longer supported by many downstream libraries like rioxarray, but several of them are not properly declaring that restriction, making 3.6 much more difficult to support.) (#1033)
    • Now tested on Python 3.7, 3.8, 3.9, and 3.10. (#1033)
    Source code(tar.gz)
    Source code(zip)
  • v0.13.0(Jun 9, 2021)

    Version 0.13.0

    Thanks to Jim Bednar, Nezar Abdennur, Philipp Rudiger, and Jean-Luc Stevens.

    Enhancements:

    • Defined new dynspread metric based on counting the fraction of non-empty pixels that have non-empty pixels within a given radius. The resulting dynspread behavior is much more intuitive than the old behavior, which counted already-spread pixels as if they were neighbors (#1001)
    • Added ds.count() as the default reduction for ds.by (#1004)

    Bugfixes:

    • Fixed array-bounds reading error in dynspread (#1001)
    • Fix color_key argument for dsshow (#986)
    • Added Matplotlib output to the 3_Interactivity getting started page. (#1009)
    • Misc docs fixes (#1007)
    • Fix nan assignment to integer array in RaggedArray (#1008)

    Compatibility:

    • Any usage of dynspread with datatypes other than points should be replaced with spread(), which will do what was probably intended by the original dynspread call, i.e. to make isolated lines and shapes visible. Strictly speaking, dynspread could still be useful for other glyph types if that glyph is contained entirely in a pixel, e.g. if a polygon or line segment is located within the pixel bounds, but that seems unlikely.
    • Dynspread may need to have the threshold or max_px arguments updated to achieve the same spreading as in previous releases, though the new behavior is normally going to be more useful than the old.
    Source code(tar.gz)
    Source code(zip)
  • v0.12.1(Mar 22, 2021)

    Major release with new features that should really be considered part of the upcoming 0.13 release; please treat all the new features as experimental in this release due to it being officially a minor release (unintentionally).

    Massive thanks to these contributors for substantial new functionality:

    • Nezar Abdennur (nvictus), Trevor Manz, and Thomas Caswell for their contributions to the new dsshow() support for using Datashader as a Matplotlib Artist, providing seamless interactive Matplotlib+Datashader plots.
    • Oleg Smirnov for category_modulo and category_binning for by(), making categorical plots vastly more powerful.
    • Jean-Luc Stevens for spread and dynspread support for numerical aggregate arrays and not just RGB images, allowing isolated datapoints to be made visible while still supporting hover, colorbars, and other plot features that depend on the numeric aggregate values.
    • Valentin Haenel for the initial anti-aliased line drawing support (still experimental).

    Thanks to Jim Bednar, Philipp Rudiger, Peter Roelants, Thuy Do Thi Minh, Chris Ball, and Jean-Luc Stevens for maintenance and other contributions.

    New features:

    • Expanded (and transposed) performance guide table (#961)
    • Add category_modulo and category_binning for grouping numerical values into categories using by() (#927)
    • Support spreading for numerical (non-RGB) aggregate arrays (#771, #954)
    • Xiaolin Wu anti-aliased line drawing, enabled by adding antialias=True to the Canvas.line() method call. Experimental; currently restricted to sum and max reductions ant only supporting a single-pixel line width. (#916)
    • Improve Dask performance issue from #899 using a tree reduction (#926)

    Bugfixes:

    • Fix for xarray 0.17 raster files, supporting various nodata conventions (#991)
    • Fix RaggedArray tests to keep up with Pandas test suite changes (#982, #993)
    • Fix out-of-bounds error on Points aggregation (#981)
    • Fix CUDA issues (#973)
    • Fix Xarray handling (#971)
    • Disable the interactivity warning on the homepage (#983)

    Compatibility:

    • Drop deprecated modules ds.geo (moved to xarray_image) and ds.spatial (moved to SpatialPandas) (#955)
    Source code(tar.gz)
    Source code(zip)
  • v0.11.1(Aug 16, 2020)

    This release is primarily a compatibility release for newer versions of Rapids cuDF and Numba versions along with a small number of bug fixes. With contributions from @jonmmease, @stuartarchibald, @AjayThorve, @kebowen730, @jbednar and @philippjfr.

    • Fixes support for cuDF 0.13 and Numba 0.48 (#933)
    • Fixes for cuDF support on Numba>=0.51 (#934, #947)
    • Fixes tile generation using aggregators with output of boolean dtype (#949)
    • Fixes for CI and build infrastructure (#935, #948, #951)
    • Updates to docstrings (b1349e3, #950)
    Source code(tar.gz)
    Source code(zip)
  • v0.11.0(May 25, 2020)

    This release includes major contributions from @maihde (generalizing count_cat to by, span for colorize), @jonmmease (Dask quadmesh support), @philippjfr and @jbednar (count_cat/by/colorize/docs/bugfixes), and Barry Bragg, Jr. (TMS tileset speedups).

    New features (see getting_started/2_Pipeline.ipynb for examples):

    • New by() categorical aggregator, extending count_cat to work with other reduction functions, no longer just count. Allows binning of aggregates separately per category value, so that you can compare how that aggregate is affected by category value. (#875, #902, #904, #906). See example in the holoviews docs.
    • Support for negative and zero values in tf.shade for categorical aggregates. (#896, #909, #910, #908)
    • Support for span in _colorize(). (#875, #910)
    • Support for Dask-based quadmesh rendering for rectilinear and curvilinear mesh types (#885, #913)
    • Support for GPU-based raster mesh rendering (via Canvas.quadmesh) (#872)
    • Faster TMS tileset generation (#886)
    • Expanded performance guide (#868)

    Bugfixes:

    • Misc bugfixes and improvements (#874, #882, #888, #889, #890, #891)

    Compatibility (breaking changes and deprecations):

    • To allow negative-valued aggregates, count_cat now weights categories according to how far they are from the minimum aggregate value observed, while previously they were referenced to zero. Previous behavior can be restored by passing color_baseline=0 to count_cat or by.
    • count_cat is now deprecated and removed from the docs; use by(..., count()) instead.
    • Result of a count() aggregation is now uint32, not int32, to distinguish counts from other aggregation types (#910).
    • tf.shade now only treats zero values as missing for count aggregates (uint); zero is otherwise a valid value distinct from NaN (#910).
    • alpha is now respected as the upper end of the alpha range for both _colorize() and _interpolate() in tf.shade; previously only _interpolate respected it.
    • Added new nansum_missing utility for working with Numpy>1.9, where nansum no longer returns NaN for all-NaN values.
    • ds.geo and ds.spatial modules are now deprecated; their contents have moved to xarray_spatial and spatialpandas, respectively. (#894)

    Download and install: https://datashader.org/getting_started

    Source code(tar.gz)
    Source code(zip)
  • v0.10.0(Jan 21, 2020)

    This release includes major contributions from @jonmmease (polygon rendering, spatialpandas), along with contributions from @philippjfr and @brendancol (bugfixes), and @jbednar (docs, warnings, and import times).

    New features:

    • Polygon (and points and lines) rendering for spatialpandas extension arrays (#826, #853)
    • Quadmesh GPU support (#861)
    • Much faster import times (#863)
    • New table in docs listing glyphs supported for each data library (#864,#867)
    • Support for remote Parquet filesystems (#818,#866)

    Bugfixes and compatibility:

    • Misc bugfixes and improvements (#844, #860, #866)
    • Fix warnings and deprecations in tests (#859)
    • Fix Canvas.raster (padding, mode buffers, etc. #862)

    Download and install: https://datashader.org/getting_started

    Source code(tar.gz)
    Source code(zip)
  • v0.9.0(Dec 8, 2019)

    This release includes major contributions from @jonmmease (GPU support), along with contributions from @brendancol (viewshed speedups), @jbednar (docs), and @jsignell (examples, maintenance, website).

    New features:

    • Support for CUDA GPU dataframes (cudf and dask_cudf) (#794, #793, #821, #841, #842)
    • Documented new quadmesh support (renaming user guide section 5_Rasters to 5_Grids to reflect the more-general grid support) (#805)

    Bugfixes and compatibility:

    • Avoid double-counting line segments that fit entirely into a single rendered pixel (#839)
    • Improved geospatial toolbox, including 75X speedups to viewshed algorithm (#811, #824, #844)
    Source code(tar.gz)
    Source code(zip)
  • v0.8.0(Oct 8, 2019)

    This release includes major contributions from @jonmmease (quadmesh and filled-area support), @brendancol (geospatial toolbox, tile previewer), @philippjfr (distributed regridding, dask performance), and @jsignell (examples, maintenance, website).

    New features:

    • Native quadmesh (canvas.quadmesh()) support (for rectilinear and curvilinear grids -- 3X faster than approximating with a trimesh; #779)
    • Filled area (canvas.area()) support (#734)
    • Expanded geospatial toolbox, with support for:
      • Zonal statistics (#782)
      • Calculating viewshed (#781)
      • Calculating proximity (Euclidean and other distance metrics, #772)
    • Distributed raster regridding with Dask (#762)
    • Improved dask performance (#798, #801)
    • tile_previewer utility function (simple Bokeh-based plotting of local tile sources for debugging; #761)

    Bugfixes and compatibility:

    • Compatibility with latest Numba, Intake, Pandas, and Xarray (#763, #768, #791)
    • Improved datetime support (#803)
    • Simplified docs (now built on Travis, and no longer requiring GeoViews) and examples (now on examples.pyviz.org)
    • Skip rendering of empty tiles (#760)
    • Improved performance for point, area, and line glyphs (#780)
    • InteractiveImage and Pipeline are now deprecated; removed from examples (#751)
    Source code(tar.gz)
    Source code(zip)
  • v0.7.0(Apr 8, 2019)

    This release includes major contributions from @jonmmease (raqgged array extension, SpatialPointsFrame, row-oriented line storage, dask trimesh support), @jsignell (maintenance, website), and @jbednar (Panel-based dashboard).

    New features:

    • Simplified Panel-based dashboard using new Param features; now only 48 lines with fewer new concepts (#707)
    • Added pandas ExtensionArray and Dask support for storing homogeneous ragged arrays (#687)
    • Added SpatialPointsFrame and updated census, osm-1billion, and osm examples to use it (#702, #706, #708)
    • Expanded 8_Geography.ipynb to document other geo-related functions
    • Added Dask support for trimesh rendering, though computing the mesh initially still requires vertices and simplicies to fit into memory (#696)
    • Add zero-copy rendering of row-oriented line coordinates, using a new axis argument (#694)

    Bugfixes and compatibility:

    • Added lnglat_to_meters to geo module; new code should import it from there (#708)
    Source code(tar.gz)
    Source code(zip)
  • v0.6.9(Jan 29, 2019)

    This release includes major contributions from @jonmmease (fixing several long-standing bugs), @jlstevens (updating all example notebooks to use current syntax, #685), @jbednar, @philippjfr, and @jsignell (Panel-based dashboard), and @brendancol (geo utilities).

    New features:

    • Replaced outdated 536-line Bokeh dashboard.py with 71-line Panel+HoloViews dashboard .ipynb (#676)
    • Allow aggregating xarray objects (in addition to Pandas and Dask DataFrames) (#675)
    • Create WMTS tiles from Datashader data (#636)
    • Added various geographic utility functions (ndvi, slope, aspect, hillshade, mean, bump map, Perlin noise) (#661)
    • Made OpenSky data public (#691)

    Bugfixes and compatibility:

    • Fix array bounds error on line glyph (#683)
    • Fixed the span argument to tf.shade (#680)
    • Fixed composite.add (for use in spreading) to clip colors rather than overflow (#689)
    • Fixed gerrymandering shape file (#688)
    • Updated to match Bokeh (#656), Dask (#681, #667), Pandas/Numpy (#697)
    Source code(tar.gz)
    Source code(zip)
  • v0.6.8(Sep 11, 2018)

    Minor, mostly bugfix, release with some speed improvements.

    New features:

    • Added Strange Attractors example (#632)
    • Major speedup: optimized dask datashape detection (#634)

    Bugfixes and compatibility:

    • Silenced inappropriate warnings (#631)
    • Fixed various other bugs, including #644
    • Added handling for zero data and zero range (#612, #648)
    Source code(tar.gz)
    Source code(zip)
  • v0.6.7(Jul 7, 2018)

  • v0.6.6(May 20, 2018)

    Minor bugfix release.

    • Now available to install using pip (pip install datashader) or conda defaults (conda install datashader)
    • InteractiveImage is now deprecated; please use the Datashader support in HoloViews instead.
    • Updated installation and example instructions to use new datashader command.
    • Made package building automatic, to allow more frequent releases
    • Ensured transparent (not black) image is returned when there is no data to plot (thanks to Nick Xie)
    • Simplified getting-started example (thanks to David Jones)
    • Various fixes and compatibility updates to examples
    Source code(tar.gz)
    Source code(zip)
  • 0.6.5(Feb 1, 2018)

    Major release with extensive support for triangular meshes and changes to the raster API.

    New features:

    • Trimesh support: Rendering of irregular triangular meshes using Canvas.trimesh() (see user guide) (#525,#552)
    • Added a new website at datashader.org, with new Getting Started pages and an extensive User Guide, with about 50% new material not previously in example notebooks. Built entirely from Jupyter notebooks, which can be run in the examples/ directory. Website is now complete except for sections on points (see the nyc_taxi example in the meantime).
    • Canvas.raster() now accepts xarray Dataset types, not just DataArrays, with the specific DataArray selectable from the Dataset using the column= argument of a supplied aggregation function.
    • tf.Images() now displays anything with an HTML representation, to allow laying out Pandas dataframes alongside datashader output.

    Bugfixes and compatibility:

    • Changed Raster API to match other glyph types:
      • Now accepts a reduction function via an agg= argument like Canvas.line(), Canvas.points(), etc. The previous downsample_method is still accepted for this release, but is now deprecated.
      • upsample_method is now interpolate, accepting linear=True or linear=False; the previous spelling is now deprecated.
      • The layer= argument previously accepted a 1-based integer index, which was confusing given the standard Python 0-based indexing elsewhere. Changed to accept an xarray coordinate, which can be a 1-based index if that's what is defined on the array, but also works with arbitrary floating-point coordinates (e.g. for a depth parameter in an image stack).
      • Now auto-ranges in x and y when not given explicit ranges, instead of raising an error.
    • Fixed various bugs, including one generating incorrect output in Canvas.raster(agg='mode')
    Source code(tar.gz)
    Source code(zip)
  • 0.6.4(Dec 5, 2017)

    Minor compatibility release to track changes in external packages.

    • Updated imports for bokeh 0.12.11 (fixes #535), though there are issues in 0.12.11 itself and so 0.12.12 should be used instead (to be released shortly).
    • Pinned pillow version on Windows (fixes #534).
    Source code(tar.gz)
    Source code(zip)
  • 0.6.3(Dec 1, 2017)

    Apart from the new website, this is a minor release primarily to catch up with changes in external libraries.

    New features:

    • Reorganized examples directory as the basis for a completely new website at https://bokeh.github.io/datashader-docs (#516).
    • Added tf.Images() class to format multiple labeled Datashader images as a table in a Jupyter notebook, now used extensively in the new website.
    • Added utility function dataframe_from_multiple_sequences(x_values, y_values) to convert large numbers of sequences stored as 2D numpy arrays to a NaN-separated pandas dataframe that can be displayed efficiently (see new example in tseries.ipynb) (#512).
    • Improved streaming support (#520).

    Bugfixes and compatibility:

    • Added support for Dask 0.15 and 0.16 and pandas 0.21 (#523,#529) and declared minimum required Numba version.
    • Improved and fixed issues with various example notebooks, primarily to update for changes in dependencies.
    • Changes in network graph support: ignore id field by default to avoid surprising dependence on column name, rename directly_connect_edges to connect_edges for accuracy and conciseness.
    Source code(tar.gz)
    Source code(zip)
  • 0.6.2(Oct 25, 2017)

    Release with bugfixes, changes to match external libraries, and some new features.

    Backwards compatibility:

    • Minor changes to network graph API, e.g. to ignore weights by default in forcelayout2 (#488)
    • Fix upper-bound bin error for auto-ranged data (#459). Previously, points falling on the upper bound of the plotted area were excluded from the plot, which was consistent with the behavior for individual grid cells, but which was confusing and misleading for the outer boundaries. Points falling on the very outermost boundaries are now folded into the final grid cell, which should be the least surprising behavior.

    New or updated examples (.ipynb files in examples/):

    • streaming-aggregation.ipynb: Illustrates combining incoming streams of data for display (also see holoviews streaming).
    • landsat.ipynb: simplified using HoloViews; now includes plots of full spectrum for each point via hovering.
    • Updated and simplified census-hv-dask (now called census-congressional), census-hv, packet_capture_graph.

    New features and improvements

    • Updated Bokeh support to work with new bokeh 0.12.10 release (#505)
    • More options for network/graph plotting (configurable column names, control over weights usage; #488, #494)
    • For lines plots (time series, trajectory, networ graphs), switch line-clipping algorithm from Cohen-Sutherland to Liang-Barsky. The performance gains for random lines range from 50-75% improvement for a million lines. (#495)
    • Added tf.Images class to format a list of images as an HTML table (#492)
    • Faster resampling/regridding operations (#486)

    Known issues:

    • examples/dashboard has not yet been updated to match other libraries, and is thus missing functionality like hovering and legends.
    • A full website with documentation has been started but is not yet ready for deployment.
    Source code(tar.gz)
    Source code(zip)
  • 0.6.1(Sep 13, 2017)

    Minor bugfix release, primarily updating example notebooks to match API changes in external packages.

    Backwards compatibility:

    • Made edge bundling retain edge order, to allow indexing, and absolute coordinates, to allow overlaying on external data.
    • Updated examples to show that xarray now requires dimension names to match before doing arithmetic or comparisons between arrays.

    Known issues:

    • If you use Jupyter notebook 5.0 (earlier or later versions should be ok), you will need to override a setting that prevents visualizations from appearing, e.g.: jupyter notebook --NotebookApp.iopub_data_rate_limit=100000000 census.ipynb &
    • The dashboard needs to be rewritten entirely to match current Bokeh and HoloViews releases, so that hover and legend support can be restored.
    Source code(tar.gz)
    Source code(zip)
  • 0.6.0(Aug 19, 2017)

    New release of features that may still be in progress, but are already usable:

    • Added graph/network plotting support (still may be in flux) (#385, #390, #398, #408, #415, #418, #436)
    • Improved raster regridding based on gridtools and xarray (still may be in flux); no longer depends on rasterio and scikit-image (#383, #389, #423)
    • Significantly improved performance for dataframes with categorical fields

    New examples (.ipynb files in examples/):

    • osm-1billion: 1-billion-point OSM example, for in-core processing on a 16GB laptop.
    • edge_bundling: Plotting graphs using "edgehammer" bundling of edges to show structure.
    • packet_capture_graph: Laying out and visualizing network packets as a graph.

    Backwards compatibility:

    • Remove deprecated interpolate and colorize functions
    • Made raster processing consistently use bin centers to match xarray conventions (requires recent fixes to xarray; only available on a custom channel for now) (#422)
    • Fixed various limitations and quirks for NaN values
    • Made alpha scaling respect min_alpha consistently (#371)

    Known issues:

    • If you use Jupyter notebook 5.0 (earlier or later versions should be ok), you will need to override a setting that prevents visualizations from appearing, e.g.: jupyter notebook --NotebookApp.iopub_data_rate_limit=100000000 census.ipynb &
    • The dashboard needs updating to match current Bokeh releases; most parts other than hover and legends, should be functional but it needs a rewrite to use currently recommended approaches.
    Source code(tar.gz)
    Source code(zip)
  • 0.5.0(May 12, 2017)

    Major release with extensive optimizations and new plotting-library support, incorporating 9 months of development from 5 main contributors:

    • Extensive optimizations for speed and memory usage, providing at least 5X improvements in speed (using the latest Numba versions) and 2X improvements in peak memory requirements. Outlined in #313 and #129.
    • Added HoloViews support for flexible, composable, dynamic plotting, making it simple to switch between datashaded and non-datashaded versions of a Bokeh or Matplotlib plot.
    • Added examples/environment.yml to make it easy to install dependencies needed to run the examples.
    • Updated examples to use the now-recommended supported and fast Apache Parquet file format
    • Added support for variable alpha for non-categorical aggregates, by specifying a single color rather than a list or colormap #345
    • Added datashader.utils.lnglat_to_meters utility function for working in Web Mercator coordinates with Bokeh
    • Added discussion of why you should be using uniform colormaps, and examples of using uniform colormaps from the new colorcet package
    • Numerous bug fixes and updates, mostly in the examples and Bokeh extension
    • Updated reference manual and documentation

    New examples (.ipynb files in examples/):

    • holoviews_datashader: Using HoloViews to create dynamic Datashader plots easily
    • census-hv-dask: Using GeoViews for overlaying shape files, demonstrating gerrymandering by race
    • nyc_taxi-paramnb: Using ParamNB to make a simple dashboard
    • lidar: Visualizing point clouds
    • solar: Visualizing solar radiation data
    • Dynamic 1D histogram example (last code cell in examples/nyc_taxi-nongeo.ipynb)
    • dashboard: Now includes opensky example (python dashboard/dashboard.py -c dashboard/opensky.yml)

    Backwards compatibility:

    • To improve consistency with Numpy and Python data structures and eliminate issues with an empty column and row at the edge of the aggregated raster, the provided xrange,yrange bounds are now treated as upper exclusive. Results will thus differ between 0.5.0 and earlier versions. See #259 for discussion.

    Known issues:

    • If you use Jupyter notebook 5.0 (earlier or later versions should be ok), you will need to override a setting that prevents visualizations from appearing, e.g.: jupyter notebook --NotebookApp.iopub_data_rate_limit=100000000 census.ipynb &
    • Legend and hover support is currently disabled for the dashboard, due to ongoing development of a simpler approach.
    Source code(tar.gz)
    Source code(zip)
  • 0.4.0(Aug 18, 2016)

    Minor bugfix release to support Bokeh 0.12.1, with some API and defaults changes.

    • Added examples() function to obtain the notebooks and other examples corresponding to the installed datashader version; see examples/README.md.
    • Updated dashboard example to match changes in Bokeh
    • Added default color cycle with distinguishable colors for shading categorical data; now tf.shade(agg) with no other arguments should give a usable plot for both categorical and non-categorical data.

    Backwards compatibility:

    • Replaced confusing tf.interpolate() and tf.colorize() functions with a single shading function tf.shade(). The previous names are still supported, but give deprecation warnings. Calls to the previous functions using keyword arguments can simply be renamed to use tf.shade, as all the same keywords are accepted, but calls to colorize that used a positional argument for e.g. the color_key will now need to use a keyword when calling shade().
    • Increased default threshold for tf.dynspread() to improve visibility of sparse dots
    • Increased default min_alpha for tf.shade() (formerly tf.colorize()) to avoid undersaturation

    Known issues:

    • For Bokeh 0.12.1, some notebooks will give warnings for Bokeh plots when used with Jupyter's "Run All" command. Bokeh 0.12.2 will fix this problem when it is released, but for now you can either downgrade to 0.12.0 or use single-cell execution.
    • There are some Bokeh compatibility issues with the dashboard example that are still being investigated and may require a new Bokeh or datashader release in this series.
    Source code(tar.gz)
    Source code(zip)
  • 0.3.2(Jul 18, 2016)

    Minor bugfix release to support Bokeh 0.12:

    • Fixed InteractiveImage zooming to work with Bokeh 0.12.
    • Added more responsive event throttling for DynamicImage; throttle parameter no longer needed and is now deprecated
    • Fixed datashader-download-data command
    • Improved non-geo Taxi example
    • Temporarily disabled dashboard legends; will re-enable in future release
    Source code(tar.gz)
    Source code(zip)
  • 0.3.0(Jun 23, 2016)

    The major feature of this release is support of raster data via Canvas.raster. To use this feature, you must install the optional dependencies via conda install rasterio scikit-image. rasterio relies on gdal, whose conda package has some known bugs, including a missing dependancy for conda install krb5. InteractiveImage in this release requires bokeh 0.11.1 or earlier, and will not work with bokeh 0.12.

    • PR #160 #187 Improved example notebooks and dashboard
    • PR #186 #184 #178 Add datashader-download-data cli command for grabbing example datasets
    • PR #176 #177 Changed census example data to use HDF5 format (slower but more portable)
    • PR #156 #173 #174 Added Landsat8 and race/ethnicity vs. elevation example notebooks
    • PR #172 #159 #157 #149 Added support for images using Canvas.raster (requires rasterio and scikit-image)
    • PR #169 Added legends notebook demonstrating create_categorical_legend and create_ramp_legend
    • PR #162 Added notebook example for datashader.bokeh_ext.HoverLayer
    • PR #152 Added alpha arg to tf.interpolate
    • PR #151 #150, etc. Small bugfixes
    • PR #146 #145 #144 #143 Added streaming example
    • Added hold decorator to utils, summarize_aggregate_values helper function
    • Added FAQ to docs

    Backwards compatibility:

    • Removed memoize_method
    • Renamed datashader.callbacks --> datashader.bokeh_ext
    • Renamed examples/plotting_problems.ipynb --> examples/plotting_pitfalls.ipynb
    Source code(tar.gz)
    Source code(zip)
  • 0.2.0(Apr 1, 2016)

    A major release with significant new functionality and some small backwards-incompatible changes.

    New features:

    • PR #124, census: New census notebook example, showing how to work with categorical data.
    • PR #79, tseries, trajectory: Added line glyph and .any() reduction, used in new time series and trajectory notebook examples.
    • PR #76, #77, #131, etc.: Updated all of the other notebooks in examples/, including nyc_taxi.
    • PR #100, #125: Improved dashboard example: added categorical data support, census and osm datasets, legend and hover support, better performance, out of core option, and more
    • PR #109, #111: Add full colormap support via a new cmap argument to interpolate and colorize; supports color ranges as lists, plus Bokeh palettes and matplotlib colormaps
    • PR #98: Added set_background to make it easier to work with images having a different background color than the default white notebooks
    • PR #119, #121: Added eq_hist option for how in interpolate, performing histogram equalization on the data to reveal structure at every intensity level
    • PR #80, #83, #128: Greatly improved InteractiveImage performance and responsiveness
    • PR #74, #123: Added operators for spreading pixels (to make individual datapoints visible, as circles, squares, or arbitrary mask shapes) and compositing (for simple and flexible composition of images)

    Backwards compatibility:

    • The low and high color options to interpolate and colorize are now deprecated and will be removed in the next release; use cmap=[low,high] instead.
    • The transfer function merge has been removed to avoid confusion. stack and others can be used instead, depending on the use case.
    • The default how for interpolate and colorize is now eq_hist, to reveal the structure automatically regardless of distribution.
    • Pipeline now has a default dynspread step, to make isolated points visible when zooming in, and the default sizes have changed.
    Source code(tar.gz)
    Source code(zip)
  • 0.1.0(Apr 1, 2016)

Owner
HoloViz
High-level tools to simplify visualization in Python
HoloViz
Simple plotting for Python. Python wrapper for D3xter - render charts in the browser with simple Python syntax.

PyDexter Simple plotting for Python. Python wrapper for D3xter - render charts in the browser with simple Python syntax. Setup $ pip install PyDexter

D3xter 31 Mar 6, 2021
Render Jupyter notebook in the terminal

jut - JUpyter notebook Terminal viewer. The command line tool view the IPython/Jupyter notebook in the terminal. Install pip install jut Usage $jut --

Kracekumar 164 May 31, 2022
Apache Superset is a Data Visualization and Data Exploration Platform

Superset A modern, enterprise-ready business intelligence web application. Why Superset? | Supported Databases | Installation and Configuration | Rele

The Apache Software Foundation 46.6k Jun 16, 2022
Apache Superset is a Data Visualization and Data Exploration Platform

Apache Superset is a Data Visualization and Data Exploration Platform

The Apache Software Foundation 46.6k Jun 15, 2022
Tidy data structures, summaries, and visualisations for missing data

naniar naniar provides principled, tidy ways to summarise, visualise, and manipulate missing data with minimal deviations from the workflows in ggplot

Nicholas Tierney 595 Jun 5, 2022
Collection of data visualizing projects through Tableau, Data Wrapper, and Power BI

Data-Visualization-Projects Collection of data visualizing projects through Tableau, Data Wrapper, and Power BI Indigenous-Brands-Social-Movements Pyt

Jinwoo(Roy) Yoon 1 Feb 5, 2022
Script to create an animated data visualisation for categorical timeseries data - GIF choropleth map with annotations.

choropleth_ldn Simple script to create a chloropleth map of London with categorical timeseries data. The script in main.py creates a gif of the most f

null 1 Oct 7, 2021
Automatic data visualization in atom with the nteract data-explorer

Data Explorer Interactively explore your data directly in atom with hydrogen! The nteract data-explorer provides automatic data visualization, so you

Ben Russert 61 Jun 9, 2022
Data-FX is an addon for Blender (2.9) that allows for the visualization of data with different charts

Data-FX Data-FX is an addon for Blender (2.9) that allows for the visualization of data with different charts Currently, there are only 2 chart option

Landon Ferguson 14 May 29, 2022
The Timescale NFT Starter Kit is a step-by-step guide to get up and running with collecting, storing, analyzing and visualizing NFT data from OpenSea, using PostgreSQL and TimescaleDB.

Timescale NFT Starter Kit The Timescale NFT Starter Kit is a step-by-step guide to get up and running with collecting, storing, analyzing and visualiz

Timescale 78 Jun 10, 2022
Debugging, monitoring and visualization for Python Machine Learning and Data Science

Welcome to TensorWatch TensorWatch is a debugging and visualization tool designed for data science, deep learning and reinforcement learning from Micr

Microsoft 3.2k Jun 13, 2022
Rubrix is a free and open-source tool for exploring and iterating on data for artificial intelligence projects.

Open-source tool for exploring, labeling, and monitoring data for AI projects

Recognai 1.1k Jun 12, 2022
Exploratory analysis and data visualization of aircraft accidents and incidents in Brazil.

Exploring aircraft accidents in Brazil Occurrencies with aircraft in Brazil are investigated by the Center for Investigation and Prevention of Aircraf

Augusto Herrmann 5 Dec 14, 2021
Python scripts for plotting audiograms and related data from Interacoustics Equinox audiometer and Otoaccess software.

audiometry Python scripts for plotting audiograms and related data from Interacoustics Equinox 2.0 audiometer and Otoaccess software. Maybe similar sc

Hamilton Lab at UT Austin 2 Jun 15, 2022
Fast data visualization and GUI tools for scientific / engineering applications

PyQtGraph A pure-Python graphics library for PyQt5/PyQt6/PySide2/PySide6 Copyright 2020 Luke Campagnola, University of North Carolina at Chapel Hill h

pyqtgraph 2.8k Jun 5, 2022
NumPy and Pandas interface to Big Data

Blaze translates a subset of modified NumPy and Pandas-like syntax to databases and other computing systems. Blaze allows Python users a familiar inte

Blaze 3.1k Jun 18, 2022
The windML framework provides an easy-to-use access to wind data sources within the Python world, building upon numpy, scipy, sklearn, and matplotlib. Renewable Wind Energy, Forecasting, Prediction

windml Build status : The importance of wind in smart grids with a large number of renewable energy resources is increasing. With the growing infrastr

Computational Intelligence Group 120 Feb 19, 2022
Fast data visualization and GUI tools for scientific / engineering applications

PyQtGraph A pure-Python graphics library for PyQt5/PyQt6/PySide2/PySide6 Copyright 2020 Luke Campagnola, University of North Carolina at Chapel Hill h

pyqtgraph 2.3k Feb 13, 2021
Library for exploring and validating machine learning data

TensorFlow Data Validation TensorFlow Data Validation (TFDV) is a library for exploring and validating machine learning data. It is designed to be hig

null 637 Jun 8, 2022