DaCe is a parallel programming framework that takes code in Python/NumPy and other programming languages

Last update: Dec 30, 2022

Overview

aCe - Data-Centric Parallel Programming

Decoupling domain science from performance optimization.

DaCe is a parallel programming framework that takes code in Python/NumPy and other programming languages, and maps it to high-performance CPU, GPU, and FPGA programs, which can be optimized to achieve state-of-the-art. Internally, DaCe uses the Stateful DataFlow multiGraph (SDFG) data-centric intermediate representation: A transformable, interactive representation of code based on data movement. Since the input code and the SDFG are separate, it is posible to optimize a program without changing its source, so that it stays readable. On the other hand, transformations are customizable and user-extensible, so they can be written once and reused in many applications. With data-centric parallel programming, we enable direct knowledge transfer of performance optimization, regardless of the application or the target processor.

DaCe generates high-performance programs for:

Multi-core CPUs (tested on Intel and IBM POWER9)
NVIDIA GPUs
AMD GPUs (with HIP)
Xilinx FPGAs
Intel FPGAs

DaCe can be written inline in Python and transformed in the command-line/Jupyter Notebooks, or SDFGs can be interactively modified using the Data-centric Interactive Optimization Development Environment (DIODE, currently experimental).

For more information, see our paper.

See an example SDFG in the standalone viewer (SDFV).

Tutorials

Installation and Dependencies

To install: pip install dace

Runtime dependencies:

A C++14-capable compiler (e.g., gcc 5.3+)
Python 3.6 or newer
CMake 3.15 or newer

Running

Python scripts: Run DaCe programs (in implicit or explicit syntax) using Python directly.

SDFV (standalone SDFG viewer): To view SDFGs separately, run the sdfv installed script with the .sdfg file as an argument. Alternatively, you can use the link or open diode/sdfv.html directly and choose a file in the browser.

Visual Studio Code plugin: Install from the VSCode marketplace or open an .sdfg file for interactive SDFG viewing and transformation.

DIODE interactive development (experimental):: Either run the installed script diode, or call python3 -m diode from the shell. Then, follow the printed instructions to enter the web interface.

The sdfgcc tool: Compile .sdfg files with sdfgcc program.sdfg. Interactive command-line optimization is possible with the --optimize flag.

Jupyter Notebooks: DaCe is Jupyter-compatible. If a result is an SDFG or a state, it will show up directly in the notebook. See the tutorials for examples.

Octave scripts (experimental): .m files can be run using the installed script dacelab, which will create the appropriate SDFG file.

Note for Windows/Visual C++ users: If compilation fails in the linkage phase, try setting the following environment variable to force Visual C++ to use Multi-Threaded linkage:

X:\path\to\dace> set _CL_=/MT

Publication

If you use DaCe, cite us:

@inproceedings{dace,
  author    = {Ben-Nun, Tal and de~Fine~Licht, Johannes and Ziogas, Alexandros Nikolaos and Schneider, Timo and Hoefler, Torsten},
  title     = {Stateful Dataflow Multigraphs: A Data-Centric Model for Performance Portability on Heterogeneous Architectures},
  year      = {2019},
  booktitle = {Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis},
  series = {SC '19}
}

Configuration

DaCe creates a file called .dace.conf in the user's home directory. It provides useful settings that can be modified either directly in the file (YAML), within DIODE, or overriden on a case-by-case basis using environment variables that begin with DACE_ and specify the setting (where categories are separated by underscores). The full configuration schema is located here.

Useful environment variable configurations include:

DACE_CONFIG (default: ~/.dace.conf): Override DaCe configuration file choice.

General configuration:

DACE_debugprint (default: False): Print debugging information.
DACE_compiler_use_cache (default: False): Uses DaCe program cache instead of re-optimizing and compiling programs.
DACE_compiler_default_data_types (default: Python): Chooses default types for integer and floating-point values. If Python is chosen, int and float are both 64-bit wide. If C is chosen, int and float are 32-bit wide.

GPU programming and debugging:

DACE_compiler_cuda_backend (default: cuda): Chooses the GPU backend to use (can be cuda for NVIDIA GPUs or hip for AMD GPUs).
DACE_compiler_cuda_syncdebug (default: False): If True, calls device-synchronization after every GPU kernel and checks for errors. Good for checking crashes or invalid memory accesses.

FPGA programming:

DACE_compiler_fpga_vendor: (default: xilinx): Can be xilinx for Xilinx FPGAs, or intel_fpga for Intel FPGAs.

SDFG interactive transformation:

DACE_optimizer_transform_on_call (default: False): Uses the transformation command line interface every time a @dace function is called.
DACE_optimizer_interface (default: dace.transformation.optimizer.SDFGOptimizer): Controls the SDFG optimization process if transform_on_call is enabled. By default, uses the transformation command line interface.
DACE_optimizer_automatic_simplification (default: True): If False, skips automatic simplification in the Python frontend (see transformations tutorial for more information).

Profiling:

DACE_profiling (default: False): Enables profiling measurement of the DaCe program runtime in milliseconds. Produces a log file and prints out median runtime.
DACE_treps (default: 100): Number of repetitions to run a DaCe program when profiling is enabled.

Contributing

DaCe is an open-source project. We are happy to accept Pull Requests with your contributions! Please follow the contribution guidelines before submitting a pull request.

License

DaCe is published under the New BSD license, see LICENSE.

Comments

Variable shadowing issue after applying FPGA transform in implicit notation

Running this code:

import dace
import numpy as np


n = dace.symbol("n")

@dace.program
def dot(x: dace.float32[n], y: dace.float32[n], result: dace.float32[1]):

    @dace.map(_[0:n])
    def product(i):
        x_in << x[i]
        y_in << y[i]

        result_out >> result(1, lambda a, b: a + b)
        result_out = x_in * y_in

# ----------
# MAIN
# ----------
if __name__== "__main__":
    a = np.array([1,2,3,4,5,6], dtype=np.float32)
    b = np.array([1,2,3,4,5,6], dtype=np.float32)
    c = np.array([0], dtype=np.float32)

    dot_sdfg = dot.to_sdfg()

    dot_sdfg(x=a, y=b, result=c, n=a.shape[0])
    print("Vec a: ", a)
    print("Vec b: ", b)
    print(c)

After applying "FPGATransformSDFG" the tasklet in connector and the inner state source memlet have a name clash i.e. produce a shadowing issue. See also in the attached image of the SDFG generated by the code after applying the FPGA transformation. Screenshot 2020-02-20 at 12 57 29

Last lines of error output:

  File "/home/burgerm/dace/dace/codegen/targets/cpu.py", line 464, in _emit_copy
    "    " + self.memlet_definition(sdfg, memlet, False, vconn),
  File "/home/burgerm/dace/dace/codegen/targets/cpu.py", line 975, in memlet_definition
    allow_shadowing=allow_shadowing)
  File "/home/burgerm/dace/dace/codegen/targets/target.py", line 226, in add
    raise dace.codegen.codegen.CodegenError(err_str)
dace.codegen.codegen.CodegenError: Shadowing variable x_in from type DefinedType.Pointer to DefinedType.Scalar

bug transformations

opened by manuelburger 10

Fix stream allocation scoping

There was an issue where streams would be allocated globally (to a state) and locally. This should not happen. The expected behaviour is to never allocate streams locally to a scope.

This PR fixes this issue by never including streams in the scope transient analysis.

opened by komplexon3 9
Parallelize Xilinx tests

Translate xilinx_test.sh into xilinx_test.py so we can run multiprocessing.starmap on our tests.

Time for running Xilinx tests reduced from ~27 minutes to ~11 minutes.

opened by definelicht 9
Unroll PEs in FPGA Codegen

Unroll maps with schedule Unrolled as part of the FPGA codegen in order to detect them as processing elements.

This is potentially fishy, since we are applying a transformation during code generation (!!) that modifies the SDFG. It is, however, a neat way of avoiding manually handling this in the FPGA codegen when detecting and generating modules.

opened by definelicht 8
General unroller

Unrolled scheduler now supports unrolling maps anywhere in the SDFG, also if they contain nested SDFG's. Adds 2 tests, one that checks nested unrolling with nested SDFG's, and one much simpler test, that unrolls a map, containing one tasklet. The general concept of unrolling is to backup all the fields that might be affected by calls to replace, then replacing all the map parameters and generating the scope, followed by restoring the fields that were saved.

opened by jnice-81 8
Allow transforming @dace.program
Currently, in order to apply transformations to a @dace.program, you have to first convert it to an SDFG.

This is sometimes suboptimal because it changes the arguments required to call the program. For example:

matmul(A, B, C) sdfg = matmul.to_sdfg() sdfg(A=A, B=B, C=C, N=N, K=K, M=M)

It would be convenient to not have to change the program arguments like this when converting to an SDFG, perhaps by allowing transformations to be applied to the underlying SDFG of a @dace.program while maintaining the program interface/API.
frontend
opened by definelicht 8
Generate Duplicated NestedSDFGs only once
PR for #392

First commit, so that we can iterate.

Goal: avoid generating 2+ times the same code for a NestedSDFGs that is used multiple times (which include also LibNodes after expansion)

Current implementation:

we need to unequivocally identify SDFG. For the moment, I've added to the SDFG an additional property unique_name (type string, default empty)

this is used in the code generator (CPU) to keep track of the already generated Nested SDFG. If we try to generate an already seen NestedSDFG, it will skip it

there are two additional tests for checking that it works on CPU and FPGA (under the assumption that the topmost SDFG is scheduled on the CPU)

So, up to now, it is up to the user to specify the SDFG unique name. We would probably need something (in the configuration file?), to disable this.

Don't know why code coverage fails: the relative difference is 100%
opened by TizianoDeMatteis 8
Calling A @ B as a dace.program inside a function gives the wrong result
Describe the bug Using dace.program A @ B returns different result than just calling A @ B. This issue doesn't seem to happen when I call it from the main method, but when it's nested inside a more complex function, the results are wrong.

To Reproduce I have a method that calls a matrix matrix multiplication like so: C[m1:m,0:B_dim] += A[m1:m,0:m1] @ B[0:m1,0:B_dim] And I attempted to replace it with: C[m1:m, 0:B_dim] += matmul_lib(A[m1:m,0:m1],B[0:m1,0:B_dim]) Where: @dace.program def matmul_lib(A: dtype[M, K], B: dtype[K, N]): return A @ B

Expected behavior They should return the same numerical result, but for some reason they do not. When looking at the numbers produced, the first row seems to correspond but the rows after are all wrong. Example: This is what I get from calling simply A @ B which gives me the correct result: 0.99134411 2.37927192 0.6220935 1.92701032 0.96556958 1.42080484 0.83607334 0.86882378 1.82525592 2.89202545 1.35001469 2.37230364 1.44825839 1.81533188 1.29120714 1.11907193 1.80647289 1.72074369 1.32760667 1.88492459 1.67942782 1.52714228 1.53037621 0.79207197 1.88622792 2.82863798 1.24272828 2.70113389 2.19127038 2.11175294 1.71630729 1.28087929

This is what I get from calling it through matmul_lib: 0.99134411 2.37927192 0.6220935 1.92701032 0.96556958 1.42080484 0.83607334 0.86882378 0.80634312 1.64045609 0.5430692 1.04235601 0.9985718 1.39916937 0.73351313 0.86935447 1.95304681 2.74536479 1.56036938 2.40742907 1.69156976 1.96556436 1.48281472 1.27695084 0.90018296 2.24500806 0.64027737 1.62140514 1.39162286 1.60093125 0.99937141 0.96864924

Desktop (please complete the following information):

OS: windows 10

DaCe on commit: 4f36b20e602a0320ce8303aafcfd9d430d1614e7

python 3.7.9

bug frontend
opened by Simeonedef 7
Code style issue: decorators

Google style guide

Use decorators judiciously when there is a clear advantage. https://google.github.io/styleguide/pyguide.html#217-function-and-method-decorators

Examples

https://github.com/spcl/dace/blob/22fd2d20b896b58a39f5dedc698d0807296767a5/dace/transformation/transformation.py#L15 https://github.com/spcl/dace/blob/22fd2d20b896b58a39f5dedc698d0807296767a5/dace/transformation/dataflow/tiling.py#L12

Cons

Require to keep in mind that something will be made with class/function. Languages provide more classic features to support the registry, that people can understand faster.

Possible solution

Keep registry inside its own class. Define classmethod for this class, which creates an instance of the class and fills it with default transformations. Such classmethod should import each transformation itself. The advantage of such design is that users can extend it by using its own class instance that is filled with user-provided transformations. Another advantage is that there is no global registry.

Good uses

To make annotations in python/numpy interface.

opened by and-ivanov 7
Serialize patch
[x] As a start, don't catch all exceptions, and don't fail silently if a field is missing.

[x] Tolerate string inputs in set_properties_from_json

[x] There's a monstrous list of string-to-type mappings in dace.properties.known_types. This seems brittle and hideous. We can replace this by calling an optionally implemented method.

[x] Naming scheme is off (should be to_json, not toJSON

[x] Move set_properties_from_json out of Property class

[x] Right now every implementation of toJSON/fromJSON needs to call json.dumps and json.loads. Reduce this to receive the JSON objece.
opened by definelicht 7

Reductions are broken on Xilinx FPGAs

Describe the bug When using a reduction (either manual dace.reduce or detected e.g. np.max) dace generates incorrect FPGA code which fails to compile with the error:

/lustre/home/nx08/nx08/jquinn/dace_iterative_solvers/.dacecache/broken_reduction_sym_1/src/xilinx/device/broken_reduction_sym_0_0.cpp: In function 'void broken_reduction_sym_0_0_0(const double*, double&, int)':
/lustre/home/nx08/nx08/jquinn/dace_iterative_solvers/.dacecache/broken_reduction_sym_1/src/xilinx/device/broken_reduction_sym_0_0.cpp:40:34: error: invalid initialization of reference of type 'double*&' from expression of type 'double'
   40 |         reduce_1_0_2(&__A_in[0], __result_out, N);
      |                                  ^~~~~~~~~~~~
/lustre/home/nx08/nx08/jquinn/dace_iterative_solvers/.dacecache/broken_reduction_sym_1/src/xilinx/device/broken_reduction_sym_0_0.cpp:10:47: note: in passing argument 2 of 'void reduce_1_0_2(const double*, double*&, int)'
   10 | void reduce_1_0_2(const double* _in, double*& _out, int N) {
      |                                      ~~~~~~~~~^~~~
gmake[2]: *** [CMakeFiles/broken_reduction_sym_1.dir/lustre/home/nx08/nx08/jquinn/dace_iterative_solvers/.dacecache/broken_reduction_sym_1/src/xilinx/device/broken_reduction_sym_0_0.cpp.o] Error 1
gmake[1]: *** [CMakeFiles/broken_reduction_sym_1.dir/all] Error 2
gmake: *** [all] Error 2

To Reproduce Minimal example:

import dace
import numpy as np
from dace.transformation.interstate import FPGATransformSDFG

N = dace.symbol("N")

@dace.program
def broken_reduction_sym(A: dace.float64[N]):
    # result = np.min(A)
    result = dace.reduce(lambda a, b: a+b, A)

broken_reduction_sdfg = broken_reduction_sym.to_sdfg()
broken_reduction_sdfg.apply_transformations(FPGATransformSDFG)
broken_reduction = broken_reduction_sdfg.compile()

Expected behavior Reductions should produce code that compiles.

Additional context Dace version: 0.13.2 Vitis version: 2021.2 XRT version: 2.11.634 Python version: 3.9.7 Cmake version: 3.19.3 G++ version: 10.2.0

bug

opened by JamieJQuinn 6

auto_optimize now properly chooses GPUAuto expansion for reduce nodes

This PR adds very little code to auto_optimize.py. The added code ensures that the GPUAutoExpansion gets used for reduce nodes, when auto_optimize is used to optimize the produced SDFG.

Before, it would always choose CUDA (device) even though the GPUAuto expansion is higher in the implementation_prio.

opened by hodelcl 0
Warn user when calling `to_sdfg` on a function that shouldn't be reparsed
When using reparse_sdfg or recompile keyword arguments:

sometimes recompile shows up as a constant expression

if calling to_sdfg() the user should be warned that these arguments will be ignored.
opened by tbennun 0
csdfg can not handle torch.rand() tensor in getting_started.ipynb
Describe the bug I replace [12] tester = np.random.rand(2000, 4000) by

import torch tester = torch.rand(2000,4000) tester

and %timeit csdfg(A=tester, N=np.int32(2000)) by %timeit csdfg(A=tester,N=2000) which means use torch.tensor() to replace numpy array as input, but we almost get an error of "Kernel Restarting":

To Reproduce Changes the code as I mentioned, then run all cells.

Expected behavior Output the time of csdfg(A=tester,N=2000).

Screenshots Almost we will get an error:

Sometimes we can get expect result (without any code changes):

Desktop (please complete the following information):

OS: Linux

Browser: Chrome

Version: 106.0.52

Additional context The error also occurs on Windows OS.
opened by Weigaa 0
Python `with` statement code generation identifier is not unique enough
Describe the bug The with statement generates in C code a pair of __with_XXX___enter, __with_XXX___exit statements, with XXX the line number in the original source. It's also how those symbols are earmarked in the SDFG. Unfortunately, this is can cause nasty clashes when:

with statement from two different file are at the same line

code change outside the DaCe handled code path end up changing the line of a with statement that is considered by DaCe, which invalidates running the .so (bad symbol) when technically nothing changed in the code DaCe should care about.

To Reproduce

It would be a two file reproducer with with statement sharing a file number.

Expected behavior

The python frontend handling with statement properly is a very good feature and shouldn't be discarded. A more robust sanitization of the with statement should be found.

Proposal: with util.timer.clock("mainloop") to be sanitized as __with_util_timer_clock_mainloop_X___enter with X a global counter on with statements to keep ordering consistent.
opened by FlorianDeconinck 0

Releases(v0.14.1)

v0.14.1(Oct 14, 2022)

This release of DaCe offers mostly stability fixes for the Python frontend, transformations, and callbacks.

Full Changelog: https://github.com/spcl/dace/compare/v0.14...v0.14.1
Source code(tar.gz)
Source code(zip)
v0.14(Aug 26, 2022)
What's Changed

This release brings forth a major change to how SDFGs are simplified in DaCe, using the Simplify pass pipeline. This both improves the performance of DaCe's transformations and introduces new types of simplification, such as dead dataflow elimination.

Please let us know if there are any regressions with this new release.

Features

Breaking change: The experimental dace.constant type hint has now achieved stable status and was renamed to dace.compiletime

Major change: Only modified configuration entries are now stored in ~/.dace.conf. The SDFG build folders still include the full configuration file. Old .dace.conf files are detected and migrated automatically.

Detailed, multi-platform performance counters are now available via native LIKWID instrumentation (by @lukastruemper in https://github.com/spcl/dace/pull/1063). To use, set .instrument to dace.InstrumentationType.LIKWID_Counters

GPU Memory Pools are now supported through CUDA's mallocAsync API. To enable, set desc.pool = True on any GPU data descriptor.

Map schedule and array storage types can now be annotated directly in Python code (by @orausch in https://github.com/spcl/dace/pull/1088). For example:

import dace from dace.dtypes import StorageType, ScheduleType N = dace.symbol('N') @dace def add_on_gpu(a: dace.float64[N] @ StorageType.GPU_Global, b: dace.float64[N] @ StorageType.GPU_Global): # This map will become a GPU kernel for i in dace.map[0:N] @ ScheduleType.GPU_Device: b[i] = a[i] + 1.0

Customizing GPU block dimension and OpenMP threading properties per map is now supported

Optional arrays (i.e., arrays that can be None) can now be annotated in the code. The simplification pipeline also infers non-optional arrays from their use and can optimize code by eliminating branches. For example:

@dace def optional(maybe: Optional[dace.float64[20]], always: dace.float64[20]): always += 1 # "always" is always used, so it will not be optional if maybe is None: # This condition will stay in the code return 1 if always is None: # This condition will be eliminated in simplify return 2 return 3

Minor changes

Miscellaneous fixes to transformations and passes

Fixes for string literal ("string") use in the Python frontend

einsum is now a library node

If CMake is already installed, it is now detected and will not be installed through pip

Add kernel detection flag by @TizianoDeMatteis in https://github.com/spcl/dace/pull/1061

Better support for __array_interface__ objects by @gronerl in https://github.com/spcl/dace/pull/1071

Replacements look up base classes by @tbennun in https://github.com/spcl/dace/pull/1080

Full Changelog: https://github.com/spcl/dace/compare/v0.13.3...v0.14
Source code(tar.gz)
Source code(zip)
v0.13.3(Jun 30, 2022)
What's Changed

Better integration with Visual Studio Code: Calling sdfg.view() inside a VSCode console or debug session will open the file directly in the editor!

Code generator for the Snitch RISC-V architecture (by @noah95 and @am-ivanov)

Minor hotfixes to Python frontend, transformations, and code generation (with @orausch)

Full Changelog: https://github.com/spcl/dace/compare/v0.13.2...v0.13.3
Source code(tar.gz)
Source code(zip)
v0.13.2(Jun 22, 2022)
What's Changed

New API for SDFG manipulation: Passes and Pipelines. More about that in the next major release!

Various fixes to frontend, type inference, and code generation.

Support for more numpy and Python functions: arange, round, etc.

Better callback support:

Support callbacks with keyword arguments

Support literal lists, tuples, sets, and dictionaries in callbacks

New transformations: move loop into map, on-the-fly-recomputation map fusion

Performance improvements to frontend

Better Docker container compatibility via fixes for config files without a home directory

Add interface to check whether in a DaCe parsing context in https://github.com/spcl/dace/pull/998

def potentially_parsed_by_dace(): if not dace.in_program(): print('Called by Python interpreter!') else: print('Compiled with DaCe!')

Support compressed (gzipped) SDFGs. Loads normally, saves with:

sdfg.save('myprogram.sdfgz', compress=True) # or just run gzip on your old SDFGs

SDFV: Add web serving capability by @orausch in https://github.com/spcl/dace/pull/1013. Use for interactively debugging SDFGs on remote nodes with: sdfg.view(8080) (or any other port)

Full Changelog: https://github.com/spcl/dace/compare/v0.13.1...v0.13.2
Source code(tar.gz)
Source code(zip)
v0.13.1(Apr 26, 2022)
What's Changed

Python frontend: Bug fixes for closures and callbacks in nested scopes

Bug fixes for several transformations (StateFusion, RedundantSecondArray)

Fixes for issues with FORTRAN ordering of numpy arrays

Python object duplicate reference checks in SDFG validation

Full Changelog: https://github.com/spcl/dace/compare/v0.13...v0.13.1
Source code(tar.gz)
Source code(zip)
v0.13(Feb 28, 2022)
New Features

Cutout:

Cutout allows developers to take large DaCe programs and cut out subgraphs reliably to create a runnable sub-program. This sub-program can be then used to check for correctness, benchmark, and transform a part of a program without having to run the full application. * Example usage from Python:

def my_method(sdfg: dace.SDFG, state: dace.SDFGState): nodes = [n for n in state if isinstance(n, dace.nodes.LibraryNode)] # Cut every library node cut_sdfg: dace.SDFG = cutout.cutout_state(state, *nodes) # The cut SDFG now includes each library node and all the necessary arrays to call it with

Also available in the SDFG editor:

Data Instrumentation:

Just like node instrumentation for performance analysis, data instrumentation allows users to set access nodes to be saved to an instrumented data report, and loaded later for exact reproducible runs. * Data instrumentation natively works with CPU and GPU global memory, so there is no need to copy data back * Combined with Cutout, this is a powerful interface to perform local optimizations in large applications with ease! * Example use:

@dace.program def tester(A: dace.float64[20, 20]): tmp = A + 1 return tmp + 5 sdfg = tester.to_sdfg() for node, _ in sdfg.all_nodes_recursive(): # Instrument every access node if isinstance(node, nodes.AccessNode): node.instrument = dace.DataInstrumentationType.Save A = np.random.rand(20, 20) result = sdfg(A) # Get instrumented data from report dreport = sdfg.get_instrumented_data() assert np.allclose(dreport['A'], A) assert np.allclose(dreport['tmp'], A + 1) assert np.allclose(dreport['__return'], A + 6)

Logical Groups:

SDFG elements can now be grouped by any criteria, and they will be colored during visualization by default (by @phschaad). See example in action:

Changes and Bug Fixes

Samples and tutorials have now been updated to reflect the latest API

Constants (added with sdfg.add_constant) can now be used as access nodes in SDFGs. The constants are hard-coded into the generated program, so you can run code with the best performance possible.

View nodes can now use the views connector to disambiguate which access node is being viewed

Python frontend: else clause is now handled in for and while loops

Scalars have been removed from the __dace_init generated function signature (by @orausch)

Multiple clock signals in the RTL codegen (by @carljohnsen)

Various fixes to frontends, transformations, and code generators

Full Changelog available at https://github.com/spcl/dace/compare/v0.12...v0.13
Source code(tar.gz)
Source code(zip)
v0.12(Jan 22, 2022)
API Changes

Important: Pattern-matching transformation API has been significantly simplified. Transformations using the old API must be ported! Summary of changes:

Transformations now expand either the SingleStateTransformation or MultiStateTransformation classes instead of using decorators

Patterns must be registered as class variables called PatternNodes

Nodes in matched patterns can be then accessed in can_be_applied and apply directly using self.nodename

The name strict is now replaced with permissive (False by default). Permissive mode allows transformations to match in more cases, but may be dangerous to apply (e.g., create race conditions).

can_be_applied is now a method of the transformation

The apply method accepts a graph and the SDFG.

Example of using the new API:

import dace from dace import nodes from dace.sdfg import utils as sdutil from dace.transformation import transformation as xf class ExampleTransformation(xf.SingleStateTransformation): # Define pattern nodes map_entry = xf.PatternNode(nodes.MapEntry) access = xf.PatternNode(nodes.AccessNode) # Define matching subgraphs @classmethod def expressions(cls): # MapEntry -> Access return [sdutil.node_path_graph(cls.map_entry, cls.access)] def can_be_applied(self, graph: dace.SDFGState, expr_index: int, sdfg: dace.SDFG, permissive: bool = False) -> bool: # Returns True if the transformation can be applied on a subgraph if permissive: # In permissive mode, we will always apply this transformation return True return self.map_entry.schedule == dace.ScheduleType.CPU_Multicore def apply(self, graph: dace.SDFGState, sdfg: dace.SDFG): # Apply the transformation using the SDFG API pass

Simplifying SDFGs is renamed from sdfg.apply_strict_transformations() to sdfg.simplify()

AccessNodes no longer have an AccessType field.

Other changes

More nested SDFG inlining opportunities by default with the multi-state inline transformation

Performance optimizations of the DaCe framework (parsing, transformations, code generation) for large graphs

Support for Xilinx Vitis 2021.2

Minor fixes to transformations and deserialization

Full Changelog: https://github.com/spcl/dace/compare/v0.11.4...v0.12
Source code(tar.gz)
Source code(zip)
v0.11.4(Dec 17, 2021)
What's Changed

If a Python call cannot be parsed into a data-centric program, DaCe will automatically generate a callback into Python. Supports CPU arrays and GPU arrays (via CuPy) without copying!

Python 3.10 support

CuPy arrays are supported when calling @dace.programs in JIT mode

Fix various issues in Python frontend and code generation

Full Changelog: https://github.com/spcl/dace/compare/v0.11.3...v0.11.4
Source code(tar.gz)
Source code(zip)
v0.11.3(Nov 23, 2021)
What's Changed

Minor fixes to exceptions in Python parsing.

Full Changelog: https://github.com/spcl/dace/compare/v0.11.2...v0.11.3
Source code(tar.gz)
Source code(zip)
v0.11.2(Nov 12, 2021)
What's Changed

Various bug fixes to the Python frontend

Full Changelog: https://github.com/spcl/dace/compare/v0.11.1...v0.11.2
Source code(tar.gz)
Source code(zip)
v0.11.1(Oct 18, 2021)
What's Changed

More flexible Python frontend: you can now call functions and object methods, use fields and globals in @dace programs! Some examples:

There is no need to annotate called functions

@dataclass and general object field support

Loop unrolling: implicit and explicit (with the dace.unroll generator)

Constant folding and explicit constant arguments (with dace.constant as a type hint)

Debuggability: all functions (e.g. dace.map, dace.tasklet) work in pure Python as well

and many more features

NumPy semantics are followed more closely, e.g., subscripts create array views

Direct CuPy and torch.tensor integration in @dace program arguments

Auto-optimization (preview): use @dace.program(auto_optimize=True, device=dace.DeviceType.CPU) to automatically run some transformations, such as turning loops into parallel maps.

ARM SVE code generation support by @sscholbe (#705)

Support for MLIR tasklets by @Berke-Ates in (#747)

Source Mapping by @benibenj in https://github.com/spcl/dace/pull/756

Support for HBM on Xilinx FPGAs by @jnice-81 (#762)

Miscellaneous:

Various performance optimizations to calling @dace programs

Various bug fixes to transformations, code generator, and frontends

Full Changelog: https://github.com/spcl/dace/compare/v0.10.8...v0.11.1
Source code(tar.gz)
Source code(zip)
v0.10.8(Apr 14, 2021)
What's New?

Various bug fixes and more stable Python/NumPy frontend

Support for running DaCe programs within the Python interpreter

(experimental) Support for automatic optimization passes (more coming soon!)

Source code(tar.gz)
Source code(zip)
v0.10.0(Oct 4, 2020)
What's New?

Python frontend improvements: More Python features are supported, such as return values, tuples, and numpy broadcasting. @dace.programs can now call other programs or SDFGs.

AMD GPU (HIP) Support: AMD GPUs are now fully supported with HIP code generation.

Easy-to-use transformation APIs: Apply transformation compositions with one call, enumerate subgraph matches manually, and many more functions now available as part of the dace API. See the new tutorial for examples.

Faster code generation: Backends now generate lower-level code that is more compiler-friendly.

Instrumentation interface: Setting the instrument property for SDFG nodes and states enables easy-to-use, localized performance reporting with timers, GPU events, and PAPI performance counters.

DaCe VSCode plugin: Interactive SDFG viewer and optimizer as part of Visual Studio Code. Download the plugin here.

Type inference and connector types: In addition to automatic type inference, connectors on nodes can now be defined with explicit types, giving more fine-grained control over type reinterpreting and vector types.

Subgraph transformations: New transformation type that can work on arbitrary subgraphs. For example, fuse any computation within a state with SubgraphFusion.

Persistent GPU kernel schedule: Launch persistent kernels with a change of a property! Proportion used of GPU multiprocessors is configurable.

More transformations: Loop manipulation and other new transformations now available with DaCe. Some transformations (such as Vectorization) made more robust to corner cases.

More tools: Use sdfgcc to quickly compile and optimize .sdfg files from the command line, generating header and library files. Great for interoperability and Makefiles.

Short DaCe annotation: Data-centric functions can now be annotated with @dace.

Many minor fixes and additions: More library nodes (such as einsum) and new properties added, enabling faster performance and more productive high-performance coding than ever.

Source code(tar.gz)
Source code(zip)
v0.9.5(Jan 6, 2020)
What's New?

Intel FPGA backend: Generates and compiles Intel FPGA OpenCL code from SDFGs.

Renderer: Many improvements to the scalability of drawing large SDFGs, touch/mobile support, and code view upon zooming into Tasklets.

SDFV: Now includes a sidebar with information about clicked nodes/edges/states.

GPU reduction: Now supports Reduce nodes where output array contains multiple dimensions (if contiguous). On other cases, use the ReduceExpansion transformation.

Faster compilation: Improved CMake usage to speed up compilation time if files were not changed.

Stability: Various fixes to the Python frontend, transformations, code generation, and DIODE (on Linux and Windows).

Generated programs now include header (.h) file and an example C program that invokes the compiled SDFG.

Source code(tar.gz)
Source code(zip)
v0.9.0(Oct 22, 2019)
What's New

NumPy syntax for Python: Wrap Python functions that work on numpy arrays with @dace.program and create SDFGs from implicit dataflow.

DIODE 2.0: DIODE has been reworked to operate in the browser, and works natively on Windows. Note that it is currently experimental, and some features may cause errors. We are happy to fix bugs if you find and report issues!

Standalone SDFG renderer (SDFV) and improved Jupyter support: Contextual, optimized SDFG drawing with collapsible scopes (double-click a map, a state, or a nested SDFG). Fully integrated into Jupyter notebooks.

Transformations: Improvements to scalability of subgraph pattern matching and memlet propagation.

Improvements to the TensorFlow frontend.

Many minor bug fixes and several API improvements.

Source code(tar.gz)
Source code(zip)
v0.8.1(Aug 24, 2019)

Initial release of DaCe.
Source code(tar.gz)
Source code(zip)

Owner

SPCL

GitHub Repository http://dace.is/fast

DaCe is a parallel programming framework that takes code in Python/NumPy and other programming languages

Related tags

Overview

aCe - Data-Centric Parallel Programming

Tutorials

Installation and Dependencies

Running

Publication

Configuration

Contributing

License

Comments

Releases(v0.14.1)

v0.14.1(Oct 14, 2022)

v0.14(Aug 26, 2022)

What's Changed

Features

Minor changes

v0.13.3(Jun 30, 2022)

What's Changed

v0.13.2(Jun 22, 2022)

What's Changed

v0.13.1(Apr 26, 2022)

What's Changed

v0.13(Feb 28, 2022)

New Features

Cutout:

Data Instrumentation:

Logical Groups:

Changes and Bug Fixes

v0.12(Jan 22, 2022)

API Changes

Other changes

v0.11.4(Dec 17, 2021)

What's Changed

v0.11.3(Nov 23, 2021)

What's Changed

v0.11.2(Nov 12, 2021)

What's Changed

v0.11.1(Oct 18, 2021)

What's Changed

v0.10.8(Apr 14, 2021)

What's New?

v0.10.0(Oct 4, 2020)

What's New?

v0.9.5(Jan 6, 2020)

What's New?

v0.9.0(Oct 22, 2019)

What's New

v0.8.1(Aug 24, 2019)

Owner

SPCL

A distributed block-based data storage and compute engine

Extract data from a wide range of Internet sources into a pandas DataFrame.

pyhsmm MITpyhsmm - Bayesian inference in HSMMs and HMMs. MIT

collect training and calibration data for gaze tracking

A script to "SHUA" H1-2 map of Mercenaries mode of Hearthstone

Reading streams of Twitter data, save them to Kafka, then process with Kafka Stream API and Spark Streaming

PandaPy has the speed of NumPy and the usability of Pandas 10x to 50x faster (by @firmai)

Python-based Space Physics Environment Data Analysis Software

Data Scientist in Simple Stock Analysis of PT Bukalapak.com Tbk for Long Term Investment

MapReader: A computer vision pipeline for the semantic exploration of maps at scale

Important dataframe statistics with a single command

Exploratory Data Analysis for Employee Retention Dataset

Visions provides an extensible suite of tools to support common data analysis operations

Streamz helps you build pipelines to manage continuous streams of data

AWS Glue ETL Code Samples

Scraping and analysis of leetcode-compensations page.

Building house price data pipelines with Apache Beam and Spark on GCP

COVID-19 deaths statistics around the world

Synthetic Data Generation for tabular, relational and time series data.

Hydrogen (or other pure gas phase species) depressurization calculations