Ray provides a simple, universal API for building distributed applications.

Overview

https://github.com/ray-project/ray/raw/master/doc/source/images/ray_header_logo.png

https://readthedocs.org/projects/ray/badge/?version=master https://img.shields.io/badge/Ray-Join%20Slack-blue https://img.shields.io/badge/Discuss-Ask%20Questions-blue

Ray provides a simple, universal API for building distributed applications.

Ray is packaged with the following libraries for accelerating machine learning workloads:

  • Tune: Scalable Hyperparameter Tuning
  • RLlib: Scalable Reinforcement Learning
  • RaySGD: Distributed Training Wrappers
  • Ray Serve: Scalable and Programmable Serving

There are also many community integrations with Ray, including Dask, MARS, Modin, Horovod, Hugging Face, Scikit-learn, and others. Check out the full list of Ray distributed libraries here.

Install Ray with: pip install ray. For nightly wheels, see the Installation page.

Quick Start

Execute Python functions in parallel.

import ray
ray.init()

@ray.remote
def f(x):
    return x * x

futures = [f.remote(i) for i in range(4)]
print(ray.get(futures))

To use Ray's actor model:

import ray
ray.init()

@ray.remote
class Counter(object):
    def __init__(self):
        self.n = 0

    def increment(self):
        self.n += 1

    def read(self):
        return self.n

counters = [Counter.remote() for i in range(4)]
[c.increment.remote() for c in counters]
futures = [c.read.remote() for c in counters]
print(ray.get(futures))

Ray programs can run on a single machine, and can also seamlessly scale to large clusters. To execute the above Ray script in the cloud, just download this configuration file, and run:

ray submit [CLUSTER.YAML] example.py --start

Read more about launching clusters.

Tune Quick Start

https://github.com/ray-project/ray/raw/master/doc/source/images/tune-wide.png

Tune is a library for hyperparameter tuning at any scale.

To run this example, you will need to install the following:

$ pip install "ray[tune]"

This example runs a parallel grid search to optimize an example objective function.

from ray import tune


def objective(step, alpha, beta):
    return (0.1 + alpha * step / 100)**(-1) + beta * 0.1


def training_function(config):
    # Hyperparameters
    alpha, beta = config["alpha"], config["beta"]
    for step in range(10):
        # Iterative training function - can be any arbitrary training procedure.
        intermediate_score = objective(step, alpha, beta)
        # Feed the score back back to Tune.
        tune.report(mean_loss=intermediate_score)


analysis = tune.run(
    training_function,
    config={
        "alpha": tune.grid_search([0.001, 0.01, 0.1]),
        "beta": tune.choice([1, 2, 3])
    })

print("Best config: ", analysis.get_best_config(metric="mean_loss", mode="min"))

# Get a dataframe for analyzing trial results.
df = analysis.results_df

If TensorBoard is installed, automatically visualize all trial results:

tensorboard --logdir ~/ray_results

RLlib Quick Start

https://github.com/ray-project/ray/raw/master/doc/source/images/rllib-wide.jpg

RLlib is an open-source library for reinforcement learning built on top of Ray that offers both high scalability and a unified API for a variety of applications.

pip install tensorflow  # or tensorflow-gpu
pip install "ray[rllib]"
import gym
from gym.spaces import Discrete, Box
from ray import tune

class SimpleCorridor(gym.Env):
    def __init__(self, config):
        self.end_pos = config["corridor_length"]
        self.cur_pos = 0
        self.action_space = Discrete(2)
        self.observation_space = Box(0.0, self.end_pos, shape=(1, ))

    def reset(self):
        self.cur_pos = 0
        return [self.cur_pos]

    def step(self, action):
        if action == 0 and self.cur_pos > 0:
            self.cur_pos -= 1
        elif action == 1:
            self.cur_pos += 1
        done = self.cur_pos >= self.end_pos
        return [self.cur_pos], 1 if done else 0, done, {}

tune.run(
    "PPO",
    config={
        "env": SimpleCorridor,
        "num_workers": 4,
        "env_config": {"corridor_length": 5}})

Ray Serve Quick Start

Ray Serve is a scalable model-serving library built on Ray. It is:

  • Framework Agnostic: Use the same toolkit to serve everything from deep learning models built with frameworks like PyTorch or Tensorflow & Keras to Scikit-Learn models or arbitrary business logic.
  • Python First: Configure your model serving with pure Python code - no more YAMLs or JSON configs.
  • Performance Oriented: Turn on batching, pipelining, and GPU acceleration to increase the throughput of your model.
  • Composition Native: Allow you to create "model pipelines" by composing multiple models together to drive a single prediction.
  • Horizontally Scalable: Serve can linearly scale as you add more machines. Enable your ML-powered service to handle growing traffic.

To run this example, you will need to install the following:

$ pip install scikit-learn
$ pip install "ray[serve]"

This example runs serves a scikit-learn gradient boosting classifier.

from ray import serve
import pickle
import requests
from sklearn.datasets import load_iris
from sklearn.ensemble import GradientBoostingClassifier

# Train model
iris_dataset = load_iris()
model = GradientBoostingClassifier()
model.fit(iris_dataset["data"], iris_dataset["target"])

# Define Ray Serve model,
class BoostingModel:
    def __init__(self):
        self.model = model
        self.label_list = iris_dataset["target_names"].tolist()

    def __call__(self, flask_request):
        payload = flask_request.json["vector"]
        print("Worker: received flask request with data", payload)

        prediction = self.model.predict([payload])[0]
        human_name = self.label_list[prediction]
        return {"result": human_name}


# Deploy model
client = serve.start()
client.create_backend("iris:v1", BoostingModel)
client.create_endpoint("iris_classifier", backend="iris:v1", route="/iris")

# Query it!
sample_request_input = {"vector": [1.2, 1.0, 1.1, 0.9]}
response = requests.get("http://localhost:8000/iris", json=sample_request_input)
print(response.text)
# Result:
# {
#  "result": "versicolor"
# }

More Information

Older documents:

Getting Involved

Issues
  • [WIP] Implement Ape-X distributed prioritization

    [WIP] Implement Ape-X distributed prioritization

    What do these changes do?

    This implements https://openreview.net/forum?id=H1Dy---0Z for testing. The main ideas from Ape-X are:

    • Worker-side prioritization: rather than take new samples as max priority, prioritize them in workers. This scales experience gathering.
    • Per-worker exploration: Rather than choosing a single exploration schedule, assign each worker a different exploration value ranging from 0.4 to ~0.0.

    WIP: evaluation on pong. This implementation probably doesn't scale to very high sample throughputs, but we should probably be able to see some gains on a couple dozen workers.

    opened by ericl 199
  • Start integrating new GCS APIs

    Start integrating new GCS APIs

    This is work in progress!

    opened by pcmoritz 161
  • [ray-core] Initial addition of performance integration testing files

    [ray-core] Initial addition of performance integration testing files

    • A Dockerfile specific for this test
      • This is needed because we eventually will upload these numbers to S3
    • Addition of simple performance test for time it takes for a number of variable number of tasks with a variable number of CPUs
    • A couple of bash scripts to setup the Docker environment and run the tests

    What do these changes do?

    Related issue number

    opened by devin-petersohn 134
  • Make Bazel the default build system

    Make Bazel the default build system

    What do these changes do?

    This switches the build system from CMake to Bazel for developers.

    The wheels, valgrind tests and Jenkins are currently still run with CMake and will be switched in follow up PRs.

    Related issue number

    opened by pcmoritz 130
  • Streaming data transfer and python integration

    Streaming data transfer and python integration

    Why are these changes needed?

    This is the minimal implementation of streaming data transfer mentioned in doc, consisting of three parts:

    • writer/reader, implemented with C++ to transfer data between streaming workers
    • streaming queue, the transport layer based on Ray’s direct actor call and C++ Core Worker APIs
    • adaption layer for python, implemented with cython to adapt writer/reader for python

    To integrate python with streaming c++ data transfer, following changes are made:

    • We moved python code from python/ray/experimental/streaming/ to streaming/python/ray/streaming, and soft link to python/ray/streaming just like rllib.
    • We removed batched_queue and added cython based streaming queue implementation.
    • We moved execution graph related logic from Environment into ExecutionGraph.
    • We refactored operator_instance into a processor, and added a JobWorker actor to execute processors.

    The java part will be submitted in following PRs.

    Related issue number

    #6184

    Checks

    • [x] I've run scripts/format.sh to lint the changes in this PR.
    • [ ] I've included any doc changes needed for https://ray.readthedocs.io/en/latest/.
    • [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failure rates at https://ray-travis-tracker.herokuapp.com/.
    opened by chaokunyang 125
  • Pull Plasma from Apache Arrow and remove Plasma store from Ray.

    Pull Plasma from Apache Arrow and remove Plasma store from Ray.

    This PR replaces the version of plasma that is in Ray with the one that was merged into the Apache Arrow project.

    This completes #611.

    opened by pcmoritz 122
  • [core worker] Python core worker object interface

    [core worker] Python core worker object interface

    What do these changes do?

    This change adds a new Cython extension for the core worker and calls into it for all object-store-related operations.

    To support this, it also adds two new methods to the core worker ObjectInterface that resemble the plasma store interface (Create and Seal). These allow us to directly write from Python memory into the object store via Arrow's SerializedPyObject without having to compile the Ray and Arrow Cython code together or add a dependency to Arrow in the core worker.

    Related issue number

    Linter

    • [x] I've run scripts/format.sh to lint the changes in this PR.
    opened by edoakes 117
  • Discussion on batch Garbage Collection.

    Discussion on batch Garbage Collection.

    Hi @robertnishihara @pcmoritz , we are planning to add a batch Garbage Collection to Ray.

    We have a concept called batchId (int64_t) used to do the Garbage Collection. For example, one job will use this batchId to generate all the objectIds and taskIds, and all these objectIds and taskIds will be stored under the Garbage Collection Table under the batchId in GCS. When the job is finished, we can simply pass a batchId to the garbage collector and the garbage collector will look up the Garbage Collection table in GCS and do the garbage collection to all the related tasks and objects.

    In current id.h implementation, the lowest 32 bits in ObjectId is used for Object Index. We can use the higher 64 bits next to the Object Index as the batchId and add a new GC Table in GCS.

    This GC mechanism will help release the memory resources in GCS and plasma. How do you think of this code change?

    opened by guoyuhong 112
  • [tune] Cluster Fault Tolerance

    [tune] Cluster Fault Tolerance

    What do these changes do?

    A redo of #3165 with extraneous cleanup changes removed.

    This currently does not use the same restoring code-path as #3238, but this can change later when component FT is implemented... (i.e., this doesn't notify components that some trials go RUNNING -> PENDING).

    This adds the following functionality:

    • pickleable trials and TrialRunner.
    • checkpointing/restoring functionality for Trial runner
    • user endpoints for experiment checkpointing

    Example:

    
    In [6]: import time
       ...: import ray
       ...: from ray import tune
       ...:
       ...: ray.init()
       ...:
       ...: kwargs = dict(
       ...:     run="__fake",
       ...:     stop=dict(training_iteration=5),
       ...:     checkpoint_freq=1,
       ...:     max_failures=1)
       ...:
       ...: # This will save the experiment state to disk on each step
       ...: tune.run_experiments(
       ...:     dict(experiment1=kwargs),
       ...:     raise_on_failed_trial=False)
       ...:
    

    TODO:

    • [x] User endpoints implemented.
    • [x] NODE FT: Add test for scheduler notification when nodes die and trials running -> pending

    NOTE: this should be a lot easier to review after #3414 is merged.

    opened by richardliaw 110
  • GCS-Based actor management implementation

    GCS-Based actor management implementation

    Why are these changes needed?

    Pls see the <Design Document> first.

    This PR implements the creation and reconstruction of actors based on gcs server.

    Changes on gcs server side

    Several important classes are added: GcsActor, GcsActorManager, GcsActorScheduler.

    • GcsActor: An abstraction of actor at GcsServer side, which wrapper the ActorTableData and provides some simple interface to access the field inside ActorTableData.
    • GcsActorManager: It is responsible for managing the lifecycle of all registered actors.
    • GcsActorScheduler: It is responsible for scheduling actors registered to GcsActorManager, it also contains a inner class called GcsLeasedWorker which is an abstraction of remote leased worker in raylet.

    In addition, this PR has also made some changes to GcsNodeManager, it is responsible for monitoring and manage nodes.

    Changes on raylet side

    • In the old actor management scheme, raylet will be responsible for updating ActorTableData, while in the new GCS-Based actor management scheme, we expect that GCS will be responsible for updating all ActorTableData. So, you will see that all logic about updating ActorTableData will be get ride off.
    • Besides, the raylet should cache the relationship of actor and leased worker because that the raylet should fast reply gcs server without lease anything when gcs server rebuild actors after restart. Pls see the <Design Document>.

    Chages on worker side

    • invoke the gcs_rpc_client.CreateActor on the callback of ResolveDependencies.
    • Fast reply the gcs server without create anyting if it is already bound with an actor when gcs server rebuild actors rebuild actors after restart.

    Related issue number

    Checks

    • [ ] I've run scripts/format.sh to lint the changes in this PR.
    • [ ] I've included any doc changes needed for https://ray.readthedocs.io/en/latest/.
    • [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failure rates at https://ray-travis-tracker.herokuapp.com/.
    opened by wumuzi520 107
  • [DON'T MERGE]

    [DON'T MERGE]

    Just running some experiments.

    opened by DmitriGekhtman 0
  • Update dataset.rst

    Update dataset.rst

    Just added a link to the pydata 2021 talk on ray datasets so people can find it more easily.

    Why are these changes needed?

    Related issue number

    Checks

    • [ ] I've run scripts/format.sh to lint the changes in this PR.
    • [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
    • [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
    • Testing Strategy
      • [ ] Unit tests
      • [ ] Release tests
      • [ ] This PR is not tested :(
    opened by mGalarnyk 0
  • [data](deps): Bump moto[s3] from 2.3.1 to 3.0.0 in /python/requirements/data_processing

    [data](deps): Bump moto[s3] from 2.3.1 to 3.0.0 in /python/requirements/data_processing

    Bumps moto[s3] from 2.3.1 to 3.0.0.

    Changelog

    Sourced from moto[s3]'s changelog.

    3.0.0

    This is a major release, and as such contains some breaking changes.
    
    • Removed:

      • All deprecated decorators have been removed
    • Changes:

      • The behaviour of the class-decorator has been reworked - the state is now reset before every test-method.
      • ECS ARN's are now using the long format.
    • Rebranded:

      • The new mock_s3control-decorator has been introduced. The existing S3control methods (get/put/delete_public_access_block) are no longer available via mock_s3, only via mock_s3control.
    • General:

      • Python 3.5 support has been removed
      • Python 3.10 is now supported

2.3.2

General:
    * Compatible with the latest `responses`-release (0.17.0)

New Services: * Appsync: * create_api_key() * create_graphql_api() * delete_api_key() * delete_graphql_api() * get_graphql_api() * get_schema_creation_status() * get_type() * list_api_keys() * list_graphql_apis() * list_tags_for_resource() * start_schema_creation() * tag_resource() * untag_resource() * update_api_key() * update_graphql_api()

Miscellaneous: * AWSLambda:invoke() now throws an error when trying to return an oversized payload (>6MB) * EC2:describe_instances() now supports filtering by dns-name * EC2:describe_managed_prefix_lists() now supports filtering by tags * SQS:delete_message_batch() now correctly deals with invalid receipt handles

Commits

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
dependencies python 
opened by dependabot[bot] 0
  • [Bug] placement_group not fulfilled in FIFO order

    [Bug] placement_group not fulfilled in FIFO order

    Search before asking

    • [X] I searched the issues and found no similar issues.

    Ray Component

    Ray Core

    What happened + What you expected to happen

    Multiple placement_group's with the same resource requirement are not fulfilled in a FIFO order.

    Versions / Dependencies

    • ray==1.9.2
    • python==3.6.9
    • OS: Ubuntu 18.04

    Reproduction script

    To reproduce, please run the following code on a 1 GPU machine.

    import time
    
    import ray
    import ray.util as ray_util
    
    ray.init('auto', namespace='test', log_to_driver=True)
    
    @ray.remote
    def create_placement_group(name):
        pg = ray_util.placement_group([{
            'CPU': 1,
            'GPU': 1,
        }])
        ray.get(pg.ready())
        print(f'{name} ready')
        time.sleep(5)
        ray_util.remove_placement_group(pg)
        print(f"removed pg {name}")
    
    actors = []
    for i in range(3):
        actors.append(create_placement_group.remote(f'pg{i}'))
        time.sleep(1)
    
    ray.get(actors)
    

    The expected output is the following:

    pg0 pg ready
    removed pg pg0
    pg1 pg ready
    removed pg pg1
    pg2 pg ready
    removed pg pg2
    

    However, the actual behavior is that the pg2 will be fulfilled first, leading to the following output:

    pg0 pg ready
    removed pg pg0
    pg2 pg ready
    removed pg pg2
    pg1 pg ready
    removed pg pg1
    

    Anything else

    No response

    Are you willing to submit a PR?

    • [X] Yes I am willing to submit a PR!
    bug triage 
    opened by Michaelvll 3
  • [tune](deps): Bump flaml from 0.6.7 to 0.9.5 in /python/requirements/ml

    [tune](deps): Bump flaml from 0.6.7 to 0.9.5 in /python/requirements/ml

    Bumps flaml from 0.6.7 to 0.9.5.

    Release notes

    Sourced from flaml's releases.

    v0.9.5

    What's Changed

    New Contributors

    Full Changelog: https://github.com/microsoft/FLAML/compare/v0.9.4...v0.9.5

    v0.9.4

    This release enables regression models for time series forecasting. It also fixes bugs in nlp tasks, such as serialization of transformer models and automatic metrics.

    What's Changed

    Full Changelog: https://github.com/microsoft/FLAML/compare/v0.9.3...v0.9.4

    v0.9.3

    What's Changed

    New Contributors

    Full Changelog: https://github.com/microsoft/FLAML/compare/v0.9.2...v0.9.3

    v0.9.2

    New Features:

    • New task: text summarization
    • Reproducibility of hyperparameter search sequence

    ... (truncated)

    Commits
    • 1c911da Sklearn api x (#405)
    • a6d70ef Bump shelljs from 0.8.4 to 0.8.5 in /website (#402)
    • cb9c7b0 adding logging of training loss (#406)
    • 8f0737c update browser icon (#407)
    • ac16c31 Logo (#399)
    • 3164518 Update flaml/nlp/README.md (#404)
    • dda4ac9 moving intermediate_results logging from model.py to huggingface/trainer.py (...
    • 569908f fix issues in logging, bug in space.py, constraint sign, and improve code cov...
    • c1b5cb5 fixing default metric for regression + change verbosity for transformers (#397)
    • 8e72904 postcss version update (#385)
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies python 
    opened by dependabot[bot] 0
  • [autoscaler] Fix ray.autoscaler.sdk import issue

    [autoscaler] Fix ray.autoscaler.sdk import issue

    Why are these changes needed?

    This PR moves the sdk to its own folder, then includes everything in import ray.autoscaler.sdk in ray's import path.

    Note: that there were circular dependencies in naively doing this because the ray core now uses constants that were defined in the autoscaler for internal kv operations (and the autoscaler similarly calls into the ray core). The solution was to move those internal kv keys into ray core constants so the imports flow (more) one way.

    Related issue number

    closes #19840

    Checks

    • [ ] I've run scripts/format.sh to lint the changes in this PR.
    • [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
    • [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
    • Testing Strategy
      • [ ] Unit tests
      • [ ] Release tests
      • [ ] This PR is not tested :(
    opened by wuisawesome 2
  • [Feature] [Serve] Graceful failure for `handle.remote()` if not deployed yet

    [Feature] [Serve] Graceful failure for `handle.remote()` if not deployed yet

    Search before asking

    • [X] I had searched in the issues and found no similar feature requirement.

    Description

    import ray
    from ray import serve
    
    ray.init()
    
    serve.start()
    
    @serve.deployment
    def f():
        return True
    
    # Let's say we forgot to call f.deploy().
    
    handle = f.get_handle()
    
    # Hangs
    handle.remote()
    

    We should instead print a graceful error message.

    Use case

    No response

    Related issues

    No response

    Are you willing to submit a PR?

    • [X] Yes I am willing to submit a PR!
    enhancement P2 
    opened by architkulkarni 0
  • [RFC] distributed Python coroutines

    [RFC] distributed Python coroutines

    Overview

    Python coroutines are Python functions that can be suspended and resumed. They are built from Python generators, including those declared with the async / await syntax. With https://github.com/llllllllll/cloudpickle-generators for generator serialization, and Ray ObjectRef for distributed future, we can build a runtime that can execute Python coroutines across Ray nodes, potentially with checkpointing.

    Screen Shot 2022-01-21 at 2 44 42 PM

    In the diagram above, Ray would run the coroutine as follow:

    1. The coroutine f() first yields at await load_from_s3(). Assuming this is an async function not using Ray, the output will be local so there is no right semantic to serialize and deserialize the coroutine here. Ray runtime steps the coroutine on the local asyncio event loop.
    2. After above, coroutine f() next yields at await classifier_actor.remote(images). Assuming this is a Ray remote method call, the output ObjectRef can be used anywhere in the Ray cluster. Serializing the coroutine, sending the serialized data to a different node, checkpointing, and deserializing the coroutine are possible.
    3. Then coroutine f() proceeds to return its result to Ray. And Ray handles the result by returning to the coroutine caller or persisting the result.

    Potential Use Cases

    Specifying Workflow in Python

    Workflows can be implemented in a Python coroutine, instead of in a special API / DSL. A hypothetical trip booking workflow can be:

    async def book_trip(request_id: str):
        undo = []
        try:
            car_reservation_id = await book_car.remote(request_id)
            undo.append(lambda : cancel(car_reservation_id))
    
            hotel_reservation_id = await book_hotel.remote(request_id)
            undo.append(lambda : cancel(hotel_reservation_id))
    
            flight_reservation_id = await book_flight.remote(request_id)
            undo.append(lambda : cancel(flight_reservation_id))
        except Exception:
            failed = True
        if failed:
            print(f"Canceling finished tasks ...")
            for callback in undo:
                await callback.remote()
            return
        print(f"Booking finished: {car_reservation_id} {hotel_reservation_id} "
              f"{flight_reservation_id}")
    

    book_car, book_hotel and book_flight are Ray tasks defined with @ray.remote. Ray workflow runtime can turn the book_trip() coroutine above into a persisted workflow, by checkpointing the workflow at each await, so the workflow avoids duplicating successful tasks and retries failed tasks.

    There are more complexities if we want to run tasks in parallel like asyncio.gather(). We may have to checkpoint at each .remote() call. This is being investigated.

    Optimizing Request Processing Flow

    Suppose there are Ray actors for specific tasks, e.g. english_speech_to_text, search and english_text_to_speech, and we want to combine them for a voice search feature (request is processed via english_speech_to_text -> search -> english_text_to_speech). Usually we have to make each Ray actor aware of the next Ray actor to continue request processing. We may need to add voice search specific logic into the Ray actors. This breaks encapsulation. Another alternative is to gather the result from each actor in a request handler, which may increase latency and cost in data transfer. Instead, we can describe how the request flows through the actors with a Python coroutine:

    async def voice_search(question_speech):
        text = await english_speech_to_text.run.remote(question_speech)
    
        if "what" in text:
            answer = await question_and_answer.run.remote(text)
            return answer
    
        search_result = await search.run.remote(text)
    
        result_speech = await english_text_to_speech.run.remote(search_result)
        return result_speech
    

    The coroutine can be developed and tested locally. It can also be executed with Ray from the request handler. Ray can suspend the coroutine at each remote call (e.g. english_speech_to_text.run.remote(question_speech)), serialize and forward the coroutine to the node which will produce the result (e.g. the node running the english_speech_to_text actor). This makes the actors' code cleaner, and reduces data movement to the same level as forwarding to specific actor with reasonable optimizations.

    Status

    Prototype: https://github.com/ray-project/ray/pull/21783

    Know Issues

    • Handling more complex parallel processing patterns, e.g. similar to those implemented by asyncio.gather() or anyio, is not fully fleshed out.
    • When serializing the coroutine, all stack variables are implicitly captured. This can be inefficient if the user has a local variable referencing a large block of data. del variables that are not used further in the coroutine can be a workaround. We may be able to automatically avoid serializing variables not used further in the coroutine too.

    Next Steps

    First we want to gather feedback from the community.

    • For the mentioned use cases, especially the workflow use case which we will be focusing on first, do you think they will be useful? (Btw we are also exploring alternative non-coroutine ways to describe workflows with Python syntax and Ray core API)
    • What are the other potential use cases for Ray distributed coroutines?

    Please let us know what you think!

    cc @ericl @richardliaw @iycheng @simon-mo

    opened by mwtian 0
  • [nit] remove decorator in test_cli.py

    [nit] remove decorator in test_cli.py

    Why are these changes needed?

    Full context see https://github.com/ray-project/ray/issues/21791

    pytest work for "some" environments for this test and on CI master, but this decorator is still unnecessary and was introduced by mistake. So just remove it and see what happens with the original issue.

    Related issue number

    Closes #21791

    Checks

    • [ ] I've run scripts/format.sh to lint the changes in this PR.
    • [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
    • [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
    • Testing Strategy
      • [ ] Unit tests
      • [ ] Release tests
      • [ ] This PR is not tested :(
    opened by jiaodong 0
  • [Bug] `test_cli.py` might not run in the master

    [Bug] `test_cli.py` might not run in the master

    Search before asking

    • [X] I searched the issues and found no similar issues.

    Ray Component

    Ray Core

    What happened + What you expected to happen

    https://github.com/ray-project/ray/blob/45eebdd6e348aadea9c41a86023247b4751bbbef/dashboard/modules/job/tests/test_cli.py#L84

    This line of code fails the test in the master because await is not called within test_address. But in the master CI, this failure seems to be not caught, meaning the test is probably not running in the master.

    cc @jiaodong

    Versions / Dependencies

    N/A

    Reproduction script

    N/A

    Anything else

    No response

    Are you willing to submit a PR?

    • [ ] Yes I am willing to submit a PR!
    bug P1 dashboard 
    opened by rkooo567 3
  • Releases(ray-1.9.2)
    • ray-1.9.2(Jan 11, 2022)

    • ray-1.9.1(Dec 22, 2021)

      Patch release to bump the log4j2 version from 2.14 to 2.16. This resolves the security vulnerabilities https://nvd.nist.gov/vuln/detail/CVE-2021-44228 and https://nvd.nist.gov/vuln/detail/CVE-2021-45046.

      No library or core changes included.

      Thanks @seonggwonyoon and @ijrsvt for contributing the fixes!

      Source code(tar.gz)
      Source code(zip)
    • ray-1.9.0(Dec 3, 2021)

      Highlights

      • Ray Train is now in beta! If you are using Ray Train, we’d love to hear your feedback here!
      • Ray Docker images for multiple CUDA versions are now provided (#19505)! You can specify a -cuXXX suffix to pick a specific version.
        • ray-ml:cpu images are now deprecated. The ray-ml images are only built for GPU.
      • Ray Datasets now supports groupby and aggregations! See the groupby API and GroupedDataset docs for usage.
      • We are making continuing progress in improving Ray stability and usability on Windows. We encourage you to try it out and report feedback or issues at https://github.com/ray-project/ray/issues.
      • We are launching a Ray Job Submission server + CLI & SDK clients to make it easier to submit and monitor Ray applications when you don’t want an active connection using Ray Client. This is currently in alpha, so the APIs are subject to change, but please test it out and file issues / leave feedback on GitHub & discuss.ray.io!

      Ray Autoscaler

      💫Enhancements:

      • Graceful termination of Ray nodes prior to autoscaler scale down (#20013)
      • Ray Clusters on AWS are colocated in one Availability Zone to reduce costs & latency (#19051)

      Ray Client

      🔨 Fixes:

      • ray.put on a list of of objects now returns a single object ref (​​#19737)

      Ray Core

      🎉 New Features:

      • Support remote file storage for runtime_env (#20280, #19315)
      • Added ray job submission client, cli and rest api (#19567, #19657, #19765, #19845, #19851, #19843, #19860, #19995, #20094, #20164, #20170, #20192, #20204)

      💫Enhancements:

      • Garbage collection for runtime_env (#20009, #20072)
      • Improved logging and error messages for runtime_env (#19897, #19888, #18893)

      🔨 Fixes:

      • Fix runtime_env hanging issues (#19823)
      • Fix specifying runtime env in @ray.remote decorator with Ray Client (#19626)
      • Threaded actor / core worker / named actor race condition fixes (#19751, #19598, #20178, #20126)

      📖Documentation:

      • New page “Handling Dependencies”
      • New page “Ray Job Submission: Going from your laptop to production”

      Ray Java

      API Changes:

      • Fully supported namespace APIs. (Check out the namespace for more information.) #19468 #19986 #20057
      • Removed global named actor APIs and global placement group APIs. #20219 #20135
      • Added timeout parameter for Ray.Get() API. #20282

      Note:

      • Use Ray.getActor(name, namespace) API to get a named actor between jobs instead of Ray.getGlobalActor(name).
      • Use PlacementGroup.getPlacementGroup(name, namespace) API to get a placement group between jobs instead of PlacementGroup.getGlobalPlacementGroup(name).

      Ray Datasets

      🎉 New Features:

      • Added groupby and aggregations (#19435, #19673, #20010, #20035, #20044, #20074)
      • Support custom write paths (#19347)

      🔨 Fixes:

      • Support custom CSV write options (#19378)

      🏗 Architecture refactoring:

      • Optimized block compaction (#19681)

      Ray Workflow

      🎉 New Features:

      • Workflow right now support events (#19239)
      • Allow user to specify metadata for workflow and steps (#19372)
      • Allow in-place run a step if the resources match (#19928)

      🔨 Fixes:

      • Fix the s3 path issue (#20115)

      RLlib

      🏗 Architecture refactoring:

      • “framework=tf2” + “eager_tracing=True” is now (almost) as fast as “framework=tf”. A check for tf2.x eager re-traces has been added making sure re-tracing does not happen outside the initial function calls. All CI learning tests (CartPole, Pendulum, FrozenLake) are now also run as framework=tf2. (#19273, #19981, #20109)
      • Prepare deprecation of build_trainer/build_(tf_)?policy utility functions. Instead, use sub-classing of Trainer or Torch|TFPolicy. POCs done for PGTrainer, PPO[TF|Torch]Policy. (#20055, #20061)
      • V-trace (APPO & IMPALA): Don’t drop last ts can be optionally switch on. The default is still to drop it, but this may be changed in a future release. (#19601)
      • Upgrade to gym 0.21. (#19535)

      🔨 Fixes:

      • Minor bugs/issues fixes and enhancements: #19069, #19276, #19306, #19408, #19544, #19623, #19627, #19652, #19693, #19805, #19807, #19809, #19881, #19934, #19945, #20095, #20128, #20134, #20144, #20217, #20283, #20366, #20387

      📖Documentation:

      • RLlib main page (“RLlib in 60sec”) overhaul. (#20215, #20248, #20225, #19932, #19982)
      • Major docstring cleanups in preparation for complete overhaul of API reference pages. (#19784, #19783, #19808, #19759, #19829, #19758, #19830)
      • Other documentation enhancements. (#19908, #19672, #20390)

      Tune

      💫Enhancements:

      • Refactored and improved experiment analysis (#20197, #20181)
      • Refactored cloud checkpointing API/SyncConfig (#20155, #20418, #19632, #19641, #19638, #19880, #19589, #19553, #20045, #20283)
      • Remove magic results (e.g. config) before calculating trial result metrics (#19583)
      • Removal of tech debt (#19773, #19960, #19472, #17654)
      • Improve testing (#20016, #20031, #20263, #20210, #19730
      • Various enhancements (#19496, #20211)

      🔨Fixes:

      • Documentation fixes (#20130, #19791)
      • Tutorial fixes (#20065, #19999)
      • Drop 0 value keys from PGF (#20279)
      • Fix shim error message for scheduler (#19642)
      • Avoid looping through _live_trials twice in _get_next_trial. (#19596)
      • clean up legacy branch in update_avail_resources. (#20071)
      • fix Train/Tune integration on Client (#20351)

      Train

      Ray Train is now in Beta! The beta version includes various usability improvements for distributed PyTorch training and checkpoint management, support for Ray Client, and an integration with Ray Datasets for distributed data ingest.

      Check out the docs here, and the migration guide from Ray SGD to Ray Train here. If you are using Ray Train, we’d love to hear your feedback here!

      🎉 New Features:

      • New train.torch.prepare_model(...) and train.torch.prepare_data_loader(...) API to automatically handle preparing your PyTorch model and DataLoader for distributed training (#20254).
      • Checkpoint management and support for custom checkpoint strategies (#19111).
      • Easily configure what and how many checkpoints to save to disk.
      • Support for Ray Client (#20123, #20351).

      💫Enhancements:

      • Simplify workflow for training with a single worker (#19814).
      • Ray Placement Groups are used for scheduling the training workers (#20091).
      • PACK strategy is used by default but can be changed by setting the TRAIN_ENABLE_WORKER_SPREAD environment variable.
      • Automatically unwrap Torch DDP model and convert to CPU when saving a model as checkpoint (#20333).

      🔨Fixes:

      • Fix HorovodBackend to automatically detect NICs- thanks @tgaddair! (#19533).

      📖Documentation:

      • Denote public facing APIs with beta stability (#20378)
      • Doc updates (#20271)

      Serve

      We would love to hear from you! Fill out the Ray Serve survey here.

      🎉 New Features:

      🔨Fixes:

      • Serve deployment functions or classes can take no parameters (#19708)
      • Replica slow start message is improved. You can now see whether it is slow to allocate resources or slow to run constructor. (#19431)
      • pip install ray[serve] will now install ray[default] as well. (#19570)

      🏗 Architecture refactoring:

      • The terminology of “backend” and “endpoint” are officially deprecated in favor of “deployment”. (#20229, #20085, #20040, #20020, #19997, #19947, #19923, #19798).
      • Progress towards Java API compatibility (#19463).

      Dashboard

      • Ray Dashboard is now enabled on Windows! (#19575)

      Thanks

      Many thanks to all those who contributed to this release! @krfricke, @stefanbschneider, @ericl, @nikitavemuri, @qicosmos, @worldveil, @triciasfu, @AmeerHajAli, @javi-redondo, @architkulkarni, @pdames, @clay4444, @mGalarnyk, @liuyang-my, @matthewdeng, @suquark, @rkooo567, @mwtian, @chenk008, @dependabot[bot], @iycheng, @jiaodong, @scv119, @oscarknagg, @Rohan138, @stephanie-wang, @Zyiqin-Miranda, @ijrsvt, @roireshef, @tkaymak, @simon-mo, @ashione, @jovany-wang, @zenoengine, @tgaddair, @11rohans, @amogkam, @zhisbug, @lchu-ibm, @shrekris-anyscale, @pcmoritz, @yiranwang52, @mattip, @sven1977, @Yard1, @DmitriGekhtman, @ckw017, @WangTaoTheTonic, @wuisawesome, @kcpevey, @kfstorm, @rhamnett, @renos, @TeoZosa, @SongGuyang, @clarkzinzow, @avnishn, @iasoon, @gjoliver, @jjyao, @xwjiang2010, @dmatrix, @edoakes, @czgdp1807, @heng2j, @sungho-joo, @lixin-wei

      Source code(tar.gz)
      Source code(zip)
    • ray-1.8.0(Nov 2, 2021)

      Highlights

      • Ray SGD has been rebranded to Ray Train! The new documentation landing page can be found here.
      • Ray Datasets is now in beta! The beta release includes a new integration with Ray Train yielding scalable ML ingest for distributed training. Check out the docs here, try it out for your ML ingest and batch inference workloads, and let us know how it goes!
      • This Ray release supports Apple Silicon (M1 Macs). Check out the installation instructions for more information!

      Ray Autoscaler

      🎉 New Features:

      • Fake multi-node mode for autoscaler testing (#18987)

      💫Enhancements:

      • Improve unschedulable task warning messages by integrating with the autoscaler (#18724)

      Ray Client

      💫Enhancements

      • Use async rpc for remote call and actor creation (#18298)

      Ray Core

      💫Enhancements

      • Eagerly install job-level runtime_env (#19449, #17949)

      🔨 Fixes:

      • Fixed resource demand reporting for infeasible 1-CPU tasks (#19000)
      • Fixed printing Python stack trace in Python worker (#19423)
      • Fixed macOS security popups (#18904)
      • Fixed thread safety issues for coreworker (#18902, #18910, #18913 #19343)
      • Fixed placement group performance and resource leaking issues (#19277, #19141, #19138, #19129, #18842, #18652)
      • Improve unschedulable task warning messages by integrating with the autoscaler (#18724)
      • Improved Windows support (#19014, #19062, #19171, #19362)
      • Fix runtime_env issues (#19491, #19377, #18988)

      Ray Data

      Ray Datasets is now in beta! The beta release includes a new integration with Ray Train yielding scalable ML ingest for distributed training. It supports repeating and rewindowing pipelines, zipping two pipelines together, better cancellation of Datasets workloads, and many performance improvements. Check out the docs here, try it out for your ML ingest and batch inference workloads, and let us know how it goes!

      🎉 New Features:

      • Ray Train integration (#17626)
      • Add support for repeating and rewindowing a DatasetPipeline (#19091)
      • .iter_epochs() API for iterating over epochs in a DatasetPipeline (#19217)
      • Add support for zipping two datasets together (#18833)
      • Transformation operations are now cancelled when one fails or the entire workload is killed (#18991)
      • Expose from_pandas()/to_pandas() APIs that accept/return plain Pandas DataFrames (#18992)
      • Customize compression, read/write buffer size, metadata, etc. in the IO layer (#19197)
      • Add spread resource prefix for manual round-robin resource-based task load balancing

      💫Enhancements:

      • Minimal rows are now dropped when doing an equalized split (#18953)
      • Parallelized metadata fetches when reading Parquet datasets (#19211)

      🔨 Fixes:

      • Tensor columns now properly support table slicing (#19534)
      • Prevent Datasets tasks from being captured by Ray Tune placement groups (#19208)
      • Empty datasets are properly handled in most transformations (#18983)

      🏗 Architecture refactoring:

      • Tensor dataset representation changed to a table with a single tensor column (#18867)

      RLlib

      🎉 New Features:

      • Allow n-step > 1 and prioritized replay for R2D2 and RNNSAC agents. (18939)

      🔨 Fixes:

      • Fix memory leaks in TF2 eager mode. (#19198)
      • Faster worker spaces inference if specified through configuration. (#18805)
      • Fix bug for complex obs spaces containing Box([2D shape]) and discrete components. (#18917)
      • Torch multi-GPU stats not protected against race conditions. (#18937)
      • Fix SAC agent with dict space. (#19101)
      • Fix A3C/IMPALA in multi-agent setting. (#19100)

      🏗 Architecture refactoring:

      • Unify results dictionary returned from Trainer.train() across agents regardless of (tf or pytorch, multi-agent, multi-gpu, or algos that use >1 SGD iterations, e.g. ppo) (#18879)

      Ray Workflow

      🎉 New Features:

      • Introduce workflow.delete (#19178)

      🔨Fixes:

      • Fix the bug which allow workflow step to be executed multiple times (#19090)

      🏗 Architecture refactoring:

      • Object reference serialization is decoupled from workflow storage (#18328)

      Tune

      🎉 New Features:

      • PBT: Add burn-in period (#19321)

      💫Enhancements:

      • Optional forcible trial cleanup, return default autofilled metrics even if Trainable doesn't report at least once (#19144)
      • Use queue to display JupyterNotebookReporter updates in Ray client (#19137)
      • Add resume="AUTO" and enhance resume error messages (#19181)
      • Provide information about resource deadlocks, early stopping in Tune docs (#18947)
      • Fix HEBOSearch installation docs (#18861)
      • OptunaSearch: check compatibility of search space with evaluated_rewards (#18625)
      • Add save and restore methods for searchers that were missing it & test (#18760)
      • Add documentation for reproducible runs (setting seeds) (#18849)
      • Depreciate max_concurrent in TuneBOHB (#18770)
      • Add on_trial_result to ConcurrencyLimiter (#18766)
      • Ensure arguments passed to tune remote_run match (#18733)
      • Only disable ipython in remote actors (#18789)

      🔨Fixes:

      • Only try to sync driver if sync_to_driver is actually enabled (#19589)
      • sync_client: Fix delete template formatting (#19553)
      • Force no result buffering for hyperband schedulers (#19140)
      • Exclude trial checkpoints in experiment sync (#19185)
      • Fix how durable trainable is retained in global registry (#19223, #19184)
      • Ensure loc column in progress reporter is filled (#19182)
      • Deflake PBT Async test (#19135)
      • Fix Analysis.dataframe() documentation and enable passing of mode=None (#18850)

      Ray Train (SGD)

      Ray SGD has been rebranded to Ray Train! The new documentation landing page can be found here. Ray Train is integrated with Ray Datasets for distributed data loading while training, documentation available here.

      🎉 New Features:

      • Ray Datasets Integration (#17626)

      🔨Fixes:

      • Improved support for multi-GPU training (#18824, #18958)
      • Make actor creation async (#19325)

      📖Documentation:

      • Rename Ray SGD v2 to Ray Train (#19436)
      • Added migration guide from Ray SGD v1 (#18887)

      Serve

      🎉 New Features:

      • Add ability to recover from a checkpoint on cluster failure (#19125)
      • Support kwargs to deployment constructors (#19023)

      🔨Fixes:

      • Fix asyncio compatibility issue (#19298)
      • Catch spurious ConnectionErrors during shutdown (#19224)
      • Fix error with uris=None in runtime_env (#18874)
      • Fix shutdown logic with exit_forever (#18820)

      🏗 Architecture refactoring:

      • Progress towards Serve autoscaling (#18793, #19038, #19145)
      • Progress towards Java support (#18630)
      • Simplifications for long polling (#19154, #19205)

      Dashboard

      🎉 New Features:

      • Basic support for the dashboard on Windows (#19319)

      🔨Fixes:

      • Fix healthcheck issue causing the dashboard to crash under load (#19360)
      • Work around aiohttp 4.0.0+ issues (#19120)

      🏗 Architecture refactoring:

      • Improve dashboard agent retry logic (#18973)

      Thanks

      Many thanks to all those who contributed to this release! @rkooo567, @lchu-ibm, @scv119, @pdames, @suquark, @antoine-galataud, @sven1977, @mvindiola1, @krfricke, @ijrsvt, @sighingnow, @marload, @jmakov, @clay4444, @mwtian, @pcmoritz, @iycheng, @ckw017, @chenk008, @jovany-wang, @jjyao, @hauntsaninja, @franklsf95, @jiaodong, @wuisawesome, @odp, @matthewdeng, @duarteocarmo, @czgdp1807, @gjoliver, @mattip, @richardliaw, @max0x7ba, @Jasha10, @acxz, @xwjiang2010, @SongGuyang, @simon-mo, @zhisbug, @ccssmnn, @Yard1, @hazeone, @o0olele, @froody, @robertnishihara, @amogkam, @sasha-s, @xychu, @lixin-wei, @architkulkarni, @edoakes, @clarkzinzow, @DmitriGekhtman, @avnishn, @liuyang-my, @stephanie-wang, @Chong-Li, @ericl, @juliusfrost, @carlogrisetti

      Source code(tar.gz)
      Source code(zip)
    • ray-1.7.0(Oct 7, 2021)

      Highlights

      • Ray SGD v2 is now in alpha! The v2 version introduces APIs that focus on ease of use and composability. Check out the docs here, and the migration guide from v1 to v2 here.
        • If you are using Ray SGD v2, we’d love to hear your feedback here!
      • Ray Workflows is now in alpha! Check out the docs here and try it out for your large-scale data science, ML, and long-running business workflows. Thanks to our early adopters for the feedback so far and the ongoing contributions from IBM Research.
      • We have made major enhancements to C++ API! While we are still busy hardening the feature for production usage, please check out the docs here, try it out, and help provide feedback!

      Ray Autoscaler

      💫Enhancements:

      • Improvement to logging and code structure #18180
      • Default head node type to 0 max_workers #17757
      • Modifications to accommodate custom node providers #17312

      🔨 Fixes:

      • Helm chart configuration fixes #17678 #18123
      • GCP autoscaler config fix #18653
      • Allow attaching to uninitialized head node for debugging #17688
      • Syncing files with Docker head node fixed #16515

      Ray Client

      🎉 New Features:

      • ray.init() args can be forwarded to remote server (#17776)
      • Allow multiple client connections from one driver (#17942)
      • gRPC channel credentials can now be configured from ray.init (#18425, #18365)
      • Ray Client will attempt to recover connections on certain gRPC failures (#18329)

      💫Enhancements

      • Less confusing client RPC errors (#18278)
      • Use a single RPC to fetch ClientObjectRefs passed in a list (#16944)
      • Increase timeout for ProxyManager.get_channel (#18350)

      🔨 Fixes:

      • Fix mismatched debug log ID formats (#17597)
      • Fix confusing error messages when client scripts exit (#17969)

      Ray Core

      🎉 New Features:

      • Major enhancements in the C++ API!
        • This API library enables you to build a C++ distributed system easily, just like the Python API and the Java API.
        • Run pip install -U ray[cpp] to install Ray with C++ API support.
        • Run ray cpp --help to learn how to use it.
        • For more details, check out the docs here and see the tab “C++”.

      🔨 Fixes:

      • Bug fixes for thread-safety / reference count issues / placement group (#18401, #18746, #18312, #17802, #18526, #17863, #18419, #18463, #18193, #17774, #17772, #17670, #17620, #18584, #18646, #17634, #17732)
      • Better format for object loss errors / task & actor logs (#18742, #18577, #18105, #18292, #17971, #18166)
      • Improved the ray status output for placement groups (#18289, #17892)
      • Improved the function export performance (#18284)
      • Support more Ray core metrics such as RPC call latencies (#17578)
      • Improved error messages and logging for runtime environments (#18451, #18092, #18088, #18084, #18496, #18083)

      Ray Data Processing

      🎉 New Features:

      • Add support for reading partitioned Parquet datasets (#17716)
      • Add dataset unioning (#17793)
      • Add support for splitting a dataset at row indices (#17990)
      • Add from_numpy() and to_numpy() APIs (#18146)
      • Add support for splitting a dataset pipeline at row indices (#18243)
      • Add Modin integration (from_modin() and to_modin()) (#18122)
      • Add support for datasets with tensor columns (#18301)
      • Add RayDP (Spark-on-Ray) integration (from_spark() and to_spark()) (#17340)

      💫Enhancements

      • Drop empty tables when read Parquet fragments in order to properly support filter expressions when reading partitioned Parquet datasets (#18098)
      • Retry application-level errors in Datasets (#18296)
      • Create a directory on write if it doesn’t exist (#18435)
      • URL encode paths if they are URLs (#18440)
      • Guard against a dataset pipeline being read multiple times on accident (#18682)
      • Reduce working set size during random shuffles by eagerly destroying intermediate datasets (#18678)
      • Add manual round-robin resource-based load balancing option to read and shuffle stages (#18678)

      🔨 Fixes:

      • Fix JSON writing so IO roundtrip works (#17691)
      • Fix schema subsetting on column selection during Parquet reads (#18361)
      • Fix Dataset.iter_batches() dropping batches when prefetching (#18441)
      • Fix filesystem inference on path containing space (#18644)

      🏗 Architecture refactoring:

      • Port write side of IO layer to use file-based datasources (#18135)

      RLlib

      🎉 New Features:

      • Replay buffers: Add config option to store contents in checkpoints (store_buffer_in_checkpoints=True). (#17999)
      • Add support for multi-GPU to DDPG. (#17789)

      💫Enhancements:

      • Support for running evaluation and training in parallel, thereby only evaluating as many episodes as the training loop takes (evaluation_num_episodes=”auto”). (#18380)
      • Enhanced stability: Started nightly multi-GPU (2) learning tests for most algos (tf + torch), including LSTM and attention net setups.

      🏗 Architecture refactoring:

      • Make MultiAgentEnv inherit gym.Env to avoid direct class type manipulation (#18156)
      • SampleBatch: Add support for nested data (+Docstring- and API cleanups). (#17485)
      • Add policies arg to callback: on_episode_step (already exists in all other episode-related callbacks) (#18119)
      • Add worker arg (optional) to policy_mapping_fn. (#18184)

      🔨 Fixes:

      • Fix Atari learning test regressions (2 bugs) and 1 minor attention net bug. (#18306)
      • Fix n-step > 1 postprocessing bug (issues 17844, 18034). (#18358)
      • Fix crash when using StochasticSampling exploration (most PG-style algos) w/ tf and numpy version > 1.19.5 (#18366)
      • Strictly run evaluation_num_episodes episodes each evaluation run (no matter the other eval config settings). (#18335)
      • Issue 17706: AttributeError: 'numpy.ndarray' object has no attribute 'items'" on certain turn-based MultiAgentEnvs with Dict obs space. (#17735)
      • Issue 17900: Set seed in single vectorized sub-envs properly, if num_envs_per_worker > 1 (#18110)
      • Fix R2D2 (torch) multi-GPU issue. (#18550)
      • Fix final_scale's default value to 0.02 (see OrnsteinUhlenbeck exploration). (#18070)
      • Ape-X doesn't take the value of prioritized_replay into account (#17541)
      • Issue 17653: Torch multi-GPU (>1) broken for LSTMs. (#17657)
      • Issue 17667: CQL-torch + GPU not working (due to simple_optimizer=False; must use simple optimizer!). (#17742)
      • Add locking to PolicyMap in case it is accessed by a RolloutWorker and the same worker's AsyncSampler or the main LearnerThread. (#18444)
      • Other fixes and enhancements: #18591, #18381, #18670, #18705, #18274, #18073, #18017, #18389, #17896, #17410, #17891, #18368, #17778, #18494, #18466, #17705, #17690, #18254, #17701, #18544, #17889, #18390, #18428, #17821, #17955, #17666, #18423, #18040, #17867, #17583, #17822, #18249, #18155, #18065, #18540, #18367, #17960, #17895, #18467, #17928, #17485, #18307, #18043, #17640, #17702, #15849, #18340

      Tune

      💫Enhancements:

      • Usability improvements when trials appear to be stuck in PENDING state forever when the cluster has insufficient resources. (#18611, #17957, #17533)
      • Searchers and Tune Callbacks now have access to some experiment settings information. (#17724, #17794)
      • Improve HyperOpt KeyError message when metric was not found. (#18549)
      • Allow users to configure bootstrap for docker syncer. (#17786)
      • Allow users to update trial resources on resume. (#17975)
      • Add max_concurrent_trials argument to tune.run. (#17905)
      • Type hint TrialExecutor. Use Abstract Base Class. (#17584)
      • Add developer/stability annotations. (#17442)

      🔨Fixes:

      • Placement group stability issues. (#18706, #18391, #18338)
      • Fix a DurableTrainable checkpointing bug. (#18318)
      • Fix a trial reset bug if a RLlib algorithm with default resources is used. (#18209)
      • Fix hyperopt points to evaluate for nested lists. (#18113)
      • Correctly validate initial points for random search. (#17282)
      • Fix local mode. Add explicit concurrency limiter for local mode. (#18023)
      • Sanitize trial checkpoint filename. (#17985)
      • Explicitly instantiate skopt categorical spaces. (#18005)

      SGD (v2)

      Ray SGD v2 is now in Alpha! The v2 version introduces APIs that focus on ease of use and composability. Check out the docs here, and the migration guide from v1 to v2 here. If you are using Ray SGD v2, we’d love to hear your feedback here!

      🎉 New Features:

      • Ray SGD v2
        • Horovod Backend (#18047)
        • JSON Callback (#17619) and Tensorboard Callback (#17824)
        • Checkpointing Support (#17632, #17807)
        • Fault Tolerance (#18090)
        • Integration with Ray Tune (#17839, #18179)
        • Custom resources per worker (#18327)
        • Low-level Stateful Class API (#18728)

      📖 Documentation:

      Serve

      ↗️Deprecation and API changes:

      • serve.start(http_host=..., http_port=..., http_middlewares=...) has been deprecated since Ray 1.2.0. They are now removed in favor of serve.start(http_options={“host”: …, “port”: …, “middlewares”: …). (#17762)
      • Remove deprecated ServeRequest API (#18120)
      • Remove deprecated endpoints API (#17989)

      🎉 New Features:

      • Serve checkpoint with cluster failure recovery from disk and S3 (#17622, #18293, #18657)

      🔨Fixes:

      • Better serve constructor failure handling (#16922, #18402)
      • Fix get_handle execution from threads (#18198)
      • Remove requirement to specify namespace for serve.start(detached=True) (#17470)

      🏗 Architecture refactoring:

      • Progress towards replica autoscaling (#18658)

      Dashboard

      🎉 New Features:

      • Ray system events are now published in experimental dashboard (#18330, pop #18698)
      • Actor page will now show actors with PENDING_CREATION status (#18666)

      Thanks

      Many thanks to all those who contributed to this release! @scottsun94, @hngenc, @iycheng, @asm582, @jkterry1, @ericl, @thomasdesr, @ryanlmelvin, @ellimac54, @Bam4d, @gjoliver, @juliusfrost, @simon-mo, @ashione, @RaphaelCS, @simonsays1980, @suquark, @jjyao, @lixin-wei, @77loopin, @Ivorforce, @DmitriGekhtman, @dependabot[bot], @souravraha, @robertnishihara, @richardliaw, @SongGuyang, @rkooo567, @edoakes, @jsuarez5341, @zhisbug, @clarkzinzow, @triciasfu, @architkulkarni, @akern40, @liuyang-my, @krfricke, @amogkam, @Jingyu-Peng, @xwjiang2010, @nikitavemuri, @hauntsaninja, @fyrestone, @navneet066, @ijrsvt, @mwtian, @sasha-s, @raulchen, @holdenk, @qicosmos, @Yard1, @yuduber, @mguarin0, @MissiontoMars, @stephanie-wang, @stefanbschneider, @sven1977, @AmeerHajAli, @matthewdeng, @chenk008, @jiaodong, @clay4444, @ckw017, @tchordia, @ThomasLecat, @Chong-Li, @jmakov, @jovany-wang, @tdhopper, @kfstorm, @wgifford, @mxz96102, @WangTaoTheTonic, @lada-kunc, @scv119, @kira-lin, @wuisawesome

      Source code(tar.gz)
      Source code(zip)
    • ray-1.6.0(Aug 23, 2021)

      Highlights

      • Runtime Environments are ready for general use! This feature enables you to dynamically specify per-task, per-actor and per-job dependencies, including a working directory, environment variables, pip packages and conda environments. Install it with pip install -U 'ray[default]'.
      • Ray Dataset is now in alpha! Dataset is an interchange format for distributed datasets, powered by Arrow. You can also use it for a basic Ray native data processing experience. Check it out here.
      • Ray Lightning v0.1 has been released! You can install it via pip install ray-lightning. Ray Lightning is a library of PyTorch Lightning plugins for distributed training using Ray. Features:
      • pip install ray now has a significantly reduced set of dependencies. Features such as the dashboard, the cluster launcher, runtime environments, and observability metrics may require pip install -U 'ray[default]' to be enabled. Please report any issues on Github if this is an issue!

      Ray Autoscaler

      🎉 New Features:

      • The Ray autoscaler now supports TPUs on GCP. Please refer to this example for spinning up a simple TPU cluster. (#17278)

      💫Enhancements:

      • Better AWS networking configurability (#17236 #17207 #14080)
      • Support for running autoscaler without NodeUpdaters (#17194, #17328)

      🔨 Fixes:

      • Code clean up and corrections to downscaling policy (#17352)
      • Docker file sync fix (#17361)

      Ray Client

      💫Enhancements:

      • Updated docs for client server ports and ray.init(ray://) (#17003, #17333)
      • Better error handling for deserialization failures (#17035)

      🔨 Fixes:

      • Fix for server proxy not working with non-default redis passwords (#16885)

      Ray Core

      🎉 New Features:

      • Runtime Environments are ready for general use!
        • Specify a working directory to upload your local files to all nodes in your cluster.
        • Specify different conda and pip dependencies for your tasks and actors and have them installed on the fly.

      🔨 Fixes:

      • Fix plasma store bugs for better data processing stability (#16976, #17135, #17140, #17187, #17204, #17234, #17396, #17550)
      • Fix a placement group bug where CUDA_VISIBLE_DEVICES were not properly detected (#17318)
      • Improved Ray stacktrace messages. (#17389)
      • Improved GCS stability and scalability (#17456, #17373, #17334, #17238, #17072)

      🏗 Architecture refactoring:

      • Plasma store refactor for better testability and extensibility. (#17332, #17313, #17307)

      Ray Data Processing

      Ray Dataset is now in alpha! Dataset is an interchange format for distributed datasets, powered by Arrow. You can also use it for a basic Ray native data processing experience. Check it out here.

      RLLib

      🎉 New Features:

      • Support for RNN/LSTM models with SAC (new agent: "RNNSAC"). Shoutout to ddworak94! (#16577)
      • Support for ONNX model export (tf and torch). (#16805)
      • Allow Policies to be added to/removed from a Trainer on-the-fly. (#17566)

      🔨 Fixes:

      • Fix for view requirements captured during compute actions test pass. Shoutout to Chris Bamford (#15856)

      • Issues: 17397, 17425, 16715, 17174. When on driver, Torch|TFPolicy should not use ray.get_gpu_ids() (b/c no GPUs assigned by ray). (#17444)

      • Other bug fixes: #15709, #15911, #16083, #16716, #16744, #16896, #16999, #17010, #17014, #17118, #17160, #17315, #17321, #17335, #17341, #17356, #17460, #17543, #17567, #17587

      🏗 Architecture refactoring:

      • CV2 to Skimage dependency change (CV2 still supported). Shoutout to Vince Jankovics. (#16841)
      • Unify tf and torch policies wrt. multi-GPU handling: PPO-torch is now 33% faster on Atari and 1 GPU. (#17371)
      • Implement all policy maps inside RolloutWorkers to be LRU-caches so that a large number of policies can be added on-the-fly w/o running out of memory. (#17031)
      • Move all tf static-graph code into DynamicTFPolicy, such that policies can be deleted and their tf-graph is GC'd. (#17169)
      • Simplify multi-agent configs: In most cases, creating dummy envs (only to retrieve spaces) are no longer necessary. (#16565, #17046)

      📖Documentation:

      • Examples scripts do-over (shoutout to Stefan Schneider for this initiative).
      • Example script: League-based self-play with "open spiel" env. (#17077)
      • Other doc improvements: #15664 (shoutout to kk-55), #17030, #17530

      Tune

      🎉 New Features:

      • Dynamic trial resource allocation with ResourceChangingScheduler (#16787)
      • It is now possible to use a define-by-run function to generate a search space with OptunaSearcher (#17464)

      💫Enhancements:

      • String names of searchers/schedulers can now be used directly in tune.run (#17517)
      • Filter placement group resources if not in use (progress reporting) (#16996)
      • Add unit tests for flatten_dict (#17241)

      🔨Fixes:

      • Fix HDFS sync down template (#17291)
      • Re-enable TensorboardX without Torch installed (#17403)

      📖Documentation:

      • LightGBM integration (#17304)
      • Other documentation improvements: #17407 (shoutout to amavilla), #17441, #17539, #17503

      SGD

      🎉 New Features:

      • We have started initial development on a new RaySGD v2! We will be rolling it out in a future version of Ray. See the documentation here. (#17536, #17623, #17357, #17330, #17532, #17440, #17447, #17300, #17253)

      💫Enhancements:

      • Placement Group support for TorchTrainer (#17037)

      Serve

      🎉 New Features:

      • Add Ray API stability annotations to Serve, marking many serve.\* APIs as Stable (#17295)
      • Support runtime_env's working_dir for Ray Serve (#16480)

      🔨Fixes:

      • Fix FastAPI's response_model not added to class based view routes (#17376)
      • Replace backend with deployment in metrics & logging (#17434)

      🏗Stability Enhancements:

      • Run Ray Serve with multi & single deployment large scale (1K+ cores) test running nightly (#17310, #17411, #17368, #17026, #17277)

      Thanks

      Many thanks to all who contributed to this release:

      @suquark, @xwjiang2010, @clarkzinzow, @kk-55, @mGalarnyk, @pdames, @Souphis, @edoakes, @sasha-s, @iycheng, @stephanie-wang, @antoine-galataud, @scv119, @ericl, @amogkam, @ckw017, @wuisawesome, @krfricke, @vakker, @qingyun-wu, @Yard1, @juliusfrost, @DmitriGekhtman, @clay4444, @mwtian, @corentinmarek, @matthewdeng, @simon-mo, @pcmoritz, @qicosmos, @architkulkarni, @rkooo567, @navneet066, @dependabot[bot], @jovany-wang, @kombuchafox, @thomasjpfan, @kimikuri, @Ivorforce, @franklsf95, @MissiontoMars, @lantian-xu, @duburcqa, @ddworak94, @ijrsvt, @sven1977, @kira-lin, @SongGuyang, @kfstorm, @Rohan138, @jamesmishra, @amavilla, @fyrestone, @lixin-wei, @stefanbschneider, @jiaodong, @richardliaw, @WangTaoTheTonic, @chenk008, @Catch-Bull, @Bam4d

      Source code(tar.gz)
      Source code(zip)
    • ray-1.5.2(Aug 12, 2021)

    • ray-1.5.1(Jul 31, 2021)

    • ray-1.5.0(Jul 26, 2021)

      Ray 1.5.0 Release Note

      Highlight

      • Ray Datasets is now in alpha (https://docs.ray.io/en/master/data/dataset.html)
      • LightGBM on Ray is now in beta (https://github.com/ray-project/lightgbm_ray).
        • enables multi-node and multi-GPU training
        • integrates seamlessly with distributed hyperparameter optimization library Ray Tune
        • comes with fault tolerance handling mechanisms, and
        • supports distributed dataframes and distributed data loading

      Ray Autoscaler

      🎉 New Features:

      • Aliyun support (#15712)

      💫 Enhancements:

      • [Kubernetes] Operator refactored to use Kopf package (#15787)
      • Flag to control config bootstrap for rsync (#16667)
      • Prometheus metrics for Autoscaler (#16066, #16198)
      • Allows launching in subnets where public IP assignments off by default (#16816)

      🔨 Fixes:

      • [Kubernetes] Fix GPU=0 resource handling (#16887)
      • [Kubernetes] Release docs updated with K8s test instructions (#16662)
      • [Kubernetes] Documentation update (#16570)
      • [Kubernetes] All official images set to rayproject/ray:latest (#15988 #16205)
      • [Local] Fix bootstrapping ray at a given static set of ips (#16202, #16281)
      • [Azure] Fix Azure Autoscaling Failures (#16640)
      • Handle node type key change / deletion (#16691)
      • [GCP] Retry GCP BrokenPipeError (#16952)

      Ray Client

      🎉 New Features:

      • Client integrations with major Ray Libraries (#15932, #15996, #16103, #16034, #16029, #16111, #16301)
      • Client Connect now returns a context that hasdisconnect and can be used as a context manager (#16021)

      💫 Enhancements:

      • Better support for multi-threaded client-side applications (#16731, #16732)
      • Improved error messages and warnings when misusing Ray Client (#16454, #16508, #16588, #16163)
      • Made Client Object & Actor refs a subclass of their non-client counterparts (#16110)

      🔨 Fixes:

      • dir() Works for client-side Actor Handles (#16157)
      • Avoid server-side time-outs (#16554)
      • Various fixes to the client-server proxy (#16040, #16038, #16057, #16180)

      Ray Core

      🎉 New Features:

      • Ray dataset alpha is available!

      🔨 Fixes:

      • Fix various Ray IO layer issues that fixes hanging & high memory usage (#16408, #16422, #16620, #16824, #16791, #16487, #16407, #16334, #16167, #16153, #16314, #15955, #15775)
      • Namespace now properly isolates placement groups (#16000)
      • More efficient object transfer for spilled objects (#16364, #16352)

      🏗 Architecture refactoring:

      • From Ray 1.5.0, liveness of Ray jobs are guaranteed as long as there’s enough disk space in machines with the “fallback allocator” mechanism which allocates plasma objects to the disk directly when objects cannot be created in memory or spilled to the disk.

      RLlib

      🎉 New Features:

      • Support for adding/deleting Policies to a Trainer on-the-fly (#16359, #16569, #16927).
      • Added new “input API” for customizing offline datasets (shoutout to Julius F.). (#16957)
      • Allow for external env PolicyServer to listen on n different ports (given n rollout workers); No longer require creating an env on the server side to get env’s spaces. (#16583).

      🔨 Fixes:

      • CQL: Bug fixes and clean-ups (fixed iteration count). (#16531, #16332)
      • D4RL: #16721
      • ensure curiosity exploration actions are passed in as tf tensors (shoutout to Manny V.). (#15704)
      • Other bug fixes and cleanups: #16162 and #16309 (shoutout to Chris B.), #15634, #16133, #16860, #16813, #16428, #16867, #16354, #16218, #16118, #16429, #16427, #16774, #16734, #16019, #16171, #16830, #16722

      📖 Documentation and testing:

      • #16311, #15908, #16271, #16080, #16740, #16843

      🏗 Architecture refactoring:

      • All RLlib algos operating on Box action spaces now operate on normalized actions by default (ranging from -1.0 to 1.0). This enables PG-style algos to learn in skewed action spaces. (#16531)

      Tune

      🎉 New Features:

      • New integration with LightGBM via Tune callbacks (#16713).
      • New cost-efficient HPO searchers (BlendSearch and CFO) available from the FLAML library (https://github.com/microsoft/FLAML). (#16329)

      💫 Enhancements:

      • Pass in configurations that have already been evaluated separately to Searchers. This is useful for warm-starting or for meta-searchers, for example (#16485)
      • Sort trials in reporter table by metric (#16576)
      • Add option to keep random values constant over grid search (#16501)
      • Read trial results from json file (#15915)

      🔨 Fixes:

      • Fix infinite loop when using Searcher that limits concurrency internally in conjunction with a ConcurrencyLimiter (#16416)
      • Allow custom sync configuration with DurableTrainable (#16739)
      • Logger fixes. W&B: #16806, #16674, #16839. MLflow: #16840
      • Various bug fixes: #16844, #16017, #16575, #16675, #16504, #15811, #15899, #16128, #16396, #16695, #16611

      📖 Documentation and testing:

      • Use BayesOpt for quick start example (#16997)
      • #16793, #16029, #15932, #16980, #16450, #16709, #15913, #16754, #16619

      SGD

      🎉 New Features:

      • Torch native mixed precision is now supported! (#16382)

      🔨 Fixes:

      • Use target label count for training batch size (#16400)

      📖 Documentation and testing:

      • #15999, #16111, #16301, #16046

      Serve

      💫 Enhancements: UX improvements (#16227, #15909), Improved logging (#16468) 🔨 Fixes: Fix shutdown logic (#16524), Assorted bug fixes (#16647, #16760, #16783) 📖 Documentation and testing: #16042, #16631, #16759, #16786

      Thanks

      Many thanks to all who contributed to this release:

      @Tonyhao96, @simon-mo, @scv119, @Yard1, @llan-ml, @xcharleslin, @jovany-wang, @ijrsvt, @max0x7ba, @annaluo676, @rajagurunath, @zuston, @amogkam, @yorickvanzweeden, @mxz96102, @chenk008, @Bam4d, @mGalarnyk, @kfstorm, @crdnb, @suquark, @ericl, @marload, @jiaodong, @thexiang, @ellimac54, @qicosmos, @mwtian, @jkterry1, @sven1977, @howardlau1999, @mvindiola1, @stefanbschneider, @juliusfrost, @krfricke, @matthewdeng, @zhuangzhuang131419, @brandonJY, @Eleven1Liu, @nikitavemuri, @richardliaw, @iycheng, @stephanie-wang, @HuangLED, @clarkzinzow, @fyrestone, @asm582, @qingyun-wu, @ckw017, @yncxcw, @DmitriGekhtman, @benjamindkilleen, @Chong-Li, @kathryn-zhou, @pcmoritz, @rodrigodelazcano, @edoakes, @dependabot[bot], @pdames, @frenkowski, @loicsacre, @gabrieleoliaro, @achals, @thomasjpfan, @rkooo567, @dibgerge, @clay4444, @architkulkarni, @lixin-wei, @ConeyLiu, @WangTaoTheTonic, @AnnaKosiorek, @wuisawesome, @gramhagen, @zhisbug, @franklsf95, @vakker, @jenhaoyang, @liuyang-my, @chaokunyang, @SongGuyang, @tgaddair

      Source code(tar.gz)
      Source code(zip)
    • ray-1.4.1(Jun 30, 2021)

      Release 1.4.1 Notes

      Ray Python Wheels

      Python 3.9 wheels (Linux / MacOS / Windows) are available (#16347 #16586)

      Ray Autoscaler

      🔨 Fixes: On-prem bug resolved (#16281)

      Ray Client

      💫Enhancements:

      • Add warnings when many tasks scheduled (#16454)
      • Better error messages (#16163)

      🔨 Fixes:

      • Fix gRPC Timeout Options (#16554)
      • Disconnect on dataclient error (#16588)

      Ray Core

      🔨 Fixes:

      • Runtime Environments
      • Fix race condition leading to failed imports #16278
      • Don't broadcast empty resources data (#16104)
      • Fix async actor lost object bug (#16414)
      • Always report job timestamps in milliseconds (#16455, #16545, #16548)
      • Multi-node placement group and job config bug fixes (#16345)
      • Fix bug in task dependency management for duplicate args (#16365)
      • Unify Python and core worker ids (#16712)

      Dask

      💫Enhancements: Dask 2021.06.1 support (#16547)

      Tune

      💫Enhancements: Support object refs in with_params (#16753)

      Serve

      🔨Fixes: Ray serve shutdown goes through Serve controller (#16524)

      Java

      🔨Fixes: Upgrade dependencies to fix CVEs (#16650, #16657)

      Documentation

      • Runtime Environments (#16290)
      • Feature contribution [Tune] (#16477)
      • Ray design patterns and anti-patterns (#16478)
      • PyTorch Lightning (#16484)
      • Ray Client (#16497)
      • Ray Deployment (#16538)
      • Dask version compatibility (#16595)

      CI

      Move wheel and Docker image upload from Travis to Buildkite (#16138 #16241)

      Thanks

      Many thanks to all those who contributed to this release!

      @rkooo567, @clarkzinzow, @WangTaoTheTonic, @ckw017, @stephanie-wang, @Yard1, @mwtian, @jovany-wang, @jiaodong, @wuisawesome, @krfricke, @architkulkarni, @ijrsvt, @simon-mo, @DmitriGekhtman, @amogkam, @richardliaw

      Source code(tar.gz)
      Source code(zip)
    • ray-1.4.0(Jun 7, 2021)

      Release 1.4.0 Notes

      Ray Autoscaler

      🎉 New Features:

      • Support Helm Chart for deploying Ray on Kubernetes
      • Key Autoscaler metrics are now exported via Prometheus!

      💫Enhancements

      • Better error messages when a node fails to come online

      🔨 Fixes:

      • Stability and interface fixes for Kubernetes deployments.
      • Fixes to Azure NodeProvider

      Ray Client

      🎉 New Features:

      • Complete API parity with non-client mode
      • Experimental ClientBuilder API (docs here)
      • Full Asyncio support

      💫Enhancements

      • Keep Alive for Messages for long lived connections
      • Improved pickling error messages

      🔨 Fixes:

      • Client Disconnect can be called multiple times
      • Client Reference Equality Check
      • Many bug fixes and tests for the complete ray API!

      Ray Core

      🎉 New Features:

      • Namespaces (check out the docs)! Note: this may be a breaking change if you’re using detached actors (set ray.init(namespace=””) for backwards compatible behavior).

      🔨 Fixes:

      • Support increment by arbitrary number with ray.util.metrics.Counter
      • Various bug fixes for the placement group APIs including the GPU assignment bug (#15049).

      🏗 Architecture refactoring:

      • Increase the efficiency and robustness of resource reporting

      Ray Data Processing

      🔨 Fixes:

      • Various bug fixes for better stability (#16063, #14821, #15669, #15757, #15431, #15426, #15034, #15071, #15070, #15008, #15955)
      • Fixed a critical bug where the driver uses excessive memory usage when there are many objects in the cluster (#14322).
      • Dask on Ray and Modin can now be run with Ray client

      🏗 Architecture refactoring:

      • Ray 100TB shuffle results: https://github.com/ray-project/ray/issues/15770
      • More robust memory management subsystem is in progress (#15157, #15027)

      RLlib

      🎉 New Features:

      • PyTorch multi-GPU support (#14709, #15492, #15421).
      • CQL TensorFlow support (#15841).
      • Task-settable Env/Curriculum Learning API (#15740).
      • Support for native tf.keras Models (no ModelV2 required) (#14684, #15273).
      • Trainer.train() and Trainer.evaluate() can run in parallel (optional) (#15040, #15345).

      💫Enhancements and documentation:

      • CQL: Bug fixes and confirmed MuJoCo benchmarks (#15814, #15603, #15761).
      • Example for differentiable neural computer (DNC) network (#14844, 15939).
      • Added support for int-Box action spaces. (#15012)
      • DDPG/TD3/A[23]C/MARWIL/BC: Code cleanup and type annotations. (#14707).
      • Example script for restoring 1 agent out of n
      • Examples for fractional GPU usage. (15334)
      • Enhanced documentation page describing example scripts and blog posts (15763).
      • Various enhancements/test coverage improvements: 15499, 15454, 15335, 14865, 15525, 15290, 15611, 14801, 14903, 15735, 15631,

      🔨 Fixes:

      • Memory Leak in multi-agent environment (#15815). Shoutout to Bam4d!
      • DDPG PyTorch GPU bug. (#16133)
      • Simple optimizer should not be used by default for tf+MA (#15365)
      • Various bug fixes: #15762, 14843, 15042, 15427, 15871, 15132, 14840, 14386, 15014, 14737, 15015, 15733, 15737, 15736, 15898, 16118, 15020, 15218, 15451, 15538, 15610, 15326, 15295, 15762, 15436, 15558, 15937

      🏗 Architecture refactoring:

      • Remove atari dependency (#15292).
      • Trainer._evaluate() renamed to Trainer.evaluate() (backward compatible); Trainer.evaluate() can be called even w/o evaluation worker set, if create_env_on_driver=True (#15591).

      Tune

      🎉 New Features:

      • ASHA scheduler now supports save/restore. (#15438)
      • Add HEBO to search algorithm shim function (#15468)
      • Add SkoptSearcher/Bayesopt Searcher restore functionality (#15075)

      💫Enhancements:

      • We now document scalability best practices (k8s, scalability thresholds). You can find this here (#14566)
      • You can now set the result buffer_length via tune.run - this helps with trials that report too frequently. (#15810)
      • Support numpy types in TBXlogger (#15760)
      • Add max_concurrent option to BasicVariantGenerator (#15680)
      • Add seed parameter to OptunaSearch (#15248)
      • Improve BOHB/ConfigSpace dependency check (#15064)

      🔨Fixes:

      • Reduce default number of maximum pending trials to max(16, cluster_cpus) (#15628)
      • Return normalized checkpoint path (#15296)
      • Escape paths before globbing in TrainableUtil.get_checkpoints_paths (#15368)
      • Optuna Searcher: Set correct Optuna TrialState on trial complete (#15283)
      • Fix type annotation in tune.choice (#15038)
      • Avoid system exit error by using del when cleaning up actors (#15687)

      Serve

      🎉 New Features:

      • As of Ray 1.4, Serve has a new API centered around the concept of “Deployments.” Deployments offer a more streamlined API and can be declaratively updated, which should improve both development and production workflows. The existing APIs have not changed from Ray 1.4 and will continue to work until Ray 1.5, at which point they will be removed (see the package reference if you’re not sure about a specific API). Please see the migration guide for details on how to update your existing Serve application to use this new API.
      • New serve.deployment API: @serve.deployment, serve.get_deployments, serve.list_deployments (#14935, #15172, #15124, #15121, #14953, #15152, #15821)
      • New serve.ingress(fastapi_app) API (#15445, 15441, 14858)
      • New @serve.batch decorator in favor of legacy max_batch_size in backend config (#15065)
      • serve.start() is now idempotent (#15148)
      • Added support for handle.method_name.remote() (#14831)

      🔨Fixes:

      • Rolling updates for redeployments (#14803)
      • Latency improvement by using pickle (#15945)
      • Controller and HTTP proxy uses num_cpus=0 by default (#15000)
      • Health checking in the controller instead of using max_restarts (#15047)
      • Use longest prefix matching for path routing (#15041)

      Dashboard

      🎉New Features:

      🔨Fixes:

      • Add object store memory column (#15697)
      • Add object store stats to dashboard API. (#15677)
      • Remove disk data from the dashboard when running on K8s. (#14676)
      • Fix reported dashboard ip when using 0.0.0.0 (#15506)

      Thanks

      Many thanks to all those who contributed to this release!

      @clay4444, @Fabien-Couthouis, @mGalarnyk, @smorad, @ckw017, @ericl, @antoine-galataud, @pleiadesian, @DmitriGekhtman, @robertnishihara, @Bam4d, @fyrestone, @stephanie-wang, @kfstorm, @wuisawesome, @rkooo567, @franklsf95, @micahtyong, @WangTaoTheTonic, @krfricke, @hegdeashwin, @devin-petersohn, @qicosmos, @edoakes, @llan-ml, @ijrsvt, @richardliaw, @Sertingolix, @ffbin, @simjay, @AmeerHajAli, @simon-mo, @tom-doerr, @sven1977, @clarkzinzow, @mxz96102, @SebastianBo1995, @amogkam, @iycheng, @sumanthratna, @Catch-Bull, @pcmoritz, @architkulkarni, @stefanbschneider, @tgaddair, @xcharleslin, @cthoyt, @fcardoso75, @Jeffwan, @mvindiola1, @michaelzhiluo, @rlan, @mwtian, @SongGuyang, @YeahNew, @kathryn-zhou, @rfali, @jennakwon06, @Yeachan-Heo

      Source code(tar.gz)
      Source code(zip)
    • ray-1.3.0(Apr 22, 2021)

      Release v1.3.0 Notes

      Highlights

      • We are now testing and publishing Ray's scalability limits with each release, see: https://github.com/ray-project/ray/tree/releases/1.3.0/benchmarks
      • Ray Client is now usable by default with any Ray cluster started by the Ray Cluster Launcher.

      Ray Cluster Launcher

      💫Enhancements:

      • Observability improvements (#14816, #14608)
      • Worker nodes no longer killed on autoscaler failure (#14424)
      • Better validation for min_workers and max_workers (#13779)
      • Auto detect memory resource for AWS and K8s (#14567)
      • On autoscaler failure, propagate error message to drivers (#14219)
      • Avoid launching GPU nodes when the workload only has CPU tasks (#13776)
      • Autoscaler/GCS compatibility (#13970, #14046, #14050)
      • Testing (#14488, #14713)
      • Migration of configs to multi-node-type format (#13814, #14239)
      • Better config validation (#14244, #13779)
      • Node-type max workers defaults infinity (#14201)

      🔨 Fixes:

      • AWS configuration (#14868, #13558, #14083, #13808)
      • GCP configuration (#14364, #14417)
      • Azure configuration (#14787, #14750, #14721)
      • Kubernetes (#14712, #13920, #13720, #14773, #13756, #14567, #13705, #14024, #14499, #14593, #14655)
      • Other (#14112, #14579, #14002, #13836, #14261, #14286, #14424, #13727, #13966, #14293, #14293, #14718, #14380, #14234, #14484)

      Ray Client

      💫Enhancements:

      • Version checks for Python and client protocol (#13722, #13846, #13886, #13926, #14295)
      • Validate server port number (#14815)
      • Enable Ray client server by default (#13350, #13429, #13442)
      • Disconnect ray upon client deactivation (#13919)
      • Convert Ray objects to Ray client objects (#13639)
      • Testing (#14617, #14813, #13016, #13961, #14163, #14248, #14630, #14756, #14786)
      • Documentation (#14422, #14265)

      🔨 Fixes:

      • Hook runtime context (#13750)
      • Fix mutual recursion (#14122)
      • Set gRPC max message size (#14063)
      • Monitor stream errors (#13386)
      • Fix dependencies (#14654)
      • Fix ray.get ctrl-c (#14425)
      • Report error deserialization errors (#13749)
      • Named actor refcounting fix (#14753)
      • RayTaskError serialization (#14698)
      • Multithreading fixes (#14701)

      Ray Core

      🎉 New Features:

      • We are now testing and publishing Ray's scalability limits with each release. Check out https://github.com/ray-project/ray/tree/releases/1.3.0/benchmarks.
      • [alpha] Ray-native Python-based collective communication primitives for Ray clusters with distributed CPUs or GPUs.

      🔨 Fixes:

      • Ray is now using c++14.
      • Fixed high CPU breaking raylets with heartbeat missing errors (#13963, #14301)
      • Fixed high CPU issues from raylet during object transfer (#13724)
      • Improvement in placement group APIs including better Java support (#13821, #13858, #13582, #15049, #13821)

      Ray Data Processing

      🎉 New Features:

      • Object spilling is turned on by default. Check out the documentation.
      • Dask-on-Ray and Spark-on-Ray are fully ready to use. Please try them out and give us feedback!
      • Dask-on-Ray is now compatible with Dask 2021.4.0.
      • Dask-on-Ray now works natively with dask.persist().

      🔨 Fixes:

      • Various improvements in object spilling and memory management layer to support large scale data processing (#13649, #14149, #13853, #13729, #14222, #13781, #13737, #14288, #14578, #15027)
      • lru_evict flag is now deprecated. Recommended solution now is to use object spilling.

      🏗 Architecture refactoring:

      • Various architectural improvements in object spilling and memory management. For more details, check out the whitepaper.
      • Locality-aware scheduling is turned on by default.
      • Moved from centralized GCS-based object directory protocol to decentralized owner-to-owner protocol, yielding better cluster scalability.

      RLlib

      🎉 New Features:

      • R2D2 implementation for torch and tf. (#13933)
      • PlacementGroup support (all RLlib algos now return PlacementGroupFactory from Trainer.default_resource_request). (#14289)
      • Multi-GPU support for tf-DQN/PG/A2C. (#13393)

      💫Enhancements:

      • Documentation: Update documentation for Curiosity's support of continuous actions (#13784); CQL documentation (#14531)
      • Attention-wrapper works with images and supports prev-n-actions/rewards options. (#14569)
      • rllib rollout runs in parallel by default via Trainer’s evaluation worker set. (#14208)
      • Add env rendering (customizable) and video recording options (for non-local mode; >0 workers; +evaluation-workers) and episode media logging. (#14767, #14796)
      • Allow SAC to use custom models as Q- or policy nets and deprecate "state-preprocessor" for image spaces. (#13522)
      • Example Scripts: Add coin game env + matrix social dilemma env + tests and examples (shoutout to Maxime Riché!). (#14208); Attention net (#14864); Serve + RLlib. (#14416); Env seed (#14471); Trajectory view API (enhancements and tf2 support). (#13786); Tune trial + checkpoint selection. (#14209)
      • DDPG: Add support for simplex action space. (#14011)
      • Others: on_learn_on_batch callback allows custom metrics. (#13584); Add TorchPolicy.export_model(). (#13989)

      🔨 Fixes:

      • Trajectory View API bugs (#13646, #14765, #14037, #14036, #14031, #13555)
      • Test cases (#14620, #14450, #14384, #13835, #14357, #14243)
      • Others (#13013, #14569, #13733, #13556, #13988, #14737, #14838, #15272, #13681, #13764, #13519, #14038, #14033, #14034, #14308, #14243)

      🏗 Architecture refactoring:

      • Remove all non-trajectory view API code. (#14860)
      • Obsolete UsageTrackingDict in favor of SampleBatch. (#13065)

      Tune

      🎉 New Features:

      • We added a new searcher HEBOSearcher (#14504, #14246, #13863, #14427)
      • Tune is now natively compatible with the Ray Client (#13778, #14115, #14280)
      • Tune now uses Ray’s Placement Groups underneath the hood. This will enable much faster autoscaling and training (for distributed trials) (#13906, #15011, #14313)

      💫Enhancements:

      • Checkpointing improvements (#13376, #13767)
      • Optuna Search Algorithm improvements (#14731, #14387)
      • tune.with_parameters now works with Class API (#14532)

      🔨Fixes:

      • BOHB & Hyperband fixes (#14487, #14171)
      • Nested metrics improvements (#14189, #14375, #14379)
      • Fix non-deterministic category sampling (#13710)
      • Type hints (#13684)
      • Documentation (#14468, #13880, #13740)
      • Various issues and bug fixes (#14176, #13939, #14392, #13812, #14781, #14150, #14850, #14118, #14388, #14152, #13825, #13936)

      SGD

      • Add fault tolerance during worker startup (#14724)

      Serve

      🎉 New Features:

      • Added metadata to default logger in backend replicas (#14251)
      • Added more metrics for ServeHandle stats (#13640)
      • Deprecated system-level batching in favor of @serve.batch (#14610, #14648)
      • Beta support for Serve with Ray client (#14163)
      • Use placement groups to bypass autoscaler throttling (#13844)
      • Deprecate client-based API in favor of process-wide singleton (#14696)
      • Add initial support for FastAPI ingress (#14754)

      🔨 Fixes:

      • Fix ServeHandle serialization (#13695)

      🏗 Architecture refactoring:

      • Refactor BackendState to support backend versioning and add more unit testing (#13870, #14658, #14740, #14748)
      • Optimize long polling to be per-key (#14335)

      Dashboard

      🎉 New Features:

      • Dashboard now supports being served behind a reverse proxy. (#14012)
      • Disk and network metrics are added to prometheus. (#14144)

      💫Enhancements:

      • Better CPU & memory information on K8s. (#14593, #14499)
      • Progress towards a new scalable dashboard. (#13790, #11667, #13763,#14333)

      Thanks

      Many thanks to all those who contributed to this release: @geraint0923, @iycheng, @yurirocha15, @brian-yu, @harryge00, @ijrsvt, @wumuzi520, @suquark, @simon-mo, @clarkzinzow, @RaphaelCS, @FarzanT, @ob, @ashione, @ffbin, @robertnishihara, @SongGuyang, @zhe-thoughts, @rkooo567, @Ezra-H, @acxz, @clay4444, @QuantumMecha, @jirkafajfr, @wuisawesome, @Qstar, @guykhazma, @devin-petersohn, @jeroenboeye, @ConeyLiu, @dependabot[bot], @fyrestone, @micahtyong, @javi-redondo, @Manuscrit, @mxz96102, @EscapeReality846089495, @WangTaoTheTonic, @stanislav-chekmenev, @architkulkarni, @Yard1, @tchordia, @zhisbug, @Bam4d, @niole, @yiranwang52, @thomasjpfan, @DmitriGekhtman, @gabrieleoliaro, @jparkerholder, @kfstorm, @andrew-rosenfeld-ts, @erikerlandson, @Crissman, @raulchen, @sumanthratna, @Catch-Bull, @chaokunyang, @krfricke, @raoul-khour-ts, @sven1977, @kathryn-zhou, @AmeerHajAli, @jovany-wang, @amogkam, @antoine-galataud, @tgaddair, @randxie, @ChaceAshcraft, @ericl, @cassidylaidlaw, @TanjaBayer, @lixin-wei, @lena-kashtelyan, @cathrinS, @qicosmos, @richardliaw, @rmsander, @jCrompton, @mjschock, @pdames, @barakmich, @michaelzhiluo, @stephanie-wang, @edoakes

      Source code(tar.gz)
      Source code(zip)
    • ray-1.2.0(Feb 13, 2021)

      Release v1.2.0 Notes

      Highlights

      • Ray client is now in beta! Check out more details here: https://docs.ray.io/en/master/ray-client.html XGBoost-Ray is now in beta! Check out more details about this project at https://github.com/ray-project/xgboost_ray.
      • Check out the Serve migration guide: https://docs.google.com/document/d/1CG4y5WTTc4G_MRQGyjnb_eZ7GK3G9dUX6TNLKLnKRAc/edit
      • Ray’s C++ support is now in beta: https://docs.ray.io/en/master/#getting-started-with-ray
      • An alpha version of object spilling is now available: https://docs.ray.io/en/master/memory-management.html#object-spilling

      Ray Autoscaler

      🎉 New Features:

      • A new autoscaler output format in monitor.log (#12772, #13561)
      • Piping autoscaler events to driver logs (#13434)

      💫Enhancements

      • Full support of ray.autoscaler.sdk.request_resources() API (https://docs.ray.io/en/master/cluster/autoscaling.html?highlight=request_resources#ray.autoscaler.sdk.request_resources) .
      • Make placement groups bypass max launch limit (#13089)
      • [K8s] Retry getting home directory in command runner. (#12925)
      • [docker] Pull if image is not present (#13136)
      • [Autoscaler] Ensure ubuntu is owner of docker host mount folder (#13579)

      🔨 Fixes:

      • Many autoscaler bug fixes (#12952, #12689, #13058, #13671, #13637, #13588, #13505, #13154, #13151, #13138, #13008, #12980, #12918, #12829, #12714, #12661, #13567, #13663, #13623, #13437, #13498, #13472, #13392, #12514, #13325, #13161, #13129, #12987, #13410, #12942, #12868, #12866, #12865, #12098, #12609)

      RLLib

      🎉 New Features:

      • Fast Attention Nets (using the trajectory view API) (#12753).
      • Attention Nets: Full PyTorch support (#12029).
      • Attention Nets: Support auto-wrapping around default- or custom models by specifying “use_attention=True” in the model’s config. * * * This works completely analogously now to “use_lstm=True”. (#11698)
      • New Offline RL Algorithm: CQL (based on SAC) (#13118).
      • MAML: Discrete actions support (added CartPole mass test case).
      • Support Atari framestacking via the trajectory view API (#13315).
      • Support for D4RL environments/benchmarks (#13550).
      • Preliminary work on JAX support (#13077, #13091).

      💫 Enhancements:

      • Rollout lengths: Allow unit to be configured as “agent_steps” in multi-agent settings (default: “env_steps”) (#12420).
      • TFModelV2: Soft-deprecate register_variables and unify var names wrt TorchModelV2 (#13339, #13363).

      📖 Documentation:

      • Added documentation on Model building API (#13260, #13261).
      • Added documentation for the trajectory view API. (#12718)
      • Added documentation for SlateQ (#13266).
      • Readme.md documentation for almost all algorithms in rllib/agents (#12943, #13035).
      • Type annotations for the “rllib/execution” folder (#12760, #13036).

      🔨 Fixes:

      • MARWIL and BC: Add grad-clipping config option to stabilize learning (#13455).
      • A3C: Solve PyTorch- and TF-eager async race condition between calling model and its value function (#13467).
      • Various issues- and bug fixes (#12619, #12682, #12704, #12706, #12708, #12765, #12786, #12787, #12793, #12832, #12844, #12846, #12915, #12941, #13039, #13040, #13064, #13083, #13121, #13126, #13237, #13238, #13308, #13332, #13397, #13459, #13553). ###🏗 Architecture refactoring:
      • Env directory has been cleaned up and is now divided in: Core part (rllib/env) with all basic env classes, and rllib/env/wrappers containing third-party wrapper classes (Atari, Unity3D, etc..) (#13082).

      Tune

      🎉 New Features:

      💫 Enhancements

      • Ray Tune now uses ray.cloudpickle underneath the hood, allowing you to checkpoint large models (>4GB) (#12958).
      • Using the 'reuse_actors' flag can now speed up training for general Trainable API usage. (#13549)
      • Ray Tune will now automatically buffer results from trainables, allowing you to use an arbitrary reporting frequency on your training functions. (#13236)
      • Ray Tune now has a variety of experiment stoppers (#12750)
      • Ray Tune now supports an integer loguniform search space distribution (#12994)
      • Ray Tune now has an initial support for the Ray placement group API. (#13370)
      • The Weights and Bias integration (WandbLogger) now also accepts wandb.data_types.Video (#13169)
      • The Hyperopt integration (HyperoptSearch) can now directly accept category variables instead of indices (#12715)
      • Ray Tune now supports experiment checkpointing when using grid search (#13357)

      🔨Fixes and Updates

      • The Optuna integration was updated to support the 2.4.0 API while maintaining backwards compatibility (#13631)
      • All search algorithms now support points_to_evaluate (#12790, #12916)
      • PBT Transformers example was updated and improved (#13174, #13131)
      • The scikit-optimize integration was improved (#12970)
      • Various bug fixes (#13423, #12785, #13171, #12877, #13255, #13355)

      SGD

      🔨Fixes and Updates

      • Fix Docstring for as_trainable (#13173)
      • Fix process group timeout units (#12477)
      • Disable Elastic Training by default when using with Tune (#12927)

      Serve

      🎉 New Features:

      • Ray Serve backends now accept a Starlette request object instead of a Flask request object (#12852). This is a breaking change, so please read the migration guide.
      • Ray Serve backends now have the option of returning a Starlette Response object (#12811, #13328). This allows for more customizable responses, including responses with custom status codes.
      • [Experimental] The new Ray Serve MLflow plugin makes it easy to deploy your MLflow models on Ray Serve. It comes with a Python API and a command-line interface.
      • Using “ImportedBackend” you can now specify a backend based on a class that is installed in the Python environment that the workers will run in, even if the Python environment of the driver script (the one making the Serve API calls) doesn’t have it installed (#12923).

      💫 Enhancements:

      • Dependency management using conda no longer requires the driver script to be running in an activated conda environment (#13269).
      • Ray ObjectRef can now be used as argument to serve_handle.remote(...). (#12592)
      • Backends are now shut down gracefully. You can set the graceful timeout in BackendConfig. (#13028)

      📖 Documentation:

      • A tutorial page has been added for integrating Ray Serve with your existing FastAPI web server or with your existing AIOHTTP web server (#13127).
      • Documentation has been added for Ray Serve metrics (#13096).
      Source code(tar.gz)
      Source code(zip)
    • ray-1.1.0(Dec 24, 2020)

      Ray 1.1.0

      Ray Core

      🎉 New Features:

      • Progress towards supporting a Ray client
      • Descendent tasks are cancelled when the calling task is cancelled

      🔨 Fixes:

      • Improved object broadcast robustness
      • Improved placement group support

      🏗 Architecture refactoring:

      • Progress towards the new scheduler backend

      RLlib

      🎉 New Features:

      • SUMO simulator integration (rllib/examples/simulators/sumo/). Huge thanks to Lara Codeca! (#11710)
      • SlateQ Algorithm added for PyTorch. Huge thanks to Henry Chen! (#11450)
      • MAML extension for all Models, except recurrent ones. (#11337)
      • Curiosity Exploration Module for tf1.x/2.x/eager. (#11945)
      • Minimal JAXModelV2 example. (#12502)

      🔨 Fixes:

      • Fix RNN learning for tf2.x/eager. (#11720)
      • LSTM prev-action/prev-reward settable separately and prev-actions are now one-hot’d. (#12397)
      • PyTorch LR schedule not working. (#12396)
      • Various PyTorch GPU bug fixes. (#11609)
      • SAC loss not using prio. replay weights in critic’s loss term. (#12394)
      • Fix epsilon-greedy Exploration for nested action spaces. (#11453)

      🏗 Architecture refactoring:

      • Trajectory View API on by default (faster PG-type algos by ~20% (e.g. PPO on Atari)). (#11717, #11826, #11747, and #11827)

      Tune

      🎉 New Features:

      • Loggers can now be passed as objects to tune.run. The new ExperimentLogger abstraction was introduced for all loggers, making it much easier to configure logging behavior. (#11984, #11746, #11748, #11749)
      • The tune verbosity was refactored into four levels: 0: Silent, 1: Only experiment-level logs, 2: General trial-level logs, 3: Detailed trial-level logs (default) (#11767, #12132, #12571)
      • Docker and Kubernetes autoscaling environments are detected automatically, automatically utilizing the correct checkpoint/log syncing tools (#12108)
      • Trainables can now easily leverage Tensorflow DistributedStrategy! (#11876)

      💫 Enhancements

      • Introduced a new serialization debugging utility (#12142)
      • Added a new lightweight Pytorch-lightning example (#11497, #11585)
      • The BOHB search algorithm can be seeded with a random state (#12160)
      • The default anonymous metrics can be used automatically if a mode is set in tune.run (#12159).
      • Added HDFS as Cloud Sync Client (#11524)
      • Added xgboost_ray integration (#12572)
      • Tune search spaces can now be passed to search algorithms on initialization, not only via tune.run (#11503)
      • Refactored and added examples (#11931)
      • Callable accepted for register_env (#12618)
      • Tune search algorithms can handle/ignore infinite and NaN numbers (#11835)
      • Improved scalability for experiment checkpointing (#12064)
      • Nevergrad now supports points_to_evaluate (#12207)
      • Placement group support for distributed training (#11934)

      🔨 Fixes:

      • Fixed with_parameters behavior to avoid serializing large data in scope (#12522)
      • TBX logger supports None (#12262)
      • Better error when metric or mode unset in search algorithms (#11646)
      • Better warnings/exceptions for fail_fast='raise' (#11842)
      • Removed some bottlenecks in trialrunner (#12476)
      • Fix file descriptor leak by syncer and Tensorboard (#12590, #12425)
      • Fixed validation for search metrics (#11583)
      • Fixed hyperopt randint limits (#11946)

      Serve

      🎉 New Features:

      • You can start backends in different conda environments! See more in the dependency management doc. (#11743)
      • You can add a optional reconfigure method to your Servable to allow reconfiguring backend replicas at runtime. (#11709)

      🔨Fixes:

      • Set serve.start(http_host=None) to disable HTTP servers. If you are only using ServeHandle, this option lowers resource usage. (#11627)
      • Flask requests will no longer create reference cycles. This means peak memory usage should be lower for high traffic scenarios. (#12560)

      🏗 Architecture refactoring:

      • Progress towards a goal state driven Serve controller. (#12369,#11792,#12211,#12275,#11533,#11822,#11579,#12281)
      • Progress towards faster and more efficient ServeHandles. (#11905, #12019, #12093)

      Ray Cluster Launcher (Autoscaler)

      🎉 New Features:

      • A new Kubernetes operator: https://docs.ray.io/en/master/cluster/k8s-operator.html

      💫 Enhancements

      • Containers do not run with root user as the default (#11407)
      • SHM-Size is auto-populated when using the containers (#11953)

      🔨 Fixes:

      • Many autoscaler bug fixes (#11677, #12222, #11458, #11896, #12123, #11820, #12513, #11714, #12512, #11758, #11615, #12106, #11961, #11674, #12028, #12020, #12316, #11802, #12131, #11543, #11517, #11777, #11810, #11751, #12465, #11422)

      SGD

      🎉 New Features:

      • Easily customize your torch.DistributedDataParallel configurations by passing in a ddp_args field into TrainingOperator.register (#11771).

      🔨 Fixes:

      • TorchTrainer now properly scales up to more workers if more resources become available (#12562)

      📖 Documentation:

      • The new callback API for using Ray SGD with Tune is now documented (#11479)
      • Pytorch Lightning + Ray SGD integration is now documented (#12440)

      Dashboard

      🔨 Fixes:

      • Fixed bug that prevented viewing the logs for cluster workers
      • Fixed bug that caused "Logical View" page to crash when opening a list of actors for a given class.

      🏗 Architecture refactoring:

      • Dashboard runs on a new backend architecture that is more scalable and well-tested. The dashboard should work on ~100 node clusters now, and we're working on lifting scalability to constraints to support even larger clusters.

      Thanks

      Many thanks to all those who contributed to this release: @bartbroere, @SongGuyang, @gramhagen, @richardliaw, @ConeyLiu, @weepingwillowben, @zhongchun, @ericl, @dHannasch, @timurlenk07, @kaushikb11, @krfricke, @desktable, @bcahlit, @rkooo567, @amogkam, @micahtyong, @edoakes, @stephanie-wang, @clay4444, @ffbin, @mfitton, @barakmich, @pcmoritz, @AmeerHajAli, @DmitriGekhtman, @iamhatesz, @raulchen, @ingambe, @allenyin55, @sven1977, @huyz-git, @yutaizhou, @suquark, @ashione, @simon-mo, @raoul-khour-ts, @Leemoonsoo, @maximsmol, @alanwguo, @kishansagathiya, @wuisawesome, @acxz, @gabrieleoliaro, @clarkzinzow, @jparkerholder, @kingsleykuan, @InnovativeInventor, @ijrsvt, @lasagnaphil, @lcodeca, @jiajiexiao, @heng2j, @wumuzi520, @mvindiola1, @aaronhmiller, @robertnishihara, @WangTaoTheTonic, @chaokunyang, @nikitavemuri, @kfstorm, @roireshef, @fyrestone, @viotemp1, @yncxcw, @karstenddwx, @hartikainen, @sumanthratna, @architkulkarni, @michaelzhiluo, @UWFrankGu, @oliverhu, @danuo, @lixin-wei

      Source code(tar.gz)
      Source code(zip)
    • ray-1.0.1.post1(Nov 19, 2020)

      Patch release containing the following changes:

      • https://github.com/ray-project/ray/commit/bcc92f59fdcd837ccc5a560fe37bdf0619075505 Fix dashboard crashing on multi-node clusters.
      • https://github.com/ray-project/ray/pull/11600 Add the cluster_name to docker file mounts directory prefix.
      Source code(tar.gz)
      Source code(zip)
    • ray-1.0.1(Nov 10, 2020)

      Ray 1.0.1

      Ray 1.0.1 is now officially released!

      Highlights

      • If you're migrating from Ray < 1.0.0, be sure to check out the 1.0 Migration Guide.
      • Autoscaler is now docker by default.
      • RLLib features multiple new environments.
      • Tune supports population based bandits, checkpointing in Docker, and multiple usability improvements.
      • SGD supports PyTorch Lightning
      • All of Ray's components and libraries have improved performance, scalability, and stability.

      Core

      • 1.0 Migration Guide.
      • Many bug fixes and optimizations in GCS.
      • Polishing of the Placement Group API.
      • Improved Java language support

      RLlib

      • Added documentation for Curiosity exploration module (#11066).
      • Added RecSym environment wrapper (#11205).
      • Added Kaggle’s football environment (multi-agent) wrapper (#11249).
      • Multiple bug fixes: GPU related fixes for SAC (#11298), MARWIL, all example scripts run on GPU (#11105), lifted limitation on 2^31 timesteps (#11301), fixed eval workers for ES and ARS (#11308), fixed broken no-eager-no-workers mode (#10745).
      • Support custom MultiAction distributions (#11311).
      • No environment is created on driver (local worker) if not necessary (#11307).
      • Added simple SampleCollector class for Trajectory View API (#11056).
      • Code cleanup: Docstrings and type annotations for Exploration classes (#11251), DQN (#10710), MB-MPO algorithm, SAC algorithm (#10825).

      Serve

      • API: Serve will error when serve_client is serialized. (#11181)
      • Performance: serve_client.get_handle("endpoint") will now get a handle to nearest node, increasing scalability in distributed mode. (#11477)
      • Doc: Added FAQ page and updated architecture page (#10754, #11258)
      • Testing: New distributed tests and benchmarks are added (#11386)
      • Testing: Serve now run on Windows (#10682)

      SGD

      • Pytorch Lightning integration is now supported (#11042)
      • Support num_steps continue training (#11142)
      • Callback API for SGD+Tune (#11316)

      Tune

      • New Algorithm: Population-based Bandits (#11466)
      • tune.with_parameters(), a wrapper function to pass arbitrary objects through the object store to trainables (#11504)
      • Strict metric checking - by default, Tune will now error if a result dict does not include the optimization metric as a key. You can disable this with TUNE_DISABLE_STRICT_METRIC_CHECKING (#10972)
      • Syncing checkpoints between multiple Docker containers on a cluster is now supported with the DockerSyncer (#11035)
      • Added type hints (#10806)
      • Trials are now dynamically created (instead of created up front) (#10802)
      • Use tune.is_session_enabled() in the Function API to toggle between Tune and non-tune code (#10840)
      • Support hierarchical search spaces for hyperopt (#11431)
      • Tune function API now also supports yield and return statements (#10857)
      • Tune now supports callbacks with tune.run(callbacks=... (#11001)
      • By default, the experiment directory will be dated (#11104)
      • Tune now supports reuse_actors for function API, which can largely accelerate tuning jobs.

      Thanks

      We thank all the contributors for their contribution to this release!

      @acxz, @Gekho457, @allenyin55, @AnesBenmerzoug, @michaelzhiluo, @SongGuyang, @maximsmol, @WangTaoTheTonic, @Basasuya, @sumanthratna, @juliusfrost, @maxco2, @Xuxue1, @jparkerholder, @AmeerHajAli, @raulchen, @justinkterry, @herve-alanaai, @richardliaw, @raoul-khour-ts, @C-K-Loan, @mattearllongshot, @robertnishihara, @internetcoffeephone, @Servon-Lee, @clay4444, @fangyeqing, @krfricke, @ffbin, @akotlar, @rkooo567, @chaokunyang, @PidgeyBE, @kfstorm, @barakmich, @amogkam, @edoakes, @ashione, @jseppanen, @ttumiel, @desktable, @pcmoritz, @ingambe, @ConeyLiu, @wuisawesome, @fyrestone, @oliverhu, @ericl, @weepingwillowben, @rkube, @alanwguo, @architkulkarni, @lasagnaphil, @rohitrawat, @ThomasLecat, @stephanie-wang, @suquark, @ijrsvt, @VishDev12, @Leemoonsoo, @scottwedge, @sven1977, @yiranwang52, @carlos-aguayo, @mvindiola1, @zhongchun, @mfitton, @simon-mo

      Source code(tar.gz)
      Source code(zip)
    • ray-1.0.0(Sep 30, 2020)

      Ray 1.0

      We're happy to announce the release of Ray 1.0, an important step towards the goal of providing a universal API for distributed computing.

      To learn more about Ray 1.0, check out our blog post and whitepaper.

      Ray Core

      • The ray.init() and ray start commands have been cleaned up to remove deprecated arguments
      • The Ray Java API is now stable
      • Improved detection of Docker CPU limits
      • Add support and documentation for Dask-on-Ray and MARS-on-Ray: https://docs.ray.io/en/master/ray-libraries.html
      • Placement groups for fine-grained control over scheduling decisions: https://docs.ray.io/en/latest/placement-group.html.
      • New architecture whitepaper: https://docs.ray.io/en/master/whitepaper.html

      Autoscaler

      • Support for multiple instance types in the same cluster: https://docs.ray.io/en/master/cluster/autoscaling.html
      • Support for specifying GPU/accelerator type in @ray.remote

      Dashboard & Metrics

      • Improvements to the memory usage tab and machine view
      • The dashboard now supports visualization of actor states
      • Support for Prometheus metrics reporting: https://docs.ray.io/en/latest/ray-metrics.html

      RLlib

      • Two Model-based RL algorithms were added: MB-MPO (“Model-based meta-policy optimization”) and “Dreamer”. Both algos were benchmarked and are performing comparably to the respective papers’ reported results.
      • A “Curiosity” (intrinsic motivation) module was added via RLlib’s Exploration API and benchmarked on a sparse-reward Unity3D environment (Pyramids).
      • Added documentation for the Distributed Execution API.
      • Removed (already soft-deprecated) APIs: Model(V1) class, Trainer config keys, some methods/functions. Where you would see a warning previously when using these, there will be an error thrown now.
      • Added DeepMind Control Suite examples.

      Tune

      Breaking changes:

      • Multiple tune.run parameters have been deprecated: ray_auto_init, run_errored_only, global_checkpoint_period, with_server (#10518)
      • tune.run(upload_dir, sync_to_cloud, sync_to_driver, sync_on_checkpoint have been moved to tune.SyncConfig [docs] (#10518)

      New APIs:

      • mode, metric, time_budget parameters for tune.run (#10627, #10642)
      • Search Algorithms now share a uniform API: (#10621, #10444). You can also use the new create_scheduler/create_searcher shim layer to create search algorithms/schedulers via string, reducing boilerplate code (#10456).
      • Native callbacks for: MXNet, Horovod, Keras, XGBoost, PytorchLightning (#10533, #10304, #10509, #10502, #10220)
      • PBT runs can be replayed with PopulationBasedTrainingReplay scheduler (#9953)
      • Search Algorithms are saved/resumed automatically (#9972)
      • New Optuna Search Algorithm docs (#10044)
      • Tune now can sync checkpoints across Kubernetes pods (#10097)
      • Failed trials can be rerun with tune.run(resume="run_errored_only") (#10060)

      Other Changes:

      • Trial outputs can be saved to file via tune.run(log_to_file=...) (#9817)
      • Trial directories can be customized, and default trial directory now includes trial name (#10608, #10214)
      • Improved Experiment Analysis API (#10645)
      • Support for Multi-objective search via SigOpt Wrapper (#10457, #10446)
      • BOHB Fixes (#10531, #10320)
      • Wandb improvements + RLlib compatibility (#10950, #10799, #10680, #10654, #10614, #10441, #10252, #8521)
      • Updated documentation for FAQ, Tune+serve, search space API, lifecycle (#10813, #10925, #10662, #10576, #9713, #10222, #10126, #9908)

      RaySGD:

      • Creator functions are subsumed by the TrainingOperator API (#10321)
      • Training happens on actors by default (#10539)

      Serve

      • serve.client API makes it easy to appropriately manage lifetime for multiple Serve clusters. (#10460)
      • Serve APIs are fully typed. (#10205, #10288)
      • Backend configs are now typed and validated via Pydantic. (#10559, #10389)
      • Progress towards application level backend autoscaler. (#9955, #9845, #9828)
      • New architecture page in documentation. (#10204)

      Thanks

      We thank all the contributors for their contribution to this release!

      @MissiontoMars, @ijrsvt, @desktable, @kfstorm, @lixin-wei, @Yard1, @chaokunyang, @justinkterry, @pxc, @ericl, @WangTaoTheTonic, @carlos-aguayo, @sven1977, @gabrieleoliaro, @alanwguo, @aryairani, @kishansagathiya, @barakmich, @rkube, @SongGuyang, @qicosmos, @ffbin, @PidgeyBE, @sumanthratna, @yushan111, @juliusfrost, @edoakes, @mehrdadn, @Basasuya, @icaropires, @michaelzhiluo, @fyrestone, @robertnishihara, @yncxcw, @oliverhu, @yiranwang52, @ChuaCheowHuan, @raphaelavalos, @suquark, @krfricke, @pcmoritz, @stephanie-wang, @hekaisheng, @zhijunfu, @Vysybyl, @wuisawesome, @sanderland, @richardliaw, @simon-mo, @janblumenkamp, @zhuohan123, @AmeerHajAli, @iamhatesz, @mfitton, @noahshpak, @maximsmol, @weepingwillowben, @raulchen, @09wakharet, @ashione, @henktillman, @architkulkarni, @rkooo567, @zhe-thoughts, @amogkam, @kisuke95, @clarkzinzow, @holli, @raoul-khour-ts

      Source code(tar.gz)
      Source code(zip)
    • ray-0.8.7(Aug 13, 2020)

      Highlight

      • Ray is moving towards 1.0! It has had several important naming changes.
        • ObjectIDs are now called ObjectRefs because they are not just IDs.
        • The Ray Autoscaler is now called the Ray Cluster Launcher. The autoscaler will be a module of the Ray Cluster Launcher.
      • The Ray Cluster Launcher now has a much cleaner and concise output style. Try it out with ray up --log-new-style. The new output style will be enabled by default (with opt-out) in a later release.
      • Windows is now officially supported by RLlib. Multi node support for Windows is still in progress.

      Cluster Launcher/CLI (formerly autoscaler)

      • Highlight: This release contains a new colorful, concise output style for ray up and ray down, available with the --log-new-style flag. It will be enabled by default (with opt-out) in a later release. Full output style coverage for Cluster Launcher commands will also be available in a later release. (#9322, #9943, #9960, #9690)
      • Documentation improvements (with guides and new sections) (#9687
      • Improved Cluster launcher docker support (#9001, #9105, #8840)
      • Ray now has Docker images available on Docker hub. Please check out the ray image (#9732, #9556, #9458, #9281)
      • Azure improvements (#8938)
      • Improved on-prem cluster autoscaler (#9663)
      • Add option for continuous sync of file mounts (#9544)
      • Add ray status debug tool and ray --version (#9091, #8886).
      • ray memory now also supports redis_password (#9492)
      • Bug fixes for the Kubernetes cluster launcher mode (#9968)
      • Various improvements: disabling the cluster config cache (#8117), Python API requires keyword arguments (#9256), removed fingerprint checking for SSH (#9133), Initial support for multiple worker types (#9096), various changes to the internal node provider interface (#9340, #9443)

      Core

      • Support Python type checking for Ray tasks (#9574)
      • Rename ObjectID => ObjectRef (#9353)
      • New GCS Actor manager on by default (#8845, #9883, #9715, #9473, #9275)
      • Worker towards placement groups (#9039)
      • Plasma store process is merged with raylet (#8939, #8897)
      • Option to automatically reconstruct objects stored in plasma after a failure. See the documentation for more information. (#9394, #9557, #9488)
      • Many bug fixes.

      RLlib

      • New algorithm: “Model-Agnostic Meta-Learning” (MAML). An algo that learns and generalizes well across a distribution of environments.
      • New algorithm: “Model-Based Meta-Policy-Optimization” (MB-MPO). Our first model-based RL algo.
      • Windows is now officially supported by RLlib.
      • Native TensorFlow 2.x support. Use framework=”tf2” in your config to tap into TF2’s full potential. Also: SAC, DDPG, DQN Rainbow, ES, and ARS now run in TF1.x Eager mode.
      • DQN PyTorch support for full Rainbow setup (including distributional DQN).
      • Python type hints for Policy, Model, Offline, Evaluation, and Env classes.
      • Deprecated “Policy Optimizer” package (in favor of new distributed execution API).
      • Enhanced test coverage and stability.
      • Flexible multi-agent replay modes and replay_sequence_length. We now allow a) storing sequences (over time) in replay buffers and retrieving “lock-stepped” multi-agent samples.
      • Environments: Unity3D soccer game (tuned example/benchmark) and DM Control Suite wrapper and examples.
      • Various Bug fixes: QMIX not learning, DDPG torch bugs, IMPALA learning rate updates, PyTorch custom loss, PPO not learning MuJoCo due to action clipping bug, DQN w/o dueling layer error.

      Tune

      • API Changes:
        • The Tune Function API now supports checkpointing and is now usable with all search and scheduling algorithms! (#8471, #9853, #9517)
        • The Trainable class API has renamed many of its methods to be public (#9184)
      • You can now stop experiments upon convergence with Bayesian Optimization (#8808)
      • DistributedTrainableCreator, a simple wrapper for distributed parameter tuning with multi-node DistributedDataParallel models (#9550, #9739)
      • New integration and tutorial for using Ray Tune with Weights and Biases (Logger and native API) (#9725)
      • Tune now provides a Scikit-learn compatible wrapper for hyperparameter tuning (#9129)
      • New tutorials for integrations like XGBoost (#9060), multi GPU PyTorch (#9338), PyTorch Lightning (#9151, #9451), and Huggingface-Transformers (#9789)
      • CLI Progress reporting improvements (#8802, #9537, #9525)
      • Various bug fixes: handling of NaN values (#9381), Tensorboard logging improvements (#9297, #9691, #8918), enhanced cross-platform compatibility (#9141), re-structured testing (#9609), documentation reorganization and versioning (#9600, #9427, #9448)

      RaySGD

      • Variable worker CPU requirements (#8963)
      • Simplified cuda visible device setting (#8775)

      Serve

      • Horizontal scalability: Serve will now start one HTTP server per Ray node. (#9523)
      • Various performance improvement matching Serve to FastAPI (#9490,#8709, #9531, #9479 ,#9225, #9216, #9485)
      • API changes
        • serve.shadow_traffic(endpoint, backend, fraction) duplicates and sends a fraction of the incoming traffic to a specific backend. (#9106)
        • serve.shutdown() cleanup the current Serve instance in Ray cluster. (#8766)
        • Exception will be raised if num_replicas exceeds the maximum resource in the cluster (#9005)
      • Added doc examples for how to perform metric monitoring and model composition.

      Dashboard

      • Configurable Dashboard Port: The port on which the dashboard will run is now configurable using the argument --dashboard-port and the argument dashboard_port to ray.init
      • GPU monitoring improvements
        • For machines with more than one GPU, the GPU and GRAM utilization is now broken out on a per-GPU basis.
        • Assignments to physical GPUs are now shown at the worker level.
      • Sortable Machine View: It is now possible to sort the machine view by almost any of its columns by clicking next to the title. In addition, whereas the workers are normally grouped by node, you can now ungroup them if you only want to see details about workers.
      • Actor Search Bar: It is possible to search for actors by their title now (this is the class name of the actor in python in addition to the arguments it received.)
      • Logical View UI Updates: This includes things like color-coded names for each of the actor states, a more grid-like layout, and tooltips for the various data.
      • Sortable Memory View: Like the machine view, the memory view now has sortable columns and can be grouped / ungrouped by node.

      Windows Support

      • Improve GPU detection (#9300)
      • Work around msgpack issue on PowerPC64LE (#9140)

      Others

      • Ray Streaming Library Improvements (#9240, #8910, #8780)
      • Java Support Improvements (#9371, #9033, #9037, #9032, #8858, #9777, #9836, #9377)
      • Parallel Iterator Improvements (#8964, #8978)

      Thanks

      We thank the following contributors for their work on this release: @jsuarez5341, @amitsadaphule, @krfricke, @williamFalcon, @richardliaw, @heyitsmui, @mehrdadn, @robertnishihara, @gabrieleoliaro, @amogkam, @fyrestone, @mimoralea, @edoakes, @andrijazz, @ElektroChan89, @kisuke95, @justinkterry, @SongGuyang, @barakmich, @bloodymeli, @simon-mo, @TomVeniat, @lixin-wei, @alanwguo, @zhuohan123, @michaelzhiluo, @ijrsvt, @pcmoritz, @LecJackS, @sven1977, @ashione, @JerryLeeCS, @raphaelavalos, @stephanie-wang, @ruifangChen, @vnlitvinov, @yncxcw, @weepingwillowben, @goulou, @acmore, @wuisawesome, @gramhagen, @anabranch, @internetcoffeephone, @Alisahhh, @henktillman, @deanwampler, @p-christ, @Nicolaus93, @WangTaoTheTonic, @allenyin55, @kfstorm, @rkooo567, @ConeyLiu, @09wakharet, @piojanu, @mfitton, @KristianHolsheimer, @AmeerHajAli, @pdames, @ericl, @VishDev12, @suquark, @stefanbschneider, @raulchen, @dcfidalgo, @chappers, @aaarne, @chaokunyang, @sumanthratna, @clarkzinzow, @BalaBalaYi, @maximsmol, @zhongchun, @wumuzi520, @ffbin

      Source code(tar.gz)
      Source code(zip)
    • ray-0.8.6(Jun 24, 2020)

      Highlight

      • Experimental support for Windows is now available for single node Ray usage. Check out the Windows section below for known issues and other details.
      • Have you had troubles monitoring GPU or memory usage while you used Ray? The Ray dashboard now supports the GPU monitoring and a memory view.
      • Want to use RLlib with Unity? RLlib officially supports the Unity3D adapter! Please check out the documentation.
      • Ray Serve is ready for feedback! We've gotten feedback from many users, and Ray Serve is already being used in production. Please reach out to us with your use cases, ideas, documentation improvements, and feedback. We'd love to hear from you. Please do so on the Ray Slack and join #serve! Please see the Serve section below for more details.

      Core

      • We’ve introduced a new feature to automatically retry failed actor tasks after an actor has been restarted by Ray (by specifying max_restarts in @ray.remote). Try it out with max_task_retries=-1 where -1 indicates that the system can retry the task until it succeeds.

      API Change

      • To enable automatic restarts of a failed actor, you must now use max_restarts in the @ray.remote decorator instead of max_reconstructions. You can use -1 to indicate infinity, i.e., the system should always restart the actor if it fails unexpectedly.
      • We’ve merged the named and detached actor APIs. To create an actor that will survive past the duration of its job (a “detached” actor), specify name=<str> in its remote constructor (Actor.options(name='<str>').remote()). To delete the actor, you can use ray.kill.

      RLlib

      • PyTorch: IMPALA PyTorch version and all rllib/examples scripts now work for either TensorFlow or PyTorch (--torch command line option).
      • Switched to using distributed execution API by default (replaces Policy Optimizers) for all algorithms.
      • Unity3D adapter (supports all Env types: multi-agent, external env, vectorized) with example scripts for running locally or in the cloud.
      • Added support for variable length observation Spaces ("Repeated").
      • Added support for arbitrarily nested action spaces.
      • Added experimental GTrXL (Transformer/Attention net) support to RLlib + learning tests for PPO and IMPALA.
      • QMIX now supports complex observation spaces.

      API Change

      • Retire use_pytorch and eager flags in configs and replace these with framework=[tf|tfe|torch].
      • Deprecate PolicyOptimizers in favor of the new distributed execution API.
      • Retired support for Model(V1) class. Custom Models should now only use the ModelV2 API. There is still a warning when using ModelV1, which will be changed into an error message in the next release.
      • Retired TupleActions (in favor of arbitrarily nested action Spaces).

      Ray Tune / RaySGD

      • There is now a Dataset API for handling large datasets with RaySGD. (#7839)
      • You can now filter by an average of the last results using the ExperimentAnalysis tool (#8445).
      • BayesOptSearch received numerous contributions, enabling preliminary random search and warm starting. (#8541, #8486, #8488)

      API Changes

      • tune.report is now the right way to use the Tune function API. tune.track is deprecated (#8388)

      Serve

      • New APIs to inspect and manage Serve objects:
        • serve.list_backends and serve.list_endpoints (#8737)
        • serve.delete_backend and serve.delete_endpoint (#8252, #8256)
      • serve.create_endpoint now requires specifying the backend directly. You can remove serve.set_traffic if there's only one backend per endpoint. (#8764)
      • serve.init API cleanup, the following options were removed:
        • blocking, ray_init_kwargs, start_server (#8747, #8447, #8620)
      • serve.init now supports namespacing with name. You can run multiple serve clusters with different names on the same ray cluster. (#8449)
      • You can specify session affinity when splitting traffic with backends using X-SERVE-SHARD-KEY HTTP header. (#8449)
      • Various documentation improvements. Highlights:
        • A new section on how to perform A/B testing and incremental rollout (#8741)
        • Tutorial for batch inference (#8490)
        • Instructions for specifying GPUs and resources (#8495)

      Dashboard / Metrics

      • The Machine View of the dashboard now shows information about GPU utilization such as:
        • Average GPU/GRAM utilization at a node and cluster level
        • Worker-level information about how many GPUs each worker is assigned as well as its GRAM use.
      • The dashboard has a new Memory View tab that should be very useful for debugging memory issues. It has:
        • Information about objects in the Ray object store, including size and call-site
        • Information about reference counts and what is keeping an object pinned in the Ray object store.

      Small changes

      • IDLE workers get automatically sorted to the end of the worker list in the Machine View

      Autoscaler

      • Improved logging output. Errors are more clearly propagated and excess output has been reduced. (#7198, #8751, #8753)
      • Added support for k8s services.

      API Changes

      • ray up accepts remote URLs that point to the desired cluster YAML. (#8279)

      Windows support

      • Windows wheels are now available for basic experimental usage (via ray.init()).
      • Windows support is currently unstable. Unusual, unattended, or production usage is not recommended.
      • Various functionality may still lack support, including Ray Serve, Ray SGD, the autoscaler, the dashboard, non-ASCII file paths, etc.
      • Please check the latest nightly wheels & known issues (#9114), and let us know if any issue you encounter has not yet been addressed.
      • Wheels are available for Python 3.6, 3.7, and 3.8. (#8369)
      • redis-py has been patched for Windows sockets. (#8386)

      Others

      • Moving towards highly available Ray (#8650, #8639, #8606, #8601, #8591, #8442)
      • Java Support (#8730, #8640, #8637)
      • Ray streaming improvements (#8612, #8594, #7464)
      • Parallel iterator improvements (#8140, #7931, #8712)

      Thanks

      We thank the following contributors for their work on this release: @pcmoritz, @akharitonov, @devanderhoff, @ffbin, @anabranch, @jasonjmcghee, @kfstorm, @mfitton, @alecbrick, @simon-mo, @konichuvak, @aniryou, @wuisawesome, @robertnishihara, @ramanNarasimhan77, @09wakharet, @richardliaw, @istoica, @ThomasLecat, @sven1977, @ceteri, @acxz, @iamhatesz, @JarnoRFB, @rkooo567, @mehrdadn, @thomasdesr, @janblumenkamp, @ujvl, @edoakes, @maximsmol, @krfricke, @amogkam, @gehring, @ijrsvt, @internetcoffeephone, @LucaCappelletti94, @chaokunyang, @WangTaoTheTonic, @fyrestone, @raulchen, @ConeyLiu, @stephanie-wang, @suquark, @ashione, @Coac, @JosephTLucas, @ericl, @AmeerHajAli, @pdames

      Source code(tar.gz)
      Source code(zip)
    • ray-0.8.5(May 7, 2020)

      Highlight

      Core

      • Task cancellation is now available for locally submitted tasks. (#7699)
      • Experimental support for recovering objects that were lost from the Ray distributed memory store. You can try this out by setting lineage_pinning_enabled: 1 in the internal config. (#7733)

      RLlib

      • PyTorch support has now reached parity with TensorFlow. (#7926, #8188, #8120, #8101, #8106, #8104, #8082, #7953, #7984, #7836, #7597, #7797)
      • Improved callbacks API. (#6972)
      • Enable Ray distributed reference counting. (#8037)
      • Work towards customizable distributed training workflows. (#7958, #8077)

      Tune

      • Documentation has improved with a new format. (#8083, #8201, #7716)
      • Search algorithms are refactored to make them easier to extend, deprecating max_concurrent argument. (#7037, #8258, #8285)
      • TensorboardX errors are now handled safely. (#8174)
      • Bug fix in PBT checkpointing. (#7794)
      • New ZOOpt search algorithm added. (#7960)

      Serve

      • Improved APIs.
        • Add delete_endpoint and delete_backend. (#8252, #8256)
        • Use dictionary to update backend config. (#8202)
      • Added overview section to the documentation.
      • Added tutorials for serving models in Tensorflow/Keras, PyTorch, and Scikit-Learn.
      • Made serve clusters tolerant to process failures. (#8116, #8008,#7970,#7936)

      SGD

      • New Semantic Segmentation and HuggingFace GLUE Fine-tuning Examples. (#7792, #7825)
      • Fix GPU Reservations in SLURM usage. (#8157)
      • Update learning rate scheduler stepping parameter. (#8107)
      • Make serialization of data creation optional. (#8027)
      • Automatic DDP wrapping is now optional. (#7875)

      Others Projects

      • Progress towards the highly available and fault tolerant control plane. (#8144, #8119, #8145, #7909, #7949, #7771, #7557, #7675)
      • Progress towards the Ray streaming library. (#8044, #7827, #7955, #7961, #7348)
      • Autoscaler improvement. (#8178, #8168, #7986, #7844, #7717)
      • Progress towards Java support. (#8014)
      • Progress towards the Window compatibility. (#8237, #8186)
      • Progress towards cross language support. (#7711)

      Thanks

      We thank the following contributors for their work on this release:

      @simon-mo, @robertnishihara, @BalaBalaYi, @ericl, @kfstorm, @tirkarthi, @nflu, @ffbin, @chaokunyang, @ijrsvt, @pcmoritz, @mehrdadn, @sven1977, @iamhatesz, @nmatthews-asapp, @mitchellstern, @edoakes, @anabranch, @billowkiller, @eisber, @ujvl, @allenyin55, @yncxcw, @deanwampler, @DavidMChan, @ConeyLiu, @micafan, @rkooo567, @datayjz, @wizardfishball, @sumanthratna, @ashione, @marload, @stephanie-wang, @richardliaw, @jovany-wang, @MissiontoMars, @aannadi, @fyrestone, @JarnoRFB, @wumuzi520, @roireshef, @acxz, @gramhagen, @Servon-Lee, @ClarkZinzow, @mfitton, @maximsmol, @janblumenkamp, @istoica

      Source code(tar.gz)
      Source code(zip)
    • ray-0.8.4(Apr 2, 2020)

      Highlight

      • Add Python 3.8 support. (#7754)

      Core

      • Fix asycnio actor deserialization. (#7806)
      • Fix importing Pyarrow lead to symbol collison segfault. (#7568)
      • ray memory will collect statistics from all nodes. (#7721)
      • Pin lineage of plasma objects that are still in scope. (#7690)

      RLlib

      • Add contextual bandit algorithms. (#7642)
      • Add parameter noise exploration API. (#7772)
      • Add scaling guide. (#7780)
      • Enable restore keras model from h5 file. (#7482)
      • Store tf-graph by default when doing Policy.export_model(). (#7759)
      • Fix default policy overrides torch policy. (#7756, #7769)

      RaySGD

      • BREAKING: Add new API for tuning TorchTrainer using Tune. (#7547)
      • BREAKING: Convert the head worker to a local model. (#7746)
      • Added a new API for save/restore. (#7547)
      • Add tqdm support to TorchTrainer. (#7588)

      Tune

      • Add sorted columns and TensorBoard to Tune tab. (#7140)
      • Tune experiments can now be cancelled via the REST client. (#7719)
      • fail_fast enables experiments to fail quickly. (#7528)
      • override the IP retrieval process if needed. (#7705)
      • TensorBoardX nested dictionary support. (#7705)

      Serve

      • Performance improvements:
        • Push route table updates to HTTP proxy. (#7774)
        • Improve serialization. (#7688)
      • Add async methods support for serve actors. (#7682)
      • Add multiple method support for serve actors. (#7709)
        • You can specify HTTP methods in serve.create_backend(..., methods=["GET", "POST"]).
        • The ability to specify which actor method to execute in HTTP through X-SERVE-CALL-METHOD header or in RayServeHandle through handle.options("method").remote(...).

      Others

      • Progress towards highly available control plane. (#7822, #7742)
      • Progress towards Windows compatibility. (#7740, #7739, #7657)
      • Progress towards Ray Streaming library. (#7813)
      • Progress towards metrics export service. (#7809)
      • Basic C++ worker implementation. (#6125)

      Thanks

      We thank the following contributors for their work on this release:

      @carlbalmer, @BalaBalaYi, @saurabh3949, @maximsmol, @SongGuyang, @istoica, @pcmoritz, @aannadi, @kfstorm, @ijrsvt, @richardliaw, @mehrdadn, @wumuzi520, @cloudhan, @edoakes, @mitchellstern, @robertnishihara, @hhoke, @simon-mo, @ConeyLiu, @stephanie-wang, @rkooo567, @ffbin, @ericl, @hubcity, @sven1977

      Source code(tar.gz)
      Source code(zip)
    • ray-0.8.3(Mar 25, 2020)

      Highlights

      • Autoscaler has added Azure Support. (#7080, #7515, #7558, #7494)
        • Ray autoscaler helps you launch a distributed ray cluster using a single command line call!
        • It works on Azure, AWS, GCP, Kubernetes, Yarn, Slurm and local nodes.
      • Distributed reference counting is turned on by default. (#7628, #7337)
        • This means all ray objects are tracked and garbage collected only when all references go out of scope. It can be turned off with: ray.init(_internal_config=json.dumps({"distributed_ref_counting_enabled": 0})).
        • When the object store is full with objects that are still in scope, you can turn on least-recently-used eviction to force remove objects using ray.init(lru_evict=True).
      • A new command ray memory is added to help debug memory usage: (#7589)
        • It shows all object IDs that are in scope, their reference types, sizes and creation site.
          • Read more in the docs: https://ray.readthedocs.io/en/latest/memory-management.html.
      > ray memory
      -----------------------------------------------------------------------------------------------------
       Object ID                                Reference Type       Object Size   Reference Creation Site
      =====================================================================================================
      ; worker pid=51230
      ffffffffffffffffffffffff0100008801000000  PINNED_IN_MEMORY            8231   (deserialize task arg) __main__..sum_task
      ; driver pid=51174
      45b95b1c8bd3a9c4ffffffff010000c801000000  USED_BY_PENDING_TASK           ?   (task call) memory_demo.py:<module>:13
      ffffffffffffffffffffffff0100008801000000  USED_BY_PENDING_TASK        8231   (put object) memory_demo.py:<module>:6
      ef0a6c221819881cffffffff010000c801000000  LOCAL_REFERENCE                ?   (task call) memory_demo.py:<module>:14
      -----------------------------------------------------------------------------------------------------
      

      API change

      • Change actor.__ray_kill__() to ray.kill(actor). (#7360)
      • Deprecate use_pickle flag for serialization. (#7474)
      • Remove experimental.NoReturn. (#7475)
      • Remove experimental.signal API. (#7477)

      Core

      • Add Apache 2 license header to C++ files. (#7520)
      • Reduce per worker memory usage to 50MB. (#7573)
      • Option to fallback to LRU on OutOfMemory. (#7410)
      • Reference counting for actor handles. (#7434)
      • Reference counting for returning object IDs created by a different process. (#7221)
      • Use prctl(PR_SET_PDEATHSIG) on Linux instead of reaper. (#7150)
      • Route asyncio plasma through raylet instead of direct plasma connection. (#7234)
      • Remove static concurrency limit from gRPC server. (#7544)
      • Remove get_global_worker(), RuntimeContext. (#7638)
      • Fix known issues from 0.8.2 release:
        • Fix passing duplicate by-reference arguments. (#7306)
        • Fix Raise gRPC message size limit to 100MB. (#7269)

      RLlib

      • New features:
        • Exploration API improvements. (#7373, #7314, #7380)
        • SAC: add discrete action support. (#7320, #7272)
        • Add high-performance external application connector. (#7641)
      • Bug fix highlights:
        • PPO torch memory leak and unnecessary torch.Tensor creation and gc'ing. (#7238)
        • Rename sample_batch_size => rollout_fragment_length. (#7503)
        • Fix bugs and speed up SegmentTree.

      Tune

      • Integrate Dragonfly optimizer. (#5955)
      • Fix HyperBand errors. (#7563)
      • Access Trial Name, Trial ID inside trainable. (#7378)
      • Add a new repeater class for high variance trials. (#7366)
      • Prevent deletion of checkpoint from user-initiated restoration. (#7501)

      Libraries

      • [Parallel Iterators] Allow for operator chaining after repartition. (#7268)
      • [Parallel Iterators] Repartition functionality. (#7163)
      • [Serve] @serve.route returns a handle, add handle.scale, handle.set_max_batch_size. (#7569)
      • [RaySGD] PyTorchTrainer --> TorchTrainer. (#7425)
      • [RaySGD] Custom training API. (#7211)
      • [RaySGD] Breaking User API changes: (#7384)
        • data_creator fed to TorchTrainer now must return a dataloader rather than datasets.
        • TorchTrainer automatically sets "DistributedSampler" if a DataLoader is returned.
        • data_loader_config and batch_size are no longer parameters for TorchTrainer.
        • TorchTrainer parallelism is now set by num_workers.
        • All TorchTrainer args now must be named parameters.

      Java

      • New Java actor API (#7414)
        • @RayRemote annotation is removed.
        • Instead of Ray.call(ActorClass::method, actor), the new API is actor.call(ActorClass::method).
      • Allow passing internal config from raylet to Java worker. (#7532)
      • Enable direct call by default. (#7408)
      • Pass large object by reference. (#7595)

      Others

      • Progress towards Ray Streaming, including a Python API. (#7070, #6755, #7152, #7582)
      • Progress towards GCS Service for GCS fault tolerance. (#7292, #7592, #7601, #7166)
      • Progress towards cross language call between Java and Python. (#7614, #7634)
      • Progress towards Windows compatibility. (#7529, #7509, #7658, #7315)
      • Improvement in K8s Operator. (#7521, #7621, #7498, #7459, #7622)
      • New documentation for Ray Dashboard. (#7304)

      Known issues

      • Ray currently doesn't work on Python 3.5.0, but works on 3.5.3 and above.

      Thanks

      We thank the following contributors for their work on this release: @rkooo567, @maximsmol, @suquark, @mitchellstern, @micafan, @ClarkZinzow, @Jimpachnet, @mwbrulhardt, @ujvl, @chaokunyang, @robertnishihara, @jovany-wang, @hyeonjames, @zhijunfu, @datayjz, @fyrestone, @eisber, @stephanie-wang, @allenyin55, @BalaBalaYi, @simon-mo, @thedrow, @ffbin, @amogkam, @TisonKun, @richardliaw, @ijrsvt, @wumuzi520, @mehrdadn, @raulchen, @landcold7, @ericl, @edoakes, @sven1977, @ashione, @jorenretel, @gramhagen, @kfstorm, @anthonyhsyu, @pcmoritz

      Source code(tar.gz)
      Source code(zip)
    • ray-0.8.2(Feb 24, 2020)

      Highlights

      • Pyarrow is no longer vendored. Ray directly uses the C++ Arrow API. You can use any version of pyarrow with ray. (#7233)
      • The dashboard is turned on by default. It shows node and process information, actor information, and Ray Tune trials information. You can also use ray.show_in_webui to display custom messages for actors. Please try it out and send us feedback! (#6705, #6820, #6822, #6911, #6932, #6955, #7028, #7034)
      • We have made progress on distributed reference counting (behind a feature flag). You can try it out with ray.init(_internal_config=json.dumps({"distributed_ref_counting_enabled": 1})). It is designed to help manage memory using precise distributed garbage collection. (#6945, #6946, #7029, #7075, #7218, #7220, #7222, #7235, #7249)

      Breaking changes

      • Many experimental Ray libraries are moved to the util namespace. (#7100)
        • ray.experimental.multiprocessing => ray.util.multiprocessing
        • ray.experimental.joblib => ray.util.joblib
        • ray.experimental.iter => ray.util.iter
        • ray.experimental.serve => ray.serve
        • ray.experimental.sgd => ray.util.sgd
      • Tasks and actors are cleaned up if their owner process dies. (#6818)
      • The OMP_NUM_THREADS environment variable defaults to 1 if unset. This improves training performance and reduces resource contention. (#6998)
      • We now vendor psutil and setproctitle to support turning the dashboard on by default. Running import psutil after import ray will use the version of psutil that ships with Ray. (#7031)

      Core

      • The Python raylet client is removed. All raylet communication now goes through the core worker. (#6018)
      • Calling delete() will not delete objects in the in-memory store. (#7117)
      • Removed vanilla pickle serialization for task arguments. (#6948)
      • Fix bug passing empty bytes into Python tasks. (#7045)
      • Progress toward next generation ray scheduler. (#6913)
      • Progress toward service based global control store (GCS). (#6686, #7041)

      RLlib

      • Improved PyTorch support, including a PyTorch version of PPO. (#6826, #6770)
      • Added distributed SGD for PPO. (#6918, #7084)
      • Added an exploration API for controlling epsilon greedy and stochastic exploration. (#6974, #7155)
      • Fixed schedule values going negative past the end of the schedule. (#6971, #6973)
      • Added support for histogram outputs in TensorBoard. (#6942)
      • Added support for parallel and customizable evaluation step. (#6981)

      Tune

      • Improved Ax Example. (#7012)
      • Process saves asynchronously. (#6912)
      • Default to tensorboardx and include it in requirements. (#6836)
      • Added experiment stopping api. (#6886)
      • Expose progress reporter to users. (#6915)
      • Fix directory naming regression. (#6839)
      • Handles nan case for asynchyperband. (#6916)
      • Prevent memory checkpoints from breaking trial fault tolerance. (#6691)
      • Remove keras dependency. (#6827)
      • Remove unused tf loggers. (#7090)
      • Set correct path when deleting checkpoint folder. (#6758)
      • Support callable objects in variant generation. (#6849)

      Autoscaler

      • Ray nodes now respect docker limits. (#7039)
      • Add --all-nodes option to rsync-up. (#7065)
      • Add port-forwarding support for attach. (#7145)
      • For AWS, default to latest deep learning AMI. (#6922)
      • Added 'ray dashboard' command to proxy ray dashboard in remote machine. (#6959)

      Utility libraries

      • Support of scikit-learn with Ray joblib backend. (#6925)
      • Parallel iterator support local shuffle. (#6921)
      • [Serve] support no http headless services. (#7010)
      • [Serve] refactor router to use Ray asyncio support. (#6873)
      • [Serve] support composing arbitrary dags. (#7015)
      • [RaySGD] support fp16 via PyTorch apex. (#7061)
      • [RaySGD] refactor PyTorch sgd documentation. (#6910)
      • Improvement in Ray Streaming. (#7043, #6666, #7071)

      Other improvements

      • Progress toward Windows compatibility. (#6882, #6823)
      • Ray Kubernetes operator improvements. (#6852, #6851, #7091)
      • Java support for concurrent actor calls API. (#7022)
      • Java support for direct call for normal tasks. (#7193)
      • Java support for cross language Python invocation. (#6709)
      • Java support for cross language serialization for actor handles. (#7134)

      Known issue

      • Passing the same ObjectIDs multiple time as arguments currently doesn't work. (#7296)
      • Tasks can exceed gRPC max message size. (#7263)

      Thanks

      We thank the following contributors for their work on this release: @mitchellstern, @hugwi, @deanwampler, @alindkhare, @ericl, @ashione, @fyrestone, @robertnishihara, @pcmoritz, @richardliaw, @yutaizhou, @istoica, @edoakes, @ls-daniel, @BalaBalaYi, @raulchen, @justinkterry, @roireshef, @elpollouk, @kfstorm, @Bassstring, @hhbyyh, @Qstar, @mehrdadn, @chaokunyang, @flying-mojo, @ujvl, @AnanthHari, @rkooo567, @simon-mo, @jovany-wang, @ijrsvt, @ffbin, @AmeerHajAli, @gaocegege, @suquark, @MissiontoMars, @zzyunzhi, @sven1977, @stephanie-wang, @amogkam, @wuisawesome, @aannadi, @maximsmol

      Source code(tar.gz)
      Source code(zip)
    • ray-0.8.1(Jan 27, 2020)

      Ray 0.8.1 Release Notes

      Highlights

      • ObjectIDs corresponding to ray.put() objects and task returns are now reference counted locally in Python and when passed into a remote task as an argument. ObjectIDs that have a nonzero reference count will not be evicted from the object store. Note that references for ObjectIDs passed into remote tasks inside of other objects (e.g., f.remote((ObjectID,)) or f.remote([ObjectID])) are not currently accounted for. (#6554)
      • asyncio actor support: actors can now define async def method and Ray will run multiple method invocations in the same event loop. The maximum concurrency level can be adjusted with ActorClass.options(max_concurrency=2000).remote().
      • asyncio ObjectID support: Ray ObjectIDs can now be directly awaited using the Python API. await my_object_id is similar to ray.get(my_object_id), but allows context switching to make the operation non-blocking. You can also convert an ObjectID to a asyncio.Future using ObjectID.as_future().
      • Added experimental parallel iterators API (#6644, #6726): ParallelIterators can be used to more convienently load and process data into Ray actors. See the documentation for details.
      • Added multiprocessing.Pool API (#6194): Ray now supports the multiprocessing.Pool API out of the box, so you can scale existing programs up from a single node to a cluster by only changing the import statment. See the documentation for details.

      Core

      • Deprecated Python 2 (#6581, #6601, #6624, #6665)
      • Fixed bug when failing to import remote functions or actors with args and kwargs (#6577)
      • Many improvements to the dashboard (#6493, #6516, #6521, #6574, #6590, #6652, #6671, #6683, #6810)
      • Progress towards Windows compatibility (#6446, #6548, #6653, #6706)
      • Redis now binds to localhost and has a password set by default (#6481)
      • Added actor.__ray_kill__() to terminate actors immediately (#6523)
      • Added 'ray stat' command for debugging (#6622)
      • Added documentation for fault tolerance behavior (#6698)
      • Treat static methods as class methods instead of instance methods in actors (#6756)

      RLlib

      • DQN distributional model: Replace all legacy tf.contrib imports with tf.keras.layers.xyz or tf.initializers.xyz (#6772)
      • SAC site changes (#6759)
      • PG unify/cleanup tf vs torch and PG functionality test cases (tf + torch) (#6650)
      • SAC for Mujoco Environments (#6642)
      • Tuple action dist tensors not reduced properly in eager mode (#6615)
      • Changed foreach_policy to foreach_trainable_policy (#6564)
      • Wrapper for the dm_env interface (#6468)

      Tune

      • Get checkpoints paths for a trial after tuning (#6643)
      • Async restores and S3/GCP-capable trial FT (#6376)
      • Usability errors PBT (#5972)
      • Demo exporting trained models in pbt examples (#6533)
      • Avoid duplication in TrialRunner execution (#6598)
      • Update params for optimizer in reset_config (#6522)
      • Support Type Hinting for py3 (#6571)

      Other Libraries

      • [serve] Pluggable Queueing Policy (#6492)
      • [serve] Added BackendConfig (#6541)
      • [sgd] Fault tolerance support for pytorch + revamp documentation (#6465)

      Thanks

      We thank the following contributors for their work on this release:

      @chaokunyang, @Qstar, @simon-mo, @wlx65003, @stephanie-wang, @alindkhare, @ashione, @harrisonfeng, @JingGe, @pcmoritz, @zhijunfu, @BalaBalaYi, @kfstorm, @richardliaw, @mitchellstern, @michaelzhiluo, @ziyadedher, @istoica, @EyalSel, @ffbin, @raulchen, @edoakes, @chenk008, @frthjf, @mslapek, @gehring, @hhbyyh, @zzyunzhi, @zhu-eric, @MissiontoMars, @sven1977, @walterddr, @micafan, @inventormc, @robertnishihara, @ericl, @ZhongxiaYan, @mehrdadn, @jovany-wang, @ujvl, @bharatpn

      Source code(tar.gz)
      Source code(zip)
    • ray-0.8.0(Dec 18, 2019)

      Ray 0.8.0 Release Notes

      This is the first release with gRPC direct calls enabled by default for both tasks and actors, which substantially improves task submission performance.

      Highlights

      • Enable gRPC direct calls by default (#6367). In this mode, actor tasks are sent directly from actor to actor over gRPC; the Raylet only coordinates actor creation. Similarly, with tasks, tasks are submitted directly from worker to worker over gRPC; the Raylet only coordinates the scheduling decisions. In addition, small objects (<100KB in size) are no longer placed in the object store. They are inlined into task submissions and returns when possible.

      Note: in some cases, reconstruction of large evicted objects is not possible with direct calls. To revert to the 0.7.7 behaviour, you can set the environment variable RAY_FORCE_DIRECT=0.

      Core

      • [Dashboard] Add remaining features from old dashboard (#6489)
      • Ray Kubernetes Operator Part 1: readme, structure, config and CRD realted file (#6332)
      • Make sure numpy >= 1.16.0 is installed for fast pickling support (#6486)
      • Avoid workers starting with the same random seed (#6471)
      • Properly handle a forwarded task that gets forwarded back (#6271)

      RLlib

      • (Bug Fix): Remove the extra 0.5 in the Diagonal Gaussian entropy (#6475)
      • AlphaZero and Ranked reward implementation (#6385)

      Tune

      • Add example and tutorial for DCGAN (#6400)
      • Report trials by state fairly (#6395)
      • Fixed bug in PBT where initial trial result is empty. (#6351)

      Other Libraries

      • [sgd] Add support for multi-model multi-optimizer training (#6317)
      • [serve] Added deadline awareness (#6442)
      • [projects] Return parameters for a command (#6409)
      • [streaming] Streaming data transfer and python integration (#6185)

      Thanks

      We thank the following contributors for their work on this release:

      @zplizzi, @istoica, @ericl, @mehrdadn, @walterddr, @ujvl, @alindkhare, @timgates42, @chaokunyang, @eugenevinitsky, @kfstorm, @Maltimore, @visatish, @simon-mo, @AmeerHajAli, @wumuzi520, @robertnishihara, @micafan, @pcmoritz, @zhijunfu, @edoakes, @sytelus, @ffbin, @richardliaw, @Qstar, @stephanie-wang, @Coac, @mitchellstern, @MissiontoMars, @deanwampler, @hhbyyh, @raulchen

      Source code(tar.gz)
      Source code(zip)
    • ray-0.7.7(Dec 16, 2019)

      Ray 0.7.7 Release Notes

      Highlights

      • Remote functions and actors now support kwargs and positionals (#5606).
      • ray.get now supports a timeout argument (#6107). If the object isn't available before the timeout passes, a RayTimeoutError is raised.
      • Ray now supports detached actors (#6036), which persist beyond the lifetime of the script that creates them and can be referred to by a user-defined name.
      • Added documentation for how to deploy Ray on YARN clusters using Skein (#6119, #6173).
      • The Ray scheduler now attempts to schedule tasks fairly to avoid starvation (#5851).

      Core

      • Progress towards a new backend architecture where tasks and actor tasks are submitted directly between workers. #5783, #5991, #6040, #6054, #6075, #6088, #6122, #6147, #6171, #6177, #6118, #6188, #6259, #6277
      • Progress towards Windows compatibility. #6071, #6204, #6205, #6282
      • Now using cloudpickle_fast for serialization by default, which supports more types of Python objects without sacrificing performance. #5658, #5805, #5960, #5978
      • Various bugfixes. #5946, #6175, #6176, #6231, #6253, #6257, #6276,

      RLlib

      • Now using pytorch's function to see if gpu is available. #5890
      • Fixed APEX priorities returning zero all the time. #5980
      • Fixed leak of TensorFlow assign operations in DQN/DDPG. #5979
      • Fixed choosing the wrong neural network model for Atari in 0.7.5. #6087
      • Added large scale regression test for RLlib. #6093
      • Fixed and added test for LR annealing config. #6101
      • Reduced log verbosity. #6154
      • Added a microbatch optimizer with an A2C example. #6161

      Tune

      • Search algorithms now use early stopped trials for optimization. #5651
      • Metrics are now outputted via a tabular format. Errors are outputted on a separate table. #5822
      • In the distributed setting, checkpoints are now deleted automatically post-sync using an rsync flag. Checkpoints on the driver are garbage collected according to the policy defined by the user. #5877
      • A much faster ExperimentAnalysis tool. #5962
      • Trial executor callbacks now take in a “Runner” parameter. #5868
      • Fixed queue_trials so to enable cluster autoscaling with a CPU-Only Head Node. #5900
      • Added a TensorBoardX logger. #6133

      Other Libraries

      • Serving: Progress towards a new Ray serving library. #5854, #5886, #5894, #5929, #5937, #5961, #6051

      Thanks

      We thank the following contributors for their amazing contributions:

      @zhuohan123, @jovany-wang, @micafan, @richardliaw, @waldroje, @mitchellstern, @visatish, @mehrdadn, @istoica, @ericl, @adizim, @simon-mo, @lsklyut, @zhu-eric, @pcmoritz, @hhbyyh, @suquark, @sotte, @hershg, @pschafhalter, @stackedsax, @edoakes, @mawright, @stephanie-wang, @ujvl, @ashione, @couturierc, @AdamGleave, @robertnishihara, @DaveyBiggers, @daiyaanarfeen, @danyangz, @AmeerHajAli, @mimoralea

      Source code(tar.gz)
      Source code(zip)
    • ray-0.7.6(Oct 24, 2019)

      Ray 0.7.6 Release Notes

      Highlights

      • The Ray autoscaler now supports Kubernetes as a backend (#5492). This makes it possible to start a Ray cluster on top of your existing Kubernetes cluster with a simple shell command.

        • Please see the Kubernetes section of the autoscaler documentation to get started.
        • This is a new feature and may be rough around the edges. If you run into problems or have suggestions for how to improve Ray on Kubernetes, please file an issue.
      • The Ray cluster dashboard has been revamped (#5730, #5857) to improve the UI and include logs and error messages. More improvements will be coming in the near future.

        • You can try out the dashboard by starting Ray with ray.init(include_webui=True) or ray start --include-webui.
        • Please let us know if you have suggestions for what would be most useful to you in the new dashboard.

      Core

      • Progress towards refactoring the Python worker on top of the core worker. #5750, #5771, #5752
      • Fix an issue in local mode where multiple actors didn't work properly. #5863
      • Fix class attributes and methods for actor classes. #5802
      • Improvements in error messages and handling. #5782, #5746, #5799
      • Serialization improvements. #5841, #5725
      • Various documentation improvements. #5801, #5792, #5414, #5747, #5780, #5582

      RLlib

      • Added a link to BAIR blog posts in the documentation. #5762
      • Tracing for eager tensorflow policies with tf.function. #5705

      Tune

      • Improved MedianStoppingRule. #5402
      • Add PBT + Memnn example. #5723
      • Add support for function-based stopping condition. #5754
      • Save/Restore for Suggestion Algorithms. #5719
      • TensorBoard HParams for TF2.0. #5678

      Other Libraries

      • Serving: Progress towards a new Ray serving library. #5849, #5850, #5852

      Thanks

      We thank the following contributors for their amazing contributions:

      @hershg, @JasonWayne, @kfstorm, @richardliaw, @batzner, @vakker, @robertnishihara, @stephanie-wang, @gehring, @edoakes, @zhijunfu, @pcmoritz, @mitchellstern, @ujvl, @simon-mo, @ecederstrand, @mawright, @ericl, @anthonyhsyu, @suquark, @waldroje

      Source code(tar.gz)
      Source code(zip)
    • ray-0.7.5(Sep 25, 2019)

      Ray 0.7.5 Release Notes

      Ray API

      • Objects created with ray.put() are now reference counted. #5590
      • Add internal pin_object_data() API. #5637
      • Initial support for pickle5. #5611
      • Warm up Ray on ray.init(). #5685
      • redis_address passed to ray.init is now just address. #5602

      Core

      • Progress towards a common C++ core worker. #5516, #5272, #5566, #5664
      • Fix log monitor stall with many log files. #5569
      • Print warnings when tasks are unschedulable. #5555
      • Take into account resource queue lengths when autoscaling #5702, #5684

      Tune

      • TF2.0 TensorBoard support. #5547, #5631
      • tune.function() is now deprecated. #5601

      RLlib

      • Enhancements for TF eager support. #5625, #5683, #5705
      • Fix DDPG regression. #5626

      Other Libraries

      • Complete rewrite of experimental serving library. #5562
      • Progress toward Ray projects APIs. #5525, #5632, #5706
      • Add TF SGD implementation for training. #5440
      • Many documentation improvements and bugfixes.
      Source code(tar.gz)
      Source code(zip)
    • ray-0.7.4(Sep 5, 2019)

      Ray 0.7.4 Release Notes

      Highlights

      • There were many documentation improvements (#5391, #5389, #5175). As we continue to improve the documentation we value your feedback through the “Doc suggestion?” link at the top of the documentation. Notable improvements:

        • We’ve added guides for best practices using TensorFlow and PyTorch.
        • We’ve revamped the Walkthrough page for Ray users, providing a better experience for beginners.
        • We’ve revamped guides for using Actors and inspecting internal state.
      • Ray supports memory limits now to ensure memory-intensive applications run predictably and reliably. You can activate them through the ray.remote decorator:

        @ray.remote(
            memory=2000 * 1024 * 1024,
            object_store_memory=200 * 1024 * 1024)
        class SomeActor(object):
            def __init__(self, a, b):
                pass
        

        You can set limits for the heap and the object store, see the documentation.

      • There is now preliminary support for projects, see the the project documentation. Projects allow you to package your code and easily share it with others, ensuring a reproducible cluster setup. To get started, you can run

        # Create a new project.
        ray project create <project-name>
        # Launch a session for the project in the current directory.
        ray session start
        # Open a console for the given session.
        ray session attach
        # Stop the given session and all of its worker nodes.
        ray session stop
        

        Check out the examples. This is an actively developed new feature so we appreciate your feedback!

      Breaking change: The redis_address parameter was renamed to address (#5412, #5602) and the former will be removed in the future.

      Core

      • Move Java bindings on top of the core worker #5370
      • Improve log file discoverability #5580
      • Clean up and improve error messages #5368, #5351

      RLlib

      • Support custom action space distributions #5164
      • Add TensorFlow eager support #5436
      • Add autoregressive KL #5469
      • Autoregressive Action Distributions #5304
      • Implement MADDPG agent #5348
      • Port Soft Actor-Critic on Model v2 API #5328
      • More examples: Add CARLA community example #5333 and rock paper scissors multi-agent example #5336
      • Moved RLlib to top level directory #5324

      Tune

      • Experimental Implementation of the BOHB algorithm #5382
      • Breaking change: Nested dictionary results are now flattened for CSV writing: {“a”: {“b”: 1}} => {“a/b”: 1} #5346
      • Add Logger for MLFlow #5438
      • TensorBoard support for TensorFlow 2.0 #5547
      • Added examples for XGBoost and LightGBM #5500
      • HyperOptSearch now has warmstarting #5372

      Other Libraries

      • SGD: Tune interface for Pytorch MultiNode SGD #5350
      • Serving: The old version of ray.serve was deprecated #5541
      • Autoscaler: Fix ssh control path limit #5476
      • Dev experience: Ray CI tracker online at https://ray-travis-tracker.herokuapp.com/

      Various fixes: Fix log monitor issues #4382 #5221 #5569, the top-level ray directory was cleaned up #5404

      Thanks

      We thank the following contributors for their amazing contributions:

      @jon-chuang, @lufol, @adamochayon, @idthanm, @RehanSD, @ericl, @michaelzhiluo, @nflu, @pengzhenghao, @hartikainen, @wsjeon, @raulchen, @TomVeniat, @layssi, @jovany-wang, @llan-ml, @ConeyLiu, @mitchellstern, @gregSchwartz18, @jiangzihao2009, @jichan3751, @mhgump, @zhijunfu, @micafan, @simon-mo, @richardliaw, @stephanie-wang, @edoakes, @akharitonov, @mawright, @robertnishihara, @lisadunlap, @flying-mojo, @pcmoritz, @jredondopizarro, @gehring, @holli, @kfstorm

      Source code(tar.gz)
      Source code(zip)
    • ray-0.7.3(Aug 4, 2019)

      Ray 0.7.3 Release Note

      Highlights

      • RLlib ModelV2API is ready to use. It improves support for Keras and RNN models, as well as allowing object-oriented reuse of variables. ModelV1 API is deprecated. No migration is needed.
      • ray.experimental.sgd.pytorch.PyTorchTrainer is ready for early adopters. Checkout the documentation here. We welcome your feedback!
        model_creator = lambda config: YourPyTorchModel()
        data_creator = lambda config: YourTrainingSet(), YourValidationSet()
        
        trainer = PyTorchTrainer(
            model_creator,
            data_creator,
            optimizer_creator=utils.sgd_mse_optimizer,
            config={"lr": 1e-4},
            num_replicas=2,
            resources_per_replica=Resources(num_gpus=1),
            batch_size=16,
            backend="auto")
        
        for i in range(NUM_EPOCHS):
            trainer.train()
        
      • You can query all the clients that have performed ray.init to connect to the current cluster with ray.jobs(). #5076
        >>> ray.jobs()
        [{'JobID': '02000000',
          'NodeManagerAddress': '10.99.88.77',
          'DriverPid': 74949,
          'StartTime': 1564168784,
          'StopTime': 1564168798},
         {'JobID': '01000000',
          'NodeManagerAddress': '10.99.88.77',
          'DriverPid': 74871,
          'StartTime': 1564168742}]
        

      Core

      • Improvement on memory storage handling. #5143, #5216, #4893
      • Improved workflow:
        • Debugging tool local_mode now behaves more consistently. #5060
        • Improved KeyboardInterrupt Exception Handling, stack trace reduced from 115 lines to 22 lines. #5237
      • Ray core:
        • Experimental direct actor call. #5140, #5184
        • Improvement in core worker, the shared module between Python and Java. #5079, #5034, #5062
        • GCS (global control store) was refactored. #5058, #5050

      RLlib

      • Finished port of all major RLlib algorithms to builder pattern #5277, #5258, #5249
      • learner_queue_timeout can be configured for async sample optimizer. #5270
      • reproducible_seed can be used for reproducible experiments. #5197
      • Added entropy coefficient decay to IMPALA, APPO and PPO #5043

      Tune:

      • Breaking: ExperimentAnalysis is now returned by default from tune.run. To obtain a list of trials, use analysis.trials. #5115
      • Breaking: Syncing behavior between head and workers can now be customized (sync_to_driver). Syncing behavior (upload_dir) between cluster and cloud is now separately customizable (sync_to_cloud). This changes the structure of the uploaded directory - now local_dir is synced with upload_dir. #4450
      • Introduce Analysis and ExperimentAnalysis objects. Analysis object will now return all trials in a folder; ExperimentAnalysis is a subclass that returns all trials of an experiment. #5115
      • Add missing argument tune.run(keep_checkpoints_num=...). Enables only keeping the last N checkpoints. #5117
      • Trials on failed nodes will be prioritized in processing. #5053
      • Trial Checkpointing is now more flexible. #4728
      • Add system performance tracking for gpu, ram, vram, cpu usage statistics - toggle with tune.run(log_sys_usage=True). #4924
      • Experiment checkpointing frequency is now less frequent and can be controlled with tune.run(global_checkpoint_period=...). #4859

      Autoscaler

      • Add a request_cores function for manual autoscaling. You can now manually request resources for the autoscaler. #4754

      • Local cluster:

        • More readable example yaml with comments. #5290

        • Multiple cluster name is supported. #4864

      • Improved logging with AWS NodeProvider. create_instance call will be logged. #4998

      Others Libraries:

      • SGD:
        • Example for Training. #5292
        • Deprecate old distributed SGD implementation. #5160
      • Kuberentes: Ray namespace added for k8s. #4111
      • Dev experience: Add linting pre-push hook. #5154

      Thanks:

      We thank the following contributors for their amazing contributions:

      @joneswong, @1beb, @richardliaw, @pcmoritz, @raulchen, @stephanie-wang, @jiangzihao2009, @LorenzoCevolani, @kfstorm, @pschafhalter, @micafan, @simon-mo, @vipulharsh, @haje01, @ls-daniel, @hartikainen, @stefanpantic, @edoakes, @llan-ml, @alex-petrenko, @ztangent, @gravitywp, @MQQ, @dulex123, @morgangiraud, @antoine-galataud, @robertnishihara, @qxcv, @vakker, @jovany-wang, @zhijunfu, @ericl

      Source code(tar.gz)
      Source code(zip)
    Distributed-systems-algos - Distributed Systems Algorithms For Python

    Distributed Systems Algorithms ISIS algorithm In an asynchronous system that kee

    Tony Joo 3 Jan 8, 2022
    A lightweight python module for building event driven distributed systems

    Eventify A lightweight python module for building event driven distributed systems. Installation pip install eventify Problem Developers need a easy a

    Eventify 14 Jul 10, 2021
    Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

    Horovod Horovod is a distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. The goal of Horovod is to make dis

    Horovod 12.1k Jan 23, 2022
    Distributed machine learning platform

    Veles Distributed platform for rapid Deep learning application development Consists of: Platform - https://github.com/Samsung/veles Znicz Plugin - Neu

    Samsung 895 Jan 9, 2022
    Framework and Library for Distributed Online Machine Learning

    Jubatus The Jubatus library is an online machine learning framework which runs in distributed environment. See http://jubat.us/ for details. Quick Sta

    Jubatus 703 Jan 4, 2022
    Microsoft Distributed Machine Learning Toolkit

    DMTK Distributed Machine Learning Toolkit https://www.dmtk.io Please open issues in the project below. For any technical support email to [email protected]

    Microsoft 2.8k Jan 10, 2022
    PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)

    English | 简体中文 Welcome to the PaddlePaddle GitHub. PaddlePaddle, as the only independent R&D deep learning platform in China, has been officially open

    null 17.5k Jan 16, 2022
    Distributed Synchronization for Python

    Distributed Synchronization for Python Tutti is a nearly drop-in replacement for python's built-in synchronization primitives that lets you fearlessly

    Hamilton Kibbe 3 Oct 22, 2021
    Run Python in Apache Storm topologies. Pythonic API, CLI tooling, and a topology DSL.

    Streamparse lets you run Python code against real-time streams of data via Apache Storm. With streamparse you can create Storm bolts and spouts in Pyt

    Parsely, Inc. 1.4k Jan 14, 2022
    An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.

    Ray provides a simple, universal API for building distributed applications. Ray is packaged with the following libraries for accelerating machine lear

    null 18.8k Jan 14, 2022
    Ray provides a simple, universal API for building distributed applications.

    An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.

    null 18.9k Jan 22, 2022
    XGBoost-Ray is a distributed backend for XGBoost, built on top of distributed computing framework Ray.

    XGBoost-Ray is a distributed backend for XGBoost, built on top of distributed computing framework Ray.

    null 73 Jan 17, 2022
    Larch: Applications and Python Library for Data Analysis of X-ray Absorption Spectroscopy (XAS, XANES, XAFS, EXAFS), X-ray Fluorescence (XRF) Spectroscopy and Imaging

    Larch: Data Analysis Tools for X-ray Spectroscopy and More Documentation: http://xraypy.github.io/xraylarch Code: http://github.com/xraypy/xraylarch L

    xraypy 79 Jan 6, 2022
    Computational modelling of ray propagation through optical elements using the principles of geometric optics (Ray Tracer)

    Computational modelling of ray propagation through optical elements using the principles of geometric optics (Ray Tracer) Introduction By applying the

    Son Gyo Jung 1 Dec 17, 2021
    A calibre plugin that generates Word Wise and X-Ray files then sends them to Kindle. Supports KFX, AZW3 and MOBI eBooks. X-Ray supports 18 languages.

    WordDumb A calibre plugin that generates Word Wise and X-Ray files then sends them to Kindle. Supports KFX, AZW3 and MOBI eBooks. Languages X-Ray supp

    null 65 Jan 17, 2022
    Nasdaq Cloud Data Service (NCDS) provides a modern and efficient method of delivery for realtime exchange data and other financial information. This repository provides an SDK for developing applications to access the NCDS.

    Nasdaq Cloud Data Service (NCDS) Nasdaq Cloud Data Service (NCDS) provides a modern and efficient method of delivery for realtime exchange data and ot

    Nasdaq 5 Dec 13, 2021
    A lightweight wrapper for PyTorch that provides a simple declarative API for context switching between devices, distributed modes, mixed-precision, and PyTorch extensions.

    A lightweight wrapper for PyTorch that provides a simple declarative API for context switching between devices, distributed modes, mixed-precision, and PyTorch extensions.

    Fidelity Investments 50 Jan 12, 2022
    Pytorch Lightning Distributed Accelerators using Ray

    Distributed PyTorch Lightning Training on Ray This library adds new PyTorch Lightning accelerators for distributed training using the Ray distributed

    null 95 Jan 20, 2022
    Pytorch Lightning Distributed Accelerators using Ray

    Distributed PyTorch Lightning Training on Ray This library adds new PyTorch Lightning plugins for distributed training using the Ray distributed compu

    null 96 Jan 21, 2022
    Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray

    A unified Data Analytics and AI platform for distributed TensorFlow, Keras and PyTorch on Apache Spark/Flink & Ray What is Analytics Zoo? Analytics Zo

    null 2.5k Jan 6, 2022
    DistML is a Ray extension library to support large-scale distributed ML training on heterogeneous multi-node multi-GPU clusters

    DistML is a Ray extension library to support large-scale distributed ML training on heterogeneous multi-node multi-GPU clusters

    null 24 Oct 16, 2021
    Distributed DataLoader For Pytorch Based On Ray

    Dpex——用户无感知分布式数据预处理组件 一、前言 随着GPU与CPU的算力差距越来越大以及模型训练时的预处理Pipeline变得越来越复杂,CPU部分的数据预处理已经逐渐成为了模型训练的瓶颈所在,这导致单机的GPU配置的提升并不能带来期望的线性加速。预处理性能瓶颈的本质在于每个GPU能够使用的C

    Dalong 20 Dec 21, 2021
    We are building an open database of COVID-19 cases with chest X-ray or CT images.

    ?? Note: please do not claim diagnostic performance of a model without a clinical study! This is not a kaggle competition dataset. Please read this pa

    Joseph Paul Cohen 2.8k Jan 11, 2022
    A simple rest api that classifies pneumonia infection weather it is Normal, Pneumonia Virus or Pneumonia Bacteria from a chest-x-ray image.

    This is a simple rest api that classifies pneumonia infection weather it is Normal, Pneumonia Virus or Pneumonia Bacteria from a chest-x-ray image.

    crispengari 3 Jan 8, 2022
    Distributed-systems-algos - Distributed Systems Algorithms For Python

    Distributed Systems Algorithms ISIS algorithm In an asynchronous system that kee

    Tony Joo 3 Jan 8, 2022
    Hapi is a Python library for building Conceptual Distributed Model using HBV96 lumped model & Muskingum routing method

    Current build status All platforms: Current release info Name Downloads Version Platforms Hapi - Hydrological library for Python Hapi is an open-sourc

    Mostafa Farrag 5 Dec 12, 2021
    A lightweight python module for building event driven distributed systems

    Eventify A lightweight python module for building event driven distributed systems. Installation pip install eventify Problem Developers need a easy a

    Eventify 14 Jul 10, 2021
    The windML framework provides an easy-to-use access to wind data sources within the Python world, building upon numpy, scipy, sklearn, and matplotlib. Renewable Wind Energy, Forecasting, Prediction

    windml Build status : The importance of wind in smart grids with a large number of renewable energy resources is increasing. With the growing infrastr

    Computational Intelligence Group 116 Dec 6, 2021
    The Python micro framework for building web applications.

    Flask Flask is a lightweight WSGI web application framework. It is designed to make getting started quick and easy, with the ability to scale up to co

    The Pallets Projects 57.7k Jan 19, 2022