ClearML - Auto-Magical Suite of tools to streamline your ML workflow. Experiment Manager, MLOps and Data-Management

Last update: Jan 01, 2023

Overview

ClearML - Auto-Magical Suite of tools to streamline your ML workflow
Experiment Manager, MLOps and Data-Management

ClearML

Formerly known as Allegro Trains

ClearML is a ML/DL development and production suite, it contains three main modules:

Experiment Manager - Automagical experiment tracking, environments and results
ML-Ops - Automation, Pipelines & Orchestration solution for ML/DL jobs (K8s / Cloud / bare-metal)
Data-Management - Fully differentiable data management & version control solution on top of object-storage (S3/GS/Azure/NAS)

Instrumenting these components is the ClearML-server, see Self-Hosting & Free tier Hosting

Sign up & Start using in under 2 minutes

ClearML Experiment Manager

Adding only 2 lines to your code gets you the following

Complete experiment setup log
- Full source control info including non-committed local changes
- Execution environment (including specific packages & versions)
- Hyper-parameters
  - ArgParser/Click for command line parameters with currently used values
  - Explicit parameters dictionary
  - Tensorflow Defines (absl-py)
  - Hydra configuration and overrides
- Initial model weights file
Full experiment output automatic capture
- stdout and stderr
- Resource Monitoring (CPU/GPU utilization, temperature, IO, network, etc.)
- Model snapshots (With optional automatic upload to central storage: Shared folder, S3, GS, Azure, Http)
- Artifacts log & store (Shared folder, S3, GS, Azure, Http)
- Tensorboard/TensorboardX scalars, metrics, histograms, images, audio and video samples
- Matplotlib & Seaborn
- ClearML Logger interface for complete flexibility.
Extensive platform support and integrations
- Supported ML/DL frameworks: PyTorch (incl' ignite / lightning), Tensorflow, Keras, AutoKeras, FastAI, XGBoost, LightGBM, MegEngine and Scikit-Learn
- Seamless integration (including version control) with Jupyter Notebook and PyCharm remote debugging

Start using ClearML

Sign up for free to the ClearML Hosted Service (alternatively, you can set up your own server, see here).

ClearML Demo Server: ClearML no longer uses the demo server by default. To enable the demo server, set the CLEARML_NO_DEFAULT_SERVER=0 environment variable. Credentials aren't needed, but experiments launched to the demo server are public, so make sure not to launch sensitive experiments if using the demo server.
Install the clearml python package:
```
pip install clearml
```
Connect the ClearML SDK to the server by creating credentials, then execute the command below and follow the instructions:
```
clearml-init
```

Add two lines to your code:

from clearml import Task
task = Task.init(project_name='examples', task_name='hello world')

You are done, everything your process outputs is now automagically logged into ClearML.

Next step, automation! Learn more about ClearML's two-click automation here.

ClearML Architecture

The ClearML run-time components:

The ClearML Python Package for integrating ClearML into your existing scripts by adding just two lines of code, and optionally extending your experiments and other workflows with ClearML powerful and versatile set of classes and methods.
The ClearML Server storing experiment, model, and workflow data, and supporting the Web UI experiment manager, and ML-Ops automation for reproducibility and tuning. It is available as a hosted service and open source for you to deploy your own ClearML Server.
The ClearML Agent for ML-Ops orchestration, experiment and workflow reproducibility, and scalability.

Additional Modules

clearml-session - Launch remote JupyterLab / VSCode-server inside any docker, on Cloud/On-Prem machines
clearml-task - Run any codebase on remote machines with full remote logging of Tensorboard, Matplotlib & Console outputs
clearml-data - CLI for managing and versioning your datasets, including creating / uploading / downloading of data from S3/GS/Azure/NAS
AWS Auto-Scaler - Automatically spin EC2 instances based on your workloads with preconfigured budget! No need for K8s!
Hyper-Parameter Optimization - Optimize any code with black-box approach and state of the art Bayesian optimization algorithms
Automation Pipeline - Build pipelines based on existing experiments / jobs, supports building pipelines of pipelines!
Slack Integration - Report experiments progress / failure directly to Slack (fully customizable!)

Why ClearML?

ClearML is our solution to a problem we share with countless other researchers and developers in the machine learning/deep learning universe: Training production-grade deep learning models is a glorious but messy process. ClearML tracks and controls the process by associating code version control, research projects, performance metrics, and model provenance.

We designed ClearML specifically to require effortless integration so that teams can preserve their existing methods and practices.

Use it on a daily basis to boost collaboration and visibility in your team
Create a remote job from any experiment with a click of a button
Automate processes and create pipelines to collect your experimentation logs, outputs, and data
Store all you data on any object-storage solution, with the simplest interface possible
Make you data transparent by cataloging it all on the ClearML platform

We believe ClearML is ground-breaking. We wish to establish new standards of true seamless integration between experiment management,ML-Ops and data management.

Who We Are

ClearML is supported by the team behind allegro.ai, where we build deep learning pipelines and infrastructure for enterprise companies.

We built ClearML to track and control the glorious but messy process of training production-grade deep learning models. We are committed to vigorously supporting and expanding the capabilities of ClearML.

We promise to always be backwardly compatible, making sure all your logs, data and pipelines will always upgrade with you.

License

Apache License, Version 2.0 (see the LICENSE for more information)

If ClearML is part of your development process / project / publication, please cite us ❤️ :

@misc{clearml,
title = {ClearML - Your entire MLOps stack in one open-source tool},
year = {2019},
note = {Software available from http://github.com/allegroai/clearml},
url={https://clear.ml/},
author = {ClearML},
}

Documentation, Community & Support

More information in the official documentation and on YouTube.

For examples and use cases, check the examples folder and corresponding documentation.

If you have any questions: post on our Slack Channel, or tag your questions on stackoverflow with 'clearml' tag (previously trains tag).

For feature requests or bug reports, please use GitHub issues.

Additionally, you can always find us at [email protected]

Contributing

PRs are always welcomed ❤️ See more details in the ClearML Guidelines for Contributing.

May the force (and the goddess of learning rates) be with you!

Comments

trains causes ETA for epoch to be 3 time more slower

Hey, We are using trains with tf 2.3.0 and tf.keras. We notice that the ETA with trains for a single epoch takes us about 7.5 Hours

with trains: ETA: 7:46:59 without trains: ETA: 2:14:54

Any ideas/solution for this trains bottleneck?

Thanks, Alon

EDIT:

trains server version is 0.13, trains packages (trains.__version__) is 0.16.1

opened by oak-tree 28
Scrolling log problem when using tqdm as training process bar

I was using tqdm to show the training process in the command line, but when I tried to use Trains together, some scrolling problems occur.

Log in Trains:

Log in the command line:

opened by huihui-v 27
remote execution with hydra

Hello trains team,

I'm trying to use trains and hydra. hydra is nice to manage configuration and helps the user to build one from the command line with auto-completion and preset features.

Nevertheless I'm strugling with remote execution. As explained here, hydra changes the working dir which defeats trains script information detector. At runtime hydra creates a directory to compose the configuration and set the working dir inside that untracked directory. If we tried to remote execute such a task, trains-agent complains that the working dir does not exist.

I came up with two non sastisfactoring partial solutions demonstrated here

First the defective script: The trains task is created inside hydra main, working_dir and entrypoint are wrong.

Fisrt fix attempt: The trains task is created before entering hydra's main function. Script info are indeed correct, but I'd like to modify the task name and project according to the application config. I did not find a way to set the project name once the task is created. Anyway I don't find this solution to be elegant.

Second fix attempt: The task is created inside hydra's context and I try to update the working_dir and entrypoint. I tested this solution in my closed source application. It looks very fragile as the script info is filled by a trains thread and I didn't find a way to porperly synchronize with it although this minimal example seems to work.

Any suggestion ?

opened by elinep 26
[Feature] Support individual scalar reporting without plots

When using logger.report_scalar a plot is automatically generated to accommodate for a time-series axis. There is no current way for the user to report a single scalar with a name (that is not a time-series) and have it both aesthetically pleasing and visible enough in the WebUI. Using iteration=0 just wastes space by creating a scatter plot with a single datum.

It would be great to have e.g. logger.report_scalar(name="MAE", value=mae), and have it visible as a table (or similar) with e.g.:

| Scalar name | value | |-|-| |MAE|0.123| |NRMSE|0.4| |My favorite scalar|and it's actually a string|

Even better, once these are in place, they can automatically be aligned between multiple experiments for comparison.
Feature Request

opened by idantene 23
Auto logging/scalar detection stopped working in new version

I had previously been using clearml version 0.17.4 with pytorch and pytorch lightning both at latest release. I recently updated clearml to the latest version 0.17.5. However, it appears that the auto logging/iteration detection does not work now. Before, the iteration and scalar detection (via tensorboard file) would be picked up automatically. However, now it only works once in a while or not at all, usually reporting no scalars and no iterations.

My script does a lot of caching up front before iterations actually start, takes maybe 3 minutes? Perhaps this is part of the issue. I can't share the code, however, but other than that it is standard pytorch/pytorch lightning.

Reverting to 0.17.4 seems to fix the issue.

opened by ndalton12 23
It seems *api_server* is misconfigured.
After setting up trains Docker image together with trains package in my local library, I keep getting issues when I'm trying to connect to trains api server.

Specifically, after running docker-compose

The following error message can be found in the console: trains-apiserver | raise ConnectionError(e, request=request) trains-apiserver | requests.exceptions.ConnectionError: HTTPSConnectionPool(host='updates.trains.allegro.ai', port=443): Max retries exceeded with url: /updates (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f390d528da0>: Failed to establish a new connection: [Errno -2] Name or service not known',))

When I try to import Task from trains, no error until I run this:

task = Task.init(project_name="my project", task_name="my task")

Resulting in the following error message:

`InsecureRequestWarning: Certificate verification is disabled! Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings Traceback (most recent call last): File "C:\ProgramData\Miniconda3\envs\jurica\lib\site-packages\trains\backend_api\session\session.py", line 545, in _do_refresh_token return resp["data"]["token"] KeyError: 'data'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "", line 1, in File "C:\ProgramData\Miniconda3\envs\jurica\lib\site-packages\trains\task.py", line 260, in init reuse_last_task_id, File "C:\ProgramData\Miniconda3\envs\jurica\lib\site-packages\trains\task.py", line 1009, in _create_dev_task log_to_backend=True, File "C:\ProgramData\Miniconda3\envs\jurica\lib\site-packages\trains\task.py", line 112, in init super(Task, self).init(**kwargs) File "C:\ProgramData\Miniconda3\envs\jurica\lib\site-packages\trains\backend_interface\task\task.py", line 81, in init super(Task, self).init(id=task_id, session=session, log=log) File "C:\ProgramData\Miniconda3\envs\jurica\lib\site-packages\trains\backend_interface\base.py", line 129, in init super(IdObjectBase, self).init(session, log, **kwargs) File "C:\ProgramData\Miniconda3\envs\jurica\lib\site-packages\trains\backend_interface\base.py", line 34, in init self._session = session or self._get_default_session() File "C:\ProgramData\Miniconda3\envs\jurica\lib\site-packages\trains\backend_interface\base.py", line 103, in _get_default_session secret_key=ENV_SECRET_KEY.get(), File "C:\ProgramData\Miniconda3\envs\jurica\lib\site-packages\trains\backend_api\session\session.py", line 144, in init self.refresh_token() File "C:\ProgramData\Miniconda3\envs\jurica\lib\site-packages\trains\backend_api\session\token_manager.py", line 95, in refresh_token self._set_token(self._do_refresh_token(self.__token, exp=self.req_token_expiration_sec)) File "C:\ProgramData\Miniconda3\envs\jurica\lib\site-packages\trains\backend_api\session\session.py", line 552, in _do_refresh_token 'Is this the TRAINS API server {} ?'.format(self.get_api_server_host())) ValueError: It seems api_server is misconfigured. Is this the TRAINS API server http://localhost:8008 ?`

I have set up the trains.conf file to contain the following:

`# TRAINS SDK configuration file api { # API server on port 8008 api_server: "http://localhost:8008"

# web_server on port 8080 web_server: "http://localhost:8080" # file server on port 8081 files_server: "http://localhost:8081" verify_certificate: False

}`

As I am behind a corporate proxy, I set that up and that should be working fine. Is the issue on my side, maybe due to DNS or is there another explanation? Much appreciated!
opened by Zeko403 22
Sub process logger

Hi,

Here is the description of the problem:

I have process A who creates logger using this command: task = Task.init(project_name=args.clearml_proj_base + “/training”, task_name=args.clearml_task, tags=[args.loss,‘patch size’ + str(args.patch_size), str(args.num_features)+‘’+str(args.max_features)+‘’+str(args.unet_depth) , ‘two channels’, ‘lr_default’], continue_last_task=False) logger = task.get_logger()

Then this process submit new job/process B (for inference) in our cluster and this job runs on a different computer. The new job creates logger using: task = Task.init(project_name=project_name, task_name=task_name) or task = Task.init(project_name=project_name, task_name=task_name, continue_last_task=False, reuse_last_task_id = False) or task = Task.init(project_name=project_name, task_name=task_name, continue_last_task=False)

Different project_name and task_name.

The problem is that all of B logs are created in the A task. Also there is no entry for the new project_name/task_name.

Thanks, Ophir Azulai IBM Research AI

opened by ophirazulai 21
Bug fix: Auto-scaler should not spin a new instance with task_id = None

Hi, after a discussion from clearml-community Slack channel, I figured out the solution and decided to make this PR. Thanks in advance for your feedbacks.

opened by tienduccao 20
ClearML is not saving scalars/images when using Tensorflow Object detection API - TF2.2
This issue is related to this thread: https://clearml.slack.com/archives/CTK20V944/p1610457717141400

To reproduce: git clone https://github.com/glemarivero/raccoon_dataset.git setup a virtualenv run sh autoinstall.sh Generate tfrecords:

python generate_tfrecord.py --output_path images.tfrecord --csv_input raccoon_labels.csv --image_dir images

Run training:

python model_main_tf2.py --model_dir=models/ --pipeline_config_path=pipeline.config
opened by glemarivero 20
Task.init() registers main process on all available GPUs
The issue

When I run experiments I set CUDA_VISIBLE_DEVICES to some integer to only make that device available to the main process (as is common). I can verify that this is in fact the case with sudo fuser -v /dev/nvidia* which shows that a single process has been created on the single device I chose.

However, I observe that a subsequent call to Task.init() in the python script somehow overrides this and “registers” the main process on all GPU devices of the node. This can be seen by inspecting sudo fuser -v /dev/nvidia* after the call to Task.init() . The original process ID registered on the device initially chosen with CUDA_VISIBLE_DEVICES is now registered on all GPU devices on the node:

/dev/nvidia0: jdh 2448 F.... python /dev/nvidia1: je 315 F...m python3 jdh 2448 F.... python /dev/nvidia2: jdh 2448 F.... python

I can only see this proces on the other devices when I use sudo fuser but not with gpustat or nvidia-smi. I can also not see any memory being allocated on the other devices.

I’ve verified that CUDA_VISIBLE_DEVICES doesn’t get changed during the Task.init call or anywhere else during the script.

I’m running manual mode and I only see one GPU tracked in the resource monitoring. I’m using trains 0.16.4.

Reproducing

You can reproduce simply by taking the ClearML PyTorch MNIST example https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/pytorch_mnist.py.

To clearly see it happening, it’s easiest if you get the GPU allocated before calling task = Task.init(…) and to avoid crashing because you’re missing the task variable, you can embed just before and after Task.init(…) using IPython. You also need the process ID of the main process to use to check against sudo fuser -v /dev/nvidia*.

Summarizing, I move task = Task.init(…) to just before the for epoch in range(…) loop and replace it with

import psutil current_process_pid = psutil.Process().pid print(current_process_pid) # e.g 12971 import IPython; IPython.embed() task = Task.init(project_name='examples', task_name='pytorch mnist train') import IPython; IPython.embed()

You can then run the example until it reaches the embed and check that the main process printed is only visible on your designated device. Then you can quit the embed to see the Task.init giving the problem after which you are waiting in the second embed. You can then quit that one to see training work fine.

You can then try the whole thing again without Task.init but you need to remove reporting in that case - otherwise you get

Logger.current_logger().report_scalar( AttributeError: 'NoneType' object has no attribute 'report_scalar'

I haven’t tested on any other versions than trains 0.16.4 so I don’t know if it happens in the new clearml package.
opened by JakobHavtorn 20
Frozen worker with large num_workers
Hi,

I'm running a modified version of your example pytorch mnist train (attached below in txt file). When I set the number of workers to large number (e.g. 40) and save my model multiple times, the worker freezes after random number of epochs but always after complete uploading a model:

2020-09-10 08:08:32 Test set: Average loss: 0.0268, Accuracy: 9915/10000 (99%) 2020-09-10 08:08:33 2020-09-10 08:08:33,292 - trains.Task - INFO - Completed model upload to file:///data/trains/saved/examples/pytorch mnist train.20e2ad4c4f4247c89d7b7681a0c51d78/models/mnist_cnn.pt 2020-09-10 08:08:34 Train Epoch: 13 [0/60000 (0%)] Loss: 0.019223

It happens both when running from trains-agent and terminal.

I'm using trains==0.16.1 and trains-agent==0.16.0 and python 3.6. I'm not using docker mode.

Any idea why?

trains_mnist.txt
opened by nirraviv 20
Task.update_output_model's name parameter is not used correctly
Describe the bug

Task.update_output_model's name reference parameter is not used as described in the documentation. Currently, it has been used only for setting model_name if it is not set.

To reproduce

task.update_output_model('my_path/my_model.pt', name='my_model_name', model_name='my_model_artifact',

Called as above will result in the model name being set as my_model(the name is the model weights file filename without the extension) instead of the my_model_name. That means that name is not recognized.

Detailed problem I need to store 2 models, .pt and .onnx files as follows:

task.update_output_model('my_path/my_model.pt', name='my_model', model_name='my_model', task.update_output_model('my_path/my_model.onnx', name='my_model_onnx', model_name='my_model_onnx',

I am expecting to have 2 output models:

my_model with the artifact name my_model

my_model_onnx with the artifact name my_model_onnx

However, there is only one output model my_model with the artifact name my_model_onnx. That means that the second model's artifact overwrote the first model's artifact.

Changing the code as

task.update_output_model('my_path/my_model.pt', name='my_model', model_name='my_model', task.update_output_model('my_path/my_model_onnx.onnx', name='my_model_onnx', model_name='my_model_onnx',

makes the job, but the purpose of the name parameter is lost.

Expected behaviour

name - the name of the output model

model_name - the name of the artifact itself of the output model

Environment

Version clearml==1.9.0 The problem seems to present under the latest commit id 0950757d442360997b3e8d0f706dc971c4f98b33
opened by Vladimir-125 0
[Feature] AssumeRoleWithWebIdentity / OIDC support for accessing S3 buckets
Proposal Summary

Currently only "regular" EC2 credentials, as in key and secret, can be supplied https://clear.ml/docs/latest/docs/integrations/storage/#configuring-aws-s3

I was wondering if your code and your config does allow for OIDC credentials and other settings required to be applied? Apparently boto3 is ready for it:

https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html#assume-role-with-web-identity-provider

Motivation

Enterprises tend to use their own, existing authentication system to also manage access to AWS resources (or Ceph RADOSGW or MinIO if they run that themselves).

MinIO - https://min.io/docs/minio/linux/operations/external-iam/configure-openid-external-identity-management.html

Ceph RADOSGW - https://docs.ceph.com/en/quincy/radosgw/STS

AWS' capabilities also extend into using those external identities within roles and access policies, see https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_providers_oidc_user-id.html, simiar things can be done with straight bucket policies.

Having support for OIDC / WebIdentities would allow for ClearML to be used in such a context.

Related Discussion
opened by frittentheke 2
UniformParameterRange tolist throws when no step size defined

Describe the bug

A clear and concise description of what the bug is. when creating unifromrangeparam with no step size, list throws exception, the fix is obvious:

To reproduce

Exact steps to reproduce the bug. Provide example code if possible.

Expected behaviour

return a proper list

Environment

all envs all versions

Related Discussion
bug

opened by davyx8 2
parse queue name

Related Issue \ discussion

https://clearml.slack.com/archives/CTK20V944/p1671538941447289

Patch Description

Now one may specify things like ${pipeline.queue_name} for queue_name and control the execution queue of each task via pipeline parameters.

Testing Instructions

Other Information

opened by Anton-Cherepkov 1
Allow setting a docker image for the PipelineController

Proposal Summary

I understand I can change the docker image for a component in the pipeline, but for the pipeline controller this is not possible. You can only change the queue it runs on, but not the docker image, nor its parameters, nor the requirements. As the workarounds seem hackish, I’d definitely prefer the ability to set a docker image/docker args/requirements config for the pipeline controller too.

Related Discussion

https://clearml.slack.com/archives/CTK20V944/p1671060074061609?thread_ts=1670948054.971189&cid=CTK20V944

opened by klaramarijan 1

Releases(v1.9.0)

v1.9.0(Dec 23, 2022)
New Features and Improvements

Add r prefix to re.match() strings (#837, thanks @daugihao!)

Add path_substitution to clearml.conf example file (#842)

Clarify deferred_init usage in Task.init() (#855)

Add pipeline decorator argument to control docker image (#856)

Add StorageManager.set_report_upload_chunk_size() and StorageManager.set_report_download_chunk_size() to set chunk size for upload and download

Add allow_archived argument in Task.get_tasks()

Support querying model metadata in Model.query_models()

Add Dataset.set_metadata() and Dataset.get_metadata()

Add delete_from_storage (default True) to Task.delete_artifacts()

Bug Fixes

Fix jsonargparse and pytorch lightning integration broken for remote execution (#403)

Fix error when using TaskScheduler with 'limit_execution_time' (#648)

Fix dataset not synced if the changes are only modified files (#835, thanks @fjean!)

Fix StorageHelper.delete() does not respect path substitutions (#838)

Fix can't write more than 2 GB to a file

Fix StorageManager.get_file_size_bytes() returns ClientError instead of None for invalid S3 links

Fix Dataset lineage view is broken with multiple dataset dependencies

Fix tensorflow_macos support

Fix crash when calling task.flush(wait_for_uploads=True) while executing remotely

Fix None values get casted to empty strings when connecting a dictionary

Source code(tar.gz)
Source code(zip)
clearml-1.9.0-py2.py3-none-any.whl(943.80 KB)
v1.8.3(Dec 4, 2022)
Bug fixes

Set GCS credentials to None if invalid service account credentials are provided (#841, thanks @freddessert!)

Fix a sync issue when loading deferred configuration

Source code(tar.gz)
Source code(zip)
clearml-1.8.3-py2.py3-none-any.whl(936.22 KB)
v1.8.2(Dec 1, 2022)
New Features and Improvements

Added VCS_ENTRY_POINT environment variable that overrides ClearML's entrypoint auto-detection

Bug Fixes

Fix all parameters returned from a pipeline are considered strings

Fix Task.set_parameters() does not add parameter type when parameter exists but does not have a type

Source code(tar.gz)
Source code(zip)
clearml-1.8.2-py2.py3-none-any.whl(936.12 KB)
v1.8.1(Nov 21, 2022)
New Features and Improvements

Raise error on failed uploads (#820, thanks @shpigi!)

Add hyperdataset examples (#823)

Change report_event_flush_threshold default to 100

Add ModelInfo.weights_object() for store callback access to the actual model object being stored (valid for both pre/post save calls, otherwise None)

Support num_workers in dataset operations

Support max connections setting for Azure storage using the sdk.azure.storage.max_connection configuration option

Bug Fixes

Fix clearml logger default level cannot be changed (#741)

Fix Hydra does use get overridden information from ClearML (#751)

Fix StorageManager.list(“s3://..”, with_metadata=True) doesn't work

Fix ModelsList.keys() is missing

Fix CLEARML_DEFERRED_TASK_INIT=1 doesn't work

Fix default API method does not work when set in configuration

Source code(tar.gz)
Source code(zip)
clearml-1.8.1-py2.py3-none-any.whl(935.82 KB)
v1.8.0(Nov 13, 2022)
New Features and Improvements

Add tarfile member sanitization to extractall() (#803, thanks @TrellixVulnTeam!)

Add Task.delete_artifacts() with raise_on_errors argument (#806, thanks @frolovconst!)

Add CI/CD example (#815, thanks @thepycoder!)

Limit number of _serialize requests when adding list of links with add_external_files() (#813)

Add support for connecting Enum values as parameters

Improve CoLab integration (store entire colab, not history)

Add clearml.browser_login to authenticate browser online sessions such as CoLab, Jupyter Notebooks etc.

Remove import_bind from stack trace of import errors

Add sdk.development.worker.report_event_flush_threshold configuration option to control the number of events to trigger a report

Return stub object from Task.init() if no clearml.conf file is found

Improve manual model uploading example

Remove deprecated demo server

Bug Fixes

Fix passing compression=ZIP_STORED (or 0) to Dataset.upload() uses ZIP_DEFLATED and overrides the user-supplied argument (#812, thanks @doronser!)

Fix unique_selector is not applied properly on batches after the first batch. Remove default selector value since it does not work for all event types (and we always specify it anyway)

Fix clearml-init colab detection

Fix cloning pipelines ran with start_locally() doesn't work

Fix if project has a default output uri there is no way to disable it in development mode (manual), allow passing output_uri=False to disable it

Fix git remote repository detection when remote is not "origin"

Fix reported images might not all be reported when waiting to complete the task

Fix Dataset.get_local_copy() deletes the source archive if it is stored locally

Fix too many parts will cause preview to inflate Task object beyond its 16MB limit - set a total limit of 320kbs

Fix media preview is created instead of a table preview

Fix task.update_output_model() should always upload local models to a remote server

Fix broken pip package might mess up requirements detection

Source code(tar.gz)
Source code(zip)
clearml-1.8.0-py2.py3-none-any.whl(933.56 KB)
v1.7.2(Oct 23, 2022)
New Features and Improvements

Support running jupyter notebook inside a git repository (repository will be referenced without uncommitted changes and jupyter notebook will be stored om plain code as uncommitted changes)

Add jupyter notebook fail warning

Allow pipeline steps to return string paths without them being treated as a folder artifact and zipped (#780)

Remove future from Python 3 requirements

Bug Fixes

Fix exception raised when using ThreadPool (#790)

Fix Pyplot/Matplotlib binding reports incorrect line labels and colors (#791)

Pipelines

Fix crash when running cloned pipeline that invokes a step twice (#770, related to #769, thanks @tonyd!)

Fix pipeline argument becomes None if default value is not set

Fix retry_on_failure callback does nothing when specified on PipelineController.add_step()

Fix pipeline clone logic

Jupyter Notebook

Fix support for multiple jupyter servers running on the same machine

Fix issue with old/new notebook packages installed

Fix local cache with access rules disabling partial local access

Fix Task.upload_artifact() fails uploading pandas DataFrame

Fix relative paths in examples (#787, thanks @mendrugory!)

Source code(tar.gz)
Source code(zip)
clearml-1.7.2-py2.py3-none-any.whl(928.30 KB)
v1.7.1(Sep 30, 2022)
New Features and Improvements

Add callback option for pipeline step retry

Bug Fixes

Fix Python Fire binding

Fix Dataset failing to load helper packages should not crash

Fix Dataset.get_local_copy() is allowed for a non-finalized dataset

Fix Task.upload_artifact() does not upload empty lists/tuples

Fix pipeline retry mechanism interface

Fix Python <3.5 compatibility

Fix local cache warning (should be a debug message)

Source code(tar.gz)
Source code(zip)
clearml-1.7.1-py2.py3-none-any.whl(923.79 KB)
v1.7.0(Sep 15, 2022)
New Features and Improvements

ClearML Data: Support providing list of links

Upload artifacts with a custom serializer (#689)

Allow user to specify extension when using custom serializer functions (for artifacts)

Skip server URL verification in clearml-init wizard process

When calling Dataset.get() without "alias" field, tell user that he can use alias to log it in the UI

Add mmcv support for logging models

Add support for Azure and GCP storage in Task.setup_upload()

Support pipeline retrying tasks which are failing on suspected non-stable failures

Better storage (AWS, GCP) internal load balancing and configurations

Add Task.register_abort_callback

Bug Fixes

Allow getting datasets with non-semantic versioning (#776)

Fix interactive plots (instead of a generated png)

Fix Python 2.7 support

Fix clearml datasets list functionality

Fix Dataset.init() modifies task (moved to Dataset.create())

Fix failure with large files upload on HTTPS

Fix 3d plots with plt shows to show 2d plot on task results page

Fix uploading files with project's default_upload_destination (#734)

Fix broken reporting of Matplotlib - Using logarithmic scale breaks reporting

Fix supporting of wildcards in clearml-data CLI

Fix report_histogram - does not show "horizontal" orientation (#699)

Fix table reporting 'series' arg does not appear on UI when using logger.report_table(title, series, iteration...) (#684)

Fix artifacts (and models) use task original name and not new name

Fix very long filenames from S3 can't be downloaded (with get_local_copy())

Fix overwrite of existing output models on pipeline task with monitor_models (#758)

Source code(tar.gz)
Source code(zip)
clearml-1.7.0-py2.py3-none-any.whl(922.70 KB)
v1.6.4(Aug 10, 2022)
Bug Fixes

Fix APIClient fails when calling get_all endpoints with API 2.20 (affects CLI tools such as clearml-session)

Source code(tar.gz)
Source code(zip)
clearml-1.6.4-py2.py3-none-any.whl(903.48 KB)
v1.6.3(Aug 9, 2022)
New Features and Improvements

Add option to specify an endpoint URL when creating S3 resource service (#679, thanks @AndolsiZied!)

Add support for providing ExtraArgs to boto3 when uploading files using the sdk.aws.s3.extra_args configuration option

Add support for Server API 2.20

Add Task.get_num_enqueued_tasks() to get the number of tasks enqueued in a specific queue

Add support for updating model metadata using Model.set_metadata(), Model.get_metadata(), Model.get_all_metadata(), Model.get_all_metadata_casted() and Model.set_all_metadata()

Add Task.get_reported_single_value()

Add a retry mechanism for models and artifacts upload

Pipelines with empty configuration takes it from code

Add support for running pipeline steps on preemptible instances

Datasets

Add description to Datasets

Add wild-card support in clearml-data

Bug Fixes

Fix dataset download (#713, thanks @dankirsdot!)

Fix lock is not released after dataset cache is downloaded (#708, thanks @mralgos!)

Fix deadlock might occur when using process pool large number processes (#674)

Fix 'series' not appearing on UI when using logger.report_table() (#684)

Fix Task.init() docstring to include behavior when executing remotely (#737, thanks @mmiller-max!)

Fix KeyError when running remotely and no params were passed to click (https://github.com/allegroai/clearml-agent/issues/111)

Fix full path is stored when uploading a single artifact file

Fix passing non-alphanumeric filename in sdk.development.detect_with_pip_freeze

Fix Python 3.6 and 3.10 support

Fix mimetype cannot be None when uploading to S3

Pipelines

Fix pipeline DAG

Add support for pipelines with spot instances

Fix pipeline proxy object is always resolved in main pipeline logic

Fix pipeline steps with empty configuration should try and take it from code

Fix wait for jobs based on local/remote pool frequency

Fix UniformIntegerParameterRange.to_list() ignores min value

Fix pipeline component returning a list of length 1

Datasets

Fix Dataset.get() does not respect auto_create

Fix getting datasets fails with new ClearML Server v1.6

Fix datasets can't be queried by project/name alone

Fix adding child dataset to older parent dataset without stats

Fix error when connecting an input model

Fix deadlocks, including:

Change thread Event/Lock to a process fork safe threading objects

Use file lock instead of process lock to avoid future deadlocks since python process lock is not process safe (killing a process holding a lock will Not release the lock)

Fix StorageManager.list() on a local Windows path

Fix model not created in the current project

Fix keras_tuner_cifar example raises DeprecationWarning and ValueError

Source code(tar.gz)
Source code(zip)
clearml-1.6.3-py2.py3-none-any.whl(903.47 KB)
v1.6.2(Jul 4, 2022)
Bug Fixes

Fix format string construction sometimes causing delayed evaluation errors (#706)

Source code(tar.gz)
Source code(zip)
clearml-1.6.2-py_0.tar.bz2(563.88 KB)
v1.6.1(Jul 1, 2022)
Bug Fixes

Fix Task.get_tasks() fails when sending search_hidden=False

Fix LightGBM example shows UserWarning

Source code(tar.gz)
Source code(zip)
clearml-1.6.1-py2.py3-none-any.whl(793.51 KB)
v1.6(Jun 29, 2022)
New Features and Improvements

New HyperParameter Optimization CLI clearml-param-search

Improvements to ClearML Data

Add support for a new ClearML Data UI in the ClearML WebApp

Add clearml-data new options set-description and rename

Add random seed control using Task.set_random_seed() allowing to set a new random seed for task initialization or to disable it

Improve error messages when failing to download an artifact

Improve error messages when testing for permissions

Bug Fixes

Fix axis range settings when logging plots

Fix Task.get_project() to return more than 500 entries (#612)

Fix pipeline progress calculation

Fix StorageManager.upload_folder() returns None for both successful and unsuccessful uploads

Fix script path capturing stores a relative path and not an absolute path

Fix HTML debug samples are saved incorrectly on S3

Fix Hydra deprecation warning in examples

Fix missing requirement for tensorboardx example

Known issues

When removing an image from a Dataset, its preview image won't be removed

Moving Datasets between projects still shows the Dataset in the old project

Source code(tar.gz)
Source code(zip)
clearml-1.6.0-py2.py3-none-any.whl(793.49 KB)
v1.5.0(Jun 16, 2022)
New Features and Improvements

Add support for single value metric reporting (#400)

Add support for specifying parameter sections in PipelineDecorator (#629)

Add support for parallel uploads and downloads (upload \ download and zip \ unzip of artifacts) (ClearML Slack)

Add support for specifying execution details (repository, branch, commit, packages, image) in PipelineDecorator

Bump PyJWT version due to "Key confusion through non-blocklisted public key formats" vulnerability

Add support for AWS Session Token (using boto3's aws_session_token argument)

Bug Fixes

Fix Task.get_projects() retrieves only the first 500 results (#612)

Fix failure to delete artifacts stored in Azure (#660)

Fix Process Pool hangs at exit (#674)

Fix number of unpacked values when syncing a dataset (#682)

Fix FastAI DeprecationWarning (#683)

Fix StorageManager.download_folder() crash

Fix pipelines can't handle None return value

Fix pre-existing pipeline raises an exception

Fix deprecation warning in the image_reporting example

Fix patches are kept binded afterTask.close() is called

Fix running pipeline code remotely without first running it locally (i.e. no configuration on the Task)

Fix local task execution with empty working directory

Fix permission check fails when using local storage folder that does not exist

Fix pipeline add_function_step breaks in remote execution

Fix wrong mimetype used for any file or folder uploaded to S3 using StorageManager

Add missing default default_cache_manager_size in configuration files

Source code(tar.gz)
Source code(zip)
clearml-1.5.0-py2.py3-none-any.whl(780.89 KB)
v1.4.1(May 17, 2022)
Bug Fixes

Fix Process Pool hangs at exit (#674)

Source code(tar.gz)
Source code(zip)
clearml-1.4.1-py2.py3-none-any.whl(776.17 KB)
v1.4.0(May 5, 2022)
New Features

Add OpenMMLab example #655 (thanks @zhouzaida!)

Add support for saving artifacts with different formats #634

Add support for setting reported values for NaN and Inf #604

Support more than 500 results in Task.get_tasks() using the fetch_only_first_page argument #612

Support links in clearml-data #585

Support deferred task initialization using Task.init() argument deferred_init (beta feature)

Support resuming experiments when importing an Offline session

Add --import-offline-session command line option to clearml-task

Support automatically logging Tensorboard Hparams

Add wildcard support for model auto-logging, see Task.init() (ClearML Slack)

Add support for Lightning CLI

Support None values in Task.connect()

Add Model.project getter/setter

Add support for Task progress indication

Datasets

Improve Dataset version table

Add warning to Dataset creation on current Task

Examples and documentation

Add manual seaborn logging example #628

Change package author

Change pipeline example to run locally #642

Update Pytorch Lightning example for pytorch-lightning>=v1.6.0 #650

Bug Fixes

Fix Keras model config serialization in PatchKerasModelIO #616 (thanks @bzamecnik!)

Fix task.get_parameters_as_dict(cast=True) casts False to True #622 (thanks @bewt85!)

Fix Fire integration is not compatible with typing library #610

Fix remote execution with argparse mutually exclusive groups raises "required" error even when no argument is required

Fix Hydra tasks never fail and are only set to completed (fix handling return code)

Fix clearml-data wildcard support

Fix HPO randomly aborts running tasks before the time limit

Fix matplotlib capture

Fix issue with accessing images in projects containing /

AutoScaler

Fix resource name with a prefix matching a resource type may cause the auto-scaler to avoid spinning down idle instances

Fix Idle workers should contain resource name and not instance type

Fix backwards compatibility issue when using abstractmethod

Matplotlib

Fix uploading 3D plots with matplotlib plt shows 2D plot on task results page

Fix wrong Histogram plotting using when matplotlib

Fix PyTorch ScriptModule autobind

Fix PyTorch auto-magic logging torchscript models

Fix forked process will not call _at_exit and flush all outstanding reports

Fix matplotlib to plotly conversion fails on subplots (convert as image if figure has subplots)

Fix Windows sub process might end up waiting forever for uploads to finish if subprocess is very shot lived

Fix StorageManager.get_local_copy() returning None for a valid path in Windows

Fix Jupyter notebook cannot be detected

Fix PipelineController does not change node Task name, only pipeline step name

Fix Task.query_tasks() specifying page size or page number

Source code(tar.gz)
Source code(zip)
clearml-1.4.0-py2.py3-none-any.whl(783 bytes)
v1.3.2(Mar 29, 2022)
New Features and Improvements

Add support for setting reported values for NaN and Inf #604

Add reserved OS environments warning

Add git credentials to colab example #621 (thanks @thepycoder!)

Add jsonargparse support #403 (thanks @ajecc and @mauvilsa!)

Update autokeras example

Bug Fixes

Fix sub-project separators are incorrectly quoted in generated URLs #584

Revert Optuna deprecation fix #613

Fix HPO randomly aborts running tasks before the time limit

Fix cloud driver overwrites agent.extra_docker_arguments

Fix Pipeline Controller auto-magic framework connect

Fix unused scroll is not cleared in Task.get_reported_plots()

Source code(tar.gz)
Source code(zip)
clearml-1.3.2-py2.py3-none-any.whl(762.36 KB)
v1.3.1(Mar 16, 2022)
Features

Add Python 3.10 support

Bug Fixes

Update Slack SDK requirement #597 (thanks @mmiller-max!)

Fix fork after task.close() is called #605

Fix Azure storage upload #598

Fix offline mode crash

Fix task delete response not checked

Fix pipeline controller kwargs with list

Fix PipelineDecorator.debug_pipeline()

Fix PipelineDecorator example

Fix Python 3.10 issues

Fix handling of legacy fileserver (files.community.clear.ml)

Fix cloud driver may use None credentials

Fix APIClient worker raises exception when accessing .name attribute

Fix minimum/default API version setting

Source code(tar.gz)
Source code(zip)
clearml-1.3.1-py2.py3-none-any.whl(761.12 KB)
v1.3.0(Mar 6, 2022)
Features and Bug Fixes

Add new pipeline visualization support (requires ClearML Server v1.3)

Support IAM Instance Profile in AWS auto-scaler

Remove old server API versions support (pre-ClearML Server)

Restructure FastAI examples

Fix failed catboost bind on GPU (#592)

Fix Optuna n_jobs deprecation warning

Fix invalid method called on delete() error

Source code(tar.gz)
Source code(zip)
clearml-1.3.0-py2.py3-none-any.whl(761.07 KB)
v1.2.1(Mar 2, 2022)
Bug Fixes

Fix HTTP download fails constructing URL (#593)

Source code(tar.gz)
Source code(zip)
clearml-1.2.1-py2.py3-none-any.whl(1.04 MB)
v1.2.0(Feb 26, 2022)
Features

Add fastai v2 support (#571)

Add catboost support (#542)

Add Python Fire support (#550)

Add new Azure Storage driver support (#548)

Add requirements file support in Task.add_requirements (#575)

Allow overriding auto_delete_file in Task.update_output_model() (#554)

Support artifact_object empty string

Add skip_zero_size_check to StorageManager.download_folder()

Add support for extra HTTP retry codes (see here or use CLEARML_API_EXTRA_RETRY_CODES)

Add Task.get_parameters() cast back to original type

Add callback support to Task.delete()

Add autoscaler CPU-only support

Add AWS autoscaler IAM instance profile support

Update examples

Edit HTML reporting examples (#546)

Add model reporting examples (#553)

Bug Fixes

Fix nargs="?" without type does not properly cast the default value (#531)

Fix using invalid configurations (#544)

Fix extra_layout not passed to report_matrix (#559)

Fix group arguments in click (#561)

Fix no warning when failing to patch argparse (#576)

Fix crash in Dataset.upload() when there is nothing to upload (#579)

Fix requirements, refactor and reformat examples (#567, #573, #582)

Auto-scaler

Change confusing log message

Fix AWS tags support

Fix instance startup script fails on any command (should only fail on the agent failing to launch)

Fix spin down stuck machine, ignore unknown stale workers

Fix pandas object passed as Task.upload_artifact() preview object

Fix incorrect timeout used for stale workers

Fix clearml-task calls Task.init() in the wrong place when a single local file is used

Fix ArgumentParser SUPPRESS as default should be resolved at remote execution in the same way (i.e. empty string equals SUPPRESS)

Upgrade six version (in case pathlib2>2.3.7 is installed)

Fix connected object base class members are not used

Fix clearml-init changing web host after pasting full credentials

Fix fileserver upload does not support path in URL

Fix crash on semaphore acquire error

Fix docs and docstrings (#558, #560)

Thanks @eugen-ajechiloae-clearml, @pollfly and @Rizwan-Hasan for contributing!
Source code(tar.gz)
Source code(zip)
clearml-1.2.0-py2.py3-none-any.whl(1.04 MB)
v1.1.6(Jan 20, 2022)
Features

Add Task.force_store_standalone_script() to force storing standalone script instead of a Git repository reference (#340)

Add Logger.set_default_debug_sample_history() and Logger.get_default_debug_sample_history() to allow controlling maximum debug samples programmatically

Add populate now stores function arg types as part of the hyper paremeters

Add status_message argument to Task.mark_stopped()

Change HTTP driver timeout and retry codes (connection timeout will now trigger a retry)

Bug Fixes

Fix and upgrade the SlackMonitor (#533)

Fix network issues causing Task to stop on status change when no status change has occurred (#535)

Fix Pipeline controller function support for dict as input argument

Fix uploading the same metric/variant from multiple processes in threading mode should create a unique file per process (since global counter is not passed between the subprocesses)

Fix resource monitoring should only run in the main process when using threaded logging mode

Fix fork patching so that the signal handler (at_exit) will be called on time

Fix fork (process pool) hangs or drops reports when reports are at the end of the forked function in both threaded and subprocess mode reporting

Fix Multi-pipeline support

Fix delete artifacts after upload

Fix artifact preview has no truth value

Fix storage cache cleanup does not remove all entries on a silent fail

Fix always store session cache in ~/.clearml (regardless of the cache folder)

Fix StorageManager.download_folder() fails on Windows path

Source code(tar.gz)
Source code(zip)
clearml-1.1.6-py2.py3-none-any.whl(1.03 MB)
v1.1.5(Jan 1, 2022)
Features

Add support for jsonargpraser (#403)

Add HyperParameterOptimizer.get_top_experiments_details() returns the hparams and metrics of the top performing experiments of an HPO (#473)

Allow overriding initial iteration offset using environment variable (CLEARML_SET_ITERATION_OFFSET) or Task.init(continue_last_task==<offset>) (#496)

Add better input handling for clearml-init in colab (#515)

Add environment variable for default request method (#521)

Add LocalClearmlJob as possible option for HPO (#525)

Add convenience functionality to clearml-data (#526)

Add support for vscode-jupyter (https://github.com/microsoft/vscode-jupyter/pull/8531)

Improve detection of running reporting subprocess (including zombie state)

Support controlling S3/Google Cloud Storage _stream_download_pool_connections using the stream_connections configuration setting in clearml.conf (default 128)

Add warning when loosing reporting subprocess

Add Model.remove() to allow removing a model from the model repository

Add HTTP download timeout control (change default connection timeout to 30 seconds)

Add initial setup callback to monitoring class

Add Task.get_reported_plots()

Allow Monitor.get_query_parameters to override defaults

Add support for Google Cloud Storage pool_connections and pool_maxsize overrides

Add last worker time to AutoScaler

Add warning when opening an aborted Dataset

Store multi-pipeline execution plots on the master pipeline Task

Support pipeline return value stored on pipeline Task

Add PipelineDecorator.multi_instance_support

Add PipelineDecorator to clearml and clearml.automation namespaces

Documentation and examples

Update docstrings (#501)

Add Markdown in pipeline jupyter notebooks (#502)

Update pipeline example (#494)

Add abseil example (#509)

Change README to dark theme (#513)

Update XGBoost example (#524)

Change example name (#528)

Bug Fixes

Fix TriggerScheduler on Dataset change (#491)

Fix links in Jupyter Notebooks (#505)

Fix pandas delta datetime conversion (#510)

Fix matplotlib auto-magic detect bar graph series name (#518)

Fix path limitation on storage services (posix, object storage) when storing target artifacts by limiting length of project name (full path) and task name used for object path (#516)

Fix multi-processing context block catching exception

Fix Google Cloud Storage with no default project causes a crash

Fix main process's reporting subprocess lost, switch back to thread mode

Fix forked StorageHelper should use its own ThreadExecuter

Fix local StorageHelper.delete() raising exception on non existing file instead of returning false

Fix StorageHelper rename partial file throwing errors on multiple access

Fix resource monitor fails on permission issues (skip over parts)

Fix reusing Task does not reset it

Fix support clearml PyCharm Plugin 1.0.2 (support partial pycharm git repo sync)

Fix Task.reset() force argument ineffective

Fix PY3.5 compatibility

Fix validation error causes infinite loop

Fix tasks schema prevents sending null container parts

Fix missing CLEARML_SET_ITERATION_OFFSET definition

Fix Model.get_weights_package() returns None on error

Fix download progress bar based on sdk.storage.log.report_download_chunk_size_mb configuration

Fix Conda lists the CudaToolkit version installed (for the agent to reproduce)

Fix Jupyter kernel shutdown causing nested atexit callbacks leaving Task in running state

Fix multi-subprocess can cause Task to hand at close

Fix TF 2.7 support (get logdir on with multiple TB writers)

Source code(tar.gz)
Source code(zip)
clearml-1.1.5-py2.py3-none-any.whl(1.03 MB)
v1.1.4(Nov 8, 2021)
Bug Fixes

Fix duplicate keyword argument (affects clearml-data Dataset.get()) https://github.com/allegroai/clearml/issues/490, ClearML Slack Channel #1, #2, #3, #4

Fix session raises missing host error when in offline mode #489

Fix Task.get_task() does not load output_uri from stored Task

Fix Task.get_models()['input'] returns string instead of clearml.Model

Fix tf.saved_model.load() binding for TensorFlow >=2.0

Fix hyperparams with None value converted to empty string causes inferred type to change to str in consecutive Task.connect() calls

Source code(tar.gz)
Source code(zip)
clearml-1.1.4-py2.py3-none-any.whl(1.02 MB)
v1.1.3(Oct 25, 2021)
Features

Add support for MegEngine with examples (#455)

Add TaskTypes to main namespace (#453)

Add LogUnifomParameterRange for hyperparameter optimization with Optuna (#462)

Add joblib (equivalent to scikit) to Task.init(auto_connect_frameworks) argument

Log environment variables starting with * in environ_bind.py (#459)

Pipeline

Add eager decorated pipeline execution

Support pipeline monitoring for scalers/models/artifacts

Add PipelineController.upload_model()

Add PipelineController.add_step(configuration_overrides) argument allowing to override Task configuration objects

Change PipelineController.start_locally() default run_pipeline_steps_locally=False

Add PipelineController.stop(mark_failed, mark_aborted) argguments

Add PipelineController.run_locally decorator

Add PipelineController.continue_on_fail property

Add PipelineController.__init__(abort_on_failure) argument

Add ClreamlJob state cache (refresh every second)

Datasets

Add clearml-data multi-chunk support

Change clearml-data default chunk size to 512MB

Change Dataset.create() now automatically reverts to using current Task if no project/name provided

Add Optimizer.get_top_experiments_id_metrics_pair() for top performing experiments

Add support for setting default value to auto connected argparse arguments

Add Task.get_script() and Task.set_script() for getting and setting task's script prioerties for execution

Add Task.mark_completed() force and status_message arguments

Add Task.stopped() reason argument

Add Task.query_tasks(), Task.get_task() and Task.get_tasks() tags argument

Bug Fixes

Fix PyJWT resiliency support

Fix xgb train overload (#456)

Fix http:// throws OSError in Windows by using pathlib2 instead of os (#463)

Fix local diff should include staged commits, otherwise applying git diff fails (#457)

Fix task.upload_artifact non-standard dictionary will now revert to pickle (#452)

Fix S3BucketConfig.is_valid() for EC2 environments with use_credentials_chain (#478)

Fix audio classifier example when training with a custom dataset (#484)

Fix clearml-task diff was corrupted by Windows drive letter and separator (#483)

Fix TQDM "line cleanup" not using CR but rather arrow-up escape sequence (#181)

Fix task.connect(dict) value casting - if None is the default value, use backend stored type

Fix Jupyter notebook should always set Task as completed/stopped, never failed (exceptions are caught in interactive session)

Fix Pipeline support

Fix LocalClearmlJob setting failed status

Fix pipeline stopping all running steps

Fix nested pipeline component parent point to pipeline Task

Fix PipelineController.start() should not kill the process when done

Fix pipeline failing to create Step Task should cause the pipeline to be marked failed

Fix nested pipeline components missing pipeline tags

Fix images reported over history size were not sent if frequency was too high

Fix git detectors missing git repository without origin

Fix support for upload LazyEvalWrapper artifacts

Fix duplicate task dataset tags

Fix FileLock create target folder

Fix crash inside forked subprocess might leave SafeQueue in a locked state, causing task.close() to hang

Fix PyTorch distributed example TimeoutSocket issue in Windows

Fix broken Dataset.finalize()

Fix Python 3.5 compatibility

Source code(tar.gz)
Source code(zip)
clearml-1.1.3-py2.py3-none-any.whl(1.02 MB)
v1.1.2(Oct 7, 2021)
Bug Fixes

Fix PyJWT issue (limit dependency to <2.2.0)

Source code(tar.gz)
Source code(zip)
clearml-1.1.2-py2.py3-none-any.whl(1018.40 KB)
1.1.1(Sep 20, 2021)
Bug Fixes

Fix Logger.report_image() throws warning

Fix TensorBoard add_image() not being reported

Source code(tar.gz)
Source code(zip)
clearml-1.1.1-py2.py3-none-any.whl(1018.39 KB)
1.1.0(Sep 19, 2021)
Breaking Changes

New PipelineController v2 (note: new constructor is not backwards compatible)

Disable default demo server (available by setting the CLEARML_NO_DEFAULT_SERVER=0 environment variable)

Deprecate Task.completed() (use Task.mark_completed() instead)

Features

Add Task Trigger Scheduler

Add Task Cron Scheduler

Add PipelineController from function

Add PipelineDecorator (PipelineDecorator.pipeline and PipelineDecorator.component decorators for full custom pipeline logic)

Add xgboost auto metric logging #381

Add sdk.storage.log.report_upload_chunk_size_mb and sdk.storage.log.report_download_chunk_size_mb configuration options to control upload/download log reporting #424

Add new optional auto_connect_frameworks argument value to Task.init() (e.g. auto_connect_frameworks={'tfdefines':False}) to allow disabling TF defines #408

Add support for CLEARNL_CONFIG_VERBOSE environment variable to allow external control over verbosity of the configuration loading process

Add support for uploading artifacts with a list of files using Task.upload_artifcats(name, [Path(), Path()])

Add missing clearml-task parameters --docker_args, --docker_bash_setup_script and --output-uri

Change CreateAndPopulate will auto list packages imported but not installed locally

Add clearml.task.populate.create_task_from_function() to create a Task from a function, wrapping function input arguments into hyper-parameter section as kwargs and storing function results as named artifacts

Add support for Task serialization (e.g. for pickle)

Add Task.get_configuration_object_as_dict()

Add docker_image argument to Task.set_base_docker() (deprecate docker_cmd)

Add auto_version_bump argument to PipelineController

Add sdk.development.detailed_import_report configuration option to provide a detailed report of all python package imports

Set current Task as Dataset parent when creating dataset

Add support for deferred configuration

Examples

Add Pipeline v2 examples

Add TaskScheduler and TriggerScheduler examples

Add pipeline controller callback example

Improve existing examples and docstrings

Bug Fixes

Fix poltly plots converting NaN to nan instead of null #373

Fix deprecation warning #376

Fix plotly multi-index without index names #399

Fix click support #437

Fix docstring #438

Fix passing task-type to clearml-task #422

Fix clearml-task --version throws an error #422

Fix clearml-task ssh repository links are not detected as remote repositories #423

Fix getattr throws an exception #426

Fix encoding while saving notebook preview #443

Fix poetry toml file without requirements.txt #444

Fix PY3.x fails calling SemLock._after_fork with forkserver context, forking while lock is acquired https://github.com/allegroai/clearml-agent/issues/73

Fix wrong download path in StorageManager.download_folder()

Fix jupyter notebook display(...) convert to print(...)

Fix Tensorflow add_image() with description='text'

Fix Task.close() should remove current_task() reference

Fix TaskScheduler weekdays, change default execute_immediately to False

Fix Python2 compatibility

Fix clearml-task exit with error when failing to verify output_uri (output warning instead)

Fix unsafe Google Storage delete object

Fix multi-process spawning wait-for-uploads can create a deadlock in very rare cases

Fix task.set_parent() fails when passing Task object

Fix PipelineController skipping queued Tasks

Remove humanfriendly dependency (unused)

Source code(tar.gz)
Source code(zip)
clearml-1.1.0-py2.py3-none-any.whl(1018.39 KB)
1.0.5(Aug 5, 2021)
Features

Add Click support and examples #386

Add progress bar to SHA2 generation #396

Add prefix to Task reported runtime info: cpu_cores, gpu_driver_version and gpu_driver_cuda_version

Add support for Logger.report_text() explicit log-level reporting

Add return_full_path argument to StorageManager.list()

Support Task.get_tasks() passing multiple project names

Add TaskScheduler

Add task_filter argument to Objective.get_top_tasks(), allow name as a task_filter field

Add --output-uri command-line option to clearml-task

Add requirements_file argument to Task.force_requirements_env_freeze() to allow specifying a local requirements file

Add support for list type argument in Task.connect_configuration() (previously only dict type was supported)

Rename TrainsTuner to ClearmlTuner

Update documentation links

Bug Fixes

Fix Pandas with multi-index #399

Fix check permissions fail in HTTPDriver #394

Fix Dataset not setting system tag on existing data_processing Tasks

Fix disable redundant resource monitoring in pipeline controller

Fix ClearMLJob when both project and target_project are specified

Fix ClearMLJob docker container info is not cached

Fix no print logging after Python logging handlers are cleared

Fix PipelineController callback returning False

Fix machine specs when GPU is not supported

Fix internal logging.Logger can't be pickled (only applicable to Python 3.6 or lower)

Wait for reported events to flush to ensure Task.flush() with wait_for_uploads=True awaits background processes

Source code(tar.gz)
Source code(zip)
clearml-1.0.5-py2.py3-none-any.whl(993.35 KB)
1.0.4(Jun 22, 2021)
Features

Add Google Colab notebook tutorial #368 #374

Add support for GIF images in Tensorboard #372

Add a tensorboardX example for add_video (creates GIFs in tensorboard) #372

Add auto scaler customizable boot bash script

Add Task.ignore_requirements

Deprecate Logger.tensorboard_single_series_per_graph() as it is now controlled from the UI 🙂

Bug Fixes

Fix default_output_uri for Dataset creation #371

Fix clearml-task failing without a docker script #378

Fix Pytorch DDP sub-process spawn multi-process

Fix Task.execute_remotely() on created Task (not initialized Task)

Fix auto scaler custom bash script should be called last before starting agent

Fix auto scaler spins too many instances at once then kills the idle ones (spin time is longer than poll time)

Fix multi-process spawn context using ProcessFork kills sub-process before parent process ends

Source code(tar.gz)
Source code(zip)
clearml-1.0.4-py2.py3-none-any.whl(981.43 KB)

ClearML - Auto-Magical Suite of tools to streamline your ML workflow. Experiment Manager, MLOps and Data-Management

Related tags

Overview

ClearML

Formerly known as Allegro Trains

ClearML Experiment Manager

ClearML Architecture

Additional Modules

Why ClearML?

Who We Are

License

Documentation, Community & Support

Contributing

Comments

EDIT:

The issue

Reproducing

Describe the bug

To reproduce

Expected behaviour

Environment

Proposal Summary

Motivation

Related Discussion

Describe the bug

To reproduce

Expected behaviour

Environment

Related Discussion

Related Issue \ discussion

Patch Description

Testing Instructions

Other Information

Proposal Summary

Related Discussion

Releases(v1.9.0)

v1.9.0(Dec 23, 2022)

New Features and Improvements

Bug Fixes

v1.8.3(Dec 4, 2022)

Bug fixes

v1.8.2(Dec 1, 2022)

New Features and Improvements

Bug Fixes

v1.8.1(Nov 21, 2022)

New Features and Improvements

Bug Fixes

v1.8.0(Nov 13, 2022)

New Features and Improvements

Bug Fixes

v1.7.2(Oct 23, 2022)

New Features and Improvements

Bug Fixes

v1.7.1(Sep 30, 2022)

New Features and Improvements

Bug Fixes

v1.7.0(Sep 15, 2022)

New Features and Improvements

Bug Fixes

v1.6.4(Aug 10, 2022)

Bug Fixes

v1.6.3(Aug 9, 2022)

New Features and Improvements

Bug Fixes

v1.6.2(Jul 4, 2022)

Bug Fixes

v1.6.1(Jul 1, 2022)

Bug Fixes

v1.6(Jun 29, 2022)

New Features and Improvements

Bug Fixes

Known issues

v1.5.0(Jun 16, 2022)

New Features and Improvements

Bug Fixes

v1.4.1(May 17, 2022)

Bug Fixes

v1.4.0(May 5, 2022)

New Features

Bug Fixes