Reputation
Badges 1
183 × Eureka!Or perhaps the complementary scenario with a continue_on_failed_steps
parameter which may be a list containing only the steps that can be ignored in case of failure.
I'm totally agree with the pipelinecontroller/decorator part. Regarding the proposal for the component parameter, I also think it would be a good feature, although it might mislead the fact that there will be times when the pipeline will fail because it is an intrinsically crucial step, so it doesn't matter whether 'continue_pipeline_on_failure' is set to True or False. Anyway, I can't think a better way to deal with that right now.
Yep, you were absolutely right. What Dask did not like was the object self.preprocesser
inside read_and_process_file
, not Task.init
. Since the dask.distributed.Client
is initialized in that same class, maybe it's something that Dask doesn't allow.
Sorry for blaming ClearML
without solid evidence x)
I see, but I don't understand the part where you talk about passing the task ID to the child processes. Sorry if it's something trivial. I recently started working with ClearML.
For sure! Excluding some parts related to preprocessing, this is the code I would like to parallelize with dask.distributed.Client
.
` from typing import Any, Dict, List, Tuple, Union
from pathlib import Path
import xarray as xr
from clearml import Task
from dask.distributed import Client, LocalCluster
def start_dask_client(
n_workers: int = None, threads_per_worker: int = None, memory_limit: str = "2Gb"
) -> Client:
cluster = LocalCluster(
n_workers=n_workers,
...
Currently I'm working with v1.0.5. Anyway, I found that it is possible to connect the new argument if I store in a variable the arguments returned by task.connect(args)
. I expected that since it is a mutable object it would not be necessary to overwrite args
, but apparently it is required in this version of ClearML.
But this path actually does not exist in my system, so how should I fix that?
Sure, but I mean, apart from label it as a local path, what's the point of renaming the original path if my goal is to access it later using the name I gave it?
I currently deal with that by skipping the first 5 characters of the path, i. e. the 'file:' part. But I'm sure there is a cleaner way to proceed.
My idea is to take advantage of the capability of getting parameters connected to a task from another task to read the path where the artifacts are stored locally, so I don't have to define it again in each script corresponding to a different task.
Now it's okey. I have found a more intuitive way to get around. I was facing the classic 'xy' problem :)
Oh, I see. I guess somehow I can retrieve that information via Task.logger
, since it is stored in JSON format? Thanks!
Of course it's always a good idea to have that extra option just in case 🙂
Nevermind, I've already found a cleaner way to address this problem. I really appreciate your help!
Yes, before removing the 'default' queue I was able to shut down agents without specifying further options after the --stop
command. I just had to run clearml-agent daemon --stop
as many times as there were agents. Of course, I will open the issue as soon as possible :D
Well, just as you can pass the 'task_type' argument in PipelineDecorator.component
, it might be a good option to pass the rest of the 'Task.init' arguments as they are passed in the original method (without using a dictionary)
BTW, let's say I accidentally removed the 'default' queue from the queue list. As a result, when I try to stop an agent using clearml-agent daemon --stop
, I get the following error:clearml_agent: ERROR: APIError: code 400/707: No queue is tagged as the default queue for this company
I have already created another queue also called 'default' but it had no effect :/
For any reason I can't get the values in their original types. Only the dictionary keys are returned as the raw nested dictionary, but the values remain casted.
Hi AgitatedDove14 , it's nice to know you've already pinpointed the problem! I think the solution you propose is a good one, but does that mean I have to unpack all the dictionary values as parameters of the pipeline function? Wouldn't that make the function too "dirty"? Or do you mean you will soon push a commit that will allow me to keep passing a dictionary and ClearML automatically flatten it?
AgitatedDove14 The pipelines are executed by the agents that are listening to the queue given by pipeline_execution_queue="controllers"
AgitatedDove14 So did you get the same results without step duplication?
That's right! run_locally() does just what I was expecting
What exactly do you mean by that? From VS Code I execute the following script, and then the agents take care of executing the code remotely:
` import pandas as pd
from clearml import Task, TaskTypes
from clearml.automation.controller import PipelineDecorator
CACHE = False
@PipelineDecorator.component(
name="Wind data creator",
return_values=["wind_series"],
cache=CACHE,
execution_queue="data_cpu",
task_type=TaskTypes.data_processing,
)
def generate_wind(start_date: st...
AgitatedDove14 I ended up with two pipelines being executed until they completed the workflow but duplicating each of their steps. You can check it here:
https://clearml.slack.com/files/U02A5DGPMPU/F02SR3G9RDK/image.png
Well, this is just a mock example 🙂 . In the real application I'm working on there will be more than one configuration file (in principle one for the data and one for the DL model). Regarding the fix, I am not in a hurry at the moment. I'll happily wait for tomorrow (or the day after) when the commit is pushed!
AgitatedDove14 After checking, I discovered that apparently it doesn't matter if each pipeline is executed by a different worker, the error persists. Honestly this has me puzzled. I'm really looking forward to getting this functionality right because it's an aspect that would make ClearML shine even more.
Well, I need to write boilerplate code to do parsing stuff if I want to use the original values after I connect the dictionary to the task, so it's a bit messy.
Currently I'm using clearml v1.0.5 and clearml-agent v1.0.0
Great AgitatedDove14 , I tested it on the mock example and it worked just as I expected 🎉
Hey AgitatedDove14 ! Any news on this? 🙂