Reputation
Badges 1
183 × Eureka!AgitatedDove14 The pipelines are executed by the agents that are listening to the queue given by pipeline_execution_queue="controllers"
AgitatedDove14 After checking, I discovered that apparently it doesn't matter if each pipeline is executed by a different worker, the error persists. Honestly this has me puzzled. I'm really looking forward to getting this functionality right because it's an aspect that would make ClearML shine even more.
Great AgitatedDove14 , I tested it on the mock example and it worked just as I expected 🎉
Okay! I'll keep an eye out for updates.
AgitatedDove14 I have the strong feeling it must be an agent issue, because when I place PipelineDecorator.run_locally()
before calling the pipeline, everything works perfectly. See:
AgitatedDove14 By adding PipelineDecorator.run_locally()
everything seems to work perfectly. This is what I expect the experiment listing to look like when the agents are the ones running the code. With this, I'm pretty sure the error search can be narrowed down to the agents' code.
Hi! Not really. It's rather random :/
What exactly do you mean by that? From VS Code I execute the following script, and then the agents take care of executing the code remotely:
` import pandas as pd
from clearml import Task, TaskTypes
from clearml.automation.controller import PipelineDecorator
CACHE = False
@PipelineDecorator.component(
name="Wind data creator",
return_values=["wind_series"],
cache=CACHE,
execution_queue="data_cpu",
task_type=TaskTypes.data_processing,
)
def generate_wind(start_date: st...
Well, this is just a mock example 🙂 . In the real application I'm working on there will be more than one configuration file (in principle one for the data and one for the DL model). Regarding the fix, I am not in a hurry at the moment. I'll happily wait for tomorrow (or the day after) when the commit is pushed!
AgitatedDove14 Exactly, I've run into the same problem
AgitatedDove14 I ended up with two pipelines being executed until they completed the workflow but duplicating each of their steps. You can check it here:
https://clearml.slack.com/files/U02A5DGPMPU/F02SR3G9RDK/image.png
Hi AgitatedDove14 , it's nice to know you've already pinpointed the problem! I think the solution you propose is a good one, but does that mean I have to unpack all the dictionary values as parameters of the pipeline function? Wouldn't that make the function too "dirty"? Or do you mean you will soon push a commit that will allow me to keep passing a dictionary and ClearML automatically flatten it?
AgitatedDove14 So did you get the same results without step duplication?
Hey AgitatedDove14 ! Any news on this? 🙂
Oddly enough I didn't run into this problem today 🤔 If it happens to me again, I'll return to this thread 🙂
Yes, although I use both terms interchangeably. The information will actually be contained in JSON files.
AgitatedDove14 BTW, I got the notification from GitHub telling me you had committed the fix and I went ahead. After testing the code again, I see the task parameter dictionary has been removed properly (now it has been broken down into flat parameters). However, I still have the same problem with duplicate tasks, as you can see in the image.
But maybe another solution would be to pass the configuration files paths as function arguments, then read and parse them inside the pipeline
So I assume that you mean to report not only the agent's memory usage, but also of all the subprocesses the agent spawns (?)
My idea is to take advantage of the capability of getting parameters connected to a task from another task to read the path where the artifacts are stored locally, so I don't have to define it again in each script corresponding to a different task.
Sure, but I mean, apart from label it as a local path, what's the point of renaming the original path if my goal is to access it later using the name I gave it?
But this path actually does not exist in my system, so how should I fix that?
I currently deal with that by skipping the first 5 characters of the path, i. e. the 'file:' part. But I'm sure there is a cleaner way to proceed.
Now it's okey. I have found a more intuitive way to get around. I was facing the classic 'xy' problem :)
But how can I reference that exact daemon execution? I tried with the ID but it fails:
clearml-agent daemon AGENT_ID --stop
I'm using the last commit. I'm just fitting a scikit-learn MinMaxScaler
object to a dataset of type tf.data.Dataset
inside a function (which represents the model training step) decorated with PipelineDecorator.component
. The function does not even return the scaler object as an artifact. However, the scaler object is logged as an artifact of the task, as shown in the image below.