
Reputation
Badges 1
183 × Eureka!Sure, but I mean, apart from label it as a local path, what's the point of renaming the original path if my goal is to access it later using the name I gave it?
Well I tried several things but none of them have worked. I'm a bit lost
AgitatedDove14 After checking, I discovered that apparently it doesn't matter if each pipeline is executed by a different worker, the error persists. Honestly this has me puzzled. I'm really looking forward to getting this functionality right because it's an aspect that would make ClearML shine even more.
Well, this is just a mock example 🙂 . In the real application I'm working on there will be more than one configuration file (in principle one for the data and one for the DL model). Regarding the fix, I am not in a hurry at the moment. I'll happily wait for tomorrow (or the day after) when the commit is pushed!
But maybe another solution would be to pass the configuration files paths as function arguments, then read and parse them inside the pipeline
What exactly do you mean by that? From VS Code I execute the following script, and then the agents take care of executing the code remotely:
` import pandas as pd
from clearml import Task, TaskTypes
from clearml.automation.controller import PipelineDecorator
CACHE = False
@PipelineDecorator.component(
name="Wind data creator",
return_values=["wind_series"],
cache=CACHE,
execution_queue="data_cpu",
task_type=TaskTypes.data_processing,
)
def generate_wind(start_date: st...
AgitatedDove14 The pipelines are executed by the agents that are listening to the queue given by pipeline_execution_queue="controllers"
Great AgitatedDove14 , I tested it on the mock example and it worked just as I expected 🎉
Okay! I'll keep an eye out for updates.
AgitatedDove14 Exactly, I've run into the same problem
AgitatedDove14 I ended up with two pipelines being executed until they completed the workflow but duplicating each of their steps. You can check it here:
https://clearml.slack.com/files/U02A5DGPMPU/F02SR3G9RDK/image.png
Now it's okey. I have found a more intuitive way to get around. I was facing the classic 'xy' problem :)
AgitatedDove14 I have the strong feeling it must be an agent issue, because when I place PipelineDecorator.run_locally()
before calling the pipeline, everything works perfectly. See:
Indeed it does! But what still puzzles me so badly is why I get below path when running dataset.get_local_copy()
on one of the machines of my cluster:/home/user/.clearml/cache/storage_manager/datasets/.lock.000.ds_61ff8d4335dd4b74bd78c3576fa44131.clearml
Why is it pointing to a .lock file?
Perfect, that's exactly what I was looking for 🙂 Thanks!
AgitatedDove14 By adding PipelineDecorator.run_locally()
everything seems to work perfectly. This is what I expect the experiment listing to look like when the agents are the ones running the code. With this, I'm pretty sure the error search can be narrowed down to the agents' code.
Where can I find this documentation?
Or perhaps the complementary scenario with a continue_on_failed_steps
parameter which may be a list containing only the steps that can be ignored in case of failure.
AgitatedDove14 BTW, I got the notification from GitHub telling me you had committed the fix and I went ahead. After testing the code again, I see the task parameter dictionary has been removed properly (now it has been broken down into flat parameters). However, I still have the same problem with duplicate tasks, as you can see in the image.
Or maybe you could bundle some parameters that belongs to PipelineDecorator.component into high-level configuration variable (something like PipelineDecorator.global_config (?))
Thanks, I'd appreciate it if you let me know when it's fixed :D
AgitatedDove14 So did you get the same results without step duplication?
AgitatedDove14 Oops, something still seems to be wrong. When trying to retrieve the dataset using get_local_copy() I get the following error:
` Traceback (most recent call last):
File "/home/user/myproject/lab.py", line 27, in <module>
print(dataset.get_local_copy())
File "/home/user/.conda/envs/myenv/lib/python3.9/site-packages/clearml/datasets/dataset.py", line 554, in get_local_copy
target_folder = self._merge_datasets(
File "/home/user/.conda/envs/myenv/lib/python3.9/site-p...
So, for example, if I have a machine with 64 CPU cores, I will be able to run up to 64 agents (in case my system consists of only this machine), right?
Ok! I'll try to spin up an agent with the --service-mode
command and I will give you feedback
Sure, here is a trivial example:from clearml import Dataset dataset = Dataset.create(dataset_name="Dataset_v1.1.3", dataset_project="Mocks") dataset.finalize() loaded_dataset = Dataset.get(dataset_id=dataset.id)
Hi TimelyPenguin76
No errors with this new version!
Hey AgitatedDove14 ! Any news on this? 🙂
CostlyOstrich36 Yes, it happens on the following line, at the time of calling the pipeline.forecast = prediction_service(config=default_config)
Were you able to reproduce the example?
By adding the slash I have been able to see that indeed the dataset is stored in output_url
. However, when calling finalize
, I get the same error. And yes, I have installed the version corresponding to the last commit :/