Reputation
Badges 1
183 × Eureka!Sure, it's already enabled. I noticed in the ClearML agent configuration another parameter related to environment caching, named as venv_update
(I believe it's still in beta). Do you think enabling this parameter significantly helps to build environments faster?
Yes, I guess. Since pipelines are designed to be executed remotely it may be pointless to enable an output_uri
parameter in the PipelineDecorator.component
. Anyway, could another task be initialized in the same scr...
Well I tried several things but none of them have worked. I'm a bit lost
But how can I reference that exact daemon execution? I tried with the ID but it fails:
clearml-agent daemon AGENT_ID --stop
My guess is to manually read and parse the string that clearml-agent list
returns, but I'm pretty sure there's a cleaner way to do it, isn't there?
I think it could be a convenient approach. The new parameter abort_on_failed_steps
could be a list containing the name of the steps for which the pipeline will stop its execution if any of them fail (so that we can ignore other steps that are not crucial to continue the pipeline execution)
Hi ExasperatedCrab78 ,
Sure! Sorry for the delay. I'm using Chrome Version 98.0.4758.102 (Official Build) (64-bit)
AgitatedDove14 Oops, something still seems to be wrong. When trying to retrieve the dataset using get_local_copy() I get the following error:
` Traceback (most recent call last):
File "/home/user/myproject/lab.py", line 27, in <module>
print(dataset.get_local_copy())
File "/home/user/.conda/envs/myenv/lib/python3.9/site-packages/clearml/datasets/dataset.py", line 554, in get_local_copy
target_folder = self._merge_datasets(
File "/home/user/.conda/envs/myenv/lib/python3.9/site-p...
Mmm but what if the dataset size is too large to be stored in the .cache path? It will be stored there anyway?
In fact, the datasets
directory does not even exist
Great! Thanks for the heads up!
is there any git redundancy on your network ? maybe you could configure a fallback server ?
I will ask this to the IT team
Hi AgitatedDove14 , so isn't it ClearML best practice to create a draft pipeline to have the task on the server so that it can be cloned, modified and executed at any time?
BTW, let's say I accidentally removed the 'default' queue from the queue list. As a result, when I try to stop an agent using clearml-agent daemon --stop
, I get the following error:clearml_agent: ERROR: APIError: code 400/707: No queue is tagged as the default queue for this company
I have already created another queue also called 'default' but it had no effect :/
I am aware of the option to enable virtual environment caching, but that is still very time consuming.
My idea is to take advantage of the capability of getting parameters connected to a task from another task to read the path where the artifacts are stored locally, so I don't have to define it again in each script corresponding to a different task.
I'm getting a NameError because 'Optional' type hint is not defined in the global scope
Indeed it does! But what still puzzles me so badly is why I get below path when running dataset.get_local_copy()
on one of the machines of my cluster:/home/user/.clearml/cache/storage_manager/datasets/.lock.000.ds_61ff8d4335dd4b74bd78c3576fa44131.clearml
Why is it pointing to a .lock file?
Hi! Not really. It's rather random :/
In my use case I have a pipeline that executes inference tasks with several models simultaneously. Each inference task is actually a component that acts as a pipeline, since it executes the required steps to generate the predictions (dataset creation, preprocessing and prediction). For this, I'm using the new pipeline functionality ( PipelineDecorator
)
AnxiousSeal95 I see. That's why I was thinking of storing the model inside a task just like with the Dataset
class. So that you can either use just the model via InputModel
or the model and all its artifacts via Task.get_task
by using the ID of the task where the model is located.
I would like my cleanup service to remove all tasks older than two weeks, but not the models. Right now, if I delete all tasks the model does not work (as it needs the training tasks). For now, I ...
Yes, before removing the 'default' queue I was able to shut down agents without specifying further options after the --stop
command. I just had to run clearml-agent daemon --stop
as many times as there were agents. Of course, I will open the issue as soon as possible :D
Okay, so the idea behind the new decorator is not to group all the defined steps under the same script so that they share the same environment, but rather to simplify the process of creating scripts for each step and avoid manually calling Task.init
on those scripts.
Regarding virtual environment creation from caching, I will keep running benchmarks (from what you say it might be due to high workload in the servers we use)
So far I've been unlucky in the attempt of clearml
recog...
Hi AgitatedDove14 , great, glad it was fixed quickly!
By the way, before releasing version 1.1.3 you might want to take a look at this mock example. I'm trying to run the same pipeline (with different configurations) in a single for loop, as you can see below:
` from clearml import Task
from clearml.automation.controller import PipelineDecorator
@PipelineDecorator.component(return_values=["msg"], execution_queue="myqueue1")
def step_1(msg: str):
msg += "\nI've survived step 1!"
re...
Or perhaps the complementary scenario with a continue_on_failed_steps
parameter which may be a list containing only the steps that can be ignored in case of failure.
Okey, so I could signal to the main pipeline the exception raised in any of the pipeline components and it should halt the whole pipeline. However, are you thinking of including this callbacks features in the new pipelines as well?
AgitatedDove14 It's in the configuration file where I specified that information. But I think this error has only appeared since I upgraded to version 1.1.4rc0