Reputation
Badges 1
183 × Eureka!Sure, it would be very intuitive if the command to stop an agent would be as easy as:clearml-agent daemon --stop AGENT_PID
Hi AgitatedDove14 ,
I have already developed a mock test that can be somewhat similar to the pipeline we are developing. The same problem arises. Only the task is created for the first set of parameters in the for loop. Here, only the configuration text file is created for user 1. Can you reproduce it?
` from clearml import Task
from clearml.automation.controller import PipelineDecorator
@PipelineDecorator.component(
return_values=["admin_config_path"], cache=False, task_type=Task.Task...
In my use case I have a pipeline that executes inference tasks with several models simultaneously. Each inference task is actually a component that acts as a pipeline, since it executes the required steps to generate the predictions (dataset creation, preprocessing and prediction). For this, I'm using the new pipeline functionality ( PipelineDecorator )
I'm totally agree with the pipelinecontroller/decorator part. Regarding the proposal for the component parameter, I also think it would be a good feature, although it might mislead the fact that there will be times when the pipeline will fail because it is an intrinsically crucial step, so it doesn't matter whether 'continue_pipeline_on_failure' is set to True or False. Anyway, I can't think a better way to deal with that right now.
I see, but I don't understand the part where you talk about passing the task ID to the child processes. Sorry if it's something trivial. I recently started working with ClearML.
Oddly enough I didn't run into this problem today 🤔 If it happens to me again, I'll return to this thread 🙂
They share the same code (i.e. the same decorated functions), but using a different configuration.
So, for example, if I have a machine with 64 CPU cores, I will be able to run up to 64 agents (in case my system consists of only this machine), right?
AgitatedDove14 By adding PipelineDecorator.run_locally() everything seems to work perfectly. This is what I expect the experiment listing to look like when the agents are the ones running the code. With this, I'm pretty sure the error search can be narrowed down to the agents' code.
Mmm I see. So the agent is taking the parameters from the base task registered in the server. Then if I call task.get_parameter_as_dict for a task that has not been executed by an agent, should I get the original types of the values?
So ClearML will scan all the repository code searching for package dependencies? Is that right?
Okey, so I could signal to the main pipeline the exception raised in any of the pipeline components and it should halt the whole pipeline. However, are you thinking of including this callbacks features in the new pipelines as well?
The thing is I don't know in advance how many models there will be in the inference stage. My approach is to read from a database the configurations of the operational models through a for loop, and in that loop all the inference tasks would be enqueued (one task for each deployed model). For this I need the system to be able to run several pipelines at the same time. As you told me for now this is not possible, as pipelines are based on singletons, my alternative is to use components
Hi AgitatedDove14 , gotcha. So how can I temporarily fix it? I'm not able to find something like task.set_output_uri() in the official docs. Or maybe do you plan to solve this problem in the very short term?
Thanks, I'd appreciate it if you let me know when it's fixed :D
AgitatedDove14 In the 'status.json' file I could see the 'is_dirty' flag is set to True
Indeed it does! But what still puzzles me so badly is why I get below path when running dataset.get_local_copy() on one of the machines of my cluster:/home/user/.clearml/cache/storage_manager/datasets/.lock.000.ds_61ff8d4335dd4b74bd78c3576fa44131.clearml
Why is it pointing to a .lock file?
I can't figure out what might be going on
Thanks to you for fixing it so quickly!
The scheme is similar to the following:
` main_pipeline
(PipelineDecorator.pipeline)
|
|----------------------------------|
| |
inference_orchestrator_1 inference_orchestrator_2
(PipelineDecorator.component, (PipelineDecorator.component,
acting as a pipeline) acting as a pipeline)
| |
...
Well, just as you can pass the 'task_type' argument in PipelineDecorator.component , it might be a good option to pass the rest of the 'Task.init' arguments as they are passed in the original method (without using a dictionary)
Or perhaps the complementary scenario with a continue_on_failed_steps parameter which may be a list containing only the steps that can be ignored in case of failure.
AgitatedDove14 The pipelines are executed by the agents that are listening to the queue given by pipeline_execution_queue="controllers"
But I cannot go back to version v1.1.3 because there is another bug related to the Dataset tags
So great! It would be a feature that would make the work much easier instead of having to clone the task and launch it with different parameters. It could even be considered more pythonic. Do you have an immediate solution in mind to keep moving forward before the new release is ready? :)
My guess is to manually read and parse the string that clearml-agent list returns, but I'm pretty sure there's a cleaner way to do it, isn't there?
Nice, in the meantime as a workaround I will implement a temporary parsing code at the beginning of step functions