Sure, converting pipelines into components also works for me (ignoring still having to fix the problem with LazyEvalWrapper return values). But this way some interesting features of the pipeline are missing, such as displaying the step execution DaG in the PLOTS tab .
To sum up, we agree that it will be nice to enable the nested components tags. I will continue playing with the capabilities of nested components and keep reporting bugs as I come across them!
I am aware of the option to enable virtual environment caching, but that is still very time consuming.
Hi TimelyPenguin76
No errors with this new version!
Sure, it's already enabled. I noticed in the ClearML agent configuration another parameter related to environment caching, named as venv_update (I believe it's still in beta). Do you think enabling this parameter significantly helps to build environments faster?
Yes, I guess. Since pipelines are designed to be executed remotely it may be pointless to enable an output_uri parameter in the PipelineDecorator.component . Anyway, could another task be initialized in the same scr...
Hey CostlyOstrich36 AgitatedDove14 ! Any news on this? Should I open an issue?
I mean that I have a script for data preprocessing task where I need the following dependencies:
` import sys
from pathlib import Path
from contextlib import contextmanager
import numpy as np
from clearml import Task
with add_temporary_module_search_path("/home/user/myclearML/"):
from helpers import (
read_netcdf_dataset,
write_records,
) However, the xarray package is a dependency of the helpers module which is required by the read_netcdf_dataset `...
Ok! I'll try to spin up an agent with the --service-mode command and I will give you feedback
Mmm well, I can think of a pipeline that could save its state in the instant before the error occurred. So that using some crontab/scheduler the pipeline could be resumed at the point where it was stopped in the case of not having been completed. Is there any functionality like this? Something like PipelineDecorator/PipelineController.resume_from(state_filepath) ?
Thanks AgitatedDove14 ! Wow, I was definitely not expecting that behavior 🤣 I will check it out tomorrow. Just one more thing, what do you mean by "my_task_id_that_i_generated_before_here"?
When you said clearml-agent initial setup are you talking about the agent section in the clearml.conf or the CLI instructions? If it is the second case I am starting the agent with the basic command:clearml-agent daemon --queue defaultIs there any other settings I should specify to the agent?
How can I tell clearml I will use the same virtual environment in all steps and there is no need to waste time re-installing all packages for each step?
Hi AgitatedDove14 , just one last thing before closing the thread. I was wondering what is the use of PipelineController.create_draft if you can't use it to clone and run tasks, as we have seen
Nested pipelines do not depend on each other. You can think of it as several models being trained or doing inference at the same time, but each one delivering results for a different client. So you don't use the output from one nested pipeline to feed another one running concurrently, if that's what you mean.
AgitatedDove14 It's in the configuration file where I specified that information. But I think this error has only appeared since I upgraded to version 1.1.4rc0
Yes, when the parameters that are connected do not have nested dictionaries, everything works fine. The problem comes when I try to do something like this:
` from clearml import Task
task = Task.init(project_name="Examples", task_name="task with connected dict")
args = {}
args["period"] = {"start": "2020-01-01 00:00", "end": "2020-12-31 23:00"}
task.connect(args) `
and the clone task is like this:
` from clearml import Task
template_task = Task.get_task(task_id="<Your template task id>"...
Okay, so the idea behind the new decorator is not to group all the defined steps under the same script so that they share the same environment, but rather to simplify the process of creating scripts for each step and avoid manually calling Task.init on those scripts.
Regarding virtual environment creation from caching, I will keep running benchmarks (from what you say it might be due to high workload in the servers we use)
So far I've been unlucky in the attempt of clearml recog...
Since I am still on time, I would like to report another minor bug related to the 'add_pipeline_tags' parameter of PipelineDecorator.pipeline . It turns out when the pipeline consists of components that in turn use other components (via 'helper_functions'), these nested components are not tagged with 'pipe: <pipeline_task_id>'. I assume this should not be like that, right?
I have found it is not possible to start a pipeline B after a pipeline A. Following the previous example, I have added one more pipeline to the script:
` from clearml import Task
from clearml.automation.controller import PipelineDecorator
@PipelineDecorator.component(return_values=["msg"], execution_queue="model_trainings")
def step_1(msg: str):
msg += "\nI've survived step 1!"
return msg
@PipelineDecorator.component(return_values=["msg"], execution_queue="model_trainings")
def st...
I mean to use a function decorated with PipelineDecorator.pipeline inside another pipeline decorated in the same way.
In the traceback attached below you can see that I am trying to use a component named user_config_creation inside the create_user_configs sub-pipeline. I have imported user_config_creation inside create_user_configs but a KeyError is raised (however I assume the function has been imported correctly because no ImportError or ` ModuleNo...
Of course it's always a good idea to have that extra option just in case 🙂
Nevermind, I've already found a cleaner way to address this problem. I really appreciate your help!
In fact, the datasets directory does not even exist
I mean the agent that will run the function (which represents a pipeline step) should clone the repo in order to find the location of the project modules that are required for the function to be executed. Also, I have found that clearml does not automatically detect the imports specified within the function decorated with PipelineDecorator.component (despite I followed a similar scheme to the one in the example https://github.com/allegroai/clearml/blob/master/examples/pipeline/pi...
Hi AgitatedDove14 ,
I have already developed a mock test that can be somewhat similar to the pipeline we are developing. The same problem arises. Only the task is created for the first set of parameters in the for loop. Here, only the configuration text file is created for user 1. Can you reproduce it?
` from clearml import Task
from clearml.automation.controller import PipelineDecorator
@PipelineDecorator.component(
return_values=["admin_config_path"], cache=False, task_type=Task.Task...
In my use case I have a pipeline that executes inference tasks with several models simultaneously. Each inference task is actually a component that acts as a pipeline, since it executes the required steps to generate the predictions (dataset creation, preprocessing and prediction). For this, I'm using the new pipeline functionality ( PipelineDecorator )
I'm totally agree with the pipelinecontroller/decorator part. Regarding the proposal for the component parameter, I also think it would be a good feature, although it might mislead the fact that there will be times when the pipeline will fail because it is an intrinsically crucial step, so it doesn't matter whether 'continue_pipeline_on_failure' is set to True or False. Anyway, I can't think a better way to deal with that right now.
I see, but I don't understand the part where you talk about passing the task ID to the child processes. Sorry if it's something trivial. I recently started working with ClearML.