Perfect, that's exactly what I was looking for 🙂 Thanks!
Or maybe you could bundle some parameters that belongs to PipelineDecorator.component into high-level configuration variable (something like PipelineDecorator.global_config (?))
From what I understood, ClearML
creates a virtual environment from scratch for each task it runs. To detect the dependencies of each script, apparently it inspects the script for the imports and packages specified in Task.add_requirements
. You mean that's not the convenient way for ClearML
to create the environments for each task? What is the right way to proceed in this case?
Hi AnxiousSeal95 !
That's it. My idea is that artifacts can be linked to the model. Typically these artifacts are often links to serialized objects (such as datasets or scalers). They are usually directories or temporary files in mount units that I want to be loaded as artifacts of the task, removed (as they are temporary) and later I can get a new local path via task.artifacts["scalers"].get_local_copy()
. I think this way the model's dependence on the task that created it could be re...
I think it could be a convenient approach. The new parameter abort_on_failed_steps
could be a list containing the name of the steps for which the pipeline will stop its execution if any of them fail (so that we can ignore other steps that are not crucial to continue the pipeline execution)
Hi! Not really. It's rather random :/
In my use case I have a pipeline that executes inference tasks with several models simultaneously. Each inference task is actually a component that acts as a pipeline, since it executes the required steps to generate the predictions (dataset creation, preprocessing and prediction). For this, I'm using the new pipeline functionality ( PipelineDecorator
)
Yes, before removing the 'default' queue I was able to shut down agents without specifying further options after the --stop
command. I just had to run clearml-agent daemon --stop
as many times as there were agents. Of course, I will open the issue as soon as possible :D
Okey, so I could signal to the main pipeline the exception raised in any of the pipeline components and it should halt the whole pipeline. However, are you thinking of including this callbacks features in the new pipelines as well?
But I cannot go back to version v1.1.3 because there is another bug related to the Dataset tags
Nice, in the meantime as a workaround I will implement a temporary parsing code at the beginning of step functions
I have tried it and it depends on the context. When I call the method inside a function decorated with PipelineDecorator.component
, I get the component task, while if I call it inside PipelineDecorator.pipeline
, I get the task corresponding to the pipeline. However, as you said that is not the expected behavior, although I think it makes sense.
But what is the name of that API library in order to have access to those commands from Python SDK?
Oddly enough I didn't run into this problem today 🤔 If it happens to me again, I'll return to this thread 🙂
Nested pipelines do not depend on each other. You can think of it as several models being trained or doing inference at the same time, but each one delivering results for a different client. So you don't use the output from one nested pipeline to feed another one running concurrently, if that's what you mean.
They share the same code (i.e. the same decorated functions), but using a different configuration.
To sum up, we agree that it will be nice to enable the nested components tags. I will continue playing with the capabilities of nested components and keep reporting bugs as I come across them!
The thing is I don't know in advance how many models there will be in the inference stage. My approach is to read from a database the configurations of the operational models through a for loop, and in that loop all the inference tasks would be enqueued (one task for each deployed model). For this I need the system to be able to run several pipelines at the same time. As you told me for now this is not possible, as pipelines are based on singletons, my alternative is to use components
Can you think of any other way to launch multiple pipelines concurrently? Since we have already seen it is only possible to run a single Pipelinecontroller in a single Python process
CostlyOstrich36 Yes, it happens on the following line, at the time of calling the pipeline.forecast = prediction_service(config=default_config)
Were you able to reproduce the example?
I'm totally agree with the pipelinecontroller/decorator part. Regarding the proposal for the component parameter, I also think it would be a good feature, although it might mislead the fact that there will be times when the pipeline will fail because it is an intrinsically crucial step, so it doesn't matter whether 'continue_pipeline_on_failure' is set to True or False. Anyway, I can't think a better way to deal with that right now.
BTW, I would like to mention another problem related to this I have encountered. It seems that arguments of type 'int', 'float' or 'list' (maybe also happens with other types) are transformed to 'str' when passed to a function decorated with PipelineDecorator.component
at the time of calling it in the pipeline itself. Again, is this something intentional?
I have also tried with type hints and it still broadcasts to string. Very weird...
Exactly, when 'extra' has a default value (in this case, 43), the argument preserves its original type. However, when 'extra' is a positional argument then it is transformed to 'str'