Reputation
Badges 1
183 × Eureka!I have tried it and it depends on the context. When I call the method inside a function decorated with PipelineDecorator.component
, I get the component task, while if I call it inside PipelineDecorator.pipeline
, I get the task corresponding to the pipeline. However, as you said that is not the expected behavior, although I think it makes sense.
But what is the name of that API library in order to have access to those commands from Python SDK?
Oddly enough I didn't run into this problem today 🤔 If it happens to me again, I'll return to this thread 🙂
By the way, where can I change the default artifacts location ( output_uri
) if a have a script similar to this example (I mean, from the code, not agent's config):
https://github.com/allegroai/clearml/blob/master/examples/pipeline/pipeline_from_decorator.py
Nested pipelines do not depend on each other. You can think of it as several models being trained or doing inference at the same time, but each one delivering results for a different client. So you don't use the output from one nested pipeline to feed another one running concurrently, if that's what you mean.
Since I am still on time, I would like to report another minor bug related to the 'add_pipeline_tags' parameter of PipelineDecorator.pipeline
. It turns out when the pipeline consists of components that in turn use other components (via 'helper_functions'), these nested components are not tagged with 'pipe: <pipeline_task_id>'. I assume this should not be like that, right?
They share the same code (i.e. the same decorated functions), but using a different configuration.
Please let me know as soon as you have something :)
To sum up, we agree that it will be nice to enable the nested components tags. I will continue playing with the capabilities of nested components and keep reporting bugs as I come across them!
The thing is I don't know in advance how many models there will be in the inference stage. My approach is to read from a database the configurations of the operational models through a for loop, and in that loop all the inference tasks would be enqueued (one task for each deployed model). For this I need the system to be able to run several pipelines at the same time. As you told me for now this is not possible, as pipelines are based on singletons, my alternative is to use components
Can you think of any other way to launch multiple pipelines concurrently? Since we have already seen it is only possible to run a single Pipelinecontroller in a single Python process
CostlyOstrich36 Yes, it happens on the following line, at the time of calling the pipeline.forecast = prediction_service(config=default_config)
Were you able to reproduce the example?
Thanks, I'd appreciate it if you let me know when it's fixed :D
AgitatedDove14 In the 'status.json' file I could see the 'is_dirty' flag is set to True
Mmmm you are right. Even if I had 1000 components spread in different project modules, only those components that are imported in the script where the pipeline is defined would be included in the DAG plot, is that right?
Mmm I see. However I think that only the components used for that pipeline should be shown, as it may be the case that you have defined, say, 1000 components, and you only use 10 in a pipeline. I think that listing them all would just clutter up the results tab for that pipeline task
I'm totally agree with the pipelinecontroller/decorator part. Regarding the proposal for the component parameter, I also think it would be a good feature, although it might mislead the fact that there will be times when the pipeline will fail because it is an intrinsically crucial step, so it doesn't matter whether 'continue_pipeline_on_failure' is set to True or False. Anyway, I can't think a better way to deal with that right now.
So ClearML will scan all the repository code searching for package dependencies? Is that right?
That's right. There is no such package, it's just a custom module.
But that module uses tensorflow
, and ClearML does not add it to the list of packages to install. The only solution available so far is to include it manually via Task.add_requirements
?
Thanks for helping. You and your team are doing a great job for the ML community.
Anyway, is there any way to retrieve the information stored in the RESULTS tab of ClearML Web UI?
Well the 'state.json' file is actually removed after the exception is raised
BTW, I would like to mention another problem related to this I have encountered. It seems that arguments of type 'int', 'float' or 'list' (maybe also happens with other types) are transformed to 'str' when passed to a function decorated with PipelineDecorator.component
at the time of calling it in the pipeline itself. Again, is this something intentional?
I have also tried with type hints and it still broadcasts to string. Very weird...
Exactly, when 'extra' has a default value (in this case, 43), the argument preserves its original type. However, when 'extra' is a positional argument then it is transformed to 'str'
Makes sense, thanks!
Yes, from archived experiments
What exactly do you mean by that? From VS Code I execute the following script, and then the agents take care of executing the code remotely:
` import pandas as pd
from clearml import Task, TaskTypes
from clearml.automation.controller import PipelineDecorator
CACHE = False
@PipelineDecorator.component(
name="Wind data creator",
return_values=["wind_series"],
cache=CACHE,
execution_queue="data_cpu",
task_type=TaskTypes.data_processing,
)
def generate_wind(start_date: st...