I don't know if you remember the need I had some time ago to launch the same pipeline through configuration. I've been thinking about it and I think PipelineController fits my needs better than PipelineDecorator in that respect.
Mmmm you are right. Even if I had 1000 components spread in different project modules, only those components that are imported in the script where the pipeline is defined would be included in the DAG plot, is that right?
BTW, how can I run 'execute_orchestrator' concurrently? That is, launch it for several configurations at the same time? The way it's implemented now, it doesn't start the next configuration until the current one is finished.
I tried specifying helpers functions but it still gives the same error. If I define a component through the following code:
` from typing import Optional
from clearml.automation.controller import PipelineDecorator
@PipelineDecorator.component(...)
def step_data_loading(path: str, target_dir: Optional[str] = None):
pass Then in the automatically created script I find the following code: from clearml.automation.controller import PipelineDecorator
def step_data_loading(path: str, target...
Mmm I see. However I think that only the components used for that pipeline should be shown, as it may be the case that you have defined, say, 1000 components, and you only use 10 in a pipeline. I think that listing them all would just clutter up the results tab for that pipeline task
Where can I find this documentation?
Hi SuccessfulKoala55
So, how can I get the ID of the requested project through the resp object? I tried with resp["id"] but it didn't work.
Well, I am thinking in the case that there are several pipelines in the system and that when filtering a task by its name and project I can get several tasks. How could I build a filter for Task.get_task(task_filter=...) that returns only the task whose parent task is the pipeline task?
AnxiousSeal95 I see. That's why I was thinking of storing the model inside a task just like with the Dataset class. So that you can either use just the model via InputModel or the model and all its artifacts via Task.get_task by using the ID of the task where the model is located.
I would like my cleanup service to remove all tasks older than two weeks, but not the models. Right now, if I delete all tasks the model does not work (as it needs the training tasks). For now, I ...
SuccessfulKoala55 I have not tried yet with argparse, but maybe I will encounter the same problem
AgitatedDove14 Oops, something still seems to be wrong. When trying to retrieve the dataset using get_local_copy() I get the following error:
` Traceback (most recent call last):
File "/home/user/myproject/lab.py", line 27, in <module>
print(dataset.get_local_copy())
File "/home/user/.conda/envs/myenv/lib/python3.9/site-packages/clearml/datasets/dataset.py", line 554, in get_local_copy
target_folder = self._merge_datasets(
File "/home/user/.conda/envs/myenv/lib/python3.9/site-p...
I have tried it and it depends on the context. When I call the method inside a function decorated with PipelineDecorator.component , I get the component task, while if I call it inside PipelineDecorator.pipeline , I get the task corresponding to the pipeline. However, as you said that is not the expected behavior, although I think it makes sense.
I see the point. The reason I'm using PipelineController now is that I've realised that in the code I only send IDs from one step of the pipeline to another, and not artefacts as such. So I think it makes more sense in this case to work with the former.
After doing so the agent is removed from the list provided by ps -ef | grep clearml-agent , but it is still visible from the ClearML UI and also when I run clearml-agent list
But what is the name of that API library in order to have access to those commands from Python SDK?
Well, I need to write boilerplate code to do parsing stuff if I want to use the original values after I connect the dictionary to the task, so it's a bit messy.
Currently I'm using clearml v1.0.5 and clearml-agent v1.0.0
Yes, I'm working with the latest commit. Anyway, I have tried to run dataset.get_local_copy() on another machine and it works. I have no idea why this happens. However, on the new machine get_local_copy() does not return the path I expect. If I have this code:dataset.upload( output_url="/home/user/server_local_storage/mock_storage" )I would expect the dataset to be stored under the path specified in output_url . But what I get with get_local_copy() is the follo...
Mmm but what if the dataset size is too large to be stored in the .cache path? It will be stored there anyway?
Well I tried several things but none of them have worked. I'm a bit lost
AgitatedDove14 BTW, I got the notification from GitHub telling me you had committed the fix and I went ahead. After testing the code again, I see the task parameter dictionary has been removed properly (now it has been broken down into flat parameters). However, I still have the same problem with duplicate tasks, as you can see in the image.
Sure, but I mean, apart from label it as a local path, what's the point of renaming the original path if my goal is to access it later using the name I gave it?
is there any git redundancy on your network ? maybe you could configure a fallback server ?
I will ask this to the IT team
By the way, where can I change the default artifacts location ( output_uri ) if a have a script similar to this example (I mean, from the code, not agent's config):
https://github.com/allegroai/clearml/blob/master/examples/pipeline/pipeline_from_decorator.py
Exactly!! That's what I was looking for: create the pipeline but not launching it. Thanks again AgitatedDove14
BTW, I would like to mention another problem related to this I have encountered. It seems that arguments of type 'int', 'float' or 'list' (maybe also happens with other types) are transformed to 'str' when passed to a function decorated with PipelineDecorator.component at the time of calling it in the pipeline itself. Again, is this something intentional?
Now it's okey. I have found a more intuitive way to get around. I was facing the classic 'xy' problem :)
Well, I see the same utility as it has in the first pipelines generation. After all, isn't the new decorator about keeping the same functionality but saving the user some boilerplate code?