Reputation
Badges 1
183 × Eureka!I don't know if you remember the need I had some time ago to launch the same pipeline through configuration. I've been thinking about it and I think PipelineController fits my needs better than PipelineDecorator in that respect.
Mmmm you are right. Even if I had 1000 components spread in different project modules, only those components that are imported in the script where the pipeline is defined would be included in the DAG plot, is that right?
BTW, how can I run 'execute_orchestrator' concurrently? That is, launch it for several configurations at the same time? The way it's implemented now, it doesn't start the next configuration until the current one is finished.
I tried specifying helpers functions but it still gives the same error. If I define a component through the following code:
` from typing import Optional
from clearml.automation.controller import PipelineDecorator
@PipelineDecorator.component(...)
def step_data_loading(path: str, target_dir: Optional[str] = None):
pass Then in the automatically created script I find the following code: from clearml.automation.controller import PipelineDecorator
def step_data_loading(path: str, target...
Mmm I see. However I think that only the components used for that pipeline should be shown, as it may be the case that you have defined, say, 1000 components, and you only use 10 in a pipeline. I think that listing them all would just clutter up the results tab for that pipeline task
Where can I find this documentation?
Hi SuccessfulKoala55
So, how can I get the ID of the requested project through the resp object? I tried with resp["id"] but it didn't work.
Well, I am thinking in the case that there are several pipelines in the system and that when filtering a task by its name and project I can get several tasks. How could I build a filter for Task.get_task(task_filter=...) that returns only the task whose parent task is the pipeline task?
AnxiousSeal95 I see. That's why I was thinking of storing the model inside a task just like with the Dataset class. So that you can either use just the model via InputModel or the model and all its artifacts via Task.get_task by using the ID of the task where the model is located.
I would like my cleanup service to remove all tasks older than two weeks, but not the models. Right now, if I delete all tasks the model does not work (as it needs the training tasks). For now, I ...
SuccessfulKoala55 I have not tried yet with argparse, but maybe I will encounter the same problem
AgitatedDove14 Oops, something still seems to be wrong. When trying to retrieve the dataset using get_local_copy() I get the following error:
` Traceback (most recent call last):
File "/home/user/myproject/lab.py", line 27, in <module>
print(dataset.get_local_copy())
File "/home/user/.conda/envs/myenv/lib/python3.9/site-packages/clearml/datasets/dataset.py", line 554, in get_local_copy
target_folder = self._merge_datasets(
File "/home/user/.conda/envs/myenv/lib/python3.9/site-p...
I have tried it and it depends on the context. When I call the method inside a function decorated with PipelineDecorator.component , I get the component task, while if I call it inside PipelineDecorator.pipeline , I get the task corresponding to the pipeline. However, as you said that is not the expected behavior, although I think it makes sense.
I see the point. The reason I'm using PipelineController now is that I've realised that in the code I only send IDs from one step of the pipeline to another, and not artefacts as such. So I think it makes more sense in this case to work with the former.
After doing so the agent is removed from the list provided by ps -ef | grep clearml-agent , but it is still visible from the ClearML UI and also when I run clearml-agent list
Hi AnxiousSeal95 !
Yes, main reason is to unclutter the ClearML Web UI but also free up space on our server (mainly due to the large size of the datasets). Once the models are trained, I want to retrain them periodically, and to do so I would like all the data specifications and artifacts generated during training to be linked to the model found under the " Models" section.
What I propose is somehow similar to the functionality of clearml.Dataset . These datasets are themselves a task t...
But how could I know whether an agent is up or not? Is it from the CLI or SDK?
But what is the name of that API library in order to have access to those commands from Python SDK?
Well, I need to write boilerplate code to do parsing stuff if I want to use the original values after I connect the dictionary to the task, so it's a bit messy.
Currently I'm using clearml v1.0.5 and clearml-agent v1.0.0
Yes, I'm working with the latest commit. Anyway, I have tried to run dataset.get_local_copy() on another machine and it works. I have no idea why this happens. However, on the new machine get_local_copy() does not return the path I expect. If I have this code:dataset.upload( output_url="/home/user/server_local_storage/mock_storage" )I would expect the dataset to be stored under the path specified in output_url . But what I get with get_local_copy() is the follo...
Mmm but what if the dataset size is too large to be stored in the .cache path? It will be stored there anyway?
Well I tried several things but none of them have worked. I'm a bit lost
AgitatedDove14 BTW, I got the notification from GitHub telling me you had committed the fix and I went ahead. After testing the code again, I see the task parameter dictionary has been removed properly (now it has been broken down into flat parameters). However, I still have the same problem with duplicate tasks, as you can see in the image.
Sure, but I mean, apart from label it as a local path, what's the point of renaming the original path if my goal is to access it later using the name I gave it?
is there any git redundancy on your network ? maybe you could configure a fallback server ?
I will ask this to the IT team
Mmm that's weird. Because I can see the type hints in the function's arguments of the automatically generated script. So, maybe I'm doing something wrong or it's a bug, since they have been passed to the created step (I'm using clearml version 1.1.2 and clearml-agent version 1.1.0).
Or maybe you could bundle some parameters that belongs to PipelineDecorator.component into high-level configuration variable (something like PipelineDecorator.global_config (?))
By the way, where can I change the default artifacts location ( output_uri ) if a have a script similar to this example (I mean, from the code, not agent's config):
https://github.com/allegroai/clearml/blob/master/examples/pipeline/pipeline_from_decorator.py