
Reputation
Badges 1
183 × Eureka!Great! Thanks for the heads up!
Indeed it does! But what still puzzles me so badly is why I get below path when running dataset.get_local_copy()
on one of the machines of my cluster:/home/user/.clearml/cache/storage_manager/datasets/.lock.000.ds_61ff8d4335dd4b74bd78c3576fa44131.clearml
Why is it pointing to a .lock file?
By the way, where can I change the default artifacts location ( output_uri
) if a have a script similar to this example (I mean, from the code, not agent's config):
https://github.com/allegroai/clearml/blob/master/examples/pipeline/pipeline_from_decorator.py
I am aware of the option to enable virtual environment caching, but that is still very time consuming.
So I assume that you mean to report not only the agent's memory usage, but also of all the subprocesses the agent spawns (?)
Thanks for the background. I now have a big picture of the process ClearML
goes through. It was helpful in clarifying some of the questions that I didn't know how to ask properly. So, the idea is that a base task is already stored on the ClearML
server for later use in a production environment. This is because such a task will always be created during the model development process.
Going back to my initial question, as far as I understood, if the environment caching option is ena...
Sure, it would be very intuitive if the command to stop an agent would be as easy as:clearml-agent daemon --stop AGENT_PID
But I was actually asking about accessing the Pipeline task ID, not the tasks corresponding to the components.
Well, instead of plain functions or files I use components because I need some of those steps to run on one machine and some on another. And it works perfectly fine (ignoring some minor bugs like this one). So I'm actually inserting component-decorated functions into 'helper_functions' parameter
Mmm but what if the dataset size is too large to be stored in the .cache path? It will be stored there anyway?
Oh, I see. This explains the surprising behavior. But what if Task.init
code is created automatically by PipelineDecorator.component
? How can I pass arguments to the init method in that case?
Exactly, when 'extra' has a default value (in this case, 43), the argument preserves its original type. However, when 'extra' is a positional argument then it is transformed to 'str'
By adding the slash I have been able to see that indeed the dataset is stored in output_url
. However, when calling finalize
, I get the same error. And yes, I have installed the version corresponding to the last commit :/
Yes, before removing the 'default' queue I was able to shut down agents without specifying further options after the --stop
command. I just had to run clearml-agent daemon --stop
as many times as there were agents. Of course, I will open the issue as soon as possible :D
Since I am still on time, I would like to report another minor bug related to the 'add_pipeline_tags' parameter of PipelineDecorator.pipeline
. It turns out when the pipeline consists of components that in turn use other components (via 'helper_functions'), these nested components are not tagged with 'pipe: <pipeline_task_id>'. I assume this should not be like that, right?
Hi AnxiousSeal95 !
That's it. My idea is that artifacts can be linked to the model. Typically these artifacts are often links to serialized objects (such as datasets or scalers). They are usually directories or temporary files in mount units that I want to be loaded as artifacts of the task, removed (as they are temporary) and later I can get a new local path via task.artifacts["scalers"].get_local_copy()
. I think this way the model's dependence on the task that created it could be re...
AgitatedDove14 In the 'status.json' file I could see the 'is_dirty' flag is set to True
To sum up, we agree that it will be nice to enable the nested components tags. I will continue playing with the capabilities of nested components and keep reporting bugs as I come across them!
Currently I'm working with v1.0.5. Anyway, I found that it is possible to connect the new argument if I store in a variable the arguments returned by task.connect(args)
. I expected that since it is a mutable object it would not be necessary to overwrite args
, but apparently it is required in this version of ClearML.
AnxiousSeal95 I see. That's why I was thinking of storing the model inside a task just like with the Dataset
class. So that you can either use just the model via InputModel
or the model and all its artifacts via Task.get_task
by using the ID of the task where the model is located.
I would like my cleanup service to remove all tasks older than two weeks, but not the models. Right now, if I delete all tasks the model does not work (as it needs the training tasks). For now, I ...
I have also tried with type hints and it still broadcasts to string. Very weird...
Hi ExasperatedCrab78 ,
Sure! Sorry for the delay. I'm using Chrome Version 98.0.4758.102 (Official Build) (64-bit)
Hi AnxiousSeal95 !
Yes, main reason is to unclutter the ClearML Web UI but also free up space on our server (mainly due to the large size of the datasets). Once the models are trained, I want to retrain them periodically, and to do so I would like all the data specifications and artifacts generated during training to be linked to the model found under the " Models" section.
What I propose is somehow similar to the functionality of clearml.Dataset
. These datasets are themselves a task t...
Can you think of any other way to launch multiple pipelines concurrently? Since we have already seen it is only possible to run a single Pipelinecontroller in a single Python process
Sure, it's already enabled. I noticed in the ClearML agent configuration another parameter related to environment caching, named as venv_update
(I believe it's still in beta). Do you think enabling this parameter significantly helps to build environments faster?
Yes, I guess. Since pipelines are designed to be executed remotely it may be pointless to enable an output_uri
parameter in the PipelineDecorator.component
. Anyway, could another task be initialized in the same scr...
Nice, in the meantime as a workaround I will implement a temporary parsing code at the beginning of step functions
Yes, although I use both terms interchangeably. The information will actually be contained in JSON files.
BTW, I would like to mention another problem related to this I have encountered. It seems that arguments of type 'int', 'float' or 'list' (maybe also happens with other types) are transformed to 'str' when passed to a function decorated with PipelineDecorator.component
at the time of calling it in the pipeline itself. Again, is this something intentional?