Hi GiganticTurtle0
I have found that
clearml
does not automatically detect the imports specified within the function decorated
The pipeline decorator will automatically detect the imports Inside the funciton, but not outside (i.e. global), to allow better control of packages (think for example one step needs the huge torch package, and the other does not.
Make sense ?
How can I tell
clearml
I will use the same virtual environment in all steps and there is no need to waste time re-installing all packages for each step?
Well, each step is a standalone (idea being, they are automatically spread across the cluster of machines, as opposed to just another subprocess),
This means each step is a standalone Task, which means "sharing" venv is not a "thing".
I am aware of the option to enable virtual environment caching, but that is still very time consuming.
venv caching makes the setup step a few seconds, do notice that it is disabled by default.
what do you mean by " that is still very time consuming" ? This is basically just copying files, it should not be more than a few seconds.
Am I missing something here ?
Of course it's always a good idea to have that extra option just in case 🙂
Nevermind, I've already found a cleaner way to address this problem. I really appreciate your help!
GiganticTurtle0 you mean the repo for the function itself ?
the default assumes the function is "standalone", you can specify a repo with:@PipelineDecorator.component(..., repo='.')
will take the current folder's repo (i.e. the local one)
you can also specify repo url/commit etc (repo=' https://github/user/repo/repo.git ' ....)
See here:
https://github.com/allegroai/clearml/blob/dd3d4cec948c9f6583a0b69b05043fd60d8c103a/clearml/automation/controller.py#L1931
By the way, where can I change the default artifacts location ( output_uri
) if a have a script similar to this example (I mean, from the code, not agent's config):
https://github.com/allegroai/clearml/blob/master/examples/pipeline/pipeline_from_decorator.py
named as
venv_update
(I believe it's still in beta). Do you think enabling this parameter significantly helps to build environments faster?
This is deprecated... it was a test to use the a package that can update pip venvs, but it was never stable, we will remove it in the next version
Yes, I guess. Since pipelines are designed to be executed remotely it may be pointless to enable an
output_uri
parameter in the
PipelineDecorator.component
regardless, I think it is good practice to add it, so we will 🙂
Anyway, could another task be initialized in the same script where the pipeline is called, so it would be the main task and the
output_uri
could be retrieved from all steps via
task = Task.current_task(); task.get_output_destination()
combo? (
Could you elaborate on how one will use it exactly,, and for what purpose ?
GiganticTurtle0 , are you using ClearML 1.1.1 or 1.1.0?
I mean the agent that will run the function (which represents a pipeline step) should clone the repo in order to find the location of the project modules that are required for the function to be executed. Also, I have found that clearml
does not automatically detect the imports specified within the function decorated with PipelineDecorator.component
(despite I followed a similar scheme to the one in the example https://github.com/allegroai/clearml/blob/master/examples/pipeline/pipeline_from_decorator.py )
Okay, so the idea behind the new decorator is not to group all the defined steps under the same script so that they share the same environment, but rather to simplify the process of creating scripts for each step and avoid manually calling Task.init
on those scripts.
Regarding virtual environment creation from caching, I will keep running benchmarks (from what you say it might be due to high workload in the servers we use)
So far I've been unlucky in the attempt of clearml
recognizing packages within the decorator function, but I'll keep trying it
How can I tell clearml
I will use the same virtual environment in all steps and there is no need to waste time re-installing all packages for each step?
Sure, it's already enabled. I noticed in the ClearML agent configuration another parameter related to environment caching, named as venv_update
(I believe it's still in beta). Do you think enabling this parameter significantly helps to build environments faster?
Yes, I guess. Since pipelines are designed to be executed remotely it may be pointless to enable an output_uri
parameter in the PipelineDecorator.component
. Anyway, could another task be initialized in the same script where the pipeline is called, so it would be the main task and the output_uri
could be retrieved from all steps via task = Task.current_task(); task.get_output_destination()
combo?
I am aware of the option to enable virtual environment caching, but that is still very time consuming.
Okay, so the idea behind the new decorator is not to group all the defined steps under the same script so that they share the same environment, but rather to simplify the process of creating scripts for each step and avoid manually calling
Task.init
on those scripts.
Correct, and allow users to more easily create Tasks from code.
Regarding virtual environment creation from caching, I will keep running benchmarks (from what you say it might be due to high workload in the servers we use)
Notice you need to enable this line:
https://github.com/allegroai/clearml-agent/blob/e17246d8ea1a113474af96d9274c42c749fe66db/docs/clearml.conf#L109
So far I've been unlucky in the attempt of
clearml
recognizing packages within the decorator function, but I'll keep trying it (edited)
This is odd, if you can create a toy example, we can probably test that
By the way, where can I change the default artifacts location (
output_uri
) if a have a script similar to this example (I mean, from the code, not agent's config):
That is a good point , I guess the assumption is that you configure it on the clearml-agent configuration, as it makes sense that all remote Tasks would have the same output_uri, no?