Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi All, I Am Testing The New

Hi all, I am testing the new PipelineDecorator feature. Is there any way to automatically detect the Git repository in the pipeline step decorated with PipelineDecorator.component without specifying the 'repo' argument?

  
  
Posted 3 years ago
Votes Newest

Answers 14


GiganticTurtle0 , are you using ClearML 1.1.1 or 1.1.0?

  
  
Posted 3 years ago

I'm using the latest version (1.1.1)

  
  
Posted 3 years ago

GiganticTurtle0 you mean the repo for the function itself ?
the default assumes the function is "standalone", you can specify a repo with:
@PipelineDecorator.component(..., repo='.')
will take the current folder's repo (i.e. the local one)
you can also specify repo url/commit etc (repo=' https://github/user/repo/repo.git ' ....)
See here:
https://github.com/allegroai/clearml/blob/dd3d4cec948c9f6583a0b69b05043fd60d8c103a/clearml/automation/controller.py#L1931

  
  
Posted 3 years ago

I mean the agent that will run the function (which represents a pipeline step) should clone the repo in order to find the location of the project modules that are required for the function to be executed. Also, I have found that clearml does not automatically detect the imports specified within the function decorated with PipelineDecorator.component (despite I followed a similar scheme to the one in the example https://github.com/allegroai/clearml/blob/master/examples/pipeline/pipeline_from_decorator.py )

  
  
Posted 3 years ago

How can I tell clearml I will use the same virtual environment in all steps and there is no need to waste time re-installing all packages for each step?

  
  
Posted 3 years ago

I am aware of the option to enable virtual environment caching, but that is still very time consuming.

  
  
Posted 3 years ago

Hi GiganticTurtle0

I have found that 

clearml

 does not automatically detect the imports specified within the function decorated

The pipeline decorator will automatically detect the imports Inside the funciton, but not outside (i.e. global), to allow better control of packages (think for example one step needs the huge torch package, and the other does not.
Make sense ?

How can I tell 

clearml

 I will use the same virtual environment in all steps and there is no need to waste time re-installing all packages for each step?

Well, each step is a standalone (idea being, they are automatically spread across the cluster of machines, as opposed to just another subprocess),
This means each step is a standalone Task, which means "sharing" venv is not a "thing".

I am aware of the option to enable virtual environment caching, but that is still very time consuming.

venv caching makes the setup step a few seconds, do notice that it is disabled by default.
what do you mean by " that is still very time consuming" ? This is basically just copying files, it should not be more than a few seconds.
Am I missing something here ?

  
  
Posted 3 years ago

Okay, so the idea behind the new decorator is not to group all the defined steps under the same script so that they share the same environment, but rather to simplify the process of creating scripts for each step and avoid manually calling Task.init on those scripts.

Regarding virtual environment creation from caching, I will keep running benchmarks (from what you say it might be due to high workload in the servers we use)

So far I've been unlucky in the attempt of clearml recognizing packages within the decorator function, but I'll keep trying it

  
  
Posted 3 years ago

By the way, where can I change the default artifacts location ( output_uri ) if a have a script similar to this example (I mean, from the code, not agent's config):
https://github.com/allegroai/clearml/blob/master/examples/pipeline/pipeline_from_decorator.py

  
  
Posted 3 years ago

Okay, so the idea behind the new decorator is not to group all the defined steps under the same script so that they share the same environment, but rather to simplify the process of creating scripts for each step and avoid manually calling 

Task.init

 on those scripts.

Correct, and allow users to more easily create Tasks from code.

Regarding virtual environment creation from caching, I will keep running benchmarks (from what you say it might be due to high workload in the servers we use)

Notice you need to enable this line:
https://github.com/allegroai/clearml-agent/blob/e17246d8ea1a113474af96d9274c42c749fe66db/docs/clearml.conf#L109

So far I've been unlucky in the attempt of 

clearml

 recognizing packages within the decorator function, but I'll keep trying it (edited)

This is odd, if you can create a toy example, we can probably test that

By the way, where can I change the default artifacts location (

output_uri

) if a have a script similar to this example (I mean, from the code, not agent's config):

That is a good point , I guess the assumption is that you configure it on the clearml-agent configuration, as it makes sense that all remote Tasks would have the same output_uri, no?

  
  
Posted 3 years ago

Sure, it's already enabled. I noticed in the ClearML agent configuration another parameter related to environment caching, named as venv_update (I believe it's still in beta). Do you think enabling this parameter significantly helps to build environments faster?

Yes, I guess. Since pipelines are designed to be executed remotely it may be pointless to enable an output_uri parameter in the PipelineDecorator.component . Anyway, could another task be initialized in the same script where the pipeline is called, so it would be the main task and the output_uri could be retrieved from all steps via task = Task.current_task(); task.get_output_destination() combo?

  
  
Posted 3 years ago

named as 

venv_update

 (I believe it's still in beta). Do you think enabling this parameter significantly helps to build environments faster?

This is deprecated... it was a test to use the a package that can update pip venvs, but it was never stable, we will remove it in the next version

Yes, I guess. Since pipelines are designed to be executed remotely it may be pointless to enable an 

output_uri

 parameter in the 

PipelineDecorator.component

regardless, I think it is good practice to add it, so we will 🙂

Anyway, could another task be initialized in the same script where the pipeline is called, so it would be the main task and the 

output_uri

 could be retrieved from all steps via 

task = Task.current_task(); task.get_output_destination()

 combo? (

Could you elaborate on how one will use it exactly,, and for what purpose ?

  
  
Posted 3 years ago

Of course it's always a good idea to have that extra option just in case 🙂

Nevermind, I've already found a cleaner way to address this problem. I really appreciate your help!

  
  
Posted 3 years ago

👍

  
  
Posted 3 years ago