Hi All, I Am Testing The New

Answered

Hi all, I am testing the new PipelineDecorator feature. Is there any way to automatically detect the Git repository in the pipeline step decorated with PipelineDecorator.component without specifying the 'repo' argument?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					GiganticTurtle0
				
					0
					 × 1

Votes Newest

Answers 14

GiganticTurtle0 , are you using ClearML 1.1.1 or 1.1.0?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

I'm using the latest version (1.1.1)

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					GiganticTurtle0
				
					0
					 × 1

GiganticTurtle0 you mean the repo for the function itself ?
the default assumes the function is "standalone", you can specify a repo with:
@PipelineDecorator.component(..., repo='.')
will take the current folder's repo (i.e. the local one)
you can also specify repo url/commit etc (repo=' https://github/user/repo/repo.git ' ....)
See here:
https://github.com/allegroai/clearml/blob/dd3d4cec948c9f6583a0b69b05043fd60d8c103a/clearml/automation/controller.py#L1931

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I mean the agent that will run the function (which represents a pipeline step) should clone the repo in order to find the location of the project modules that are required for the function to be executed. Also, I have found that clearml does not automatically detect the imports specified within the function decorated with PipelineDecorator.component (despite I followed a similar scheme to the one in the example https://github.com/allegroai/clearml/blob/master/examples/pipeline/pipeline_from_decorator.py )

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					GiganticTurtle0
				
					0
					 × 1

How can I tell clearml I will use the same virtual environment in all steps and there is no need to waste time re-installing all packages for each step?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					GiganticTurtle0
				
					0
					 × 1

I am aware of the option to enable virtual environment caching, but that is still very time consuming.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					GiganticTurtle0
				
					0
					 × 1

Hi GiganticTurtle0

I have found that

clearml

does not automatically detect the imports specified within the function decorated

The pipeline decorator will automatically detect the imports Inside the funciton, but not outside (i.e. global), to allow better control of packages (think for example one step needs the huge torch package, and the other does not.
Make sense ?

How can I tell

clearml

I will use the same virtual environment in all steps and there is no need to waste time re-installing all packages for each step?

Well, each step is a standalone (idea being, they are automatically spread across the cluster of machines, as opposed to just another subprocess),
This means each step is a standalone Task, which means "sharing" venv is not a "thing".

I am aware of the option to enable virtual environment caching, but that is still very time consuming.

venv caching makes the setup step a few seconds, do notice that it is disabled by default.
what do you mean by " that is still very time consuming" ? This is basically just copying files, it should not be more than a few seconds.
Am I missing something here ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Okay, so the idea behind the new decorator is not to group all the defined steps under the same script so that they share the same environment, but rather to simplify the process of creating scripts for each step and avoid manually calling Task.init on those scripts.

Regarding virtual environment creation from caching, I will keep running benchmarks (from what you say it might be due to high workload in the servers we use)

So far I've been unlucky in the attempt of clearml recognizing packages within the decorator function, but I'll keep trying it

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					GiganticTurtle0
				
					0
					 × 1

By the way, where can I change the default artifacts location ( output_uri ) if a have a script similar to this example (I mean, from the code, not agent's config):
https://github.com/allegroai/clearml/blob/master/examples/pipeline/pipeline_from_decorator.py

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					GiganticTurtle0
				
					0
					 × 1

Okay, so the idea behind the new decorator is not to group all the defined steps under the same script so that they share the same environment, but rather to simplify the process of creating scripts for each step and avoid manually calling

Task.init

on those scripts.

Correct, and allow users to more easily create Tasks from code.

Regarding virtual environment creation from caching, I will keep running benchmarks (from what you say it might be due to high workload in the servers we use)

Notice you need to enable this line:
https://github.com/allegroai/clearml-agent/blob/e17246d8ea1a113474af96d9274c42c749fe66db/docs/clearml.conf#L109

So far I've been unlucky in the attempt of

clearml

recognizing packages within the decorator function, but I'll keep trying it (edited)

This is odd, if you can create a toy example, we can probably test that

By the way, where can I change the default artifacts location (

output_uri

) if a have a script similar to this example (I mean, from the code, not agent's config):

That is a good point , I guess the assumption is that you configure it on the clearml-agent configuration, as it makes sense that all remote Tasks would have the same output_uri, no?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Sure, it's already enabled. I noticed in the ClearML agent configuration another parameter related to environment caching, named as venv_update (I believe it's still in beta). Do you think enabling this parameter significantly helps to build environments faster?

Yes, I guess. Since pipelines are designed to be executed remotely it may be pointless to enable an output_uri parameter in the PipelineDecorator.component . Anyway, could another task be initialized in the same script where the pipeline is called, so it would be the main task and the output_uri could be retrieved from all steps via task = Task.current_task(); task.get_output_destination() combo?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					GiganticTurtle0
				
					0
					 × 1

named as

venv_update

(I believe it's still in beta). Do you think enabling this parameter significantly helps to build environments faster?

This is deprecated... it was a test to use the a package that can update pip venvs, but it was never stable, we will remove it in the next version

Yes, I guess. Since pipelines are designed to be executed remotely it may be pointless to enable an

output_uri

parameter in the

PipelineDecorator.component

regardless, I think it is good practice to add it, so we will 🙂

Anyway, could another task be initialized in the same script where the pipeline is called, so it would be the main task and the

output_uri

could be retrieved from all steps via

task = Task.current_task(); task.get_output_destination()

combo? (

Could you elaborate on how one will use it exactly,, and for what purpose ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Of course it's always a good idea to have that extra option just in case 🙂

Nevermind, I've already found a cleaner way to address this problem. I really appreciate your help!

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					GiganticTurtle0
				
					0
					 × 1

👍

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

2K Views

14 Answers

3 years ago

2 years ago