Hi, I’M Trying Out Clearml Pipelines From Decorators, And I’M Encountering A Few Problems I Don’T Know How To Solve.

Answered

Hi,
I’m trying out ClearML Pipelines from Decorators, and I’m encountering a few problems I don’t know how to solve.
I’d like to configure requirements file, docker image, docker command for my pipeline controller, but it seems I cannot set it up. Am I missing something? I’d like to setup uploading pipeline artifacts / outputs of pipeline steps to a GCP bucket. By default they are uploaded to a file server which seems suboptimal, but it seems there is no option to set it to gcp bucket by default. Am I missing something?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					DizzyPelican17
				
					0
					 × 1

Votes Newest

Answers 6

I understand I can change the docker image for a component in the pipeline, but for the

it isn’t possible.

you can always to Task.current_task.connect() from the pipeline function itself, to connect more configuration arguments you basically add via the function itself, all the pipeline logic function arguments become pipeline arguments, it's kind of neat 🙂 regrading docker, the idea is that you use a very basic python docker (the default for services) queue for all the pipeline logic, that said, inside the pipeline function you can call Task.current_task.set_base_docker() and set the base docker to be used. The only caveat is that you first have to run it locally.
It might be a good idea to add docker option to the decorator itself regardless, like we have in the component, wdty?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

btw, when running pipelines, a common pattern is that two consequential components end up on the same node. The current implementation will upload the result of the first component, and then the first thing the next component will do is download it. I assume get_local_copy method is used and the output is stored to local cache. Wouldn’t it be more performant for the first component to store its result to the local cache along uploading it to file server? In that way, the next component if run on the same node wouldn’t need to download it from the file server.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					DizzyPelican17
				
					0
					 × 1

Thanks for the response!
I understand I can change the docker image for a component in the pipeline, but for the https://github.com/allegroai/clearml/blob/90f30e8d9a5ca9a1afa6b2e5ffccb96b0afe9c78/examples/pipeline/pipeline_from_decorator.py#L77 it isn’t possible. I see that you can just change the queue it runs on, not the docker image, nor its params, nor the requirements. thanks for this! I’ll try it!

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					DizzyPelican17
				
					0
					 × 1

Thanks for the extensive response! As the solution seems a bit hackish, I’d definitely prefer the ability to set a docker image/docker args/requirements config for the pipeline controller too

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					DizzyPelican17
				
					0
					 × 1

I’d definitely prefer the ability to set a docker image/docker args/requirements config for the pipeline controller too

That makes sense, any chance you can open a github issue with feature request so that we do not forget ?

The current implementation will upload the result of the first component, and then the first thing the next component will do is download it.

If they are on the same machine, it should be cached when accessed the 2nd time

Wouldn’t it be more performant for the first component to store its result to the local cache along uploading it to file server? In that way, the next component if run on the same node wouldn’t need to download it from the file server.

I think you are correct since the first time, it will not pass through the cache...
Not sure if there is an easy "path" to tell the cache "put this file in the cache"...

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hi DizzyPelican17
I’d like to configure requirements file, docker image, docker command for my pipeline controller, but it seems I cannot set it up. Am I missing something?The decorator itself accepts those as arguments:
https://clear.ml/docs/latest/docs/references/sdk/automation_controller_pipelinecontroller#pipelinedecoratorcomponent
https://github.com/allegroai/clearml/blob/90f30e8d9a5ca9a1afa6b2e5ffccb96b0afe9c78/examples/pipeline/pipeline_from_decorator.py#L8

I’d like to setup uploading pipeline artifacts / outputs of pipeline steps to a GCP bucket. By default they are uploaded to a file server which seems suboptimal, but it seems there is no option to set it to gcp bucket by default. Am I missing something?Sure you can configure the file_server so every artifact is uploaded to GCP instead of the default file server:
https://github.com/allegroai/clearml/blob/90f30e8d9a5ca9a1afa6b2e5ffccb96b0afe9c78/docs/clearml.conf#L10
just put there: gs://bucket/folder do not forget to configure your credentials:
https://github.com/allegroai/clearml/blob/90f30e8d9a5ca9a1afa6b2e5ffccb96b0afe9c78/docs/clearml.conf#L126

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

2K Views

6 Answers

2 years ago