Hey There, Is It Possible For A Clearml Pipeline Step To Log A Folder Instead Of Numpy/Pickle Objects? Looking At The Docs,

Answered

Hey there, Is it possible for a clearml pipeline step to log a folder instead of numpy/pickle objects? Looking at the docs, monitor_artifacts could be what I am searching for but I am not sure

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

Votes Newest

Answers 13

Yes exactly

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

In all the steps I want to store them as artifacts to s3 because it’s very convenient.
The last step should merge them all, ie. it needs to know all the other artifacts of the previous steps

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

I think it depends on your code and the pipeline setup. You can also cache steps - avoiding the entire need to worry about artifacts.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

CostlyOstrich36 super thanks for confirming! I have then the follow-up question: are the artifacts duplicated (copied)? or just referenced?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

So in which scenario do you want to keep those folders as artifacts and where would you like to store them?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

So if all artifacts are logged in the pipeline controller task, I need the last task to access all the artifacts from the pipeline task. I need to execute something like PipelineController.get_artifact() in the last step task

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

https://github.com/allegroai/clearml/blob/master/examples/reporting/artifacts.py

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

I guess I can have a workaround by passing the pipeline controller task id to the last step, so that the last step can download all the artifacts from the controller task.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

I also would like to avoid any copy of these artifacts on s3 (to avoid double costs, since some folders might be big)

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

What's the use case?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

So in my use case each step would create a folder (potentially big) and would store it as an artifact. The last step should “merge” all the pervious folders. The idea is to split the work among multiple machines (in parallel). I would like to avoid that these potentially big folder artifacts are also stored in the pipeline task, because this one will be running on the services queue in the clearml-server instance, that will definitely not have enough space to handle all of them

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

JitteryCoyote63 , heya, yes it is :)
You can save the entire folder as an artifact.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Do you mean if they are shared between steps or if each step creates a duplicate?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Write your answer

2K Views

13 Answers

3 years ago

2 years ago